Arxiv:1803.04680V4 [Cs.CV] 16 Mar 2018 1

Multi-Frame Quality Enhancement for Compressed Video Ren Yang, Mai Xu,∗ Zulin Wang and Tianyi Li School of Electronic and Information Engineering, Beihang University, Beijing, China fyangren, maixu, wzulin, [email protected] HEVC Our MFQE approach PQF Abstract PSNR (dB) PSNR 81 85 90 95 100 105 Frames The past few years have witnessed great success in ap- Compressed frame 93 Compressed frame 96 Compressed frame 97 plying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on en- hancing the quality of a single frame, ignoring the similar- ity between consecutive frames. In this paper, we investigate that heavy quality fluctuation exists across compressed MF-CNN of our MFQE approach video frames, and thus low quality frames can be enhanced DS-CNN [42] using the neighboring high quality frames, seen as Multi- Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as a first attempt in this direction. In our approach, we Enhanced frame 96 Enhanced frame 96 Raw frame 96 firstly develop a Support Vector Machine (SVM) based de- Figure 1. Example for quality fluctuation (top) and quality enhancement tector to locate Peak Quality Frames (PQFs) in compressed performance (bottom). video. Then, a novel Multi-Frame Convolutional Neural Recently, there has been increasing interest in enhanc- Network (MF-CNN) is designed to enhance the quality of ing the visual quality of compressed image/video [27, 10, compressed video, in which the non-PQF and its nearest 18, 36, 6, 9, 38, 12, 24, 24, 42]. For example, Dong et two PQFs are as the input. The MF-CNN compensates al. [9] designed a four-layer Convolutional Neural Net- motion between the non-PQF and PQFs through the Mo- work (CNN) [22], named AR-CNN, which considerably tion Compensation subnet (MC-subnet). Subsequently, the improves the quality of JPEG images. Later, Yang et al. de- Quality Enhancement subnet (QE-subnet) reduces compres- signed a Decoder-side Scalable CNN (DS-CNN) for video sion artifacts of the non-PQF with the help of its near- quality enhancement [42, 44]. However, when process- est PQFs. Finally, the experiments validate the effective- ing a single frame, all existing quality enhancement ap- ness and generality of our MFQE approach in advanc- proaches do not take any advantage of information in the ing the state-of-the-art quality enhancement of compressed neighbouring frames, and thus their performance is largely video. The code of our MFQE approach is available at limited. As Figure 1 shows, the quality of compressed https://github.com/ryangBUAA/MFQE.git. video dramatically fluctuates across frames. Therefore, it arXiv:1803.04680v4 [cs.CV] 16 Mar 2018 1. Introduction is possible to use the high quality frames (i.e., Peak Qual- 1 During the past decades, video has become significantly ity Frames, called PQFs ) to enhance the quality of their popular over the Internet. According to the Cisco Data Traf- neighboring low quality frames (non-PQFs). This can be fic Forecast [7], video generates 60% of Internet traffic in seen as Multi-Frame Quality Enhancement (MFQE), simi- 2016, and this figure is predicted to reach 78% by 2020. lar to multi-frame super-resolution [20, 3]. When transmitting video over the bandwidth-limited Inter- This paper proposes an MFQE approach for compressed net, video compression has to be applied to significantly video. Specifically, we first investigate that there exists save the coding bit-rate [31, 39, 33, 26]. However, the com- large quality fluctuation across frames, for video sequences pressed video inevitably suffers from compression artifacts compressed by almost all coding standards. Thus, it is nec- [45, 35, 25, 41, 43], which may severely degrade the Qual- essary to find PQFs that can be used to enhance the quality ity of Experience (QoE). Therefore, it is necessary to study of their adjacent non-PQFs. To this end, we train a Sup- on quality enhancement for compressed video. 1PQFs are defined as the frames whose quality is higher than their pre- ∗Mai Xu is the corresponding author of this paper. vious and subsequent frames. 1 port Vector Machine (SVM) as a no-reference method to coding. However, the CNN in [8] was designed as a com- detect PQFs. Then, a novel Multi-Frame CNN (MF-CNN) ponent of the video encoder, so that it is not practical for architecture is proposed for quality enhancement, in which already compressed video. Most recently, a Deep CNN- both the current frame and its adjacent PQFs are as the in- based Auto Decoder (DCAD), which contains 10 CNN lay- puts. Our MF-CNN includes two components, i.e., Motion ers, was proposed in [37] to reduce the distortion of com- Compensation subnet (MC-subnet) and Quality Enhance- pressed video. Moreover, Yang et al. [42] proposed the ment subnet (QE-subnet). The MC-subnet is developed to DS-CNN approach for video quality enhancement. In [42], compensate the motion between the current non-PQF and DS-CNN-I and DS-CNN-B, as two subnetworks of DS- its adjacent PQFs. The QE-subnet, with a spatio-temporal CNN, are used to reduce the artifacts of intra- and inter- architecture, is designed to extract and merge the features coding, respectively. More importantly, the video encoder of the current non-PQF and the compensated PQFs. Finally, does not need to be modified when applying the DCAD [37] the quality of the current non-PQF can be enhanced by QE- and DS-CNN [42] approaches. Nevertheless, all above ap- subnet that takes advantage of the high quality content in the proaches can be seen as single-frame quality enhancement adjacent PQFs. For example, as shown in Figure 1, the cur- approaches, as they do not use any advantageous informa- rent non-PQF (frame 96) and the nearest PQFs (frames 93 tion available in the neighboring frames. Consequently, the and 97) are fed in to the MF-CNN of our MFQE approach. video quality enhancement performance is severely limited. As a result, the low quality content (basketball) of the non- Multi-frame super-resolution. To our best knowledge, PQF (frame 96) can be enhanced upon the same content but there exists no MFQE work for compressed video. The with high quality in the neighboring PQFs (frames 93 and closest area is multi-frame video super-resolution. In the 97). Moreover, Figure 1 shows that our MFQE approach early years, Brandi et al. [2] and Song et al. [32] pro- also mitigates the quality fluctuation, because of the con- posed to enlarge video resolution by taking advantage of siderable quality improvement of non-PQFs. high resolution key-frames. Recently, many multi-frame The main contributions of this paper are: (1) We analyze super-resolution approaches have employed deep neural the frame-level quality fluctuation of video sequences com- networks. For example, Huang et al. [17] developed a Bidi- pressed by various video coding standards. (2) We propose rectional Recurrent Convolutional Network (BRCN), which a novel CNN-based MFQE approach, which can reduce the improves the super-resolution performance over traditional compression artifacts of non-PQFs by making use of the single-frame approaches. In 2016, Kappeler et al. proposed neighboring PQFs. a Video Super-Resolution network (VSRnet) [20], in which 2. Related works the neighboring frames are warped according to the esti- mated motion, and both the current and warped neighbor- Quality enhancement. Recently, extensive works [27, ing frames are fed into a super-resolution CNN to enlarge 10, 36, 18, 19, 6, 9, 12, 38, 24, 4] have focused on enhanc- the resolution of the current frame. Later, Li et al. [23] pro- ing the visual quality of compressed image. Specifically, posed replacing VSRnet by a deeper network with residual Foi et al. [10] applied pointwise Shape-Adaptive DCT (SA- learning strategy. Besides, other deep learning approaches DCT) to reduce the blocking and ringing effects caused by of video super-resolution were proposed in [28, 3]. JPEG compression. Later, Jancsary et al. [18] proposed The aforementioned multi-frame super-resolution ap- reducing JPEG image blocking effects by adopting Regres- proaches are motivated by the fact that different obser- sion Tree Fields (RTF). Moreover, sparse coding was uti- vations of the same objects or scenes are probably avail- lized to remove the JPEG artifacts, such as [19] and [6]. able across frames of a video. As a result, the neighbor- Recently, deep learning has also been successfully applied ing frames may contain the content missed when down- to improve the visual quality of compressed image. Particu- sampling the current frame. Similarly, for compressed larly, Dong et al. [9] proposed a four-layer AR-CNN to re- 3 video, the low quality frames can be enhanced by taking duce the JPEG artifacts of images. Afterwards, D [38] and advantage of their adjacent higher quality frames, because Deep Dual-domain Convolutional Network (DDCN) [12] heavy quality fluctuation exists across compressed frames. were proposed as advanced deep networks for the quality Consequently, the quality of compressed video may be ef- enhancement of JPEG image, utilizing the prior knowledge fectively improved by leveraging the multi-frame informa- of JPEG compression. Later, DnCNN was proposed in [46] tion. To the best of our knowledge, our MFQE approach for several tasks of image restoration, including quality en- proposed in this paper is the first attempt in this direction. hancement. Most recently, Li et al. [24] proposed a 20- layer CNN, achieving the state-of-the-art quality enhance- 3. Quality fluctuation of compressed video ment performance for compressed image. In this section, we analyze the quality fluctuation of com- For the quality enhancement of compressed video, the pressed video alongside the frames.

Arxiv:1803.04680V4 [Cs.CV] 16 Mar 2018 1

A Comprehensive Benchmark for Single Image Compression Artifact Reduction

Learning a Virtual Codec Based on Deep Convolutional Neural

New Image Compression Artifact Measure Using Wavelets

ITC Confplanner DVD Pages Itcconfplanner

Life Science Journal 2013; 10 (2) [email protected] 102 P

Image Compression Using Anti-Forensics Method

Differentiable GIF Encoding Framework

Gifnets: Differentiable GIF Encoding Framework

Review of Artifacts in JPEG Compression and Reduction

Dail and Hammar's Pulmonary Pathology

Progress in Respiratory Research

Better Compression with Deep Pre-Editing