A New Real-Time Embedded Video Denoising Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
A New Real-Time Embedded Video Denoising Algorithm Andrea Petreto, Thomas Romera, Ian Masliah, Boris Gaillard, Manuel Bouyer, Quentin Meunier, Lionel Lacassagne, Florian Lemaitre To cite this version: Andrea Petreto, Thomas Romera, Ian Masliah, Boris Gaillard, Manuel Bouyer, et al.. A New Real- Time Embedded Video Denoising Algorithm. DASIP 2019 - The Conference on Design and Architec- tures for Signal and Image Processing, Oct 2019, Montréal, Canada. hal-02343597 HAL Id: hal-02343597 https://hal.archives-ouvertes.fr/hal-02343597 Submitted on 2 Nov 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A New Real-Time Embedded Video Denoising Algorithm Andrea Petreto∗y, Thomas Romera∗y, Florian Lemaitre∗, Ian Masliah∗, Boris Gaillardy, Manuel Bouyer∗, Quentin L. Meunier∗ and Lionel Lacassagne∗ ∗ Sorbonne University, CNRS, LIP6, F-75005 Paris, France. Email: fi[email protected] yLheritier´ - Alcen, F-95862, Cergy-Pontoise, France. Email: [email protected] Abstract—Many embedded applications rely on video pro- state-of-the-art in terms of denoising efficiency in section III. cessing or on video visualization. Noisy video is thus a major We then present the main optimizations applied to achieve issue for such applications. However, video denoising requires real-time performance in section IV. Finally in section V, we a lot of computational effort and most of the state-of-the-art algorithms cannot be run in real-time at camera framerate. This compare the execution time of RTE-VD to other state-of-the- article introduces a new real-time video denoising algorithm art algorithms and study its speed and power consumption for embedded platforms called RTE-VD. We first compare its depending on various embedded CPUs and frequencies. denoising capabilities with other online and offline algorithms. We show that RTE-VD can achieve real-time performance (25 II. DENOISING ALGORITHM frames per second) for qHD video (960×540 pixels) on embedded CPUs and the output image quality is comparable to state-of-the- Most state-of-the-art video denoising algorithms rely on art algorithms. In order to reach real-time denoising, we applied patch based filtering methods [10], [13], [14] which tend to several high-level transforms and optimizations (SIMDization, be effective to reduce noise but are very time consuming. multi-core parallelization, operator fusion and pipelining). We In addition, their memory access patterns generate many study the relation between computation time and power con- sumption on several embedded CPUs and show that it is possible cache misses, making it difficult to speed the computation up. to determine different frequency and core configurations in order Therefore, we did not consider this technique for real-time to minimize either the computation time or the energy. denoising. Index Terms—Embedded system, Video denoising, Real-time, A crucial aspect of video denoising is to keep temporal Image processing, night-vision, SIMD. coherence between frames. While we cannot ensure this co- herence using patch search, it is possible to do so with an I. INTRODUCTION optical flow estimation [11], [17]. Image denoising is a major research area for computer vi- Optical flow algorithms are also time consuming but are sion. It can be based on methods such as pixel-wise filters [1]– sensitive to code transformations and can be highly accel- [3], patch-based methods [4], [5] or Convolutional Neural erated [18]–[22]. We thus decided to use an optical flow Networks (CNN) [6]–[8]. Video denoising can take advantage algorithm in our real-time denoising chain. of the temporal dimension to reduce the noise more efficiently. While there exists many optical flow algorithms [23], optical Denoising algorithms can be classified according to two flow estimation for video denoising should be dense, robust criteria: the processing time and the denoising efficiency. to noise and handle flow discontinuities [17]. Therefore, we In this work, we focus on video with heavy noise and we decided to use the TV-L1 algorithm to compute the optical target real-time processing at 25 frames per second (fps) on flow, since it satisfies all these constraints [24]. embedded systems. Concerning the image quality, we focus on details enhancement rather than aesthetic rendering. Some video denoising algorithms are able to deal with heavy noise [9]–[14]. However, all these algorithms are slow and cannot be used for real-time denoising. Moreover, they have a high latency as they need future frames to process the current one (typically 5 to 7). Fig. 1. Principle components of the denoising application. There are algorithms designed for real-time denoising [15] on embedded architectures [16], but only for light noise like Our complete denoising application is composed of three compression artifacts, which give poor results on heavy noise. steps (Figure 1). First, the video stream is stabilized using As far as we know, there is no real-time algorithm able to a global one-pass Lucas-Kanade approach [25] in order to handle heavily noisy video as in this work. compensate camera motion. Then, the dense optical flow is In this paper, we introduce a new Real-Time Embedded computed using the TV-L1 algorithm. The flow enables to Video Denoising (RTE-VD) algorithm targeting very noisy extrapolate denoised past frames at the current frame, and video in section II, which has been implemented for general match the two together. Finally, a 3D spatio-temporal bilateral purpose processors. We then compare our algorithm to the filter is applied [26]. Input Noisy STMKF RTE-VD VBM3D VBM4D Fig. 2. Visual comparison on pedestrians sequence with a standard deviation noise of 40 (PSNR in Table I). TABLE I PSNR RESULTS FOR SEVEN SEQUENCES OF THE Derf’s Test Media Collection ON STATE-OF-THE-ART ALGORITHMS AND OUR METHOD (RTE-VD). Noise Method crowd park joy pedestrians station sunflower touchdown tractor overall STMKF [15] 26.25 25.59 28.34 26.66 26.97 28.87 25.37 26.70 RTE-VD 26.38 25.65 30.58 30.98 32.51 30.17 29.38 28.73 σ = 20 VBM3D [12] 28.75 27.89 35.49 34.19 35.48 32.85 31.44 31.34 VBM4D [13] 28.43 27.11 35.91 35.00 35.97 32.73 31.65 31.11 STMKF [15] 20.80 20.75 20.70 20.41 20.70 20.86 19.80 20.56 RTE-VD 22.55 21.64 25.72 27.76 27.87 27.05 25.99 24.85 σ = 40 VBM3D [12] 24.81 23.78 30.65 30.62 30.88 30.21 27.82 27.43 VBM4D [13] 24.65 23.22 31.32 31.53 31.39 30.09 28.09 27.35 III. DENOISING EFFICIENCY COMPARISON VMB4D require offline computation. Processing times are studied in more details further in this article. In order to evaluate the denoising efficiency of RTE-VD, we reproduced the conditions of the comparison made in [27]. Some visual results are presented in Figure 2: we can see We ran our algorithm on seven sequences from the Derf’s Test that RTE-VD is a clear improvement compared to the noisy Media Collection [28]. The sequences are originally colored input. In most cases, VBM3D and VBM4D are visually better. 1080p video. We down-scaled them to 960×540 pixels in gray- Also STMKF is much less effective than the other considered scale by keeping only brightness. A Gaussian noise is added methods. to each pixel with a standard deviation of 20 and 40. Then, For the two noise deviation tested, the RTE-VD PSNR the algorithms are ran on the first hundred frames of each is overall less than 3 dB below the ones from VBM3D and sequence. VBM4D. RTE-VD PSNR is overall 2 dB above STMKF for However, we were not able to reproduce the same results a noise deviation of 20 and more than 4 dB above for a noise as in [27]: all the PSNR values that we have obtained are deviation of 40. It confirms that RTE-VD handles heavy noisy significantly lower for a given method. situations better than STMKF, as can be seen in Figure 2. We compare RTE-VD to VBM3D [12], VBM4D [13] and RTE-VD tends to trade noise reduction off for information STMFK [15]. VBM3D is one of the most popular video keeping. Figure 2 shows that RTE-VD is able to produce denoising method. It is also one of the fastest high perfor- more detailed results than STMKF on moving objects but mance denoising algorithm even if its computation time is still encounters difficulties on still objects. The loss of sharpness over a second for a 960×540 pixel frame on a Xeon server. in static scenes can be explained by a bad optical flow esti- VBM4D has a more powerful filtering than VBM3D but at a mation on non-moving objects. Error estimations are indeed higher computational cost. STMKF is a state-of-the-art real- proportionally more important on small movements and bad time video denoising method. For each algorithm, the source compensations may generate a little blur effect on non moving code provided on the authors website is used with the default objects. Filtering the estimated optical flow may reduce this parameters depending on the noise intensity.