Image Restoration Using Total Variation Regularized Deep Image Prior

IMAGE RESTORATION USING TOTAL VARIATION REGULARIZED DEEP IMAGE PRIOR Jiaming Liu1, Yu Sun2, Xiaojian Xu2, and Ulugbek S. Kamilov1;2 1 Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO 63130 2 Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130 ABSTRACT In the past decade, sparsity-driven regularization has led to signifi- cant improvements in image reconstruction. Traditional regularizers, such as total variation (TV), rely on analytical models of sparsity. However, increasingly the field is moving towards trainable models, inspired from deep learning. Deep image prior (DIP) is a recent regularization framework that uses a convolutional neural network (CNN) architecture without data-driven training. This paper extends the DIP framework by combining it with the traditional TV regularization. We show that the inclusion of TV leads to considerable performance gains when tested on several traditional restoration tasks Original Corrupt CBM3D 20.93dB such as image denoising and deblurring. Index Terms— Image reconstruction, image restoration, deep learning, deep image prior, total variation regularization. 1. INTRODUCTION Image reconstruction is one of the most widely studied problems in computational imaging. Since the problem is often ill-posed, the process is traditionally regularized by constraining the solutions to be consistent with our prior knowledge about the image. Some traditional imaging priors include nonnegativity, transform-domain spar- DIP 20.91dB DIP-TV 21.35dB sity, and self-similarity [1–4]. Recently, however, the attention in the field has been shifting towards new imaging formulations based on deep learning [5]. Fig. 1: Comparison of DIP-TV against several standard algorithms The most common deep-learning approach is based on an end- for image denoising. DIP-TV achieves the best SNR performances to-end training of a convolutional neural network (CNN) for repro- on Monarch with AWGN of σ = 65. The combination of the CNN ducing the desired image from its noisy measurements [6–10]. A and TV priors preserve homogeneity of the background as well as popular alternative considers training a CNN as an image denoiser the texture, highlighted by rectangles drawn inside the images. and using it within an iterative reconstruction algorithms [11–14]. However, recently, it was also shown that a CNN can by itself regularize image reconstruction without data-driven training [15]. This 2. BACKGROUND deep image prior (DIP) framework naturally regularizes reconstruction by optimizing the weights of a CNN for it to synthesize the Consider the restoration as a linear inverse problem measurements from a given random input vector. The intuition be- y =Hx+e; (1) hind DIP is that natural images can be well represented by CNNs, N which is not the case for the random noise and certain other image where the goal is to reconstruct an unknown image x2R from M M×N degradations. DIP was shown to achieve remarkable performance on the measurements y 2R . Here, H2R is a degradation ma- M a number of image reconstruction tasks [15, 16]. trix and e2R corresponds to the measurement noise, which is as- 2 In this paper, we propose to further improve DIP by combin- sumed to be additive white Gaussian (AWGN) of variance σ . ing an implicit CNN regularization with an explicit TV penalty. The As practical inverse problems are often ill-posed, it is common idea of our DIP-TV approach is simple: by including an additional to regularize the task by constraining the solution according some TV term into the objective function, we restrict the solutions synthe- prior knowledge. In practice, the reconstruction often relies on the sized by CNN to those that are piecewise smooth. We experimentally regularized least-squares formulation show that our DIP-TV method outperforms the traditional formula- ∗ 2 x =argmin ky −Hxk`2 +λρ(x) (2) tions of DIP and TV, and performs on a par with other state-of-the-art x image restoration methods such as BM3D [17] and IRCNN [12]. where the data-fidelity term ensures the consistency with measure- This material is based upon work supported by the National Science ments, and regularizer ρ constrains the solution to the desired image Foundation under Grant No. 1813910. class. The parameter λ>0 controls the strength of regularization. 978-1-5386-4658-8/18/$31.00 ©2019 IEEE 7715 ICASSP 2019 128 128 128 Skip Connections 128 128 128 128 128 128 Fig. 2: The set of 14 grayscale images used in experiments. Total variation (TV) is one of the most widely used image priors that promotes sparsity in image in image gradients [18]. It has been 128 128 Leaky ReLU BN u[i] shown to be effective in a number of applications [19–21]. The 3x3 conv. 1x1 conv. d[i] `1-based anisotropic TV is given by N X Fig. 3: CNN architecture [15] used in this paper. The architecture ρTV (x) , j[D1x]nj+j[D2x]nj; (3) is based on the well-known U-net with skip connections between i=1 the down layers and up layers. Two different kernel sizes are noted under each convolutional layer, and the number of filters is illustrated where D1 and D2 denote the finite difference operation along the first and second dimension of a two-dimensional (2D) image with above each block. The variable ns[i] denotes the number of feature appropriate boundary conditions. maps at ith skip layer, and the features in other layers correspond to Currently, deep learning achieves the state-of-the-art perfor- 128. mance for different image restoration problems [22–24]. The core idea is to train a CNN via the following optimization Optimization in (6) is similar to training of a CNN and one can rely ∗ Θ =argmin L(fΘ(y);x); (4) on any standard optimization algorithms. Θ Figure 3 illustrates the CNN architecture we used in this paper, where fΘ(·) represents the CNN parametrized by Θ. L denotes which was adapted from [15]. In particular, the popular U-net ar- the loss function. In practice, (4) can be effectively optimized using chitecture [26] is modified such that the skip connections contain the family of stochastic gradient descend (SGD) methods, such as a convolutional layer. The decoder uses a down-sampling and up- adaptive moment estimation (ADAM) [25]. sampling based scaling-expanding structure, which makes the effec- Recently, Ulyanov et al. [15] proposed to use CNN-based meth- tive receptive field of the network increase as the input goes deeper ods in an alternative way. They discovered that the architecture of into the network [27]. Besides, the skip connection enables the later deep CNN models is well-suited for representing natural images, but layers to reconstruct the feature maps with both local details and not random noise. With a random input vector, CNN can reproduce global texture. Here, the input z can be initialized with a fixed 3D the clear image without supervised training on a large dataset. In the tensor with 32 feature maps and of the same spatial size as x filled context of image restoration, the associated optimization for DIP can with uniform noise. The proposed framework can deal with both be formulated as grayscale and color images, where for color images anisotropic TV jointly regularizes all three channels. ∗ 2 Θ =argmin ky −HfΘ(z)k`2 ; Θ (5) ∗ such that x = fΘ∗ (z): 4. EXPERIMENTS N We now present the experimental results on image denoising and where z2R denotes the random input vector. The CNN generator is initialized with random variables Θ, and these variables are iter- deblurring. We consider 14 gray scale images and 8 standard color atively optimized so that the output of the network is as close to the images (256×256 and 512×512) from set12, set14, and BSD68 as target measurement as possible. our testing images. The gray scale images are shown in Figure 2, while color images are: Monarch, Parrots, House, Lena, Peppers, Baby, and Jet. 3. PROPOSED METHOD The goal of DIP-TV is to use the TV regularization to improve the 4.1. Image Denoising basic DIP approach. We first consider the optimization problem shown in (2) and the objective function of DIP in (5). One can In this subsection, we analyze the performance of DIP-TV method 2 for image denoising problems. The CNN architecture in Figure 3 find that the ky −HfΘ(z)k`2 term in (5) actually corresponds to the is used for both color and grayscale images, with ns[i]=4 for each data-fidelity term in (3) by replacing fΘ(z) with an unknown image output. Thus, we can consider replacing (5) with an optimization skip layers. All algorithmic hyperparameters were optimized in problem each experiment for the best signal-to-noise ratio (SNR) performance with respect to the ground truth test image. Both DIP-TV ∗ 2 and DIP were set to run 5000 optimization step. We use the average Θ =argmin ky −HfΘ(z)k`2 +λρTV(fΘ(z)) ; Θ (6) SNR to denote the SNR values averaged over the associated set of ∗ such that x = fΘ∗ (z): test images. 7716 Original Corrupt (SNR: 5.00dB) EPLL (SNR: 21.20dB) BM3D (SNR: 21.22dB) TV (SNR: 20.60dB) DIP (SNR: 20.51dB) DIP-TV (SNR: 21.57dB) Original Corrupt (SNR: 15.00dB) EPLL (SNR: 27.21dB) BM3D (SNR: 27.26dB) TV (SNR: 25.49dB) DIP (SNR: 26.89dB) DIP-TV (SNR: 27.33dB) Baselines TV / DIP / DIP-TV Fig. 4: Image denoising results on Tower and Jet obtained by EPLL, BM3D, TV-FISTA, DIP, and DIP-TV. The first and second columns display the original images and corrupted images, respectively.

Load more