Regularization and Applications of a Network Structure Deep Image Prior

Regularization and Applications of a Network Structure Deep Image Prior Timothy Anderson Department of Electrical Engineering Stanford University Stanford, CA [email protected] Abstract prior Γ(·). Image priors are either applied directly in the optimization process (if they are differentiable), or used in Finding a robust image prior is one of the fundamental an iterative process where first the prior is applied, then the challenges in image recovery problems. Many priors are energy function is minimized. based on the statistics of the noise source or assumed fea- Classical image priors are typically based off some tures (e.g. sparse gradients) of the image. More recently, existing or assumed structure of the image. For ex- priors based on convolutional neural networks have gained ample, the isotropic total variation (TV) prior Γ(x) = p 2 2 increased attention, due to the availability of training data jDxxj + jDyxj is based off the assumption that nat- and flexibility of a neural network-based prior. ural images contain sparse gradients. The non-local means Here, we present results for an entirely novel neural- (NLM) prior Γ(x) = NLM(x; σ) is similarly based off the network based image prior that was introduced in [18], assumption that natural images have self-similarity [3]. known as a network structure deep image prior. We test the effect of regularization on convergence of the network structure prior, and apply this prior to problems in deconvo- 2. Background lution, denoising, and single-pixel camera image recovery. Deep neural networks have exhibited stunning success Our results show that regularization does not improve in a variety of computer vision and image processing tasks the convergence properties of the network. The perfor- [12, 16, 10], and provide a general framework for learning mance for improving deconvolution results and single-pixel function approximators from data. Consequently, there has camera image recovery are also poor. However, the results been much interest recently in applying neural network ar- for denoising are comparable to baseline methods, achiev- chitectures to image recovery and restoration tasks. ing a strong PSNR for several test images. We believe with Several methods apply neural networks to directly find further work, this method can be readily applied to similar the latent image. That is, these methods train a neural net- computational imaging problems such as inpainting and de- work to directly solve the inverse problem in eqn. 1. For mosaicking. example, SRCNN from [9] showed state of the art super resolution results using a convolutional neural network (CNN) to super-resolve an input image. [14] similarly introduced 1. Introduction the SRResNet and SRGAN architectures to perform super- Finding an effective prior is one of the most fundamental resolution. [4] introduced a blind deconvolution method challenges in computational imaging. The general image which uses a neural network to predict the motion blur denoising or restoration problem is of the form: kernel. [19] develops an iterative non-blind deconvolution method which alternately deblurs an image using quadratic min E(b; x) + λΓ(x) (1) optimization and denoises using a CNN. x Beyond using the network to directly perform image where E(·) is an energy function dependent on the observed recovery or restoration, much recent work has also cen- image b and reconstructed image x, λ is a hyperparameter, tered on using neural networks as priors for image recov- and Γ(·) is a prior used to regularize the reconstructed im- ery and restoration. Image priors bifurcate into two dis- age. In a typical image reconstruction problem, the energy tinct classes: those learned from data, and “data-free” pri- function will be almost entirely task-dependent [18], mean- ors based only on a single input image. The vast majority ing the quality of the reconstruction depends heavily on the of neural-network priors fall into the former class, where a prior function is learned from multiple examples. The most direct approach to using a CNN-based prior is to substitute a trained CNN for the prior term Γ(x) in eqn. 1. This is the approach taken by [13], which integrates FFT- based deconvolution with a CNN-based image prior. [8] presents a more general framework for integrating a CNN prior with off-the-shelf optimization algorithms. A popular optimization technique for solving the split- (a) Ground truth (b) Noisy image (c) Denoised objective optimization problem in eqn. 1 is the Alternat- Figure 1: Denoising using a NS deep image prior. ing Direction Method of Multipliers (ADMM) [2]. ADMM is an attractive framework for imaging inverse problems because there often exist computationally-efficient closed- mizes over the space of natural images realizable by the form solutions for the proximal operators [1]; several neural neural network. To apply the NS prior to an image, the network-based image priors have been developed with the above optimization is run until some stopping procedure is goal of preserving the optimization structure provided by met ([18] uses early stopping in all of their experiments). ADMM, and therefore the acceleration provided by these An example of denoising with this method is shown in fig. methods. [5] takes a similar approach to [19], but embeds 3. their method in an ADMM optimization framework. In- The NS prior is based on the observation that a neural stead of learning the prior itself, [15] attempts to directly network will gravitate towards natural images more quickly learn the proximal operators for ADMM-based deconvolu- than unnatural (e.g. noisy) images. If we realize the pre- tion. dicted image x as the output of a neural network, then the While CNN-based image prior methods have been suc- output of the network will first produce natural-looking im- cessful, these methods often require extensive training ages, and in this sense acts as an implicit image prior. The and/or large amounts of data to achieve strong performance. primary advantage of the NS prior over many other neural Methods not based on neural-networks (e.g. BM3D and network-based methods is that it does not require any train- CBM3D from [6]) have also been successful, but rarely ing data, and instead relies on the network structure to act surpass neural network-based methods in generality or per- as a prior. As shown in [18], the network structure greatly formance. Consequently, there exists a need for a method affects the results, depending on the task and the network with the speed and generality of neural network methods, structure used. but does not require large amounts of training data. 4. Regularization 3. Network Structure Deep Image Prior 4.1. Regularization Approaches To overcome the difficulties of learned and data-free priors, [18] introduced an entirely novel type of image prior In order to improve network generalization, regulariza- based on the structure of a neural network. The basic opti- tion techniques are often applied to the network, either dur- mization problem for a neural network is: ing the training process or in the form of a penalty in the optimization process. Here, we explore three common reg- min `(fθ(x); y) ularization methods for neural networks. θ Dropout [17] is a training technique where features are where `(·) is the loss function, θ the vector of network pa- passed to the next network layer with some probability rameters, x the input image, fθ(x) the output, and y the pkeep i.e. desired output of the network. xi+1 = mi ◦ xi In the approach used by [18], they train a neural net- where x is the feature map at the ith network layer and work to recreate a corrupted/distorted input image, and in i m ∼ Bernoulli p is a Bernoulli random vector. doing so are able to remove the noise or artifacts present i ( keep) Randomly passing feature maps to the next layer effectively in the image. Let z be a random input vector ([18] uses applies a stochastic relaxation to the optimization process z ∼ U(0; 1=10) i.i.d.), f (z) be the neural network output, θ of the network, thereby allowing the network to find a more and x the image we would like to reconstruct. Then, for 0 general local minimum or saddle point. fixed z, they solve: L1 and L2 regularization differ from dropout in that they min `(fθ(z); x0) impose a penalty on the objective, so the optimization ob- θ jective becomes: From hereon, we will refer to this type of prior as a net- min `(fθ(x); y) + λR(θ) work structure (NS) prior. This setup effectively mini- θ 1 1 10− 10− 2 2 10− 10− MSE MSE 3 Image 3 Image 10− 10− Image + Noise Image + Noise Noise Noise 4 4 10− 10− 100 101 102 103 104 100 101 102 103 104 Iteration Iteration 30 30 20 20 PSNR PSNR 10 10 100 101 102 103 104 100 101 102 103 104 Iteration Iteration (a) No regularization (b) L2 regularization 1 1 10− 10− 2 2 10− 10− MSE MSE 3 Image 3 Image 10− 10− Image + Noise Image + Noise Noise Noise 4 4 10− 10− 100 101 102 103 104 100 101 102 103 104 Iteration Iteration 30 30 20 20 PSNR PSNR 10 10 100 101 102 103 104 100 101 102 103 104 Iteration Iteration (c) L1 regularization (d) Dropout Figure 2: Regularization results. We observe that adding regularization does not improve convergence. All three types of regularization also decreases the PSNR for all three cases. where R(θ) is the regularization function and λ the asso- 4.2. Convergence Issues ciated regularization parameter. For L regularization, the 1 The primary drawback of the approach in [18] is that regularization function has the form: early stopping during the training process must be em- X ployed to achieve strong results.

Regularization and Applications of a Network Structure Deep Image Prior

NTGAN: Learning Blind Image Denoising Without Clean Reference

High-Quality Self-Supervised Deep Image Denoising

Learning Deep Image Priors for Blind Image Denoising

Layered Neural Rendering for Retiming People in Video

Deep Image Prior for Undersampling High-Speed Photoacoustic Microscopy

Image Deconvolution with Deep Image Prior: Final Report

PROCEEDINGS of the ICA CONGRESS (Onl the ICA PROCEEDINGS OF

Controlling Neural Networks Via Energy Dissipation

2007 Registration Document

Denoising and Regularization Via Exploiting the Structural Bias of Convolutional Generators

Deep Geometric Prior for Surface Reconstruction

Deep Learning at the Edge for Channel Estimation in Beyond-5G Massive MIMO Mauro Belgiovine, Kunal Sankhe, Carlos Bocanegra, Debashri Roy, and Kaushik R