Image deconvolution with deep image prior: Final Report

Abstract tion kernel is given (Sezan & Tekalp, 1990), i.e. recovering Image deconvolution has always been a hard in- X with knowing K in Equation 1. This problem is ill-posed, verse problem because of its mathematically ill- because simply applying the inverse of the convolution op- −1 posed property (Chan & Wong, 1998). A recent eration on degraded image B with kernel K, i.e. B∗ K, −1 work (Ulyanov et al., 2018) proposed deep im- gives an inverted term E∗ K, which dominates the age prior (DIP) which uses neural net structures solution (Hansen et al., 2006). to represent image prior information. This work Blind deconvolution: In reality, we can hardly obtain the mainly focuses on the task of image deconvolu- detailed kernel information, in which case the deconvolu- tion, using DIP to express image prior informa- tion problem is formulated in a blind setting (Kundur & tion and ref ne the domain of its objectives. It Hatzinakos, 1996). More concisely, blind deconvolution is proposes new energy functions for kernel-known to recover X without knowing K. This task is much more and blind deconvolution respectively in terms of challenging than it is under non-blind settings, because the DIP, and uses hourglass ConvNet (Newell et al., observed information becomes less and the domains of the 2016) as the DIP structures to represent sharpness variables become larger (Chan & Wong, 1998). or other higher level priors of natural images, as well as the prior in degradation kernels. From the In image deconvolution, prior information on unknown results on 6 standard test images, we f rst prove images and kernels (in blind settings) can signif cantly im- that DIP with ConvNet structure is strongly ca- prove the deconvolved results. A traditional representation pable of capturing natural image prior in image for such prior information is handcrafted regularisers in im- deconvolution. Then we conclude that our DIP age energy minimisation (Fornasier et al., 2010), e.g. total models signif cantly improve deconvolution per- variation (TV) regularisation for image sharpness (Chan & 1 formance compared with the baselines in both Wong, 1998) and L regularisation for kernel sparsity (Shan kernel-known (Beck & Teboulle, 2009) and blind et al., 2008; Wang, 2017). However, prior representations (Wang, 2017) settings, in terms of either PSNR like the above-mentioned regularisers have limited ability metric or deconvolution visual ef ects. of expressiveness (Majumdar & Ward, 2009). Therefore, this work aims to f nd better prior representations of images and kernels to improve deconvolution performances. 1. Introduction Deep image prior (DIP) (Ulyanov et al., 2018) is a recently Image degradation is a realistic and hard problem because proposed image prior representation. The main idea of DIP it tends to make images lose the key information it orig- is to substitute the image variable in energy minimisation inally had. Recovering from degraded images helps peo- by the output of a deep convolutional neural net (ConvNet) ple obtain more pictorial information from their origins with random noise input, so that the image prior informa- (Basavaprasad & Ravi, 2014). One of the many ways to tion is captured by the hyperparameter of the ConvNet, and model the processes of image degradation is convolution the output image is determined by the parameter of the Con- with translational invariance property (Xu et al., 2014) vNet. More specif cally, DIP-based image deconvolution can be described as follows. First, f nd a hyperparameter B=X∗K+E (1) setting for the ConvNet which is capable of expressing im- where X∈R d×m×n is the original image, K∈R h×w is age prior implicitly. Then, search for the optimal parameter the convolution kernel, E∈R d×m×nis the additive noise, on the ConvNet that minimises the energy. Compared to B∈R d×m×nis the degraded image, and d denotes the num- handcrafted regularisers, DIP expresses prior information ber of channels in the images (1 for greyscale images and by its network structure instead of terms in minimisation 3 for color images). Image deconvolution is the process objectives, and explores the solutions in the ConvNet’s pa- of recovering the original image X from the observed de- rameter space instead of the image space. One point to graded image B, i.e. the inverse process of convolutional emphasise here is that the prior information expressed by image degradation. This work focuses on the task of image the both is embodied in their own formulations and struc- deconvolution in two dif erent settings, kernel-known and tures, and do not require a large amount of data for training. kernel-unknown (a.k.a. blind deconvolution). Deep neural architecture has a strong capability to accom- Kernel-known: The preliminary stage of image deconvolu- modate and express information because of its intricate and tion research mainly considers the case where the convolu- f exible structure (Szegedy et al., 2013). Compared to other MLP Coursework 4 – Final Report (G045)

served image B. The generation processes can be illustrated Convolution Add noise by the diagram in Figure1. The noise matrix E is indepen-

+ dently and identically distributed by Gaussian with respect * to each entry, and the noise strength (i.e. standard devi-

Original Image Observed Image ation) σ is fixed to 0.01 in order to reduce experimental variables. To explore different kinds of degradation models, Kernel three common kernels for different kinds of degradation, Noise Gaussian kernel (Hummel et al., 1987), defocus (Hansen Figure 1: The generation processes of observed images. et al., 2006) and motion blur (Yitzhaky & Kopeika, 1997) For each process, we first convolve the original standard are used to generate the data set. image by a given kernel, then add a noise term to the con- Gaussian: The kernel for degradation caused by atmo- volved image. spheric turbulence can be described as a two-dimensional Gaussian function (Jain, 1989; Roggemann et al., 2018), and the entries of the unscaled kernel are given by the for- mula (Hansen et al., 2006) 2 2 1 i c1 1 j c2 256 256 256 256 512 512 512 512 256 256 512 512 K , = exp − − × × × × × × i j −2 s − −2 s  1 ! 2 !  Figure 2: Standard test image data set experimented in   where (c , c ) is the center of K, and (s , s ) determines the (zoomed out), containing 4 greyscale and 2 color 1 2  1 2  our work width of the kernel (i.e. standard deviation of the Gaussian images, named cameraman (abbr. C.man), house, Lena, function). In this work, s and s are set to s = s = 2.0. boat, house.c, peppers respectively from left to right. 1 2 1 2 The original resolution is marked below each image. Defocus: Out-of-focus is another common issue in optical imaging. Knowledge of the physical process that causes out-of-focus provides an explicit formulation of the kernel image prior representations with limited structure (e.g. reg- (Hansen et al., 2006) ularisers), neural nets with such powerful expressiveness 2 2 2 2 1/(πr ) if (i c1) + ( j c2) r , seem more capable of capturing high-level prior informa- Ki, j = − − ≤ tion of natural images and achieving remarkable results in 0 otherwise.  image deconvolution. Therefore, this work aims to analyse  where r denotes the radius of the kernel, which is set to the expressive power of DIP on natural images, and use  r = min (h/2, w/2) in this work. DIP to improve the performance of image deconvolution b c compared to regularisation-based priors. To summarise Motion blur: This usually happens if an image being with respect to contribution, this work proposes the gen- recorded changes in a single exposure when taking a photo- eral energy function of DIP-based image deconvolution graph. For example, when taking a picture, moving objects and constructs its optimisation pipeline (see Section3), being taken at high speed or lens shake will blur the picture. which achieves considerably better performance than other Regardless of the noise, the convolution process of motion learning-free baselines in different degradation kernels (see blur with amplitude u and shifting angle α is given by the Section4), especially in motion blur (see Figure9). formula (Kalalembang et al., 2009) 1 u B , = X + α, + α 2. Data set and evaluation metrics i j 2u + 1 i k cos j k sin k= u X− As discussed in section1, capturing image prior by either in which the shape of the kernel is a line segment as Figure regularisation terms or deep neural net structures is learning- 3c shows. In this work, the blur amplitude and shifting free. Therefore, data set explored in this work will only be angle are set as u = √2 min (h/2, w/2) and α = 3π/4. used for testing. Experiments and performance evaluation · b c are conducted on a data set with 6 standard test images All the kernels adopted in data generation processes and experiments in this work are in shape 9 9 (i.e. h = w = 9) shown in Figure2. It contains 4 grey-scale images and 2 × color images. Those images have also been experimented to with center (4, 4) (i.e. c1 = c2 = 4), and scaled such that test performances in many related works, such as denoising elements in each kernel sum to 1 (Hansen et al., 2006). (Dabov et al., 2007), TV deblurring (Beck & Teboulle, Figure3 gives visualisation examples of the 3 different 2009), etc., which to some extent guarantees the consistency kernels adopted with given settings mentioned above. and reliability of experimental results. 2.2. Evaluation metrics 2.1. Observed data generation and kernels We use the Mean Square Error (MSE) between the degraded objective image B = X K and the observed image B To preprocess the image data and obtain degraded observa- obj ∗ tions, we use the degradation model formulated as Equation 1 2 MSE B , B = B B 1 to transfer the original standard test image Xstd to the ob- obj d m n obj − 2   · ·

MLP Coursework 4 – Final Report (G045)

super-resolution (Gerchberg, 1974) and image denoising (Rudin et al., 1992). Methods adopted in this work are all based on the deconvolution energy model and its mutants.

3.1. Baseline models with regularisation prior (a) Gaussian (b) Defocus (c) Motion blur The gradient magnitude of a two-dimensional function Figure 3: Visualisation examples of the 3 kernels. x (s, t) is defined and formulated as the following new two- dimensional function (Gonzalez & Wintz, 1977) to measure the energy function (Ulyanov et al., 2018), and ∂x 2 ∂x 2 to track parameter learning iterations in the first experiment x (s, t) = x (s, t) 2 = + , |∇ | k∇ k s ∂s ∂t in non-blind setting. Using this metric, the energy minimi- ! ! sation is to find the image X that, when degraded, is the the discrete formulation of which for an image X is given same as the observation B. by the following matrix To measure the quality of image deconvolution quantita- 2 2 X = XD> + D ,mX tively, we use the Peak Signal to Noise Ratio (PSNR) (in |∇ | 1,n 1 dB) (Huynh-Thu & Ghanbari, 2008) between the obtained r where the square and the square root calculations are entry- objective image X and the standard test image Xstd wise, and D1,n is the discrete partial derivative operator R2 PSNR (X, Xstd) = 10 log 1 1 10 MSE (X, X ) − " std # .. .. = . . where R is the maximum possible pixel value of the image, D1,n   .  1 1 e.g. R = 1 if images in double-precision floating-point data  −   0  type, R = 255 if in 8-bit data type. In this work, we use  n n   × double-precision floating-point data type, i.e. R = 1.   In image processing, discrete gradient magnitudes are In experiment described in 4.3, we compare the gradient proven to be a strong prior to natural images (Shan et al., distributions among model output images and standard test 2008; Hansen et al., 2006). The sum of such magnitudes images. To measure the similarity between a gradient fre- in a single image is a regularisation representation of the quency distribution Pr ( ) and the one by standard test im- image prior, i.e. the total variation norm of the image · ages Prstd ( ), we use the Kullback-Leibler (KL) divergence m n · 2 2 (Kullback, 1997) X B XD> + D ,mX . k kTV 1,n i, j 1 i, j i=1 j=1 r Prstd (b) X X D (Pr Pr ) = Pr (b) log    KL k std − Pr (b) The efficiency of TV image norm has been proven for re- b X∈ B covering blocky images (Dobson & Santosa, 1996) and where b denotes a bin corresponding to a range of gradient images with sharp edges (Chan et al., 2000). values, denotes the whole bin set covering all possible B 1 gradient values. From the definition, the larger the KL It is also known that L norm is capable of expressing the divergence, the smaller the similarity between the two dis- sparsity of matrices (Friedman et al., 2001), defined as tributions, and vice versa. m n X = Xi, j . k k1 Xi=1 Xj=1 3. Methodology In most instances, degradation convolution kernels are According to section1, both regularisation-based prior and sparse (Shan et al., 2008). Thus L1 sparsity regularisation deep image prior are embedded in energy minimisation is a strong prior to convolution kernels in blind settings. models, which, in general, are formulated as (Fornasier et al., 2010) The baseline models in this work are energy minimisation 1 min E (X; B) + R (X) (2) with TV and L regularisation priors, of which the details X in the two main settings are as follows. where E (X; B) indicates the energy term associated with Kernel-known: The baseline model with K known is for- the data, and R (X) is the prior term. A general expla- mulated as the following energy minimisation model with nation of the energy term is the numerical difference be- TV regularisation prior tween the given image data and the optimised image pro- cessed by given degradation. For image deconvolution, the min MSE (X K, B) + α X TV (3) given degradation is convolution, therefore the energy is X ∗ k k designed as E (X; B) = MSE (X K, B). The energy term where α is the TV regularisation parameter. To solve the TV ∗ E (X; B) can also be designed for other specific tasks in regularisation system efficiently, we adopt a fast gradient- image restoration, such as inpainting (Shen et al., 2003), based algorithm named MFISTA (Beck & Teboulle, 2009), MLP Coursework 4 – Final Report (G045)

ConvNet output rewritten with the input space eliminated ! ! = #(%) ! ∗ # Observation !

ConvNet Conv with parameter f : Θ , θ X θ 7→ X → which maps only a selection of parameters θ on the network, !′ ConvNet with energy function: !"#(% ∗ ', )) to an output image X. In the rest of the report, f (θ) denotes parameter run SGD on η θ η output image by deep image prior f with weight θ. !observed if non-blind ! = #(%) if blind 3.2.1. Deconvolution energy functions with DIP Figure 4: The overall pipeline of our DIP deconvolution model, corresponding to Equation6 and7. Traditional energy minimisation (formulated as Equation2) for image deconvolution explores the whole image space as the domain. By re-parameterising the image term X into which has performed remarkable time-efficiency and con- the neural net output f (θ), the solution space contains the vergence property in TV regularisation. prior information expressed by the structure of f , instead of the prior term R (X). Thereby with deep image prior the The baseline system in blind setting Blind deconvolution: general energy model by Equation2 turns into introduces a new sparsity prior compared to the non-blind baseline above, which is formulated as min E( f (θ) ; B). (5) θ

min MSE (X K, B) + α X TV + β K 1 (4) By optimising network weights θ on a ideal structure, an X,K ∗ k k k k image is optimised conditioned on the desired prior. 1 1 where β is L regularisation parameter. This TV-L double- Kernel-known image deconvolution objective with deep prior system can be solved using TNIP-MFISTA algorithm image prior is derived directly from Equation5, by applying proposed in (Wang, 2017). To optimise both the image the deconvolution energy function and the kernel, this algorithm adopts fix-update iterations between MFISTA and an L1 regularisation optimisation min MSE ( f (θ) K, B) (6) θ ∗ algorithm named Truncated Newton Interior Point method (TNIP)(Kim et al., 2007). where K is the observed kernel. The minimiser θ∗ is ob- tained by Adam optimiser (Kingma & Ba, 2014) with ran- 3.2. Deconvolution with deep image prior dom initialisation. Deep image prior (Ulyanov et al., 2018) aims to capture Blind deconvolution: In blind settings, the convolution the prior of images by the structure of a generative deep kernel K is assumed to be unobservable. Thereby the kernel neural net. It re-parameterises the image X as the neural is parameterised by another deep neural net structure g η net output X = f (z; θ), defined as the following surjection containing prior information regarding degradation kernels. After parameterisation on kernel matrix in Equation6, the  ConvNet blind deconvolution objective with deep image prior is f ∗ : supp p Θ , (z, θ) X × 7−−−−−−→X → formulated as the following system

p 1 min MSE f (θ) g η , B (7) where supp denotes the support of the input noise proba- θ, η ∗ bility density function p, Θ denotes the weight space deter-   mined by the network structure, and is the solution space where f and g have different ConvNet structures since the X of X, containing the prior information. The neural net f ∗ prior information of natural images and kernels are appar- maps the random noise network input z and the network ently different. To obtain the minimisers θ∗ and η∗, we use weights θ to the output X. Ideally, by adjusting the network Adam to update the two variables simultaneously. structure to its optimum, the solution space only contains X Figure4 gives a diagram summarising our DIP deconvolu- images on desired prior information. tion models in both settings. Hyperparameter settings for From the perspective of mechanics, the desired prior is both f and g are explained in detail in section4. expressed by the network structure, and the weights θ ex- plores solutions on the prior. The random input noise z 4. Experiments is a high-dimensional Gaussian vector. The main reason to take a random noise as the network input is to increase To explore to what extent DIP can capture prior information the robustness (Morales et al., 2007) to overcome degener- of natural images in deconvolution models, we acy issues. On the other hand, high-dimensional Gaussian 1. compare the energy convergence property during DIP vectors are essentially concentrated uniformly in a sphere deconvolution optimisation between natural images (Johnstone, 2006). Therefore the input space supp p can be and noise images. approximated as a single point, and the surjection can be 2. compare the gradient distributions among standard 1supp p = z Ω p (z) , 0 (Royden & Fitzpatrick, 1988), test images and images deconvolved by both baseline { ∈ | } where Ω is the sample space of noise vector z. model and DIP. MLP Coursework 4 – Final Report (G045)

Encoder Decoder 2009). In blind setting, the regularisation parameters are sN set to α = 2 10 3 and β = 5 as the same in (Wang, 2017), × − dN uN among the experiments of which such setting achieved the ...... best results. . . s2

d u2 ConvNet architecture as DIP: As suggested for super- 2 s1 resolution task setting in (Ulyanov et al., 2018), we use d1 u1 hourglass architecture (shown in Figure5) as the main body Input Noise Z Output f(Z) of both images and kernels (if blind), whose hyperparameter settings are shown below

di si

nd [i] nd [i] DIP for images: DIP for kernels (if blind): 32 m n 32 w h z R × × (0, 0.1); z R × × (0, 0.1); ns[i] ks[i] ∈ ∼ U ∈ ∼ U n = n = [128] 5; n = n = [128] 5; u d × u d × ku = kd = [3, 3, 3, 3, 3]; ku = kd = [3, 3, 3, 3, 3]; kd [i] kd [i] Label ui ns = [4, 4, 4, 4, 4]; ns = [4, 4, 4, 4, 4];

n [i] u nu[i] ConV ks = [1, 1, 1, 1, 1]; ks = [1, 1, 1, 1, 1];

Batch Normalization upsample stride size 2; upsample stride size 1; LeakyReLU Sigmoid to output. Softmax to output. Upsample

Downsample 1 ku[i] We put Sigmoid and Softmax on ConvNet outputs for im- ages and kernels respectively, because image pixels range Figure 5: The hourglass architecture used in our exper- from [0, 1] and kernel pixels sum to 1. The reason for set- iments as the main part of DIP structure. Above is the ting upsample stride size to 1 for kernel generation is to prevent degeneration due to their small size (9 9). It is high-level encoder-decoder network with skip connections × (Mao et al., 2016). The details inside each downsample worth mentioning that we apply add-noise regularisation to connection di, upsample connection ui and skip connection the neural network, i.e. we disturb the noise input z with an additive z z + ∆ z at the beginning si are shown below the high-level structure, where nu[i], ← nd[i], ns[i] denote the number of filters in the downsample, of each iteration. This technique aims to increase model upsample and skip connection at depth i respectively, and robustness to perturbation (Morales et al., 2007). Although ku[i], kd[i], ks[i] are the corresponding kernel sizes. this regularisation has a negative impact on the optimisa- tion process, we find that the network can still converge energy function to 0 with a sufficient number of iterations This part of our experiment aims to evaluate DIP’s expres- and improve deconvolution performance. siveness on natural images, therefore it is only conducted in kernel-known setting. 4.2. Bias in convergence The second part of our experiment aims to find out whether Even though the complex structure of the neural network in our proposed DIP-based deconvolution models improve the a DIP model allows the solution space to have a variety of performance of image deconvolution in both kernel-known features regarding natural images, it is still possible for the and blind settings, compared with the baseline models. In DIP model to express interference information other than our results, PSNR comparison is conducted for quantitative natural images (Szegedy et al., 2013), e.g. noise. Therefore, analysis on deconvolution performance, and qualitative we introduce noise into our experiments, using our DIP analysis is based on the deconvolved images presented. kernel-known model on natural images (incl. greyscale and color images) and noise respectively. By comparing In the following, we first specify our experiment setup, the convergence property of the energy functions on the including regularisation parameters in baseline models and two in the optimisation process, we can know whether ConvNet hyperparameters for both images and kernels in our model can block such interference information in its DIP models. Then we show the achieved results and discuss solution space. them with our research objectives. In our control experiment, we decide to use Gaussian white 4.1. Experiment Setup noise and uniform noise, generated from Gaussian (0, 1) N and uniform distribution (0, 1). Figure6 shows the opti- U Convolution: All convolution processes in this work, in- misation curves of energy values with respect to iteration cluding data generation and energy function calculations, times in DIP kernel-known deconvolution, where each plot reflexive are subject to boundary condition (Hansen et al., corresponds to each degradation kernel. In spite of the 2006). Specifically, for convolution on color images, all Gaussian kernel, energy value convergence shows obvious same channels share the kernel (Hansen et al., 2006). differences between natural images and noise in DIP de- Baseline: In kernel-known setting, the TV regularisation convolution with defocus and motion blur kernels. More parameter α is set to 2 10 2, within a reasonable range specifically, we observe that curves by the noise are clearly × − for image deconvolution according to (Beck & Teboulle, above those by natural images, and sudden leaps take place MLP Coursework 4 – Final Report (G045)

greyscale color noise ~ N(0, 1) noise ~ U(0, 1)

Gaussian Defocus Motion blur

10 2

3

Energy 10

10 4 0 2000 4000 0 2000 4000 0 2000 4000 Num. of iter. Num. of iter. Num. of iter.

Figure 6: Optimisation curves of different types of images/noise for kernel-known DIP deconvolution with the 3 kernels respectively (to iteration 5000).

Gaussian (zoomed in) 0 3 0 standard test images 2 × 10 reg prior -0.5 -2 DIP prior

-4 -1

-6 -1.5 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 10 3 -8

-6.6

-10 -6.8 Energy

-7 logarithm probability greyscale -12 -7.2 -7.4

-7.6 4 color -14 6 × 10 -7.8 noise ~ N(0, 1) -8 -16 -8.2 noise ~ U(0, 1) -8.4 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 102 2 × 102 image gradient component Num. of iter. Figure 8: Logarithm probability distributions of image Figure 7: Zoomed-in optimisation curve of Gaussian case gradient components. originally shown in Figure6 (from iteration 80 to 200). following 2-D vector (Di Zenzo, 1986)

∂i, j X = Xi+ , j Xi, j, Xi, j+ Xi, j for energy values by noise in both plots. We speculate, 1 − 1 − the cause of this observation is that, the ConvNet struc- where each componenth is a gradient value. Ini this experi- ture in DIP is unstable to parameter fluctuations for noise ment, we calculate the image gradient value distributions in generation, which also explains how DIP deconvolution 3 image sets, standard test images, images deconvolved by blocks noise information. For Gaussian kernel, although baseline kernel-known model and images deconvolved by in Figure6 we cannot see a wild di fference between noise DIP kernel-known model respectively. The estimated prob- and natural images, in Figure7 we can still observe that ability distribution from frequency for each set is denoted the energy value by the uniform noise converges slower by Prˆ ( ), Prˆ ( ) and Prˆ ( ), where Prˆ ( ) is assumed std · reg · dip · std · than that by natural images in early iterations, which also to be the “standard” distribution. Therefore between the indicates that DIP model blocks uniform noise in Gaussian distributions by the 2 model-generated image sets, the one degraded deconvolution. with greater similarity to the “standard” distribution is more The DIP deconvolution in the control experiments with in line with the natural prior. noise indeed shows biases to natural images from the per- Since the values of image gradients are continuous because spective of energy function convergence, which means in of their double-precision floating-point data type, we split most cases, DIP is capable of blocking interference and the range of gradient values [ 1, 1] into 64 disjoint bins and − irrelevant information in image deconvolution. count the number of gradient values that fall in each bin as the frequency. Figure8 plots the logarithm probability 4.3. Image gradient distributions distribution for each image set. Since the plot is in log scale, we can infer that all the three distributions have the Previous image statistics studies (Weiss & Freeman, 2007; previously mentioned heavy-tailed property, and their log- Roth & Black, 2005) have shown that natural image gradi- probability curves are similar in shape to each other. The ents follow heavy-tailed distributions, which provide nat- peak close-up in the distribution shows a decreasing order ural prior information for natural images. Starting from of baseline-DIP-standard in terms of log-probability, the this, we consider evaluating the gradient distributions of gradient values in which lie around 0. This shows that the our model-generated images with a “standard” distribution density of the baseline and DIP model where the gradient which can be assumed as the natural prior. values are close to 0 is larger than the standard images, The gradient of image X at pixel (i, j) is defined as the and further speaking, the DIP model performs closer to MLP Coursework 4 – Final Report (G045)

C.man house Lena boat house.c peppers avg. reg 24.108 29.541 29.663 26.353 27.842 28.550 27.676 Gaussian DIP 25.093 30.745 30.705 27.436 29.021 28.827 28.638 reg 23.841 29.053 29.164 25.874 27.488 28.210 27.272 Defocus DIP 25.688 30.473 30.355 27.480 29.594 29.089 28.780 reg 6.9216 .1425 .2516 .268 6.1725 .697 6.075 Motion blur DIP 27.089 31.566 31.801 28.435 30.007 29.661 29.760

Table 1: Kernel-known deconvolution PSNR comparison between baseline (denoted by reg above) and deep image prior.

C.man house Lena boat house.c peppers avg. reg 19.553 14.214 29.798 26.323 14.662 24.790 21.557 Gaussian DIP 23.230 27.748 26.094 24.977 27.122 21.347 25.086 reg 18.845 13.519 27.435 24.035 13.849 24.782 20.411 Defocus DIP 23.021 23.094 26.286 25.154 24.462 28.229 25.041 reg 16.835 12.865 25.304 22.625 15.295 22.207 19.189 Motion blur DIP 23.935 24.382 26.156 25.039 22.862 26.152 24.754

Table 2: Blind deconvolution PSNR comparison between baseline (denoted by reg above) and deep image prior. the standard than the baseline in this range. However, the overfits the gradient prior on the motion deblur, so that the close-up in the middle of peak and tail gives an order of non-edge regions of the image tend to be in the same pixel standard-baseline-DIP, which indicates the exact opposite value (as Figure 9b shows). When the kernel type is set to the above peak-range results. The results above are in as Gaussian or defocus, the performance is improved by expectation because the TV regulariser in the baseline tends around 1.2 0.3 in terms of PSNR as we expect. In blind ± to reduce image gradient values due to the property of TV setting, DIP improves the PSNR performance by around norm (Chan et al., 2005) and thereby gives high frequency 5.0 0.5, which is significantly beyond the performance of ± where gradients are close to 0, and low frequency outside of the baseline. However, baseline gives higher PSNR values peak range, which also illustrates DIP’s better performance than the deep image prior for a few pictures and kernel in high frequency gradients. types, such as Lena degraded by Gaussian or defocus. A possible reason is that the gradient values in Lena are rela- Overall, the KL divergence between the DIP-generated im- tively small, so that TV regularisation gives better results age gradient distribution and that of standard test images on this specific image. DKL Prˆ dip Prˆ std = 0.9537, while DKL Prˆ reg Prˆ std = 1.2598, which indicates that DIP has a greater similarity to Figure 10 visualises the comparison between images     the “standard” than the baseline in terms of image gradient restored from Gaussian degraded Lena and defocused distributions. This result is foreseeable because although house.c in kernel-known setting. From the pictures and the baseline model performs closer to the standard than DIP the close-ups of their specific areas, we can see that DIP in the middle range, DIP performs closer to the standard in performs better in detail recovery. For example, the hair the peak range with much higher frequency. shown in Figure 10b has only a clear outline, while the de- tails shown in Figure 10c are more abundant as well as the 4.4. Performance on deconvolution trees shown in Figure 10e and Figure 10f respectively. One possible explanation is that TV regulariser over-optimises To obtain whether DIP deconvolution models beat the base- the sharpness of images, resulting in great performance lines, we run our baselines and DIP deconvolution models only in outlines but not details. on 18 degraded images (3 degradation kernels on 6 standard test images) in both kernel-known and blind settings. Then In spite of the two kernels mentioned above, this work we compute the PSNR between generated results and origi- achieves especially remarkable results in motion blur de- nal standard test images, and visualise some of the results convolution. Figure9 visualises the comparison between for quantitative and qualitative comparison respectively. images restored from motion blurred C.man in both set- tings. As mentioned in the previous paragraph, kernel- Shown in Table1 and Table2 are PSNR comparisons be- known baseline gives an unsatisfactory result (Figure 9b), tween baseline and deep image prior for kernel-known and where only the basic outline of the cameraman can be ob- blind deconvolution respectively. Overall, our DIP deconvo- served, and all other details inside the image are lost, while lution models always perform better than baseline models kernel-known DIP restores the image almost perfectly as in terms of average PSNR on different degradation kernels. shown in Figure 9c. For blind motion deblurring on C.man, In kernel-known setting, DIP even gives a larger PSNR The result (Figure 9d) given by baseline still has motion value on every single degraded image. Particularly, when blur, and the shape of its kernel is completely different from the kernel type is set as motion blur, the baseline gives unex- motion blur, while DIP removes motion blur efficiently and pectedly bad results as shown by the PSNR values marked the shape of its kernel is much closer to motion blur than in red in Table1. We suspect this is because TV regulariser MLP Coursework 4 – Final Report (G045)

(a) motion blurred (b) kernel-known baseline (c) kernel-known DIP (d) blind baseline (e) blind DIP

Figure 9: Comparison on motion blurred cameraman between baseline and DIP in both kernel-known and blind settings.

2017), but trying to use deep image priors instead of hand- crafted priors. It uses ConvNet to express the prior informa- tion of both natural images and degradation kernels, bridg- ing the gap between kernel-known and blind deconvolution. Besides, its ConvNet-based image prior representation links (a) Gaussian (b) baseline (kk) (c) DIP (kk) two sets of popular deconvolution methods (Ulyanov et al., 2018): learning-based approaches by ConvNet (Xu et al., 2014; Zhang et al., 2017; Mao et al., 2016) and learning- free approaches by handcrafted prior (Shan et al., 2008).

6. Conclusions

(d) defocused (e) baseline (kk) (f) DIP (kk) In this work, we investigate deep ConvNet’s expressiveness on natural image prior information in DIP-based decon- Figure 10: Comparison on Gaussian degraded Lena, defo- volution, and presents the performance of the DIP model cused house.c between baseline and deep image prior in on image deconvolution in both kernel-known and blind both kernel-known (abbriviated as kk above) settings, with settings. We also provide detailed descriptions on data close-ups in specified areas. generation, adopted kernels and evaluation metrics. Most importantly, we propose DIP-based energy minimisation pipelines for image deconvolution in both kernel-known the baseline (see Figure 9e), which also verifies ConvNet’s and blind settings, and achieve performance which is far be- expressiveness on degradation kernels. yond our baselines (Beck & Teboulle, 2009; Wang, 2017). The motivation of this work is to adopt DIP with more com- 5. Related work plex structures to express image prior information based on the idea of traditional learning-free optimisation methods, The earliest traditional methods of image deconvolution and at the same time to improve the performance of tradi- include Richardson-Lucy (RL) method (Richardson, 1972) tional learning-free methods in image deconvolution task. and Weiner Filtering (Wiener, 1949). Due to their sim- Through the first two experiments, we first show to what plicity and efficiency, these two methods are still widely extent DIP blocks noise during the optimisation process used today, but they may be subject to ringing artifacts and how it captures image gradient prior, and thereby prove (Proakis, 2001). To solve this, many refinements based that the ConvNet structure of DIP captures strong prior on handcrafted regularisation priors came out. Dey et al. information on natural images in terms of generation types (2006) adopted TV regulariser as prior in kernel-known and image gradient distributions. In the final experiment, deconvolution. Yuan et al.(2008) proposed a progres- we show how DIP deconvolution performs better than the sive multi-scale optimisation method based on RL method, baselines in terms of both PSNR values and visual effects, with edge-preserving regularisation as the image prior. For especially with motion blur kernel, and thereby confirm the degradation kernels, early methods (Reeves & Mersereau, significant improvement by DIP models compared to the 1992) only dealt with their simple parametric forms. Later baselines. We indeed find a fairly ideal DIP structure that then, natural image statistics were used to estimate ker- can express the prior of degradation kernels in blind set- nels (Fergus et al., 2006; Levin, 2007). After that, Shan ting, but we believe there are better ones. Therefore, future 1 et al.(2008); Wang(2017) adopted L regulariser as kernel endeavours in this topic should focus on the structures of prior in blind deconvolution. However, handcrafted priors DIP generating kernels, trying other hyperparameters on mentioned above have relatively simple structures, so their hourglass, or other ConvNet structures, e.g. texture nets expressiveness is rather limited (Majumdar & Ward, 2009). (Ulyanov et al., 2016). Besides, as in (Shan et al., 2008), This work is inspired by traditional image deconvolution the formulation of our energy function may be adjusted methods by handcrafted priors (Rudin et al., 1992; Wang, with gradient terms to become more suitable for the task. MLP Coursework 4 – Final Report (G045)

References Gonzalez, Rafael C and Wintz, Paul. Digital image process- ing(book). Reading, Mass., Addison-Wesley Publishing Basavaprasad, Bl and Ravi, M. A study on the impor- Co., Inc.(Applied Mathematics and Computation, (13): tance of image processing and its applications. IJRET: 451, 1977. International Journal of Research in Engineering and Technology, 3, 2014. Hansen, Per Christian, Nagy, James G, and O’leary, Di- anne P. Deblurring images: matrices, spectra, and filter- Beck, Amir and Teboulle, Marc. Fast gradient-based al- ing, volume 3. Siam, 2006. gorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Hummel, Robert A, Kimia, B, and Zucker, Steven W. De- Processing, 18(11):2419–2434, 2009. blurring . Computer Vision, Graphics, and Image Processing, 38(1):66–80, 1987. Chan, Tony, Marquina, Antonio, and Mulet, Pep. High- order total variation-based image restoration. SIAM Jour- Huynh-Thu, Quan and Ghanbari, Mohammed. Scope of nal on Scientific Computing, 22(2):503–516, 2000. validity of psnr in image/video quality assessment. Elec- tronics letters, 44(13):800–801, 2008. Chan, Tony, Esedoglu, Selim, Park, Frederick, and Yip, A. Jain, Anil K. Fundamentals of digital image processing. Recent developments in total variation image restoration. Englewood Cliffs, NJ: Prentice Hall„ 1989. Mathematical Models of Computer Vision, 17(2), 2005. Johnstone, Iain M. High dimensional statistical inference Chan, Tony F and Wong, Chiu-Kwong. Total variation blind and random matrices. arXiv preprint math/0611589, deconvolution. IEEE transactions on Image Processing, 2006. 7(3):370–375, 1998. Kalalembang, Erik, Usman, Koredianto, and Gunawan, Ir- Dabov, Kostadin, Foi, Alessandro, and Egiazarian, Karen. wan Prasetya. Dct-based local motion blur detection. Video denoising by sparse 3d transform-domain collabo- In International Conference on Instrumentation, Com- rative filtering. In 2007 15th European Signal Processing munication, Information Technology, and Biomedical Conference, pp. 145–149. IEEE, 2007. Engineering 2009, pp. 1–6. IEEE, 2009.

Dey, Nicolas, Blanc-Feraud, Laure, Zimmer, Christophe, Kim, Seung-Jean, Koh, Kwangmoo, Lustig, Michael, and Roux, Pascal, Kam, Zvi, Olivo-Marin, Jean-Christophe, Boyd, Stephen. An efficient method for compressed and Zerubia, Josiane. Richardson–lucy algorithm with sensing. In Image Processing, 2007. ICIP 2007. IEEE In- total variation regularization for 3d confocal microscope ternational Conference on, volume 3, pp. III–117. IEEE, deconvolution. Microscopy research and technique, 69 2007. (4):260–266, 2006. Kingma, Diederik P and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, Di Zenzo, Silvano. A note on the gradient of a multi-image. 2014. Computer vision, graphics, and image processing, 33(1): 116–125, 1986. Kullback, Solomon. Information theory and statistics. Courier Corporation, 1997. Dobson, David C and Santosa, Fadil. Recovery of blocky images from noisy and blurred data. SIAM Journal on Kundur, Deepa and Hatzinakos, Dimitrios. Blind image Applied Mathematics, 56(4):1181–1198, 1996. deconvolution. IEEE signal processing magazine, 13(3): 43–64, 1996. Fergus, Rob, Singh, Barun, Hertzmann, Aaron, Roweis, Sam T, and Freeman, William T. Removing camera Levin, Anat. Blind motion deblurring using image statistics. shake from a single photograph. In ACM transactions on In Advances in Neural Information Processing Systems, graphics (TOG), volume 25, pp. 787–794. ACM, 2006. pp. 841–848, 2007. Majumdar, Angshul and Ward, Rabab K. Classification via Fornasier, Massimo, Langer, Andreas, and Schönlieb, group sparsity promoting regularization. 2009. Carola-Bibiane. A convergent overlapping domain de- composition method for total variation minimization. Nu- Mao, Xiao-Jiao, Shen, Chunhua, and Yang, Yu-Bin. Image merische Mathematik, 116(4):645–685, 2010. denoising using very deep fully convolutional encoder- decoder networks with symmetric skip connections. Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. CoRR, abs/1603.09056, 2016. URL http://arxiv.org/abs/ The elements of statistical learning, volume 1. Springer 1603.09056. series in statistics New York, NY, USA:, 2001. Morales, Nicolás, Gu, Liang, and Gao, Yuqing. Adding Gerchberg, RW. Super-resolution through error energy noise to improve noise robustness in speech recognition. reduction. Optica Acta: International Journal of Optics, In Eighth Annual Conference of the International Speech 21(9):709–720, 1974. Communication Association, 2007. MLP Coursework 4 – Final Report (G045)

Newell, Alejandro, Yang, Kaiyu, and Deng, Jia. Stacked Wang, Zhunxuan. An iterative method for image deblurring hourglass networks for human pose estimation. CoRR, based on total variation and compressed sensing. Bach- abs/1603.06937, 2016. URL http://arxiv.org/abs/1603. elor’s thesis, School of Mathematical Sciences, Fudan 06937. University, 220 Handan Rd., Yangpu District, Shanghai, China, 6 2017. Proakis, John G. Digital signal processing: principles algorithms and applications. Pearson Education India, Weiss, Yair and Freeman, William T. What makes a good 2001. model of natural images? In CVPR. IEEE Computer Society, 2007. Reeves, Stanley J and Mersereau, Russell M. Blur identi- Wiener, Norbert. Extrapolation, interpolation and smooth- fication by the method of generalized cross-validation. ing of stationary time series-with engineering applica- IEEE Transactions on Image Processing, 1(3):301–311, tions’ mit press. 1949. 1992. Xu, Li, Ren, Jimmy SJ, Liu, Ce, and Jia, Jiaya. Deep Richardson, William Hadley. Bayesian-based iterative convolutional neural network for image deconvolution. method of image restoration. JOSA, 62(1):55–59, 1972. In Advances in Neural Information Processing Systems, pp. 1790–1798, 2014. Roggemann, Michael C, Welsh, Byron M, and Hunt, Bobby R. Imaging through turbulence. CRC press, Yitzhaky, Yitzhak and Kopeika, Norman S. Identification of 2018. blur parameters from motion blurred images. Graphical models and image processing, 59(5):310–320, 1997. Roth, S. and Black, M. J. Fields of experts: A framework for learning image priors. In IEEE Conf. on Computer Yuan, Lu, Sun, Jian, Quan, Long, and Shum, Heung-Yeung. Vision and Pattern Recognition, volume 2, pp. 860–867, Progressive inter-scale and intra-scale non-blind image June 2005. deconvolution. In Acm Transactions on Graphics (TOG), volume 27, pp. 74. ACM, 2008. Royden, Halsey Lawrence and Fitzpatrick, Patrick. Real Zhang, Kai, Zuo, Wangmeng, Gu, Shuhang, and Zhang, Lei. analysis, volume 32. Macmillan New York, 1988. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE Conference on Computer Rudin, Leonid I, Osher, Stanley, and Fatemi, Emad. Nonlin- Vision and Pattern Recognition, pp. 3929–3938, 2017. ear total variation based noise removal algorithms. Phys- ica D: nonlinear phenomena, 60(1-4):259–268, 1992.

Sezan, M Ibrahim and Tekalp, A Murat. Survey of re- cent developments in digital image restoration. Optical Engineering, 29(5):393–405, 1990.

Shan, Qi, Jia, Jiaya, and Agarwala, Aseem. High-quality motion deblurring from a single image. In Acm trans- actions on graphics (tog), volume 27, pp. 73. ACM, 2008.

Shen, Jianhong, Kang, Sung Ha, and Chan, Tony F. Euler’s elastica and curvature-based inpainting. SIAM journal on Applied Mathematics, 63(2):564–592, 2003.

Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian, and Fer- gus, Rob. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

Ulyanov, Dmitry, Lebedev, Vadim, Vedaldi, Andrea, and Lempitsky, Victor S. Texture networks: Feed-forward synthesis of textures and stylized images. In ICML, vol- ume 1, pp. 4, 2016.

Ulyanov, Dmitry, Vedaldi, Andrea, and Lempitsky, Victor. Deep image prior. In Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, pp. 9446–9454, 2018.