Image Deconvolution with Deep Image Prior: Final Report
Total Page:16
File Type:pdf, Size:1020Kb
Image deconvolution with deep image prior: Final Report Abstract tion kernel is given (Sezan & Tekalp, 1990), i.e. recovering Image deconvolution has always been a hard in- X with knowing K in Equation 1. This problem is ill-posed, verse problem because of its mathematically ill- because simply applying the inverse of the convolution op- −1 posed property (Chan & Wong, 1998). A recent eration on degraded image B with kernel K, i.e. B∗ K, −1 work (Ulyanov et al., 2018) proposed deep im- gives an inverted noise term E∗ K, which dominates the age prior (DIP) which uses neural net structures solution (Hansen et al., 2006). to represent image prior information. This work Blind deconvolution: In reality, we can hardly obtain the mainly focuses on the task of image deconvolu- detailed kernel information, in which case the deconvolu- tion, using DIP to express image prior informa- tion problem is formulated in a blind setting (Kundur & tion and ref ne the domain of its objectives. It Hatzinakos, 1996). More concisely, blind deconvolution is proposes new energy functions for kernel-known to recover X without knowing K. This task is much more and blind deconvolution respectively in terms of challenging than it is under non-blind settings, because the DIP, and uses hourglass ConvNet (Newell et al., observed information becomes less and the domains of the 2016) as the DIP structures to represent sharpness variables become larger (Chan & Wong, 1998). or other higher level priors of natural images, as well as the prior in degradation kernels. From the In image deconvolution, prior information on unknown results on 6 standard test images, we f rst prove images and kernels (in blind settings) can signif cantly im- that DIP with ConvNet structure is strongly ca- prove the deconvolved results. A traditional representation pable of capturing natural image prior in image for such prior information is handcrafted regularisers in im- deconvolution. Then we conclude that our DIP age energy minimisation (Fornasier et al., 2010), e.g. total models signif cantly improve deconvolution per- variation (TV) regularisation for image sharpness (Chan & 1 formance compared with the baselines in both Wong, 1998) and L regularisation for kernel sparsity (Shan kernel-known (Beck & Teboulle, 2009) and blind et al., 2008; Wang, 2017). However, prior representations (Wang, 2017) settings, in terms of either PSNR like the above-mentioned regularisers have limited ability metric or deconvolution visual ef ects. of expressiveness (Majumdar & Ward, 2009). Therefore, this work aims to f nd better prior representations of images and kernels to improve deconvolution performances. 1. Introduction Deep image prior (DIP) (Ulyanov et al., 2018) is a recently Image degradation is a realistic and hard problem because proposed image prior representation. The main idea of DIP it tends to make images lose the key information it orig- is to substitute the image variable in energy minimisation inally had. Recovering from degraded images helps peo- by the output of a deep convolutional neural net (ConvNet) ple obtain more pictorial information from their origins with random noise input, so that the image prior informa- (Basavaprasad & Ravi, 2014). One of the many ways to tion is captured by the hyperparameter of the ConvNet, and model the processes of image degradation is convolution the output image is determined by the parameter of the Con- with translational invariance property (Xu et al., 2014) vNet. More specif cally, DIP-based image deconvolution can be described as follows. First, f nd a hyperparameter B=X∗K+E (1) setting for the ConvNet which is capable of expressing im- where X∈R d×m×n is the original image, K∈R h×w is age prior implicitly. Then, search for the optimal parameter the convolution kernel, E∈R d×m×nis the additive noise, on the ConvNet that minimises the energy. Compared to B∈R d×m×nis the degraded image, and d denotes the num- handcrafted regularisers, DIP expresses prior information ber of channels in the images (1 for greyscale images and by its network structure instead of terms in minimisation 3 for color images). Image deconvolution is the process objectives, and explores the solutions in the ConvNet’s pa- of recovering the original image X from the observed de- rameter space instead of the image space. One point to graded image B, i.e. the inverse process of convolutional emphasise here is that the prior information expressed by image degradation. This work focuses on the task of image the both is embodied in their own formulations and struc- deconvolution in two dif erent settings, kernel-known and tures, and do not require a large amount of data for training. kernel-unknown (a.k.a. blind deconvolution). Deep neural architecture has a strong capability to accom- Kernel-known: The preliminary stage of image deconvolu- modate and express information because of its intricate and tion research mainly considers the case where the convolu- f exible structure (Szegedy et al., 2013). Compared to other MLP Coursework 4 – Final Report (G045) served image B. The generation processes can be illustrated Convolution Add noise by the diagram in Figure1. The noise matrix E is indepen- + dently and identically distributed by Gaussian with respect * to each entry, and the noise strength (i.e. standard devi- Original Image Observed Image ation) σ is fixed to 0.01 in order to reduce experimental variables. To explore different kinds of degradation models, Kernel three common kernels for different kinds of degradation, Noise Gaussian kernel (Hummel et al., 1987), defocus (Hansen Figure 1: The generation processes of observed images. et al., 2006) and motion blur (Yitzhaky & Kopeika, 1997) For each process, we first convolve the original standard are used to generate the data set. image by a given kernel, then add a noise term to the con- Gaussian: The kernel for degradation caused by atmo- volved image. spheric turbulence can be described as a two-dimensional Gaussian function (Jain, 1989; Roggemann et al., 2018), and the entries of the unscaled kernel are given by the for- mula (Hansen et al., 2006) 2 2 1 i c1 1 j c2 256 256 256 256 512 512 512 512 256 256 512 512 K , = exp − − × × × × × × i j −2 s − −2 s 1 ! 2 ! Figure 2: Standard test image data set experimented in where (c , c ) is the center of K, and (s , s ) determines the (zoomed out), containing 4 greyscale and 2 color 1 2 1 2 our work width of the kernel (i.e. standard deviation of the Gaussian images, named cameraman (abbr. C.man), house, Lena, function). In this work, s and s are set to s = s = 2.0. boat, house.c, peppers respectively from left to right. 1 2 1 2 The original resolution is marked below each image. Defocus: Out-of-focus is another common issue in optical imaging. Knowledge of the physical process that causes out-of-focus provides an explicit formulation of the kernel image prior representations with limited structure (e.g. reg- (Hansen et al., 2006) ularisers), neural nets with such powerful expressiveness 2 2 2 2 1/(πr ) if (i c1) + ( j c2) r , seem more capable of capturing high-level prior informa- Ki, j = − − ≤ tion of natural images and achieving remarkable results in 0 otherwise. image deconvolution. Therefore, this work aims to analyse where r denotes the radius of the kernel, which is set to the expressive power of DIP on natural images, and use r = min (h/2, w/2) in this work. DIP to improve the performance of image deconvolution b c compared to regularisation-based priors. To summarise Motion blur: This usually happens if an image being with respect to contribution, this work proposes the gen- recorded changes in a single exposure when taking a photo- eral energy function of DIP-based image deconvolution graph. For example, when taking a picture, moving objects and constructs its optimisation pipeline (see Section3), being taken at high speed or lens shake will blur the picture. which achieves considerably better performance than other Regardless of the noise, the convolution process of motion learning-free baselines in different degradation kernels (see blur with amplitude u and shifting angle α is given by the Section4), especially in motion blur (see Figure9). formula (Kalalembang et al., 2009) 1 u B , = X + α, + α 2. Data set and evaluation metrics i j 2u + 1 i k cos j k sin k= u X− As discussed in section1, capturing image prior by either in which the shape of the kernel is a line segment as Figure regularisation terms or deep neural net structures is learning- 3c shows. In this work, the blur amplitude and shifting free. Therefore, data set explored in this work will only be angle are set as u = √2 min (h/2, w/2) and α = 3π/4. used for testing. Experiments and performance evaluation · b c are conducted on a data set with 6 standard test images All the kernels adopted in data generation processes and experiments in this work are in shape 9 9 (i.e. h = w = 9) shown in Figure2. It contains 4 grey-scale images and 2 × color images. Those images have also been experimented to with center (4, 4) (i.e. c1 = c2 = 4), and scaled such that test performances in many related works, such as denoising elements in each kernel sum to 1 (Hansen et al., 2006). (Dabov et al., 2007), TV deblurring (Beck & Teboulle, Figure3 gives visualisation examples of the 3 different 2009), etc., which to some extent guarantees the consistency kernels adopted with given settings mentioned above. and reliability of experimental results. 2.2.