Solving Inverse Problems with a Flow-Based Noise Model

Solving Inverse Problems with a Flow-based Noise Model Jay Whang 1 Qi Lei 2 Alexandros G. Dimakis 3 Abstract plausible solution among them. Sparsity has classically We study image inverse problems with a normaliz- been a very influential structural prior for various inverse ing flow prior. Our formulation views the solution problems (Candes et al., 2006; Donoho, 2006; Baraniuk as the maximum a posteriori estimate of the image et al., 2010). Alternatively, recent approaches introduced conditioned on the measurements. This formula- deep generative models as a powerful signal prior, showing tion allows us to use noise models with arbitrary significant gains in reconstruction quality compared to spar- dependencies as well as non-linear forward op- sity priors (Bora et al., 2017; Asim et al., 2019; Van Veen erators. We empirically validate the efficacy of et al., 2018; Menon et al., 2020). our method on various inverse problems, includ- However, most existing methods assume the setting of Gaus- ing compressed sensing with quantized measure- sian noise model and are unsuitable for structured and corre- ments and denoising with highly structured noise lated noise, making them less applicable in many real-world patterns. We also present initial theoretical recov- scenarios. For instance, when an image is ruined by hand- ery guarantees for solving inverse problems with drawn scribbles or a piece of audio track is overlayed with a flow prior. human whispering, the noise to be removed follows a very complex distribution. These settings deviate significantly from Gaussian noise settings, yet they are much more realis- 1. Introduction tic and deserve more attention. In this paper, we propose to use a normalizing flow model to represent the noise distri- Inverse problems seek to reconstruct an unknown signal bution and derive principled methods for solving inference from observations (or measurements), which are produced problems under this general noise model. by some process that transforms the original signal. Because such processes are often lossy and noisy, inverse problems Contributions. are typically formulated as reconstructing x from its measurements • We present a general formulation for obtaining maxi- y = f(x) + δ (1) mum a posteriori (MAP) reconstructions for dependent where f is a known deterministic forward operator and δ noise and general forward operators. Notably, our is an additive noise which may have a complex structure method can leverage deep generative models for both itself. An impressively wide range of applications can be posed under this formulation with an appropriate choice of f and δ, such as compressed sensing (Candes et al., 2006; Donoho, 2006), computed tomography (Chen et al., 2008), magnetic resonance imaging (MRI) (Lustig et al., 2007), and phase retrieval (Candes et al., 2015a;b). In general for a non-invertible forward operator f, there can be potentially infinitely many signals that match given observations. Thus the recovery algorithm must critically rely on a priori knowledge about the original signal to find the most 1Dept. of Computer Science, UT Austin, TX, USA 2Dept. of Electrical and Computer Engineering, Princeton Univer- 3 Asim Bora sity, NJ, USA Dept. of Electrical and Computer Engineer- Noisy Ours BM3D ing, UT Austin, TX, USA. Correspondence to: Jay Whang et al. et al. <[email protected]>. Figure 1. Result of denoising MNIST digits. The first column Proceedings of the 38 th International Conference on Machine contains noisy observations, and subsequent columns contain re- Learning, PMLR 139, 2021. Copyright 2021 by the author(s). constructions. Solving Inverse Problems with a Flow-based Noise Model the original image and the noise. 2.2. Inverse Problems with a Generative Prior • We extend our framework to the general setting where We briefly review the existing literature on the application the signal prior is given as a latent-variable model, for of deep generative models to inverse problems. While vast which likelhood evaluation is intractable. The result- literature exists on compressed sensing and other inverse ing formulation presents a unified view on existing problems, the idea of replacing the classical sparsity-based approaches based on GAN, VAE and flow priors. prior (Candes et al., 2006; Donoho, 2006) with a neural network was introduced relatively recently. In their pioneer- • We empirically show that our method achieves excel- ing work, Bora et al.(2017) proposed to use the generator lent reconstruction in the presence of noise with var- from a pre-trained GAN or a VAE (Goodfellow et al., 2014; ious complex and dependent structures. Specifically, Kingma & Welling, 2013) as the prior for compressed sens- we demonstrate the efficacy of our method on various ing. This led to a substantial gain in reconstruction quality inverse problems with structured noise and non-linear compared to classical methods, particularly at small number forward operators. of measurements. Following this work, numerous studies have investigated dif- • We provide the initial theoretical characterization of ferent ways to utilize various neural network architectures likelihood-based priors for image denoising. Specif- for inverse problems (Mardani et al., 2018; Heckel & Hand, ically, we show a reconstruction error bound that de- 2019; Mixon & Villar, 2018; Pandit et al., 2019; Lucas et al., pends on the local concavity of the log-likelihood func- 2018; Shah & Hegde, 2018; Liu & Scarlett, 2020; Kabkab tion. et al., 2018; Mousavi et al., 2018; Raj et al., 2019; Sun et al., 2019). One straightforward extension of (Bora et al., 2017) 2. Background proposes to expand the range of the pre-trained generator by allowing sparse deviations (Dhar et al., 2018). Similarly, 2.1. Normalizing Flow Models Shah & Hegde(2018) proposed another algorithm based on projected gradient descent with convergence guarantees. Normalizing flow models are a class of likelihood-based Van Veen et al.(2018) showed that an untrained convolu- generative models that represent complex distributions by tional neural network can be used as a prior for imaging transforming a simple distribution (such as standard Gaus- tasks based on Deep Image Prior by Ulyanov et al.(2018). sian) through an invertible mapping (Tabak & Turner, 2013). Compared to other types of generative models, flow models More recently, Wu et al.(2019) applied techniques from are computationally flexible in that they provide efficient meta-learning to improve the reconstruction speed, and sampling, inversion and likelihood estimation (Papamakar- Ardizzone et al.(2018) showed that by modelling the for- ios et al., 2019, and references therein). ward process with a flow model, one can implicitly learn the inverse process through the invertibility of the model. Concretely, given a differentiable invertible mapping G : Asim et al.(2019) proposed to replace the GAN prior of n ! n, the samples x from this model are generated R R (Bora et al., 2017) with a normalizing flow model and re- via z ∼ p (z); x = G(z). Since G is invertible, change of G ported excellent reconstruction performance, especially on variables formula allows us to compute the log-density of out-of-distribution images. x: log p(x) = log p(z) + log jdet J −1 (x)j ; (2) G 3. Our Method −1 where JG−1 (x) is the Jacobian of G evaluated at x. Since log p(z) is a simple distribution, computing the likelihood 3.1. Notations and Setup −1 at any point x is straightforward as long as G and the We use bold lower-case variables to denote vectors, k · k log-determinant term can be efficiently evaluated. to denote `2 norm, and to denote element-wise multi- Notably, when a flow model is used as the prior for an plication. We also assume that we are given a pre-trained inverse problem, the invertibility of G guarantees that it latent-variable generative model pG(x) that we can effi- has an unrestricted range. Thus the recovered signal can ciently sample from. Importantly, we assume the access to represent images that are out-of-distribution, albeit at lower a noise distribution p∆ parametrized as a normalizing flow, probability. This is a key distinction from a GAN-based which itself can be an arbitrarily complex, pre-trained distri- prior, whose generator has a restricted range and can only bution. We let f denote the deterministic and differentiable generate samples from the distribution it was trained on. As forward operator for our measurement process. Thus an ob- pointed out by Asim et al.(2019) and also shown below servation is generated via y = f(x) + δ where x ∼ pG(x) in our experiments, this leads to performance benefits on and δ ∼ p∆(δ). out-of-distribution examples. Solving Inverse Problems with a Flow-based Noise Model Note that while f and p∆ are known, they cannot be treated 3.3. MLE Formulation as fixed across different examples, e.g. in compressed sens- When the signal prior does not provide tractable likelihood ing the measurement matrix is random and thus only known evaluation (e.g. for the case of GAN and VAE), we view at the time of observation. This precludes the use of end- the problem as a maximum-likelihood estimation under to-end training methods that require having a fixed forward the noise model. Thus we attempt to find the signal that operator. maximizes noise likelihood within the support of pG(x) and arrive at a similar, but different loss: 3.2. MAP Formulation When the likelihood under the prior pG(x) can be computed efficiently (e.g. when it is a flow model), we can pose the arg max log p∆(y − f(x)) inverse problem as a MAP estimation task. Concretely, for x2supp p(x) a given observation y, we wish to recover x as the MAP = arg max log p∆(y − f(G(z))) z estimate of the conditional distribution pG(xjy): , arg min LMLE(z; y); arg max log p(xjy) z x where = arg max [log p(yjx) + log pG(x) − log p(y)] x LMLE(z; y) − log p∆(y − f(G(z))): (5) (1) , = arg max [log p∆(y − f(x)) + log pG(x)] x 3.4.

Load more