Deconvolution with ADMM
Total Page:16
File Type:pdf, Size:1020Kb
Deconvolution with ADMM EE367/CS448I: Computational Imaging and Display stanford.edu/class/ee367 Lecture 6 Gordon Wetzstein Stanford University Lens as Optical Low-pass Filter • point source on focal plane maps to point focal plane Lens as Optical Low-pass Filter • away from focal plane: out of focus blur blurred point focal plane Lens as Optical Low-pass Filter • shift-invariant convolution focal plane Lens as Optical Low-pass Filter convolution kernel is called point spread function (PSF) c b = c∗ x x b sharp image measured, blurred image point spread function point (PSF): Lens as Optical Low-pass Filter diffraction-limited PSF of circular aperture (aka “Airy” pattern): c b = c∗ x x b sharp image measured, blurred image point spread function point (PSF): PSF, OTF, MTF • point spread function (PSF) is fundamental concept in optics • optical transfer function (OTF) is (complex) Fourier transform of PSF • modulation transfer function (MTF) is magnitude of OTF • example: MTF=|OTF| OTF=F{PSF} PSF PSF, OTF, MTF • example: MTF=|OTF| OTF=F{PSF} PSF Deconvolution • given measurements b and convolution kernel c, what is x? c x b * ? = Deconvolution with Inverse Filtering −1 −1 ⎧F{b}⎫ • naive solution: apply inverse kernel x! = c ∗b = F ⎨ ⎬ F c ⎩ { } ⎭ x x! Deconvolution with Inverse Filtering & Noise −1 −1 ⎧F{b}⎫ • naive solution: apply inverse kernel x! = c ∗b = F ⎨ ⎬ F c ⎩ { } ⎭ x! • Gaussian noise, σ = 0.05 Deconvolution with Inverse Filtering & Noise • results: terrible! • why? this is an ill-posed problem (division by (close to) zero in frequency domain) à noise is drastically amplified! • need to include prior(s) on images to make up for lost data • for example: noise statistics (signal to noise ratio) Deconvolution with Wiener Filtering • apply inverse kernel and don’t divide by 0 ⎧ 2 ⎫ −1 ⎪ F{c} F{b}⎪ x! = F ⋅ ⎨ 2 1 ⎬ ⎪ F{c} + F{c} ⎪ ⎩ SNR ⎭ amplitude-dependent damping factor! ���� ������ ≈ 0.5 ��� = ����� ��� = � Deconvolution with Wiener Filtering Naïve inverse filter Wiener x x! Deconvolution with Wiener Filtering σ = 0.01 σ = 0.05 σ = 0.1 Deconvolution with Wiener Filtering • results: not too bad, but noisy • this is a heuristic à dampen noise amplification Total Variation 2 2 minimize Cx − b + λTV(x) = minimize Cx − b + λ ∇x x 2 x 2 1 x = x 1 ∑ i i • idea: promote sparse gradients (edges) • ∇ is finite differences operator, i.e. matrix ⎡ −1 1 ⎤ ⎢ 1 1 ⎥ ⎢ − ⎥ ⎢ ! ⎥ ⎣⎢ −1 ⎦⎥ Rudin et al. 1992 Total Variation express (forward finite difference) ⎡ 0 0 0 ⎤ ⎡ 0 0 0 ⎤ ⎢ 0 1 1 ⎥ ∗⎢ 0 1 0 ⎥ gradient as convolution! ∗⎢ − ⎥ ⎢ − ⎥ ⎣⎢ 0 0 0 ⎦⎥ ⎣⎢ 0 1 0 ⎦⎥ x ∇x x ∇y x Total Variation better: isotropic easier: anisotropic 2 2 2 2 x (∇x x) + (∇y x) (∇x x) + (∇y x) Total Variation • for simplicity, this lecture only discusses anisotropic TV: ⎡ ⎤ ∇x TV(x) = ∇ x + ∇ x = ⎢ ⎥ x x 1 y 1 ⎢ ∇y ⎥ ⎣ ⎦ 1 • problem: l1-norm doesn’t allow for an inverse filtering approach • however: simple solution for data fitting along and simple solution for TV alone à split problem! Deconvolution with ADMM • split deconvolution with TV prior: 2 minimize Cx − b 2 + λ z 1 subject to ∇x = z • general form of ADMM (alternating direction method of multiplies): 2 f (x) = Cx − b minimize f (x) + g(z) 2 g(z) = λ z subject to Ax + Bz = c 1 A = ∇, B = −I, c = 0 Deconvolution with ADMM • split deconvolution with TV prior: 2 minimize Cx − b 2 + λ z 1 subject to ∇x = z • general form of ADMM (alternating direction method of multiplies): 2 f (x) = Cx − b minimize f (x) + g(z) 2 g(z) = λ z subject to Ax + Bz = c 1 A = ∇, B = −I, c = 0 Deconvolution with ADMM • split deconvolution with TV prior: 2 minimize Cx − b 2 + λ z 1 subject to ∇x = z • general form of ADMM (alternating direction method of multiplies): 2 f (x) = Cx − b minimize f (x) + g(z) 2 g(z) = λ z subject to Ax + Bz = c 1 A = ∇, B = −I, c = 0 Deconvolution with ADMM • split deconvolution with TV prior: 2 minimize Cx − b 2 + λ z 1 subject to ∇x = z • general form of ADMM (alternating direction method of multiplies): 2 f (x) = Cx − b minimize f (x) + g(z) 2 g(z) = λ z subject to Ax + Bz = c 1 A = ∇, B = −I, c = 0 minimize f (x) + g(z) ADMM subject to Ax + Bz = c • Lagrangian (bring constraints into objective = penalty method): L(x, y,z) = f (x) + g(z) + yT (Ax + Bz − c) dual variable or Lagrange multiplier • augmented Lagrangian: additional penalty term T 2 Lρ (x, y,z) = f (x) + g(z) + y (Ax + Bz − c) + (ρ / 2) Ax + Bz − c 2 minimize f (x) + g(z) ADMM subject to Ax + Bz = c • augmented Lagrangian is differentiable under mild conditions (usually better convergence etc.) T 2 Lρ (x, y,z) = f (x) + g(z) + y (Ax + Bz − c) + (ρ / 2) Ax + Bz − c 2 minimize f (x) + g(z) ADMM subject to Ax + Bz = c • ADMM consists of 3 steps per iteration k: k+1 k k x := argmin Lρ (x,z , y ) x k+1 k+1 k z := argmin Lρ (x ,z, y ) z yk+1 := yk + ρ(Axk+1 + Bzk+1 − c) minimize f (x) + g(z) ADMM subject to Ax + Bz = c • ADMM consists of 3 steps per iteration k: constant xk+1 := argmin f (x) + (ρ / 2) Ax + Bzk − c + u k x ( ) zk+1 := argmin g(z) + (ρ / 2) Axk+1 + Bz − c + u k z ( ) u k+1 := u k + Axk+1 + Bzk+1 − c scaled dual variable: u = (1/ ρ)y minimize f (x) + g(z) ADMM subject to Ax + Bz = c • ADMM consists of 3 steps per iteration k: split f(x) and g(x) into independent problems! (u connects them) 2 xk+1 := argmin f (x) + (ρ / 2) Ax + Bzk − c + u k 2 x ( ) 2 zk+1 := argmin g(z) + (ρ / 2) Axk+1 + Bz − c + u k 2 z ( ) u k+1 := u k + Axk+1 + Bzk+1 − c scaled dual variable: u = (1/ ρ)y 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 • ADMM consists of 3 steps per iteration k: k+1 ⎛ 1 2 k k 2 ⎞ x := argmin Cx − b + (ρ / 2) ∇x − z + u ⎜ 2 2 ⎟ x ⎝ 2 ⎠ 2 zk+1 := argmin λ z + (ρ / 2) ∇xk+1 − z + u k 1 2 z ( ) u k+1 := u k + ∇xk+1 − zk+1 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 constant, say v = zk − u k k+1 ⎛ 1 2 k k 2 ⎞ 1. x-update: x := argmin Cx − b + (ρ / 2) ∇x − z + u ⎜ 2 2 ⎟ x ⎝ 2 ⎠ solve normal equations (CT C + ρ∇T ∇)x = (CT b + ρ∇T v) T ⎡ ⎤ ∇x ∇T v = ⎢ ⎥ v = ∇T v + ∇T v ∇ x 1 y 2 ⎣⎢ y ⎦⎥ 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 constant, say v = zk − u k k+1 ⎛ 1 2 k k 2 ⎞ 1. x-update: x := argmin Cx − b + (ρ / 2) ∇x − z + u ⎜ 2 2 ⎟ x ⎝ 2 ⎠ −1 x = (CT C + ρ∇T ∇) (CT b + ρ∇T v) * * ⎧ F c * F b F F v F F v ⎫ ⎪ { } ⋅ { } + ρ {∇x } ⋅ { 1} + {∇y } ⋅ { 2 } ⎪ • inverse filtering: xk+1 = F −1 ( ) ⎨ * * * ⎬ ⎪F{c} ⋅ F{c} + ρ F{∇x } ⋅ F{∇x } + F ∇y ⋅ F ∇y ⎪ ⎩ ( { } { })⎭ precompute! à may blow up, but that’s okay 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 constant, say a = ∇xk+1 + u k 2 2. z-update: zk+1 := argmin λ z + (ρ / 2) ∇xk+1 − z + u k 1 2 z ( ) 2 := argminλ z 1 + (ρ / 2) z − a 2 z • l1-norm is not differentiable! yet, closed-form solution via element-wise soft thresholding: ⎧ a −κ a >κ k+1 ⎪ z := Sλ/ρ (a) Sκ (a) = ⎨ 0 a ≤κ = (a −κ )+ − (−a −κ )+ ⎪ κ = λ / ρ ⎩ a +κ a < −κ 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 for k=1:max_iters ⎛ 2 ⎞ k 1 1 ⎡ C ⎤ ⎡ b ⎤ x + := argmin ⎜ ⎢ ⎥ x − ⎢ ⎥ ⎟ inverse filtering x ⎜ 2 ρ∇ ρv ⎟ ⎝ ⎣⎢ ⎦⎥ ⎣⎢ ⎦⎥ 2 ⎠ k+1 k+1 k element-wise threshold z := Sλ/ρ (∇x + u ) u k+1 := u k + ∇xk+1 − zk+1 trivial 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 for k=1:max_iters ⎛ 2 ⎞ k 1 1 ⎡ C ⎤ ⎡ b ⎤ x + := argmin ⎜ ⎢ ⎥ x − ⎢ ⎥ ⎟ inverse filtering x ⎜ 2 ρ∇ ρv ⎟ ⎝ ⎣⎢ ⎦⎥ ⎣⎢ ⎦⎥ 2 ⎠ k+1 k+1 k element-wise threshold z := Sλ/ρ (∇x + u ) u k+1 := u k + ∇xk+1 − zk+1 trivial à easy! J 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 Wiener filtering ADMM with anisotropic TV, λ = 0.01, ρ = 10 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 • too much TV: “patchy”, too little TV: noisy λ = 0.01, ρ = 10 λ = 0.05, ρ = 10 λ = 0.1, ρ = 10 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 Wiener filtering ADMM with anisotropic TV, λ = 0.1, ρ = 10 1 2 minimize Cx − b + λ z Deconvolution with ADMM 2 2 1 subject to ∇x − z = 0 • too much TV: okay because image actually has sparse gradients! λ = 0.01, ρ = 10 λ = 0.05, ρ = 10 λ = 0.1, ρ = 10 Outlook ADMM • powerful tool for many computational imaging problems • include generic prior in g(z), just need to derive proximal operator 1 2 minimize f (x) + g(z) minimize Ax − b + Γ(x) {x,z} x 2 2 % !#"#$ regularization subject to Ax = z data fidelity • example priors: noise statistics, sparse gradient, smoothness, … • weighted sum of different priors also possible • anisotropic TV is one of the easiest priors Remember! • implement matrix-free operations for Ax and A’x if efficient (e.g.