Sparse Estimation for Image and Vision Processing
Total Page:16
File Type:pdf, Size:1020Kb
Sparse Estimation for Image and Vision Processing Julien Mairal Inria, Grenoble SPARS summer school, Lisbon, December 2017 Julien Mairal Sparse Estimation for Image and Vision Processing 1/187 Course material (freely available on arXiv) J. Mairal, F. Bach and J. Ponce. Sparse Modeling for Image and Vision Processing. Foundations and Trends in Computer Graphics and Vision. 2014. F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1). 2012. Julien Mairal Sparse Estimation for Image and Vision Processing 2/187 Outline 1 A short introduction to parsimony 2 Discovering the structure of natural images 3 Sparse models for image processing 4 Optimization for sparse estimation 5 Application cases Julien Mairal Sparse Estimation for Image and Vision Processing 3/187 Part I: A Short Introduction to Parcimony Julien Mairal Sparse Estimation for Image and Vision Processing 4/187 1 A short introduction to parsimony Early thoughts Sparsity in the statistics literature from the 60’s and 70’s Wavelet thresholding in signal processing from 90’s The modern parsimony and the ℓ1-norm Structured sparsity Compressed sensing and sparse recovery 2 Discovering the structure of natural images 3 Sparse models for image processing 4 Optimization for sparse estimation 5 Application cases Julien Mairal Sparse Estimation for Image and Vision Processing 5/187 Early thoughts (a) Dorothy Wrinch (b) Harold Jeffreys 1894–1980 1891–1989 The existence of simple laws is, then, apparently, to be regarded as a quality of nature; and accordingly we may infer that it is justifiable to prefer a simple law to a more complex one that fits our observations slightly better. [Wrinch and Jeffreys, 1921]. Philosophical Magazine Series. Julien Mairal Sparse Estimation for Image and Vision Processing 6/187 Historical overview of parsimony 14th century: Ockham’s razor; 1921: Wrinch and Jeffreys’ simplicity principle; 1952: Markowitz’s portfolio selection; 60 and 70’s: best subset selection in statistics; 70’s: use of the ℓ1-norm for signal recovery in geophysics; 90’s: wavelet thresholding in signal processing; 1996: Olshausen and Field’s dictionary learning; 1996–1999: Lasso (statistics) and basis pursuit (signal processing); 2006: compressed sensing (signal processing) and Lasso consistency (statistics); 2006–now: applications of dictionary learning in various scientific fields such as image processing and computer vision. Julien Mairal Sparse Estimation for Image and Vision Processing 7/187 Sparsity in the statistics literature from the 60’s and 70’s Given some observed data points z1,..., zn that are assumed to be independent samples from a statistical model with parameters θ in Rp, maximum likelihood estimation (MLE) consists of minimizing n △ min (θ) = log Pθ(zi ) . θ∈Rp L − " i # X=1 Example: ordinary least square Observations zi =(yi , xi ), with yi in R. ⊤ Linear model: yi = x θ + εi , with εi (0, 1). i ∼N n 2 1 ⊤ min yi xi θ . θ∈Rp 2 − i X=1 Julien Mairal Sparse Estimation for Image and Vision Processing 8/187 Sparsity in the statistics literature from the 60’s and 70’s Given some observed data points z1,..., zn that are assumed to be independent samples from a statistical model with parameters θ in Rp, maximum likelihood estimation (MLE) consists of minimizing n △ min (θ) = log Pθ(zi ) . θ∈Rp L − " i # X=1 Motivation for finding a sparse solution: removing irrelevant variables from the model; obtaining an easier interpretation; preventing overfitting; Julien Mairal Sparse Estimation for Image and Vision Processing 8/187 Sparsity in the statistics literature from the 60’s and 70’s Given some observed data points z1,..., zn that are assumed to be independent samples from a statistical model with parameters θ in Rp, maximum likelihood estimation (MLE) consists of minimizing n △ min (θ) = log Pθ(zi ) . θ∈Rp L − " i # X=1 Two questions: 1 how to choose k? 2 how to find the best subset of k variables? Julien Mairal Sparse Estimation for Image and Vision Processing 8/187 Sparsity in the statistics literature from the 60’s and 70’s How to choose k? Mallows’s Cp statistics [Mallows, 1964, 1966]; Akaike information criterion (AIC) [Akaike, 1973]; Bayesian information critertion (BIC) [Schwarz, 1978]; Minimum description length (MDL) [Rissanen, 1978]. These approaches lead to penalized problems min (θ)+ λ θ 0, θ∈Rp L k k with different choices of λ depending on the chosen criterion. Julien Mairal Sparse Estimation for Image and Vision Processing 9/187 Sparsity in the statistics literature from the 60’s and 70’s How to solve the best k-subset selection problem? Unfortunately... ...the problem is NP-hard [Natarajan, 1995]. Two strategies combinatorial exploration with branch-and-bound techniques [Furnival and Wilson, 1974] leaps and bounds, → exact algorithm but exponential complexity; greedy approach: forward selection [Efroymson, 1960] (originally developed for observing intermediate solutions), already contains all the ideas of matching pursuit algorithms. Important reference: [Hocking, 1976]. The analysis and selection of variables in linear regression. Biometrics. Julien Mairal Sparse Estimation for Image and Vision Processing 10/187 Wavelet thresholding in signal processing from the 90’s A wavelet basis represents a set of functions ϕ1,ϕ2 that are essentially dilated and shifted versions of each other [see Mallat, 2008]. Concept of parsimony with wavelets When a signal f is “smooth”, it is close to an expansion i αi ϕi where only a few coefficients αi are non-zero. P 1.5 1 0.8 1 0.6 0.4 0.5 0.2 0 0 −0.2 −0.4 −0.5 −0.6 −0.8 −1 −1 −8 −6 −4 −2 0 2 4 6 8 −4 −3 −2 −1 0 1 2 3 4 (a) Meyer’s wavelet (b) Morlet’s wavelet Julien Mairal Sparse Estimation for Image and Vision Processing 11/187 Wavelet thresholding in signal processing from the 90’s Wavelets where the topic of a long quest for representing natural images 2D-Gabors [Daugman, 1985]; steerable wavelets [Simoncelli et al., 1992]; curvelets [Cand`es and Donoho, 2002]; countourlets [Do and Vertterli, 2003]; bandlets [Le Pennec and Mallat, 2005]; ⋆-lets (joke). (a) 2D Gabor filter. (b) With shifted phase. (c) With rotation. Julien Mairal Sparse Estimation for Image and Vision Processing 12/187 Wavelet thresholding in signal processing from 90’s The theory of wavelets is well developed for continuous signals, e.g., in L2(R), but also for discrete signals x in Rn. Julien Mairal Sparse Estimation for Image and Vision Processing 13/187 Wavelet thresholding in signal processing from 90’s n×n Given an orthogonal wavelet basis D =[d1,..., dn] in R , the wavelet decomposition of x in Rn is simply β = D⊤x and we have x = Dβ. The k-sparse approximation problem 1 2 min x Dα 2 s.t. α 0 k, α∈Rp 2k − k k k ≤ is not NP-hard here: since D is orthogonal, it is equivalent to 1 2 min β α 2 s.t. α 0 k. α∈Rp 2k − k k k ≤ Julien Mairal Sparse Estimation for Image and Vision Processing 14/187 Wavelet thresholding in signal processing from 90’s n×n Given an orthogonal wavelet basis D =[d1,..., dn] in R , the wavelet decomposition of x in Rn is simply β = D⊤x and we have x = Dβ. The k-sparse approximation problem 1 2 min x Dα 2 s.t. α 0 k, α∈Rp 2k − k k k ≤ The solution is obtained by hard-thresholding: β[j] if β[j] µ αht[j]= δ β[j]= | |≥ , |β[j]|≥µ 0 otherwise where µ the k-th largest value among the set β[1] ,..., β[p] . {| | | |} Julien Mairal Sparse Estimation for Image and Vision Processing 14/187 Wavelet thresholding in signal processing, 90’s Another key operator introduced by Donoho and Johnstone [1994] is the soft-thresholding operator: β[j] λ if β[j] λ △ − ≥ αst[j] = sign(β[j]) max( β[j] λ, 0) = β[j]+ λ if β[j] λ , | |− ≤− 0 otherwise where λ is a parameter playing the same role as µ previously. △ With β = D⊤x and D orthogonal, it provides the solution of the following sparse reconstruction problem: 1 2 min x Dα 2 + λ α 1, α∈Rp 2k − k k k which will be of high importance later. Julien Mairal Sparse Estimation for Image and Vision Processing 15/187 Wavelet thresholding in signal processing, 90’s αst αht λ µ − − λ β µ β (d) Soft-thresholding operator, (e) Hard-thresholding operator st ht α = sign(β) max(|β| − λ, 0). α = δ|β|≥µβ. Figure: Soft- and hard-thresholding operators, which are commonly used for signal estimation with orthogonal wavelet basis. Julien Mairal Sparse Estimation for Image and Vision Processing 16/187 Wavelet thresholding in signal processing, 90’s Various work tried to exploit the structure of wavelet coefficients. α1 α2 α3 α4 α5 α6 α7 α8 α9 α10 α11 α12 α13 α14 α15 Figure: Illustration of a wavelet tree with four scales for one-dimensional signals. We also illustrate the zero-tree coding scheme [Shapiro, 1993]. Julien Mairal Sparse Estimation for Image and Vision Processing 17/187 Wavelet thresholding in signal processing, 90’s To model spatial relations, it is possible to define some (non-overlapping) groups of wavelet coefficients, and define a group soft-thresholding G operator [Hall et al., 1999, Cai, 1999]. For every group g in , G λ gt △ 1 β g β[g] if β[g] 2 λ α [g] = − k [ ]k2 k k ≥ , ( 0 otherwise where β[g] is the vector of size g recording the entries of β in g.