Functional Nonlinear Sparse Models Luiz F
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020 2449 Functional Nonlinear Sparse Models Luiz F. O. Chamon , Student Member, IEEE, Yonina C. Eldar , Fellow, IEEE, and Alejandro Ribeiro Abstract—Signal processing is rich in inherently continuous and kernel Hilbert spaces (RKHSs) also admit finite descriptions often nonlinear applications, such as spectral estimation, optical through variational results known as representer theorems [14], imaging, and super-resolution microscopy, in which sparsity plays [15]. The infinite dimensionality of continuous problems is a key role in obtaining state-of-the-art results. Coping with the infi- nite dimensionality and non-convexity of these problems typically therefore often overcome by means of sampling theorems. Sim- involves discretization and convex relaxations, e.g., using atomic ilarly, nonlinear functions with bounded total variation or lying norms. Nevertheless, grid mismatch and other coherence issues in an RKHS can be written as a finite linear combination of basis often lead to discretized versions of sparse signals that are not functions. Under certain smoothness assumptions, nonlinearity sparse. Even if they are, recovering sparse solutions using convex can be addressed using “linear-in-the-parameters” methods. relaxations requires assumptions that may be hard to meet in prac- tice. What is more, problems involving nonlinear measurements Due to the limited number of measurements, however, dis- remain non-convex even after relaxing the sparsity objective. We cretization often leads to underdetermined problems. Sparsity address these issues by directly tackling the continuous, nonlinear priors then play an important role in achieving state-of-the-art re- problem cast as a sparse functional optimization program. We sults by leveraging the assumption that there exists a signal repre- prove that when these problems are non-atomic, they have no sentation in terms of only a few atoms from an overparametrized duality gap and can therefore be solved efficiently using duality and (stochastic) convex optimization methods. We illustrate the dictionary [13], [16], [17]. Since fitting these models leads to wide range of applications of this approach by formulating and non-convex (and possibly NP-hard [18]) problems, sparsity is solving problems from nonlinear spectral estimation and robust typically replaced by a tractable relaxation based on an atomic classification. norm (e.g., the 1-norm). For linear, incoherent dictionaries, Index Terms—Functional optimization, sparsity, nonlinear these relaxed problems have be shown to retrieve the desired models, strong duality, compressive sensing. sparse solution [16], [17]. Nevertheless, discretized continuous problems rarely meet these conditions. Only in specific in- I. INTRODUCTION stances, such as line spectrum estimation, there exist guarantees HE analog and often nonlinear nature of the physical for relaxations that forgo discretization [12], [19]–[24]. T world make for two of the main challenges in signal and This discretization/relaxation approach, however, is not al- information processing applications. Indeed, there are many ways effective. Indeed, discretization can lead to grid mismatch examples of inherently continuous1 problems, such as spectral issues and even loss of sparsity: infinite dimensional sparse estimation, image recovery, and source localization [3]–[6], as signals need not be sparse when discretized [25]–[27]. Also, well as nonlinear ones, e.g., magnetic resonance fingerprinting, sampling theorems are sensitive to the function class considered spectrum cartography, and manifold data sparse coding [7]–[9]. and are often asymptotic: results improve as the discretization These challenges are often tackled by imposing structure on the becomes finer. This leads to high dimensional statistical prob- signals. For instance, bandlimited, finite rate of innovation, or lems with potentially poor numerical properties (high condition union-of-subspaces signals can be processed using an appropri- number). In fact, 1-norm-based recovery of spikes on fine ate discrete set of samples [10]–[13]. Functions in reproducing grids (essentially) finds twice the number of actual spikes and the number of support candidate points increases as the number of measurements decreases [28], [29]. Furthermore, performance Manuscript received March 22, 2019; revised September 18, 2019 and January 31, 2020; accepted March 15, 2020. Date of publication March 30, 2020; date guarantees for convex relaxations rely on incoherence assump- of current version April 28, 2020. The associate editor coordinating the review tions (e.g., restricted isometry/eigenvalue properties) that may of this manuscript and approving it for publication was Prof. Vincent Y. F. be difficult to meet in practice and are NP-hard to check [30]– Tan. The work of Luiz Chamon and Alejandro Ribeiro were supported by ARL DCIST CRA W911NF-17-2-0181. This article was presented in part at the IEEE [32]. Finally, these guarantees hold for linear measurements International Conference on Acoustics, Speech and Signal Processing, Calgary, models. AB, Canada, 2018 and in part at the IEEE International Conference on Acoustics, Directly accounting for nonlinearities in sparse models makes Speech and Signal Processing, Brighton, U.K., 2019. (Corresponding author: Luiz F. O. Chamon.) a difficult problem harder, since the optimization program re- Luiz F. O. Chamon and Alejandro Ribeiro are with the Department of mains non-convex even after relaxing the sparsity objective. Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, This is evidenced by the weaker guarantees existing for 1-norm PA 19104 USA (e-mail: [email protected]; [email protected]). Yonina C. Eldar is with Math and Computer Science Department, Weiz- relaxations in nonlinear compressive sensing problems [33], mann Institute of Science, Rehovot 234, Israel (e-mail: yonina.eldar@ [34]. Though “linear-in-the-parameters” models, such as splines weizmann.ac.il). or kernel methods, may sometimes be used (e.g., spectrum Digital Object Identifier 10.1109/TSP.2020.2982834 1Throughout this work, we use the term “continuous” only in contrast to cartography [8]), they are not applicable in general. Indeed, “discrete” and not to refer to a smoothness property of signals. the number of kernels needed to represent a generic nonlinear 1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: Weizmann Institute of Science. Downloaded on May 05,2020 at 07:37:43 UTC from IEEE Xplore. Restrictions apply. 2450 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020 model may be so large that the solution is no longer sparse. A general SFP is then defined as the optimization problem What is more, there is no guarantee that these models meet the β β β incoherence assumptions required for the convex relaxation to minimize F0 [X( ), ] d + λ X L0 X∈X , z∈Cp Ω be effective [13], [16], [17]. In this work, we propose to forgo both discretization and subject to gi(z) ≤ 0,i=1,...,m relaxation and directly tackle the continuous problem using (P-SFP) sparse functional programming. Although sparse functional z = F [X(β), β] dβ programs (SFPs) combine the infinite dimensionality of func- Ω tional programming with the non-convexity of sparsity and X(β) ∈Pa.e. nonlinear atoms, we show that they are tractable under mild where λ>0 is a regularization parameter that controls the conditions (Theorem 1). To do so, this paper develops the p sparsity of the solution; gi : C → R are convex functions; theory of sparse functional programming by formulating a gen- F0 : C × Ω → R is an optional, not necessarily convex regu- eral SFP (Section II), deriving its Lagrangian dual problem 2 larization term (e.g., take F0(x, β)=|x| for shrinkage); F : (Section II-B), and proving that strong duality holds under mild C × Ω → Cp is a vector-valued (possibly nonlinear) function; conditions (Section III). This shows that SFPs can be solved P is a (possibly non-convex) set defining an almost every- exactly by means of their dual problems. Moreover, we use where (a.e.) pointwise constraints on X, i.e., a constraint that this result to obtain a relation between minimizing the support holds for all β ∈ Ω except perhaps over a set of measure of a function (“L0-norm”) and its L1-norm, even though the zero (e.g., P = {x ∈ C ||x|≤Γ} for some Γ > 0); and X is latter may yield non-sparse solutions (Section III-B). We then a decomposable function space, i.e., if X, X ∈X, then for propose two algorithms to solve SFPs, based on subgradient any Z∈Bit holds that X¯ ∈Xfor and stochastic subgradient ascent, by leveraging different nu- merical integration methods (Section IV). Finally, we illustrate X(β), β ∈Z X¯(β)= . the expressiveness of SFPs by using them to cast different X (β), β ∈Z/ signal processing problems and provide numerical examples to showcase the effectiveness of this approach (Section V). Lebesgue spaces (e.g., X = L2 or X = L∞) or more generally Throughout the paper, we use lowercase boldface letters for Orlicz spaces are typical examples of decomposable function vectors (x), uppercase boldface letters for matrices (X), calli- spaces. The spaces of constant or continuous functions, for graphic letters for sets (A), and fraktur font for measures (h). In instance, are not decomposable [35]. particular, we denote the Lebesgue measure by m.WeuseC to The linear, continuous sparse recovery/denoising problem denote the set of complex numbers, R for real numbers, and R+ is a particular case of (P-SFP). Here, we seek to represent y ∈ Rp for non-negative√ real numbers. For a complex number z = a + a signal as a linear combination of a continuum of ∈ ⊂ R jb, j = −1, we write Re[z]=a for its real part and Im[z]=b atoms φ(β) indexed by β Ω , i.e., as Ω X(β)φ(β)dβ, for its imaginary part. We use zH for the conjugate transpose using a sparse functional coefficient X.