Sparse Recovery by Means of Nonnegative Least Squares Simon Foucart and David Koslicki

TO APPEAR IN IEEE SIGNAL PROCESSING LETTERS 1 Sparse Recovery by means of Nonnegative Least Squares Simon Foucart and David Koslicki Abstract—This short note demonstrates that sparse is sometimes used (see [2]), it can be modified to recovery can be achieved by an `1-minimization ersatz incorporate the nonnegativity constraint, leading to easily implemented using a conventional nonnegative • nonnegative basis pursuit: least squares algorithm. A connection with orthogonal matching pursuit is also highlighted. The preliminary min kzk1 subject to Az = y and z ≥ 0: (NNBP) N results call for more investigations on the potential z2R of the method and on its relations to classical sparse 1 recovery algorithms. It is folklore — but not well-known enough — that, for certain matrices A (including frequency matrices, Index Terms—compressive sensing, sparse recovery, see (1) below), if x is the unique solution of (BP) or `1-minimization, orthogonal matching pursuit, nonnegative least squares, adjacency matrices of bipartite the unique solution of (NNBP), then it is in fact the graphs, k-mer frequency matrices, Gaussian matrices. unique solution of the feasibility problem N Throughout this note, we consider the standard com- find z 2 R such that Az = y and z ≥ 0: (F) pressive sensing problem, i.e., the recovery of a sparse Appendix A contains the proofs of this fact and of (possibly just almost sparse) vector x 2 N from the R a few others. For frequency matrices, it is therefore mere knowledge of its measurement vector (possibly irrelevant to try and recover nonnegative vectors via corrupted by noise) y = Ax 2 m with m < N. The R ` -minimization, since we might as well minimize a measurement matrix A 2 m×N is perfectly known. 1 R constant function subject to the constraints Az = y Among popular methods to produce an estimate for x, and z ≥ 0. We may also consider solving the problem we single out the very popular • nonnegative least squares: • basis pursuit: min ky − Azk2 subject to z ≥ 0: (NNLS) min kzk1 subject to Az = y; (BP) z2 N N R z2R This minimization program is directly implemented • orthogonal matching pursuit: construct sequences in MATLAB as the function lsqnonneg. It executes (Sn) of support sets and (xn) of vectors from S0 = ;, the active-set algorithm of Lawson and Hanson (see x0 = 0, and iterate, for n ≥ 1, [14, Chapter 23]), which seems particularly adapted n n−1 n n S = S [ fj g; with j chosen as (OMP1) to sparse recovery. Indeed, it is structured as ∗ n−1 ∗ n−1 • Lawson–Hanson algorithm: construct sequences (Sn) jA (y − Ax )jn j = max jA (y − Ax )jj; 1≤j≤N of support sets and (xn) of vectors from S0 = ;, x0 = 0, n n x = argmin fky − Azk2; supp(z) ⊆ S g: (OMP2) and iterate, for n ≥ 1, N z2R Sn = Sn−1 [ fjng; with jn chosen as (LH ) We will introduce a seemingly (and surprisingly) new 1 ∗ n−1 ∗ n−1 A (y − Ax )jn = max A (y − Ax )j; method which has analogies with both (BP) and 1≤j≤N (OMP). We start by dealing with nonnegative sparse n n x = argmin fky − Azk2; supp(z) ⊆ S g; (LH2) vectors before considering arbitrary sparse vectors. e N z2R n n n if xe ≥ 0, then one sets x to xe ; I. NONNEGATIVE SPARSE RECOVERY otherwise, an inner loop reduces the active set Sn A. Exact Measurements n n n until xe ≥ 0 and then one sets x to xe : In a number of practical situations, the sparse vector Aside from the inner loop, there is a peculiar analogy x 2 N to be recovered from y = Ax 2 m has R R between this algorithm (dated from 1974) and orthog- nonnegative entries. Although ordinary basis pursuit onal matching pursuit (usually attributed to [16] in This work is supported by the NSF under grant DMS-1120622. 1993). In the reproducible MATLAB file accompanying S. Foucart is with the Department of Mathematics, University of this note, the benefit of the inner loop is revealed by Georgia, Athens, GA 30602, USA (e-mail: [email protected]). the frequencies of successful nonnegative recovery via D. Koslicki is with the Department of Mathematics, Oregon State University, Corvallis, OR 97331, USA (e-mail: (NNLS) and (OMP). [email protected]). Manuscript last updated February 14, 2014. 1see for instance [5] which contains results along these lines TO APPEAR IN IEEE SIGNAL PROCESSING LETTERS 2 Usual conditions ensuring sparse recovery via (BP) • `1-squared nonnegative regularization: and (OMP) involve restricted isometry constants of the min kzk2 + λ2kAz − yk2 subject to z ≥ 0 (NNREG) N 1 2 matrix A (see e.g. [10, chapter 6]). The guarantees are z2R N valid for all sparse vectors x 2 R simultaneously, in for some large λ > 0. Our main observation (already N particular for all nonnegative sparse vectors x 2 R . exploited fruitfully in [12]) is that (NNREG) can be Under these conditions, from the strict point of view recast as a nonnegative least squares problem, for of recovery success (i.e., disregarding computational which the algorithm of Lawson and Hanson is par- effort), there is nothing to be gained from solving ticularly well adapted. Indeed, since z ≥ 0, we have (NNBP) or (F) instead of (BP). Even for (renormalized) PN kzk1 = zj, so (NNREG) takes the form adjacency matrices of lossless expanders, which do not j=1 possess the restricted isometry property (see [7]), the min kAze − yk2 subject to z ≥ 0 N e 2 z2R sparse recovery of all sparse vectors x 2 RN via (BP) (m+1)×N m+1 is ensured (see [3] or [10, chapter 13]). Since these where Ae 2 R and ye 2 R are defined by are frequency matrices, the recovery of all nonnegative 1 ··· 1 0 N A = and y = : sparse vectors x 2 R is guaranteed via (NNBP) e λA e λy and (F), too, but again no gain in terms of recovery success arises. Nonetheless, there are empirical exam- As exhibited in the reproducible file, the execution ples where nonnegative sparse recovery via (NNBP) time compares favorably to state-of-the-art algorithms and (F) succeeds, while arbitrary sparse recovery via for `1-minimization (we selected YALL1 [20]) but the (BP) fails. One such example is provided by the k-mer choice of the parameter λ is critical in our current frequency matrix of DNA sequences. Its columns are implementation. indexed by organisms in a database and its rows are indexed by all 4k possible DNA subwords of length k. C. Theoretical Justifications It is obtained after column-normalizing the matrix To support the previous heuristics, we prove that, as whose (i; j)th entry counts the number of occurrence λ ! 1, the solutions xλ of (NNREG) converge to the of the ith subword in the DNA sequence of the jth solution x] of (NNBP) — assumed to be unique. This organism. We supply such a k-mer frequency matrix in is done for matrices A with nonnegative entries and the reproducible MATLAB file and we verify (using the all column-sums equal to one, i.e., optimization package CVX [11]) that all nonnegative m X sparse vectors supported on a given set S are recovered Ai;j ≥ 0 for all i; j and Ai;j = 1 for all j: (1) via (NNBP) — or equivalently via (F) or (NNLS) — i=1 while arbitrarily signed sparse vectors supported on S We call such matrices frequency matrices. Examples may not be recovered by (BP). are given by k-mer frequency matrices and by (renormalized) adjacency matrices of left-regular bipartite graphs. For t 2 [0; 1], the vector (1 − t)x] + txλ is λ B. Inaccurate Measurements nonnegative, so the minimality of x yields λ 2 2 λ 2 Let us now consider the more realistic situation kx k1 + λ kAx − yk2 N ] λ 2 2 ] λ 2 where the acquisition of the vector x 2 R is not ≤ k(1 − t)x + tx k1 + λ kA((1 − t)x + tx ) − yk2: perfect, so that the measurement vector y = Ax + e 2 The triangle inequality and the fact that y = Ax] give m contains an error term e 2 m. If we know a R R λ 2 2 λ ] 2 reasonable bound kek2 ≤ η, a common strategy for the kx k1 + λ kA(x − x )k2 ] λ 2 2 2 λ ] 2 approximate recovery of x is to solve the inequality- ≤ ((1 − t)kx k1 + tkx k1) + λ t kA(x − x )k2: constrained basis pursuit consisting of minimizing After rearrangement, we obtain kzk1 subject to kAz − yk2 ≤ η. Another common strat- 2 2 λ ] 2 ] λ egy is the LASSO (see [17]) consisting of minimizing λ (1 − t )kA(x − x )k2 ≤ (1 − t)(kx k1 − kx k1) kAz − yk2 subject to kzk1 ≤ τ. Ordinary LASSO is ] λ × ((1 − t)kx k1 + (1 + t)kx k1): (2) sometimes used for nonnegative recovery (as in [1]), but one can also incorporate the constraint z ≥ 0, The left-hand side being nonnegative, we notice that λ ] leading to the so-called nonnegative garrote (see [4]). A kx k1 ≤ kx k1. We also take into account that third strategy (intimately linked to the previous two) 1 kA(xλ − x])k ≥ p kA(xλ − x])k is basis pursuit denoising (see [8]) consisting of min- 2 m 1 2 imizing kzk1 + νkAz − yk2. For nonnegative recovery, m N N m 1 X X 1 X X incorporating an additional nonnegativity constraint = p A (xλ − x] ) ≥ p A (x] − xλ) m i;j j j m i;j j j has been employed e.g.

Load more