TO APPEAR IN IEEE LETTERS 1 Sparse Recovery by means of Nonnegative Simon Foucart and David Koslicki

Abstract—This short note demonstrates that sparse is sometimes used (see [2]), it can be modified to recovery can be achieved by an `1-minimization ersatz incorporate the nonnegativity constraint, leading to easily implemented using a conventional nonnegative • nonnegative pursuit: least squares algorithm. A connection with orthogonal matching pursuit is also highlighted. The preliminary min kzk1 subject to Az = y and z ≥ 0. (NNBP) N results call for more investigations on the potential z∈R of the method and on its relations to classical sparse 1 recovery algorithms. It is folklore — but not well-known enough — that, for certain matrices A (including frequency matrices, Index Terms—compressive sensing, sparse recovery, see (1) below), if x is the unique solution of (BP) or `1-minimization, orthogonal matching pursuit, non- negative least squares, adjacency matrices of bipartite the unique solution of (NNBP), then it is in fact the graphs, k-mer frequency matrices, Gaussian matrices. unique solution of the feasibility problem

N Throughout this note, we consider the standard com- find z ∈ R such that Az = y and z ≥ 0. (F) pressive sensing problem, i.e., the recovery of a sparse Appendix A contains the proofs of this fact and of (possibly just almost sparse) vector x ∈ N from the R a few others. For frequency matrices, it is therefore mere knowledge of its measurement vector (possibly irrelevant to try and recover nonnegative vectors via corrupted by noise) y = Ax ∈ m with m < N. The R ` -minimization, since we might as well minimize a measurement A ∈ m×N is perfectly known. 1 R constant function subject to the constraints Az = y Among popular methods to produce an estimate for x, and z ≥ 0. We may also consider solving the problem we single out the very popular • nonnegative least squares: • basis pursuit:

min ky − Azk2 subject to z ≥ 0. (NNLS) min kzk1 subject to Az = y; (BP) z∈ N N R z∈R This minimization program is directly implemented • orthogonal matching pursuit: construct sequences in MATLAB as the function lsqnonneg. It executes (Sn) of support sets and (xn) of vectors from S0 = ∅, the active-set algorithm of Lawson and Hanson (see x0 = 0, and iterate, for n ≥ 1, [14, Chapter 23]), which seems particularly adapted n n−1 n n S = S ∪ {j }, with j chosen as (OMP1) to sparse recovery. Indeed, it is structured as ∗ n−1 ∗ n−1 • Lawson–Hanson algorithm: construct sequences (Sn) |A (y − Ax )jn | = max |A (y − Ax )j|, 1≤j≤N of support sets and (xn) of vectors from S0 = ∅, x0 = 0, n n x = argmin {ky − Azk2, supp(z) ⊆ S }. (OMP2) and iterate, for n ≥ 1, N z∈R Sn = Sn−1 ∪ {jn}, with jn chosen as (LH ) We will introduce a seemingly (and surprisingly) new 1 ∗ n−1 ∗ n−1 A (y − Ax )jn = max A (y − Ax )j, method which has analogies with both (BP) and 1≤j≤N (OMP). We start by dealing with nonnegative sparse n n x = argmin {ky − Azk2, supp(z) ⊆ S }, (LH2) vectors before considering arbitrary sparse vectors. e N z∈R n n n if xe ≥ 0, then one sets x to xe , I.NONNEGATIVE SPARSE RECOVERY otherwise, an inner loop reduces the active set Sn A. Exact Measurements n n n until xe ≥ 0 and then one sets x to xe . In a number of practical situations, the sparse vector Aside from the inner loop, there is a peculiar analogy x ∈ N to be recovered from y = Ax ∈ m has R R between this algorithm (dated from 1974) and orthog- nonnegative entries. Although ordinary basis pursuit onal matching pursuit (usually attributed to [16] in This work is supported by the NSF under grant DMS-1120622. 1993). In the reproducible MATLAB file accompanying S. Foucart is with the Department of Mathematics, University of this note, the benefit of the inner loop is revealed by Georgia, Athens, GA 30602, USA (e-mail: [email protected]). the frequencies of successful nonnegative recovery via D. Koslicki is with the Department of Mathematics, Oregon State University, Corvallis, OR 97331, USA (e-mail: (NNLS) and (OMP). [email protected]). Manuscript last updated February 14, 2014. 1see for instance [5] which contains results along these lines TO APPEAR IN IEEE SIGNAL PROCESSING LETTERS 2

Usual conditions ensuring sparse recovery via (BP) • `1-squared nonnegative regularization: and (OMP) involve restricted isometry constants of the min kzk2 + λ2kAz − yk2 subject to z ≥ 0 (NNREG) N 1 2 matrix A (see e.g. [10, chapter 6]). The guarantees are z∈R N valid for all sparse vectors x ∈ R simultaneously, in for some large λ > 0. Our main observation (already N particular for all nonnegative sparse vectors x ∈ R . exploited fruitfully in [12]) is that (NNREG) can be Under these conditions, from the strict point of view recast as a nonnegative least squares problem, for of recovery success (i.e., disregarding computational which the algorithm of Lawson and Hanson is par- effort), there is nothing to be gained from solving ticularly well adapted. Indeed, since z ≥ 0, we have (NNBP) or (F) instead of (BP). Even for (renormalized) PN kzk1 = zj, so (NNREG) takes the form adjacency matrices of lossless expanders, which do not j=1 possess the restricted isometry property (see [7]), the min kAze − yk2 subject to z ≥ 0 N e 2 z∈R sparse recovery of all sparse vectors x ∈ RN via (BP) (m+1)×N m+1 is ensured (see [3] or [10, chapter 13]). Since these where Ae ∈ R and ye ∈ R are defined by are frequency matrices, the recovery of all nonnegative 1 ··· 1  0  N A = and y = . sparse vectors x ∈ R is guaranteed via (NNBP) e λA e λy and (F), too, but again no gain in terms of recovery success arises. Nonetheless, there are empirical exam- As exhibited in the reproducible file, the execution ples where nonnegative sparse recovery via (NNBP) time compares favorably to state-of-the-art algorithms and (F) succeeds, while arbitrary sparse recovery via for `1-minimization (we selected YALL1 [20]) but the (BP) fails. One such example is provided by the k-mer choice of the parameter λ is critical in our current frequency matrix of DNA sequences. Its columns are implementation. indexed by organisms in a database and its rows are indexed by all 4k possible DNA subwords of length k. C. Theoretical Justifications It is obtained after column-normalizing the matrix To support the previous heuristics, we prove that, as whose (i, j)th entry counts the number of occurrence λ → ∞, the solutions xλ of (NNREG) converge to the of the ith subword in the DNA sequence of the jth solution x] of (NNBP) — assumed to be unique. This organism. We supply such a k-mer frequency matrix in is done for matrices A with nonnegative entries and the reproducible MATLAB file and we verify (using the all column-sums equal to one, i.e., optimization package CVX [11]) that all nonnegative m X sparse vectors supported on a given set S are recovered Ai,j ≥ 0 for all i, j and Ai,j = 1 for all j. (1) via (NNBP) — or equivalently via (F) or (NNLS) — i=1 while arbitrarily signed sparse vectors supported on S We call such matrices frequency matrices. Examples may not be recovered by (BP). are given by k-mer frequency matrices and by (renor- malized) adjacency matrices of left-regular bipartite graphs. For t ∈ [0, 1], the vector (1 − t)x] + txλ is λ B. Inaccurate Measurements nonnegative, so the minimality of x yields λ 2 2 λ 2 Let us now consider the more realistic situation kx k1 + λ kAx − yk2 N ] λ 2 2 ] λ 2 where the acquisition of the vector x ∈ R is not ≤ k(1 − t)x + tx k1 + λ kA((1 − t)x + tx ) − yk2. perfect, so that the measurement vector y = Ax + e ∈ The triangle inequality and the fact that y = Ax] give m contains an error term e ∈ m. If we know a R R λ 2 2 λ ] 2 reasonable bound kek2 ≤ η, a common strategy for the kx k1 + λ kA(x − x )k2 ] λ 2 2 2 λ ] 2 approximate recovery of x is to solve the inequality- ≤ ((1 − t)kx k1 + tkx k1) + λ t kA(x − x )k2. constrained basis pursuit consisting of minimizing After rearrangement, we obtain kzk1 subject to kAz − yk2 ≤ η. Another common strat- 2 2 λ ] 2 ] λ egy is the (see [17]) consisting of minimizing λ (1 − t )kA(x − x )k2 ≤ (1 − t)(kx k1 − kx k1) kAz − yk2 subject to kzk1 ≤ τ. Ordinary LASSO is ] λ × ((1 − t)kx k1 + (1 + t)kx k1). (2) sometimes used for nonnegative recovery (as in [1]), but one can also incorporate the constraint z ≥ 0, The left-hand side being nonnegative, we notice that λ ] leading to the so-called nonnegative garrote (see [4]). A kx k1 ≤ kx k1. We also take into account that third strategy (intimately linked to the previous two) 1 kA(xλ − x])k ≥ √ kA(xλ − x])k is basis pursuit denoising (see [8]) consisting of min- 2 m 1 2 imizing kzk1 + νkAz − yk2. For nonnegative recovery, m N N m 1 X X 1 X X incorporating an additional nonnegativity constraint = √ A (xλ − x] ) ≥ √ A (x] − xλ) m i,j j j m i,j j j has been employed e.g. in [15], [13]. Here, we replace i=1 j=1 j=1 i=1 the `1-norm in the objective function by a squared N 1 X ] λ 1 ] λ `1-norm (which is arguably more natural), i.e., we = √ (x − x ) = √ (kx k − kx k ). m j j m 1 1 consider the program j=1 TO APPEAR IN IEEE SIGNAL PROCESSING LETTERS 3

Substituting the latter inequality into (2) and dividing To see the equivalence of the optimization problems, by (1 − t)(kx]k − kxλk ) ≥ 0 implies that we simply observe that the objective function in (REG) 1 1   2 z+ λ (1 + t) ] λ ] λ is, with z = , (kx k − kx k ) ≤ (1 − t)kx k + (1 + t)kx k , e z− m 1 1 1 1 2 2 2 2 that is to say (kz+k1 + kz−k1) + λ kAz+ − Az− − yk2 = kAe ez − yek2. λ2 1 − t  λ2  − kx]k ≤ 1 + kxλk . B. Theoretical Justifications m 1 + t 1 m 1 We support the previous heuristics by a result valid With the optimal choice t = 1, we derive the estimates m×N for random matrices A ∈ R whose entries are λ2/m independent Gaussian variables with mean zero and kx]k ≤ kxλk ≤ kx]k , (3) 1 + λ2/m 1 1 1 variance 1/m. With high probability, these matrices λ ] satisfy both the `1-quotient property and the restricted which shows that kx k1 → kx k1 as λ → ∞. From here, λ ] isometry property of optimal order s  m ln(N/m). we can prove that x → x as λ → ∞. Indeed, if not, This implies (see [18] or [10, chapter 11]) that, for any there would exist a number of ε > 0 and an increasing x ∈ N and any e ∈ m in y = Ax + e, the solution x] λn ] R R sequence (λn) such that kx − x k1 ≥ ε for all n. λ of (BP) approximates x with an error controlled as Since the sequence (x n ) is bounded, we can extract √ λ ? N ] a subsequence (x nk ) converging to some x ∈ R . We kx − x k1 ≤ Cσs(x)1 + D skek2 (4) ? ] ? note that kx − x k1 ≥ ε. We also note that x ≥ 0 and for some constants C, D > 0. Here, the sparsity defect ? ] kx k1 = kx k1. Moreover, the inequality appears in σs(x)1 = min{kx−zk1, z is s-sparse}. Unlike λ λn 2 2 λn 2 ] 2 2 ] 2 Subsection I-C, we do not prove that the solution x of kx k k1 + λn kAx k − yk2 ≤ kx k1 + λn kAx − yk2 k k (REG) converges to x], but rather that it approximates = kx]k2 1 x with an error controlled almost as in (4), namely passed to the limit yields Ax? = y. By uniqueness λ √ E s ] kx − x k ≤ Cσ (x) + D skek + kxk (5) of x as a minimizer of kzk1 subject to Az = y and 1 s 1 2 λ2 1 z ≥ 0, we deduce that x? = x], which contradicts ? ] λ for some constants C, D, E > 0 when λ is chosen large kx − x k1 ≥ ε. We conclude that x does converge enough. First, we use the minimality of xλ to write to x], as announced. The reproducible file includes λ 2 2 λ 2 ] 2 2 ] 2 experimental evidence that the left-hand side of (3) kx k1 + λ kAx − yk2 ≤ kx k1 + λ kAx − yk2 captures the correct behavior of (kx]k − kxλk )/kx]k ] 2 1 1 1 = kx k1, (6) as a function of the parameter . λ λ ] so kx k1 ≤ kx k1 follows in particular. Next, the restricted isometry property implies the ` -robust null II.ARBITRARILY SIGNED SPARSE RECOVERY 2 space property (see [10, Theorem 6.13]) and we derive We consider in this section the sparse recovery of in turn (see [10, Theorem 4.20]) that arbitrary vectors, not necessarily nonnegative ones. λ ] We build on the previous considerations to introduce a kx − x k1 λ ] ] √ λ ] nonnegative-least-squares surrogate for basis pursuit. ≤ c(kx k1 − kx k1 + 2σs(x )1) + d skA(x − x )k2 √ The method does not require knowledge of convex ≤ 2cσ (x]) + d skAxλ − yk optimization to be implemented — it is easily written s 1 2 in MATLAB and its execution time is rather fast. for some constant c, d > 0. Therefore, λ ] 2 2 ] 2 2 λ 2 kx − x k1 ≤ 8c σs(x )1 + 2d skAx − yk2. (7) A. Rationale But (6) also implies that The crucial point is again to replace the ` -norm in 1 ] λ ] λ basis pursuit denoising by its square, thus introducing λ 2 (kx k1 + kx k1)(kx k1 − kx k1) kAx − yk2 ≤ 2 • `1-squared regularization: λ 2kx]k (kx] − xλk ) 2 2 2 1 1 min kzk + λ kAz − yk . (REG) ≤ 2 . N 1 2 λ z∈R N This inequality substituted in (7) gives Next, decomposing any z ∈ R as z = z+ − z− with 2 ] ] λ z ≥ 0 and z ≥ 0, (REG) is transformed into the 4d skx k1(kx − x k1) + − kxλ − x]k2 ≤ 8c2σ (x])2 + . nonnegative least squares problem 1 s 1 λ2 λ ] Solving this quadratic inequality in kx − x k1 yields min kAe z − yk2 subject to z ≥ 0, z∈ 2N e e e e R 2 ] r 4 2 ] 2 λ ] 2d skx k1 4d s kx k1 (m+1)×2N m+1 kx − x k ≤ + + 8c2σ (x])2 where Ae ∈ R and ye ∈ R are defined by 1 λ2 λ4 s 1     2 ] √ 1 ··· 1 1 ··· 1 0 4d skx k1 ] A = , y = . ≤ + 8cσs(x )1. e λA −λA e λy λ2 TO APPEAR IN IEEE SIGNAL PROCESSING LETTERS 4

] ] Now using the facts that kx k1 ≤ kxk1 + kx − xk1 and N ] ] X X X X X that σs(x )1 ≤ σs(x)1 + kx − xk1, we derive vi = vj + v` ≥ − vj + |v`| > 0. 2 √ i=1 j∈S `∈S j∈S `∈S λ ] 4d skxk1 kx − x k1 ≤ 2 + 8cσs(x)1 Let us now assume that the matrix A satisfies the λ PN 4d2s √  condition i=1 vi = 0 for all v ∈ ker A \{0}. This + + 8c kx] − xk . condition can be imposed by appending a row of ones λ2 1 to any matrix, but it is also fulfilled for frequency We finally deduce (5) by calling upon (4) — provided matrices, since Av = 0 gives 2 that λ is chosen large enough to have λ ≥ s, say. The m m N N m N X X X X X X experiment conducted in the reproducible file strongly 0 = (Av)i = Ai,jvj = Ai,jvj = vj. suggests that (5) reflects the correct behavior as a i=1 i=1 j=1 j=1 i=1 j=1 function of λ when x is exactly s-sparse and e = 0. Under this condition on A, (NNBP’)⇒(F’)⇒(BP’) holds, so that (BP), (NNBP), and (F) are all equivalent, as III.CONCLUSION announced in Subsection I-A. Indeed, (NNBP’) implies PN We have established a connection between basis (F’) because if vS ≥ 0, then i=1 vi could not be pursuit and orthogonal matching pursuit by introduc- zero by (NNBP’), and (F’) implies (BP’) because the P P ing a simple method to approximate `1-minimizers by impossibility of vS ≥ 0 gives `∈S v` < `∈S |v`|, so solutions of nonnegative least squares problems. Sev- X X X eral computational improvements can be contemplated vj = − v` < |v`|. (e.g. the choice of the parameter λ, the possibility j∈S `∈S `∈S of varying λ at each iteration within Lawson and REFERENCES Hanson’s algorithm, the combined reconstruction of [1] M. AlQuraishi and H.H. McAdams, Direct inference of protein- positive and nonnegative parts and the implementa- DNA interactions using methods, Proc. Natl. Acad. Sci., 108(36), 14819–14824, 2011. tion of QR-updates in the iterative algorithm). Sev- [2] A. Amir and O. Zuk, Bacterial community reconstruction using eral theoretical questions can also be raised (e.g. is compressed sensing, J. Comput. Biol., 18(11), 1723–1741, 2011. the number of iterations to solve (NNLS) at most [3] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, Combining geometry and combinatorics: A unified approach to proportional to the sparsity, at least under restricted sparse signal recovery, In ‘Proc. 46th Annual Allerton Conf. on isometry conditions as in [19], see also [10, Theorem Communication, Control, and Computing’, 798–805, 2008. 6.25]?). Importantly, a careful understanding of the [4] L. Breiman, Better subset regression using the nonnegative gar- rote, Technometrics, 37(4), 373–384, 1995. relations between `1-minimization-like and OMP-like [5] A. Bruckstein, M. Elad, and M. Zibulevsky, On the uniqueness algorithms would be extremely valuable. of nonnegative sparse solutions to underdetermined systems of equations, IEEE Trans. Inf. Theory, 54(11), 4813–4820, 2008. [6] E.J. Candes,` Compressive sampling, In ‘Proc. of the Interna- APPENDIX A tional Congress of Mathematicians’, 2006. N [7] V. Chandar, A negative result concerning explicit matrices with Let x ∈ R be a fixed nonnegative vector and let S the restricted isometry property, Preprint, 2008. be its support. It is not hard to verify that [8] S.S. Chen, D.L. Donoho, and M.A. Saunders, Atomic Decomposi- • x is the unique solution of (BP) if and only if tion by Basis Pursuit, SIAM J. Sci. Comput., 20(1), 33–61, 1998. [9] D.L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory,

X X 52(4), 1289–1306, 2006. for all v ∈ ker A \{0}, vj < |v`|; (BP’) [10] S. Foucart and H. Rauhut, A Mathematical Introduction to j∈S `∈S Compressive Sensing, Birkhauser,¨ 2013. [11] M. Grant, S. Boyd, and Y. Ye, CVX: Matlab software for disci- • x is the unique solution of (NNBP) if and only if plined convex programming, 2008. N [12] D. Koslicki, S. Foucart, and G. Rosen, Quikr: a method for rapid X reconstruction of bacterial communities via compressive sensing, for all v ∈ ker A\{0}, vS ≥ 0 ⇒ vi > 0; (NNBP’) Bioinformatics, 29(17), 2096–2102, 2013. i=1 [13] A.S. Lan, A.E. Waters, C. Studer, and R.G. Baraniuk, Sparse • x is the unique solution of (F) if and only 2 Factor Analysis for Learning and Content Analytics, Preprint. [14] C.L. Lawson and R.J. Hanson, Solving Least Squares Problems, Prentice-Hall, 1974 for all v ∈ ker A \{0}, vS ≥ 0 is impossible. (F’) [15] W. Li, J. Feng, and T. Jiang, IsoLasso: a LASSO regression The implication (F’)⇒(NNBP’) is clear ((F)⇒(NNBP) approach to RNA-Seq based transcriptome assembly, J. Comput. Biol., 18(11), 1693–1707, 2011. is clearer if the logical statement ‘FALSE implies any- [16] S.G. Mallat and Z. Zhang, Matching pursuits with time- thing’ is troublesome). The implication (BP’)⇒(NNBP’) frequency dictionaries, IEEE Trans. Signal Proces., 41(12), also holds in general: indeed, for v ∈ ker A \{0}, if 3397–3415, 1993. P P [17] R. Tibshirani, Regression shrinkage and selection via the lasso, vS ≥ 0, then j∈S vj < `∈S |v`| yields J. Roy. Stat. Soc. B, 58(1), 267–288, 1996. [18] P. Wojtaszczyk, Stability and instance optimality for Gaussian 2This allows to check that an algorithm’s output x] satisfying measurements in compressed sensing, Found. Comput. Math., Ax] = y and x] ≥ 0 is the true solution x: verify (F’) for 10, 1–13, 2010. S] = supp(x]) by using e.g. CVX [11] to solve the convex pro- [19] T. Zhang, Sparse recovery with orthogonal matching pursuit gram minimize max v subject to Av = 0 (and an additional under RIP, IEEE Trans. Inf. Theory, 57(9), 6215–6221, 2011. `∈S] ` constraint to exclude the zero solution) — the minimum should be [20] Y. Zhang, J. Yang, and W. Yin, YALL1: Your ALgorithms for L1, positive online at yall1.blogs.rice.edu, 2011.