A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter
Total Page:16
File Type:pdf, Size:1020Kb
A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter To cite this version: Nicholas Schachter. A New Algorithm for Non-Negative Sparse Approximation. 2020. hal- 02888300v1 HAL Id: hal-02888300 https://hal.archives-ouvertes.fr/hal-02888300v1 Preprint submitted on 2 Jul 2020 (v1), last revised 9 Jun 2021 (v5) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter July 2, 2020 Abstract In this article we introduce a new algorithm for non-negative sparse approximation problems based on a combination of the approaches used in orthogonal matching pursuit and basis de-noising pursuit towards solving sparse approximation problems. By taking advantage of structural properties inherent to non-negative sparse approximation problems, a branch and bound (BnB) scheme is developed that enables fast and accurate recovery of underlying dictionary atoms, even in the presence of noise. Detailed analysis of the performance of the algorithm is discussed, with attention specically paid to situations in which the algorithm will perform better or worse based on the properties of the dictionary and the required sparsity of the solution. Performance on test sets is presented along with possible directions for future research and improvements. 1 Introduction Non-negative sparse approximation (NNSA) is a special case of the sparse approximation (SA) problem. In SA we are given a dictionary D 2 Rmxn where m < n and a signal vector y 2 Rm and asked nd x 2 Rn such that jjy − Dxjjp, for a given choice of p-norm, and jjxjj0 are minimized. In NNSA we add the constraints x ≥ 0 element-wise and y − Dx ≥ 0 1. In specroscopy and applied chemistry, this problem is sometimes called mixture analysis, due to being commonly used to analysis unknown chemical mixtures. Unfortunately, optimization problems involving the 0-norm are known to be NP-Hard in many cases [6] so we must make do with methods of approximation [4]. Commonly chosen methods for this include `1 reg- ularization (also known as LASSO) [15], elastic net regularization (which includes `1 and `2 regularization as special cases) [19], matching pursuit and its extensions [12], and proximal gradient methods [5]. When D satises certain conditions relating to the matrix spark2 it can be shown that the convex re- laxation of the 0-norm formulation are guaranteed to nd the optimal solution to the problem [8], [16]. In situations where these conditions are not met these methods can often result in solutions that contain no- ticeable residual components. When it is suspected that some of the components of the measured signal are small relative to the others, determining whether these residual signals are necessary or extraneous becomes non-trivial. This challenge is especially acute for NNSA instances where it likely that a substance of interest is a very small proportion of the measured signal in total. Methods that are based on minimizing the residual norm will naturally be more prone towards missing these smaller components. Additionally, when the measured signal contains noise there are further complications. 1This constraint prevents overexplaining of the signal vector and is not strictly necessary, but is included here due to both empirically improving results and simplifying some of the analysis. 2The supremum of all integers k such that there exists a set of k columns of D that are linearly independent. In most (if not all) real-life scenarios there will be some degree of imprecision, or noise, present in the measured signal. If the signal/noise ratio in the measured signal is suciently large, methods based on a convex relaxation of the `0-norm constraint will lead to extraneous coecients being non-zero. Empirically, the signal/noise ratio does not have to be particularly poor in order for this eect to occur; in one of the tests detailed in section X the signal/noise ratio is roughly 10 to 1 and this eect can be observed. Fundamentally, this is because we are not actually solving s.t. but s.t. argminx jjy − Dxjjp x ≥ 0 argminx jjye− Dxjjp x ≥ 0 where is our inexact measurement of the pure signal. ye As the goal of NNSA is to estimate the estimate the relative amounts of dictionary atoms present in a signal, the ideal point of comparison is the pure signal, not the measured one. Obviously in real life we cannot actually observe the pure signal, but with some assumptions we can reconstruct a very good post-hoc approximation of it. The primary assumption necessary is that the dictionary D contains nearly noise-free representations of its atoms. This is a reasonable assumption to make when either the dictionary has been compiled on a more precise instrument than the tool being used to take measurements of the target spectra, or the dictionary atoms are the results of numerous measurements averaged out in order to reduce noise, as is often the case in applications of hand-held spectroscopic tools. The second major assumption is that the noise present in the measurement is randomly distributed (ideally with a mean of 0, but this is not strictly necessary). If the noise is randomly distributed its eects will, on average, be distributed globally across the measured signal and thus the improvement in t gained by accounting for the noise in one area will be outweighed by the loss introduced elsewhere as a result. When combined with an explicit sparsity constraint, which ensures that extraneous components are not added to the estimate of the pure signal in an attempt to account for noise, we can generate a close approximation of the pure signal by nding the subset of atoms in the dictionary that best ts the measured signal. This is the fundamental underpinning of the algorithm presented in this article. In section 2 we describe the algorithm, prove that it always results in an optimal solution to the NNSA problem, and examine its asymptotic time-complexity. Section 3 contains theoretical analysis of the condi- tions necessary for the algorithm to perform optimally in terms of speed, as well as detailed exposition on the relevance of the composition of the dictionary atoms when performing sparse approximation in general. Section 4 presents the results of computational tests done using this algorithm with a simulated dictionary as compared to using LASSO, with specic emphasis on the relative performances in correctly identifying and weighting smaller components in the measured signal. Section 5 summarizes the conclusions of this article and describes possible areas of improvement and future research. 2 The Algorithm Given a dictionary D 2 Rmxn, a measured vector y 2 Rm, a sparsity parameter k, and a noise estimation , the algorithm works as follows. 1. Set and set the minimum residual to jjyjj1− . j = 0 `1 k−j 2. For all atoms in D, minimize jjDix − yjj1 subject to x ≥ 0 and y ≥ Dix element-wise. Store the indices of all atoms such that jjyjj1− . 1 − jjDix − yjj1 ≥ k−j 3. Set j = j + 1 4. If , set the minimum residual to jjyjj1− , otherwise return the best value of and the corre- k > j `1 k−j x sponding indices. 5. For each stored index (or combination of indices), iterate over all other atoms in D and minimize jjDSx − yjj1 subject to x ≥ 0 and y ≥ Dix element-wise, where S is the set of atoms being examined. Store all combinations of indices such that jjyjj1− . 1 − jjDSx − yjj1 ≥ k−j 6. If k > j, go to step 4, otherwise return the best value of x and the corresponding indices. 2 Algorithm 1 BnB Algorithm for NNSA Precondition: D is an m by n dictionary with m < n, is an estimate of the `1-norm of the noise in y 1: function SparseApprox(D; k; y; ) 2: jjyjj1− λ k 3: S1; γ; ρ ; 4: τ 1 5: for i 1 to n do 6: x argminx jjDix − yjj1 s.t. x ≥ 0 & y ≥ Dix element-wise 7: if 1 − jjDix − yjj1 ≥ λ then 8: S1 [S1 ; i] . [x ; y]: concatenation of x and y 9: if jjDix − yjj1 < τ then 10: τ jjDix − yjj1 11: γ x 12: ρ i 13: for i 2 to k do 14: Si ; 15: jjyjj1− λ k−i−1 16: for j 1 to jSi−1j do 17: for q 1 to n do 18: s [Si−1[j] ; q] . The elements of Si are sets of indices of size i 19: x argminx jjDsx − yjj1 s.t. x ≥ 0 & y ≥ Dsx element-wise 20: if 1 − jjDsx − yjj1 ≥ λ then 21: Si [Si ; s] 22: if jjDsx − yjj1 < τ then 23: τ jjDsx − yjj1 24: γ x 25: ρ s 26: return τ; γ; ρ This algorithm can be seen as an extension of orthogonal matching pursuit, which iterates k times over the atoms in the dictionary greedily tracks the subset of atoms that results in the best least squares t to the observed signal [17]. By taking advantage of the non-negativity of the data being analyzed, we can use the properties of the `1 norm to establish an upper bound of goodness of t for each atom (and combination of atoms) as the algorithm progresses, as well as a minimum goodness of t bound at each stage, removing candidate solutions whose upper bound does not satisfy the lower bound for an optimal solution.