Surveying and Comparing Simultaneous Sparse Approximation (Or Group-Lasso) Algorithms

Signal Processing 91 (2011) 1505–1526 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Review Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms A. Rakotomamonjy LITIS EA4108, University of Rouen, Avenue de l’universite´, 76800 Saint Etienne du Rouvray, France article info abstract Article history: In this paper, we survey and compare different algorithms that, given an overcomplete Received 20 April 2010 dictionary of elementary functions, solve the problem of simultaneous sparse signal Received in revised form approximation, with common sparsity profile induced by a ‘pÀ‘q mixed-norm. Such a 4 November 2010 problem is also known in the statistical learning community as the group lasso Accepted 16 January 2011 problem. We have gathered and detailed different algorithmic results concerning these Available online 25 January 2011 two equivalent approximation problems. We have also enriched the discussion by Keywords: providing relations between several algorithms. Experimental comparisons of the Simultaneous sparse approximation detailed algorithms have also been carried out. The main lesson learned from these Block-sparse regression experiments is that depending on the performance measure, greedy approaches and Group lasso iterative reweighted algorithms are the most efficient algorithms either in term of Iterative reweighted algorithms computational complexities, sparsity recovery or mean-square error. & 2011 Elsevier B.V. All rights reserved. Contents 1. Introduction ........................................................................................ 1506 1.1. Problem formalization .......................................................................... 1506 2. Solving the ‘1–‘2 optimization problem .................................................................. 1507 2.1. Block coordinate descent . ...................................................................... 1507 2.1.1. Deriving optimality conditions ............................................................1508 2.1.2. The algorithm and its convergence . ........................................................1508 2.2. Landweber iterations ........................................................................... 1509 2.3. Other works . ............................................................................... 1510 3. Generic algorithms for large classes of p and q ............................................................ 1510 3.1. M-FOCUSS algorithm ........................................................................... 1510 3.1.1. Deriving the algorithm. .................................................................1510 3.1.2. Discussing M-FOCUSS . .................................................................1511 3.2. Automatic relevance determination approach ....................................................... 1511 3.2.1. Exhibiting the relation with ARD. ........................................................1512 3.3. Solving the ARD formulation for p ¼ 1 and 1rqr2.................................................. 1512 3.4. Iterative reweighted ‘1À‘q algorithms for po1 ..................................................... 1513 3.4.1. Iterative reweighted algorithm ............................................................1513 3.4.2. Connections with Majorization-Minimization algorithm . ......................................1514 3.4.3. Connection with group bridge Lasso ........................................................1514 E-mail address: [email protected] 0165-1684/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2011.01.012 1506 A. Rakotomamonjy / Signal Processing 91 (2011) 1505–1526 4. Specific case algorithms. .............................................................................. 1515 4.1. S-OMP. ..................................................................................... 1515 4.2. M-CosAmp ................................................................................... 1515 4.3. Sparse Bayesian learning and reweighted algorithm . ................................................. 1516 5. Numerical experiments . .............................................................................. 1517 5.1. Experimental set-up............................................................................ 1517 5.2. Comparing ‘1À‘2 M-BP problem solvers . .......................................................... 1518 5.3. Computational performances. ................................................................... 1519 5.4. Comparing performances........................................................................ 1520 6. Conclusions . ....................................................................................... 1523 Acknowledgments ................................................................................... 1523 Appendix A. ....................................................................................... 1523 A.1. J1,2ðCÞ subdifferential .......................................................................... 1523 A.2. Proof of Lemma 1. ............................................................................ 1524 A.3. Proof of Eq. (37) . ............................................................................ 1524 References . ....................................................................................... 1524 1. Introduction the signal matrix S and the dictionary U under the hypothesis that all signals si share the same sparsity Since several years now, there has been a lot of interest profile. This latter hypothesis can also be translated into about sparse signal approximation. This large interest the coefficient matrix C having a minimal number of non- comes from frequent wishes of practitioners to represent zero rows. In order to measure the number of non-zero data in the most parsimonious way. rows of C, a possible criterion is the so-called row-support Recently, researchers have focused their efforts on a or row-diversity measure of a coefficient matrix defined as natural extension of sparse approximation problem which is the problem of finding jointly sparse representations of rowsuppðCÞ¼fi 2½1 ÁÁÁM : c a0 for some kg multiple signal vectors. This problem is also known as i,k simultaneous sparse approximation and it can be stated The row-support of C tells us which atoms of the dic- as follows. Suppose we have several signals describing the tionary have been used for building the signal matrix. same phenomenon, and each signal is contaminated by Hence, if the cardinality of the row-support is lower than noise. We want to find the sparsest approximation of each the dictionary cardinality, it means that at least one atom signal by using the same set of elementary functions. of the dictionary has not been used for synthesizing the Hence, the problem consists in finding the best approx- signal matrix. Then, the row-‘ pseudo-norm of a coeffi- imation of each signal while controlling the number of 0 cient matrix can be defined as: JCJ ¼jrowsuppðCÞj. functions involved in all the approximations. row-0 According to this definition, the simultaneous sparse Such a situation arises in many different application approximation problem can be stated as domains such as sensor networks signal processing [30,11], neuroelectromagnetic imaging [22,37,59], 1J J2 min SÀUC F source localization [31], image restoration [17], distribu- C 2 ted compressed sensing [27] and signal and image s:t: JCJrow-0 rT ð2Þ processing [24,49]. where J Á JF is the Frobenius norm and T a user-defined parameter that controls the sparsity of the solution. Note 1.1. Problem formalization that the problem can also take the different form: Formally, the problem of simultaneous sparse approx- min JCJrow-0 C imation is the following. Suppose that we have measured 1J J L s:t: 2 SÀUC F re ð3Þ L signals fsigi ¼ 1 where each signal is of the form si ¼ N NÂM Uci þe where si 2 R , U 2 R is a matrix of unit-norm For this latter formulation, the problem translates in M elementary functions, ci 2 R is a weighting vector and e minimizing the number of non-zero rows in the coeffi- is a noise vector. U will be denoted in the sequel as the cient matrix C while keeping control on the approxima- dictionary matrix. Since we have several signals, the tion error. Both problems (2) and (3) are appealing for overall measurements can be written as their formulation clarity. However, similarly to the single signal approximation case, solving these optimization S ¼ UCþE ð1Þ problems are notably intractable because J Á Jrow-0 is a with S ¼½s1 s2 ÁÁÁ sL is a signal matrix, C ¼½c1 c2 ÁÁÁ cL discrete-valued function. Two ways of addressing these a coefficient matrix and E a noise matrix. Note that in the intractable problems (2) and (3) are possible: relaxing the sequel, we have adopted the following notations: ci,Á and problem by replacing the J Á Jrow-0 function with a more cÁ,j respectively denote the i-th row and j-th column of tractable row-diversity measure or by using some sub- matrix C and ci,j is the i-th element in the j-th column of C. optimal algorithms. For the simultaneous sparse approximation (SSA) A large class of relaxed versions of J Á Jrow-0 proposed problem, the goal is then to recover the matrix C given in the literature are encompassed into the following A. Rakotomamonjy / Signal Processing 91 (2011) 1505–1526 1507 form [52]: suggest interested readers to follow for instance

Surveying and Comparing Simultaneous Sparse Approximation (Or Group-Lasso) Algorithms

Accelerating Matching Pursuit for Multiple Time-Frequency Dictionaries

Paper, We Present Data Fusion Across Multiple Signal Sources

Improved Greedy Algorithms for Sparse Approximation of a Matrix in Terms of Another Matrix

Privacy Preserving Identification Using Sparse Approximation With

Column Subset Selection Via Sparse Approximation of SVD

Modified Sparse Approximate Inverses (MSPAI) for Parallel

A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter

Efficient Implementation of the K-SVD Algorithm Using

Regularized Dictionary Learning for Sparse Approximation

Structured Compressed Sensing - Using Patterns in Sparsity

Sparse Estimation for Image and Vision Processing

Exact Sparse Approximation Problems Via Mixed-Integer Programming: Formulations and Computational Performance