Analysis of Regularized LS Reconstruction and Random Matrix

IEEE TRANSACTIONS ON INFORMATION THEORY 1 Analysis of Regularized LS Reconstruction and Random Matrix Ensembles in Compressed Sensing Mikko Vehkapera¨ Member, IEEE, Yoshiyuki Kabashima, and Saikat Chatterjee Member, IEEE Abstract—Performance of regularized least-squares estimation where A 2 RM×N represents the compressive (M ≤ N) in noisy compressed sensing is analyzed in the limit when sampling system. Measurement errors are captured by the the dimensions of the measurement matrix grow large. The M vector w 2 R and parameter σ controls the magnitude of sensing matrix is considered to be from a class of random 0 ensembles that encloses as special cases standard Gaussian, row- the distortions. The task is then to infer x from y, given the orthogonal, geometric and so-called T -orthogonal constructions. measurement matrix A. Depending on the chosen performance Source vectors that have non-uniform sparsity are included in the metric, the level of knowledge about the statistics of the source system model. Regularization based on `1-norm and leading to and error vectors, or computational complexity constraints, LASSO estimation, or basis pursuit denoising, is given the main multiple choices are available for achieving this task. One emphasis in the analysis. Extensions to `2-norm and “zero-norm” regularization are also briefly discussed. The analysis is carried possible solution that does not require detailed information 0 out using the replica method in conjunction with some novel about σ or statistics of fx ; wg is regularized least-squares matrix integration results. Numerical experiments for LASSO (LS) based reconstruction are provided to verify the accuracy of the analytical results. 1 The numerical experiments show that for noisy compressed x^ = arg min ky − Axk2 + c(x) ; sensing, the standard Gaussian ensemble is a suboptimal choice (2) x2 N 2λ for the measurement matrix. Orthogonal constructions provide a R superior performance in all considered scenarios and are easier where k · k is the standard Euclidean norm, λ a non-negative to implement in practical applications. It is also discovered that c : N ! for non-uniform sparsity patterns the T -orthogonal matrices design parameter and R R a fixed non-negative valued can further improve the mean square error behavior of the (cost) function. If we interpret (2) as a maximum a posteriori reconstruction when the noise level is not too high. However, probability (MAP) estimator, the implicit assumption would as the additive noise becomes more prominent in the system, the be that: 1) the additive noise can be modeled by a zero-mean simple row-orthogonal measurement matrix appears to be the Gaussian random vector with covariance λIM , and 2) the best choice out of the considered ensembles. distribution of the source is proportional to e−c(x). Neither is Index Terms—Compressed sensing, eigenvalues of random main general true for the model (1) and, therefore, reconstruction trices, compressed sensing matrices, noisy linear measurements, based on (2) is clearly suboptimal. ` minimization 1 In the sparse estimation framework, the purpose of the cost function c is to penalize the trial x so that some desired property of the source is carried over to the solution x^. In I. INTRODUCTION the special case when the measurements are noise-free, that is, ONSIDER the standard compressed sensing (CS) [1]– σ = 0, the choice λ ! 0 reduces (2) to solving a constrained C [3] setup where the sparse vector x0 2 RN of interest is optimization problem observed via noisy linear measurements min c(x^) s.t. y = Ax^: (3) arXiv:1312.0256v2 [cs.IT] 2 Feb 2016 N y = Ax0 + σw; (1) x^2R It is well-known that in the noise-free case the `1-cost c(x) = Manuscript received December 1, 2013; revised November 6, 2015; ac- P kxk1 = jxjj leads to sparse solutions that can be found cepted January 25, 2016. The editor coordinating the review of this manuscript j and approving it for publication was Prof. Venkatesh Saligrama. The research using linear programming. For the noisy case the resulting was funded in part by Swedish Research Council under VR Grant 621- scheme is called LASSO [4] or basis pursuit denoising [5] 2011-1024 (MV) and MEXT KAKENHI Grant No. 25120013 (YK). M. Vehkapera’s¨ visit to Tokyo Institute of Technology was funded by the MEXT 1 2 KAKENHI Grant No. 24106008. This paper was presented in part at the 2014 x^`1 = arg min ky − Axk + kxk1 : (4) N 2λ IEEE International Symposium on Information Theory. x2R M. Vehkapera¨ was with the KTH Royal Institute of Technology, Sweden and Aalto University, Finland. He is now with the Department of Electronic Just like its noise-free counterpart, it is of particular impor- and Electrical Engineering, University of Sheffield, Sheffield S1 3JD, UK. tance in CS since (4) can be solved by using standard convex (e-mail: m.vehkapera@sheffield.ac.uk) Y. Kabashima is with the Department of Computational Intelligence and optimization tools such as cvx [6]. Due to the prevalence Systems Science, Tokyo Institute of Technology, Yokohama 226-8502, Japan. of reconstruction methods based on `1-norm regularization in (e-mail: [email protected]) CS, we shall keep the special case of `1-cost c(x) = kxk1 as S. Chatterjee is with the School of Electrical Engineering and the ACCESS Linnaeus Center, KTH Royal Institute of Technology, SE-100 44 Stockholm, the main example of the paper, although it is known to be a Sweden. (e-mail: [email protected]) suboptimal choice in general. Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. 2 IEEE TRANSACTIONS ON INFORMATION THEORY A. Brief Literature Review passing (AMP) algorithm [22], [23] was introduced in [24] and shown to match the combinatorial results perfectly. Both of In the literature, compressed sensing has a strong connota- the above methods are mathematically rigorous and the AMP tion of sparse representations. We shall next provide a brief approach has the additional benefit that it provides also a low- review of the CS literature while keeping this in mind. The the- complexity computational algorithm that matches the threshold oretical works in CS can be roughly divided into two principle behavior. The downside is that extending these analysis for directions: 1) worst case analysis, and 2) average / typical case more general ensembles, both for the measurement matrix and analysis. In the former approach, analytical tools that examine the source vector, seems to be quite difficult. Alternative route the algebraic properties of the sensing matrix A, such as, is to use statistical mechanics inspired tools like the replica mutual coherence, spark or restricted isometry property (RIP) method [25]–[27]. are used. The goal is then to find sufficient conditions for the By now the replica method has been accepted in the A chosen property of that guarantee perfect reconstruction — information theory society as a mathematical tool that can at least with high probability. The latter case usually strives for tackle problems that are very difficult, or impossible, to solve A sharp conditions when the reconstruction is possible when is using other (rigorous) approaches. Although the outcomes sampled from some random distribution. Analytical tools vary of the replica analysis have received considerable success from combinatorial geometry to statistical physics methods. (see, e.g., [28]–[34] for some results related to the present Both, worst case and average case analysis have their merits paper), one should keep in mind that mathematical rigor is still and flaws as we shall discuss below. lacking in parts of the method [35]. However, recent results For mutual coherence, several works have considered the in mathematical physics have provided at least circumstantial case of noise-free observations (σ = 0) and `1-norm mini- evidence that the main problem of the replica method is mization based reconstruction. The main objective is usually most likely in the assumed structure of the solution [35]– to find the conditions that need to be satisfied between the [39] and not in the parts such as replica continuity that lack allowed sparsity level of x and the mutual coherence property mathematical proof. In particular, the mistake in the original of A so that exact reconstruction is possible. In particular, the solution of the Sherrington-Kirkpatrick spin glass has now authors of [7] established such conditions for the special case been traced to the assumption of replica symmetric (RS) ansatz when A is constructed by concatenating a pair of orthonormal in the saddle-point evaluation of the free energy. Indeed, the bases. These conditions were further refined in [8] and the end result of the Parisi’s full replica symmetry breaking (RSB) extension to general matrices was reported in [9] using the solution (see, e.g., [25]) has been proved to be correct [38], concept of spark. [39] in this case. Similar rigorous methods have also been Another direction in the worst case analysis was taken in applied in wireless communications [40] and error correction [10], where the basic setup (1) with sparse additive noise coding [41], [42], to name just a few examples1. was considered. The threshold for exact reconstruction under these conditions was derived using RIP. By establishing a connection between the Johnson-Lindenstrauss lemma and B. Related Prior Work RIP, the authors of [11] proved later that RIP holds with The authors of [28] analyzed the asymptotic performance high probability when M grows large for a certain class of of LASSO and “zero-norm” regularized LS by extending the random matrices. Special cases of this ensemble are, for ex- minimum mean square error (MMSE) estimation problem in ample, matrices whose components are independent identically code division multiple access (CDMA) to MAP detection in distributed (IID) Gaussian or Bernoulli random variables.

Analysis of Regularized LS Reconstruction and Random Matrix

Methods for Generating Variates From

Ebookdistributions.Pdf

A New Stochastic Asset and Contingent Claim Valuation Framework with Augmented Schrödinger-Sturm-Liouville Eigenprice Conversion

Compendium of Common Probability Distributions

Probability Distributions

Estimation for the Mean and the Variance of the Round-Off Error

Engineering Methods in Microsoft Excel

Driving Etiquette by Huei Peng Professor University of Michigan DISCLAIMER

Why Some Families of Probability Distributions Are Practically Efficient

Probability Distribution Compendium

Sub-Independence: an Expository Perspective

Contributions to Statistical Distribution Theory with Applications