Estimating Unknown Sparsity in Compressed Sensing

Estimating Unknown Sparsity in Compressed Sensing Miles E. Lopes [email protected] UC Berkeley, Dept. Statistics, 367 Evans Hall, Berkeley, CA 94720-3860 n p Abstract where A R × is a user-specified measurement matrix, 2n is a random noise vector, and n is much In the theory of compressed sensing (CS), the R smaller2 than the signal dimension p. During the last sparsity x of the unknown signal x p 0 R several years, the theory of CS has drawn widespread is commonlyk k assumed to be a known parame-2 attention to the fact that this seemingly ill-posed prob- ter. However, it is typically unknown in prac- lem can be solved reliably when x is sparse | in tice. Due to the fact that many aspects of the sense that the parameter x := card j : x = 0 CS depend on knowing x , it is important 0 j 0 is much less than p. For instance,k k if n fis approxi-6 g to estimate this parameterk k in a data-driven mately x log(p= x ), then accurate recovery can way. A second practical concern is that x 0 0 0 be achievedk k with highk k probability when A is drawn is a highly unstable function of x. In partic-k k from a Gaussian ensemble (Donoho, 2006; Candès ular, for real signals with entries not exactly et al., 2006). Along these lines, the value of the pa- equal to 0, the value x = p is not a useful 0 rameter x is commonly assumed to be known in description of the effectivek k number of coordi- 0 the analysisk k of recovery algorithms | even though it nates. In this paper, we propose to estimate a is typically unknown in practice. Due to the funda- stable measure of sparsity s(x) := x 2= x 2, 1 2 mental role that sparsity plays in CS, this issue has which is a sharp lower bound on k xk .k Ourk 0 been recognized as a significant gap between theory estimation procedure uses only a smallk k num- and practice by several authors (Ward, 2009; Eldar, ber of linear measurements, does not rely on 2009; Malioutov et al., 2008). Nevertheless, the liter- any sparsity assumptions, and requires very ature has been relatively quiet about the problems of little computation. A confidence interval for estimating this parameter and quantifying its uncer- s(x) is provided, and its width is shown to tainty. have no dependence on the signal dimension p. Moreover, this result extends naturally to the matrix recovery setting, where a soft ver- 1.1. Motivations and the role of sparsity sion of matrix rank can be estimated with At a conceptual level, the problem of estimating x 0 analogous guarantees. Finally, we show that is quite different from the more well-studied prob-k k the use of randomized measurements is essen- lems of estimating the full signal x or its support set tial to estimating s(x). This is accomplished S := j : xj = 0 . The difference arises from sparsity by proving that the minimax risk for estimat- assumptions.f 6 Ong one hand, a procedure for estimating ing s(x) with deterministic measurements is x 0 should make very few assumptions about sparsity large when n p. k k (if any). On the other hand, methods for estimating x or S often assume that a sparsity level is given, and then impose this value on the solution xb or Sb. Con- 1. Introduction sequently, a simple plug-in estimate of x , such as k k0 x 0 or card(Sb), may fail when the sparsity assump- The central problem of compressed sensing (CS) is to kbk p tions underlying x or S are invalid. estimate an unknown signal x R from n linear mea- b b 2 surements y = (y1; : : : ; yn) given by To emphasize that there are many aspects of CS that depend on knowing x , we provide several examples y = Ax + , (1) 0 below. Our main pointk k here is that a method for esti- th mating x is valuable because it can help to address Proceedings of the 30 International Conference on Ma- k k0 chine Learning, Atlanta, Georgia, USA, 2013. JMLR: a broad range of issues. W&CP volume 28. Copyright 2013 by the author(s). Estimating Unknown Sparsity in Compressed Sensing Modeling assumptions. One of the core mod- we must also remember that such a “certificate” • eling assumptions invoked in applications of CS is not meaningful unless we can check that k is is that the signal of interest has a sparse rep- consistent with the true signal. resentation. Likewise, the problem of checking Recovery algorithms. When recovery al- whether or not this assumption is supported by • data has been an active research topic, partic- gorithms are implemented, the sparsity level ularly in in areas of face recognition and image of x is often treated as a tuning parame- classification (Rigamonti et al., 2011; Shi et al., ter. For example, if k is a presumed bound on x , then the Orthogonal Matching Pur- 2011). In this type of situation, an estimate dx 0 0 suitk algorithmk (OMP) is typically initialized to that does not rely on any sparsity assumptionsk k is run for k iterations. A second example is a natural device for validating the use of sparse the Lasso algorithm, which computes the so- representations. 2 p lution xb argmin y Av 2 + λ v 1 : v R , for some2 choice offkλ −0 .k The sparsityk k 2 of x gis The number of measurements. If the choice ≥ b • of n is too small compared to the \critical" num- determined by the size of λ, and in order to se- lect the appropriate value, a family of solutions ber n∗(x) := x 0 log(p= x 0), then there are known information-theoretick k k k barriers to the ac- is examined over a range of λ values. In the case curate reconstruction of x (Arias-Castro et al., of either OMP or Lasso, a sparsity estimate dx k k0 2011). At the same time, if n is chosen to be much would reduce computation by restricting the pos- larger than n∗(x), then the measurement process sible choices of λ or k, and it would also ensure is wasteful, as there are known algorithms that that the chosen values conform to the true signal. can reliably recover x with approximately n∗(x) measurements (Davenport et al., 2011). 1.2. An alternative measure of sparsity To deal with the selection of n, a sparsity esti- Despite the important theoretical role of the param- mate dx 0 may be used in two different ways, de- k k eter x in many aspects of CS, it has the practical pending on whether measurements are collected k k0 sequentially, or in a single batch. In the sequential drawback of being a highly unstable function of x. In p case, an estimate of x 0 can be computed from particular, for real signals x R whose entries are not k k exactly equal to 0, the value2 x = p is not a useful a set of \preliminary" measurements, and then k k0 the estimated value dx determines how many description of the effective number of coordinates. k k0 additional measurements should be collected to In order to estimate sparsity in a way that accounts recover the full signal. Also, it is not always nec- for the instability of x , it is desirable to replace k k0 essary to take additional measurements, since the the `0 norm with a \soft" version. More precisely, preliminary set may be re-used to compute xb (as we would like to identify a function of x that can be discussed in Section5). Alternatively, if all of the interpreted like x 0, but remains stable under small measurements must be taken in one batch, the perturbations ofkx.k A natural quantity that serves this value dx 0 can be used to certify whether or not purpose is the numerical sparsity enoughk measurementsk were actually taken. x 2 s(x) := 1 ; (2) The measurement matrix. Two of the most k k2 x 2 • well-known design characteristics of the matrix A k k are defined explicitly in terms of sparsity. These which always satisfies 1 s(x) p for any non-zero ≤ ≤ are the restricted isometry property of order k x. Although the ratio x 2= x 2 appears sporadically k k1 k k2 (RIP-k), and the restricted null-space property of in different areas (Tang & Nehorai, 2011; Hurley & order k (NSP-k), where k is a presumed upper Rickard, 2009; Hoyer, 2004; Lopes et al., 2011), it does bound on the sparsity level of the true signal. not seem to be well known as a sparsity measure in CS. Since many recovery guarantees are closely tied A key property of s(x) is that it is a sharp lower bound to RIP-k and NSP-k, a growing body of work has on x for all non-zero x, been devoted to certifying whether or not a given k k0 matrix satisfies these properties (d'Aspremont & s(x) x 0; (3) El Ghaoui, 2011; Juditsky & Nemirovski, 2011; ≤ k k Tang & Nehorai, 2011). When k is treated as which follows from applying the Cauchy-Schwarz in- given, this problem is already computationally equality to the relation x 1 = x; sgn(x) . (Equality difficult. Yet, when the sparsity of x is unknown, in (3) is attained iff thek non-zerok h coordinatesi of x are Estimating Unknown Sparsity in Compressed Sensing equal in magnitude.) We also note that this inequal- methods consider a collection of (say m) solutions (1) (m) ity is invariant to scaling of x, since s(x) and x 0 are xb ;:::; xb obtained from different values θ1; : : : ; θm individually scale invariant.

Load more