<<

Frame Coherence and Sparse

Dustin G. Mixon∗, Waheed U. Bajwa†, Robert Calderbank† ∗Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544 †Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708

Abstract—The sparse signal processing literature often uses condition is equivalent to having the spectral norm be as small random sensing matrices to obtain performance guarantees. as possible; for an M ×N unit norm frame F , this equivalently Unfortunately, in the real world, sensing matrices do not always means kF k2 = N . come from random processes. It is therefore desirable to evaluate 2 M whether an arbitrary , or frame, is suitable for sensing Throughout the literature, applications require finite- sparse signals. To this end, the present paper investigates two dimensional frames that are nearly tight and have small worst- parameters that measure the coherence of a frame: worst-case case coherence [1]–[8]. Among these, a foremost application and average coherence. We first provide several examples of is sparse signal processing, where frames of small spectral frames that have small spectral norm, worst-case coherence, norm and/or small worst-case coherence are commonly used and average coherence. Next, we present a new lower bound on worst-case coherence and compare it to the Welch bound. Later, to analyze sparse signals [4]–[8]. Recently, [9] introduced we propose an algorithm that decreases the average coherence another notion of frame coherence called average coherence: of a frame without changing its spectral norm or worst-case N coherence. Finally, we use worst-case and average coherence, as 1 X νF := max hfi, fji . (2) opposed to the Restricted Isometry Property, to garner near- N−1 i∈{1,...,N} j=1 optimal probabilistic guarantees on both sparse signal detection j6=i and reconstruction in the presence of noise. This contrasts with recent results that only guarantee noiseless signal recovery from Note that, in addition to having zero worst-case coherence, arbitrary frames, and which further assume independence across orthonormal bases also have zero average coherence. It was the nonzero entries of the signal—in a sense, requiring small established in [9] that when νF is sufficiently smaller than average coherence replaces the need for such an assumption. µF , a number of guarantees can be provided for sparse signal I.INTRODUCTION processing. It is therefore evident from [1]–[9] that there is a pressing need for nearly tight frames with small worst-case Many classical applications, such as radar and error- and average coherence, especially in the area of sparse signal correcting codes, make use of over-complete spanning systems processing. [1]. Oftentimes, we may view an over-complete spanning This paper offers four main contributions in this regard. system as a frame. Take F = {fi}i∈I to be a collection of First, we discuss three types of frames that exhibit small spec- vectors in some separable H. Then F is a frame tral norm, worst-case coherence, and average coherence: nor- if there exist frame bounds A and B with 0 < A ≤ B < ∞ malized Gaussian, random harmonic, and code-based frames. 2 P 2 2 such that Akxk ≤ i∈I |hx, fii| ≤ Bkxk for every With all three frame parameters provably small, these frames x ∈ H. When A = B, F is called a tight frame. For finite- are guaranteed to perform well in relevant applications. Sec- dimensional unit norm frames, where I = {1,...,N}, the ond, performance in many applications is dictated by worst- worst-case coherence is a useful parameter: case coherence [1]–[8]. It is therefore particularly important to understand which worst-case coherence values are achievable. µF := max |hfi, fji|. (1) i,j∈{1,...,N} To this end, the Welch bound [1] is commonly used in the i6=j arXiv:1105.4279v1 [math.FA] 21 May 2011 literature. However, the Welch bound is only tight when the Note that orthonormal bases are tight frames with A = B = 1 number of frame elements N is less than the square of the and have zero worst-case coherence. In both ways, frames spatial dimension M [1]. Another lower bound, given in [10] form a natural generalization of orthonormal bases. and [11], beats the Welch bound when there are more frame In this paper, we only consider finite-dimensional frames. elements, but it is known to be loose for real frames [12]. Those not familiar with frame theory can simply view a finite- Given this context, our next contribution is a new lower bound dimensional frame as an M × N matrix of rank M whose on the worst-case coherence of real frames. Our bound beats columns are the frame elements. With this view, the tightness both the Welch bound and the bound in [10] and [11] when the number of frame elements far exceeds the spatial dimension. This work was supported by the Office of Naval Research under Grant N00014-08-1-1110, by the Air Force Office of Scientific Research under Third, since average coherence is new to the frame theory Grants FA9550-09-1-0551 and FA 9550-09-1-0643 and by NSF under Grant literature, we investigate how it relates to worst-case coherence DMS-0914892. Mixon was supported by the A.B. Krongard Fellowship. The and spectral norm. In particular, we want average coherence to views expressed in this article are those of the authors and do not reflect the official policy or position of the United States Air Force, Department of satisfy the following property, which is used in [9] to provide Defense, or the U.S. Government. various guarantees for sparse signal processing: Definition 1. We say an M × N unit norm frame F satisfies However, to the best of our knowledge, there is no result in the the Strong Coherence Property if literature that shows that random harmonic frames have small worst-case coherence. To fill this gap, the following theorem (SCP-1) µ ≤ 1 and (SCP-2) ν ≤ √µF , F 164 log N F M characterizes the spectral norm and the worst-case and average where µF and νF are given by (1) and (2), respectively. coherence of random harmonic frames. Since average coherence is so new, there is currently no in- Theorem 3 (Geometry of random harmonic frames). Let U be tuition as to when (SCP-2) is satisfied. As a third contribution, an N × N non-normalized discrete matrix, 2πik`/N this paper shows how to transform a frame that satisfies (SCP- explicitly, Uk` := e for each k, ` = 0,...,N − 1. Next, N 1) into another frame with the same spectral norm and worst- let {Bi}i=1 be a collection of independent Bernoulli random M case coherence that additionally satisfies (SCP-2). Finally, this variables with mean N , and take M := {i : Bi = 1}. paper uses the Strong Coherence Property to provide new Finally, construct an |M| × N harmonic frame F by col- guarantees on both sparse signal detection and reconstruction lecting rows of U which correspond to indices in M and in the presence of noise. These guarantees are related to those normalize the columns. Then F is a unit norm tight frame: 2 N N in [4], [5], [7], and we elaborate on this relationship in Section kF k2 = |M| . Furthermore, provided 16 log N ≤ M ≤ 3 , the V. In the interest of space, the proofs have been omitted following inequalities simultaneously hold with probability throughout, but they can be found in [13]. exceeding 1 − 4N −1 − N −2: 1 3 (i) 2 M ≤ |M| ≤ 2 M, II.FRAMECONSTRUCTIONS µF (ii) νF ≤ √ , Many applications require nearly tight frames with small |M| q 118(N−M) log N worst-case and average coherence. In this section, we give (iii) µF ≤ MN . three types of frames that satisfy these conditions. C. Code-based frames A. Normalized Gaussian frames Many structures in coding theory are also useful for con- Construct a matrix with independent, Gaussian distributed structing frames. Here, we build frames from a code that entries that have zero mean and unit variance. By normalizing originally emerged with Berlekamp in [20], and found recent the columns, we get a matrix called a normalized Gaussian reincarnation with [21]. We build a 2m × 2(t+1)m frame, frame. This is perhaps the most widely studied type of frame indexing rows by elements of F2m and indexing columns in the signal processing and statistics literature. by (t + 1)-tuples of elements from F2m . For x ∈ F2m and To be clear, the term “normalized” is intended to distinguish t+1 α ∈ F2m , the corresponding entry of the matrix F is the results presented here from results reported in earlier i Trα x+Pt α x2 +1 works, such as [9], [14]–[16], which only ensure that the frame √1 0 i=1 i Fxα = m (−1) , (3) elements of Gaussian frames have unit norm in expectation. 2 In other words, normalized Gaussian frames are frames with where Tr : F2m → F2 denotes the trace map, defined by Pm−1 2i individual frame elements independently and uniformly dis- Tr(z) = i=0 z . The following theorem gives the spectral tributed on the unit hypersphere in RM . norm and the worst-case and average coherence of this frame. That said, the following theorem characterizes the spectral Theorem 4 (Geometry of code-based frames). The 2m × norm and the worst-case and average coherence of normalized 2(t+1)m frame defined by (3) is unit norm and tight, i.e., Gaussian frames. kF k2 = 2tm, with worst-case coherence µ ≤ √ 1 and 2 F 2m−2t−1 Theorem 2 (Geometry of normalized Gaussian frames). Build average coherence ν ≤ √µF . F 2m a real M × N frame G by drawing entries independently at random from a Gaussian distribution of zero mean and III.FUNDAMENTALLIMITSONWORST-CASECOHERENCE unit variance. Next, construct a normalized Gaussian frame In many applications of frames, performance is dictated by gn F by taking fn := kg k for every n = 1,...,N. Provided worst-case coherence. It is therefore particularly important to N−1n 60 log N ≤ M ≤ 4 log N , then the following inequalities si- understand which worst-case coherence values are achievable. multaneously hold with probability exceeding 1 − 11N −1: √ To this end, the following bound is commonly used in the 15 log N (i) µF ≤ √ √ , literature: M√− 12 log N 15 log N (ii) νF ≤ √ , Theorem 5 (Welch bound [1]). Every M ×N unit norm frame M−√ 12M√log N√ q M+ N+ 2 log N N−M (iii) kF k2 ≤ √ √ . F has worst-case coherence µF ≥ M(N−1) . M− 8M log N 2 B. Random harmonic frames The Welch bound is not tight whenever N > M [1]. For this region, the following gives a better bound: Random harmonic frames, constructed by randomly select- ing rows of a discrete Fourier transform (DFT) matrix and Theorem 6 ([10], [11]). Every M × N unit norm frame F −1/(M−1) normalizing the resulting columns, have received considerable has worst-case coherence µF ≥ 1 − 2N . Taking M 2 attention lately in the compressed sensing literature [17]–[19]. N = Θ(a ), this lower bound goes to 1 − a as M → ∞. 1 Algorithm 1 Linear-time flipping 0.9 Input: An M × N unit norm frame F 0.8 Output: An M × N unit norm frame G that is flipping 0.7 equivalent to F

0.6 g1 ← f1 {Keep first frame element} 0.5 for n = 2 to N do Pn−1 Pn−1 0.4 if k i=1 gi + fnk ≤ k i=1 gi − fnk then

0.3 gn ← fn {Keep frame element for shorter sum} else 0.2 gn ← −fn {Flip frame element for shorter sum} 0.1 end if 0 end for 5 10 15 20 25 30 35 40 45 50 55

Fig. 1. Different bounds on worst-case coherence for M = 3, N = 3,..., 55. Stars give numerically determined optimal worst-case coherence Furthermore, they are flipping equivalent if D is real, having of N real unit vectors, found in [12]. Dotted curve gives Welch bound, dash- only ±1’s on the diagonal. dotted curve gives bound from Theorem 6, dashed curve gives general bound from Theorem 7, and solid curve gives bound from Theorem 8. The terms “wiggling” and “flipping” are inspired by the fact that individual frame elements of such equivalent frames For many applications, it does not make sense to use a are related by simple unitary operations. Note that every frame with N nonzero frame elements belongs to a flipping complex frame, but the bound in Theorem 6 is known to be N loose for real frames [12]. We therefore improve Theorem 6 equivalence class of size 2 , while being wiggling equivalent for the case of real unit norm frames: to uncountably many frames. The importance of this type of frame equivalence is, in part, due to the following lemma, Theorem 7. Every real M ×N unit norm frame F has worst- which characterizes the shared geometry of wiggling equiva- case coherence lent frames:  1   Γ( M−1 )  M−1 µ ≥ cos π M−1 2 . Lemma 10 (Geometry of wiggling equivalent frames). Wig- F Nπ1/2 Γ( M ) (4) 2 gling equivalence preserves the norms of frame elements, the Furthermore, taking N = Θ(aM ), this lower bound goes to worst-case coherence, and the spectral norm. π cos( a ) as M → ∞. Now that we understand wiggling and flipping equivalence, In [12], numerical results are given for M = 3, and we we are ready for the main idea behind this section. Suppose we compare these results to Theorems 6 and 7 in Figure 1. are given a unit norm frame with acceptable spectral norm and Considering this figure, we note that the bound in Theorem 6 is worst-case coherence, but we also want the average coherence inferior to the maximum of the Welch bound and the bound in to satisfy (SCP-2). Then by Lemma 10, all of the wiggling Theorem 7, at least when M = 3. This illustrates the degree equivalent frames will also have acceptable spectral norm and to which Theorem 7 improves the bound in Theorem 6 for worst-case coherence, and so it is reasonable to check these π 2 frames for good average coherence. In fact, the following real frames. In fact, since cos( a ) ≥ 1 − a for all a ≥ 2, the bound for real frames in Theorem 7 is asymptotically better theorem guarantees that at least one of the flipping equivalent than the bound for complex frames in Theorem 6. Moreover, frames will have good average coherence, with only modest π requirements on the original frame’s redundancy. for M = 2, Theorem 7 says µ ≥ cos( N ), and [22] proved this bound to be tight for every N ≥ 2. For M = 3, Theorem 7 Theorem 11 (Frames with low average coherence). Let F be can be further improved as follows: N−1 an M × N unit norm frame with M < 4 log 4N . Then there Theorem 8. Every real 3 × N unit norm frame F has worst- exists a frame G that is flipping equivalent to F and satisfies µG 4 2 νG ≤ √ . case coherence µF ≥ 1 − N + N 2 . M While Theorem 11 guarantees the existence of a flipping IV. REDUCINGAVERAGECOHERENCE equivalent frame with good average coherence, the result does In [9], average coherence is used to garner a number of not describe how to find it. Certainly, one could check all 2N guarantees on sparse signal processing. Since average coher- frames in the flipping equivalence class, but such a procedure ence is so new to the frame theory literature, this section is computationally slow. As an alternative, we propose a linear- will investigate how average coherence relates to worst-case time flipping algorithm (Algorithm 1). The following theorem coherence and the spectral norm. We start with a definition: guarantees that linear-time flipping will produce a frame with good average coherence, but it requires the original frame’s Definition 9 (Wiggling and flipping equivalent frames). We redundancy to be higher than what suffices in Theorem 11. say the frames F and G are wiggling equivalent if there exists a diagonal matrix D of unimodular entries such that G = FD. Theorem 12. Suppose N ≥ M 2 + 3M + 3. Then Algorithm 1 outputs an M × N frame G that is flipping equivalent to F Definition 13. We say an M × N frame F satisfies the and satisfies ν ≤ √µG . (K, δ, p)-Weak Restricted Isometry Property (Weak RIP) if for G M every K-sparse vector x ∈ N , a random permutation y of As an example of how linear-time flipping improves average C x’s entries satisfies coherence, consider the following matrix: 2 2 2  + + + + − + + + + −  (1 − δ)kyk ≤ kF yk ≤ (1 + δ)kyk  + − + + + − − − + −  with probability exceeding 1 − p. 1   F := √  + + + + + + + + − +  . 5   We note the distinction between RIP and Weak RIP—  − − − + − + + − − −  − + + − − + − − − − Weak RIP requires that F preserves the energy of most sparse vectors. Moreover, the manner in which we quantify “most” Here, ν ≈ 0.3778 > 0.2683 ≈ √µF . Even though N < M 2+ F M is important. For each sparse vector, F preserves the energy 3M + 3, we can run linear-time flipping to get the flipping of most permutations of that vector, but for different sparse pattern D := diag(+ − + − − + + − ++). Then FD vectors, F might not preserve the energy of permutations has average coherence ν ≈ 0.1556 < √µF = µ√FD . This FD M M with the same support. That is, unlike RIP, Weak RIP is example illustrates that the condition N ≥ M 2 + 3M + 3 not a statement about the singular values of submatrices in Theorem 12 is sufficient but not necessary. of F . Certainly, matrices for which most submatrices are well-conditioned, such as those discussed in [7], will satisfy V. NEAR-OPTIMALSPARSESIGNALPROCESSINGWITHOUT Weak RIP, but Weak RIP does not require this. That said, THE RESTRICTED ISOMETRY PROPERTY the following theorem shows, in part, the significance of the Frames with small spectral norm, worst-case coherence, Strong Coherence Property. and/or average coherence have found use in recent years with applications involving sparse signals. Donoho et al. used the Theorem 14. Any M × N unit norm frame F that satisfies 4K worst-case coherence in [5] to provide uniform bounds on the Strong Coherence Property also satisfies the (K, δ, N 2 )- the signal and support recovery performance of combinatorial Weak Restricted Isometry Property provided N ≥ 128 and δ2 and convex optimization methods and greedy algorithms. 2K log N ≤ min{ 2 ,M}. 100µF Later, Tropp [7] and Candes` and Plan [4] used both the B. Reconstruction of sparse signals from noisy measurements spectral norm and worst-case coherence to provide tighter bounds on the signal and support recovery performance of Another common task in signal processing applications is to N convex optimization methods for most support sets under the reconstruct a K-sparse signal x ∈ C from a small collection M additional assumption that the sparse signals have independent of linear measurements y ∈ C . Recently, Tropp [7] used nonzero entries with zero median. Recently, Bajwa et al. [9] both the worst-case coherence and spectral norm of frames to made use of the spectral norm and both coherence parameters find bounds on the reconstruction performance of pursuit to report tighter bounds on the noisy model selection and (BP) [26] for most support sets under the assumption that the noiseless signal recovery performance of an incredibly fast nonzero entries of x are independent with zero median. In greedy algorithm called one-step thresholding (OST) for most contrast, [9] used the spectral norm and worst-case and average support sets and arbitrary nonzero entries. In this section, we coherence of frames to find bounds on the reconstruction discuss further implications of the spectral norm and worst- performance of OST for most support sets and arbitrary case and average coherence of frames in applications involving nonzero entries. However, both [7] and [9] limit themselves sparse signals. to recovering x in the absence of noise, corresponding to y = F x, a rather ideal scenario. A. The Weak Restricted Isometry Property Our goal in this section is to provide guarantees for the A common task in signal processing applications is to test reconstruction of sparse signals from noisy measurements whether a collection of measurements corresponds to mere y = F x + e, where the entries of the noise vector e ∈ CM noise [23]. For applications involving sparse signals, one can are independent, identical complex-Gaussian random variables M 2 test measurements y ∈ C against the null hypothsis H0 : with mean zero and variance σ . In particular, and in contrast y = e and alternative hypothesis H1 : y = F x + e, where the with [5], our guarantees will hold for arbitrary frames F entries of the noise vector e ∈ CM are independent, identical without requiring the signal’s sparsity level to satisfy K = −1 zero-mean complex-Gaussian random variables and the signal O(µF ). The reconstruction algorithm that we analyze here is x ∈ CN is K-sparse. The performance of such signal detection the OST algorithm of [9], which is described in Algorithm 2. problems is directly proportional to the energy in F x [23]– The following theorem extends the analysis of [9] and shows [25]. In particular, existing literature on the detection of sparse that the OST algorithm leads to near-optimal reconstruction signals [24], [25] leverages the fact that kF xk2 ≈ kxk2 when error for large classes of sparse signals. F satisfies the Restricted Isometry Property (RIP) of order K. Before proceeding further, we first define some notation. In contrast, we now show that the Strong Coherence Property We use SNR := kxk2/E[kek2] to denote the signal-to- 2 2 also guarantees kF xk ≈ kxk for most K-sparse vectors. We noise ratio associated with the signal√ reconstruction problem. 2 2 p 2 start with a definition: Also, we use Tσ(t) := {n : |xn| > 1−t 2σ log N} for any Algorithm 2 One-Step Thresholding (OST) [9] such signals in the presence of noise, while using either ran- Input: An M × N unit norm frame F , a vector y = F x + e, dom or deterministic frames, even when K = O(M/ log N). and a threshold λ > 0 Output: An estimate Kˆ ⊆ {1,...,N} of the support of x and REFERENCES an estimate xˆ ∈ CN of x [1] T. Strohmer, R.W. Heath, Grassmannian frames with applications to coding and communication, Appl. Comput. Harmon. Anal. 14 (2003) xˆ ← 0 {Initialize} 257–275. z ← F ∗y {Form signal proxy} [2] R.B. Holmes, V.I. Paulsen, Optimal frames for erasures, Appl. 377 (2004) 31–51. Kˆ ← {n : |zn| > λ}{Select indices via OST} † [3] D.G. Mixon, C. Quinn, N. Kiyavash, M. Fickus, Equiangular tight frame xˆKˆ ← (FKˆ ) y {Reconstruct signal via least-squares} fingerprinting codes, to appear in: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (2011) 4 pages. [4] E.J. Candes,` Y. Plan, Near-ideal model selection by `1 minimization, Ann. Statist. 37 (2009) 2145–2177. t ∈ (0, 1) to denote the locations of all the entries of x that, [5] D.L. Donoho, M. Elad, V.N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. roughly speaking, lie above the noise floor σ. Finally, we Inform. Theory 52 (2006) 6–18. 20 √ use Tµ(t) := {n : |xn| > t µF kxk 2 log N} to denote the [6] J.A. Tropp, Greed is good: Algorithmic results for sparse approximation, locations of entries of x that, roughly speaking, lie above the IEEE Trans. Inform. Theory 50 (2004) 2231–2242. [7] J.A. Tropp, On the conditioning of random subdictionaries, Appl. self-interference floor µF kxk. Comput. Harmon. Anal. 25 (2008) 1–24. [8] R. Zahedi, A. Pezeshki, E.K.P. Chong, Robust measurement design Theorem 15 (Reconstruction of sparse signals). Take for detecting sparse signals: Equiangular uniform tight frames and an M × N unit norm frame F which satisfies the Grassmannian packings, American Control Conference (2010) 6 pages. [9] W.U. Bajwa, R. Calderbank, S. Jafarpour, Why Gabor frames? Two Strong Coherence Property,√ pick t ∈ √(0, 1), and choose p 2 10 2 fundamental measures of coherence and their role in model selection, J. λ = 2σ log N max{ t µF M SNR, 1−t }. Further, sup- Commun. Netw. 12 (2010) 289–307. N pose x ∈ C has support K drawn uniformly at random from [10] K. Mukkavilli, A. Sabharwal, E. Erkip, B.A. Aazhang, On beam-forming all possible K-subsets of {1,...,N}. Then provided with finite rate feedback in multiple antenna systems, IEEE Trans. Inform. Theory 49 (2003) 2562–2579. N [11] P. Xia, S. Zhou, G.B. Giannakis, Achieving the Welch bound with K ≤ 2 2 , (5) c1kF k2 log N difference sets, IEEE Trans. Inform. Theory 51 (2005) 1900–1907. [12] J.H. Conway, R.H. Hardin, N.J.A. Sloane, Packing lines, planes, etc.: Algorithm 2 produces Kˆ such that Tσ(t) ∩ Tµ(t) ⊆ Kˆ ⊆ K Packings in Grassmannian spaces, Experiment. Math. 5 (1996) 139–159. and xˆ such that [13] W.U. Bajwa, R. Calderbank, D.G. Mixon, Two are better than one: Fundamental parameters of frame coherence, arXiv:1103.0435v1. q [14] R. Baraniuk, M. Davenport, R.A. DeVore, M.B. Wakin, A simple proof 2 ˆ kx − xˆk ≤ c2 σ |K| log N + c3kxK\Kˆ k (6) of the restricted isometry property for random matrices, Constructive Approximation 28 (2008) 253–263. with probability exceeding 1 − 10N −1. Finally, defining T := [15] E.J. Candes,` T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (2005) 4203–4215. |Tσ(t) ∩ Tµ(t)|, we further have [16] M.J. Wainwright, Sharp thresholds for high-dimensional and noisy p 2 sparsity recovery using `1-constrained quadratic programming (lasso), kx − xˆk ≤ c2 σ K log N + c3kx − xT k (7) IEEE Trans. Inform. Theory 55 (2009) 2183–2202. 2 [17] E.J. Candes,` J. Romberg, T. Tao, Robust uncertainty principles: Exact in the same probability event. Here, c1 = 37e, c2 = 1−e−1/2 , signal reconstruction from highly incomplete frequency information, e−1/2 and c3 = 1 + −1/2 are numerical constants. IEEE Trans. Inform. Theory 52 (2006) 489–509. 1−e [18] E.J. Candes,` T. Tao, Near-optimal signal recovery from random projec- A few remarks are in order now for Theorem 15. First, if tions: Universal encoding strategies?, IEEE Trans. Inform. Theory 52 F satisfies the Strong Coherence Property and F is nearly (2006) 5406–5425. [19] M. Rudelson, R. Vershynin, On sparse reconstruction from Fourier and tight, then OST handles sparsity that is almost linear in M: Gaussian measurements, Commun. Pure Appl. Math. 61 (2008) 1025– K = O(M/ log N) from (5). Second, the `2 error associated 1045. with the OST algorithm is the near-optimal (modulo the log [20] E.R. Berlekamp, The weight enumerators for certain subcodes of the second order binary Reed-Muller codes, Inform. Control 17 (1970) 485– p 2 factor) error of σ K log N plus the best T -term approxima- 500. [21] N.Y. Yu, G. Gong, A new binary sequence family with low correlation tion error caused by the inability of the OST algorithm√ to re- cover signal entries that are smaller than O(µ kxk 2 log N). and large size, IEEE Trans. Inform. Theory 52 (2006) 1624–1636. F [22] J.J. Benedetto, J.D. Kolesar, Geometric properties of Grassmannian 2 3 Nevertheless, it is easy to convince oneself that such error is frames for R and R , EURASIP J. on Appl. Signal Processing (2006) still near-optimal for large classes of sparse√ signals. Consider, 17 pages. for example, the case where µ = O(1/ M), the magnitudes [23] S.M. Kay, Fundamentals of Statistical Signal Processing: Detection F Theory, Upper Saddle River, Prentice Hall, 1998. p 2 of K/2 nonzero entries of x are some α = Ω( σ log N), [24] M.A. Davenport, P.T. Boufounos, M.B. Wakin, R.G. Baraniuk, Signal while the magnitudes of the other K/2 nonzero entries are not processing with compressive measurements, IEEE J. Select. Topics O(pσ2 log N) Signal Processing 4 (2010) 445–460. necessarily same but scale as . Then we have [25] J. Haupt, R. Nowak, Compressive sampling for signal detection, Proc. p 2 from Theorem 15 that kx − xT k = O( σ K log N), which IEEE Int. Conf. Acoustics, Speech, and Signal Processing (2007) 1509– p 2 1512. leads to near-optimal `2 error of kx−xˆk = O( σ K log N). [26] S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by To the best of our knowledge, this is the first result in the basis pursuit, SIAM J. Scientific Comput. 20 (1998) 33–61. sparse signal processing literature that does not require RIP and still provides near-optimal reconstruction guarantees for