MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com

Signal Processing with Compressive Measurements

Mark Davenport, Petros Boufounos, Michael Wakin, Richard Baraniuk

TR2010-002 February 2010

Abstract The recently introduced theory of compressive sensing enables the recovery of sparse or com- pressive signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be much smaller than the number of Nyquist-rate samples. In- terestingly, it has been shown that random projections are a near-optimal measurement scheme. This has inspired the design of hardware systems that directly implement random measurement protocols. However, despite the intense focus of the community on signal recovery, many (if not most) signal processing problems do not require full signal recovery. In this paper, we take some first steps in the direction of solving inference problems - such as detection, classifica- tion, or estimation - and filtering problems using only compressive measurements and without ever reconstructing the signals involved. We provide theoretical bounds along with experimental results.

Journal of Special Topics in Signal Processing

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c Mitsubishi Electric Research Laboratories, Inc., 2010 201 Broadway, Cambridge, Massachusetts 02139 MERLCoverPageSide2 1 Signal Processing with Compressive Measurements Mark A. Davenport, Student Member, IEEE, Petros T. Boufounos, Member, IEEE, Michael B. Wakin, Member, IEEE, and Richard G. Baraniuk, Fellow, IEEE

Abstract—The recently introduced theory of compressive flexible, cheaper and, therefore, more ubiquitous than sensing enables the recovery of sparse or compressible sig- their analog counterparts. nals from a small set of nonadaptive, linear measurements. As a result of this success, the amount of data gener- If properly chosen, the number of measurements can be much smaller than the number of Nyquist-rate samples. ated by sensing systems has grown from a trickle to a Interestingly, it has been shown that random projections torrent. We are thus confronted with the challenges of: are a near-optimal measurement scheme. This has inspired 1) acquiring signals at ever higher sampling rates, the design of hardware systems that directly implement 2) storing the resulting large amounts of data, and random measurement protocols. However, despite the in- 3) processing/analyzing large amounts of data. tense focus of the community on signal recovery, many (if not most) signal processing problems do not require full Until recently the first challenge was largely ignored by signal recovery. In this paper, we take some first steps in the signal processing community, with most advances the direction of solving inference problems—such as de- being made by hardware designers. Meanwhile, the tection, classification, or estimation—and filtering problems signal processing community made tremendous progress using only compressive measurements and without ever on the remaining two challenges, largely via research in reconstructing the signals involved. We provide theoretical bounds along with experimental results. the fields of modeling, compression, and dimensionality reduction. However, the solutions to these problems typ- ically rely on having a complete set of digital samples. I.INTRODUCTION Only after the signal is acquired—presumably using a A. From DSP to CSP sensor tailored to the signals of interest—could one In recent decades the digital signal processing (DSP) distill the necessary information ultimately required by community has enjoyed enormous success in developing the user or the application. This requirement has placed algorithms for capturing and extracting information from a growing burden on analog-to-digital converters [1]. As signals. Capitalizing on the early work of Whitaker, the required sampling rate is pushed higher, these devices Nyquist, and Shannon on sampling and representation move inevitably toward a physical barrier, beyond which of continuous signals, signal processing has moved from their design becomes increasingly difficult and costly [2]. the analog to the digital domain and ridden the wave of Thus, in recent years, the signal processing community Moore’s law. Digitization has enabled the creation of has also begun to address the challenge of signal acquisi- sensing and processing systems that are more robust, tion more directly by leveraging its successes in address- ing the second two. In particular, compressive sensing Copyright (c) 2009 IEEE. Personal use of this material is (CS) has emerged as a framework that can significantly permitted. However, permission to use this material for any other reduce the acquisition cost at a sensor. CS builds on the purposes must be obtained from the IEEE by sending a request to [email protected]. work of Candes,` Romberg, and Tao [3], and Donoho [4], MAD and RGB are with the Department of Electrical and Computer who showed that a signal that can be compressed using Engineering, , Houston, TX 77005. They are supported classical methods such as transform coding can also be by the grants NSF CCF-0431150, CCF-0728867, CNS-0435425, and CNS-0520280, DARPA/ONR N66001-08-1-2065, ONR N00014-07-1- efficiently acquired via a small set of nonadaptive, linear, 0936, N00014-08-1-1067, N00014-08-1-1112, and N00014-08-1-1066, and usually randomized measurements. AFOSR FA9550-07-1-0301, ARO MURI W311NF-07-1-0185, ARO A fundamental difference between CS and classical MURI W911NF-09-1-0383, and by the Texas Instruments Leader- ship University Program. PTB is with Mitsubishi Electric Research sampling is the manner in which the two frameworks Laboratories, Cambridge, MA 02139. MBW is with the Division deal with signal recovery, i.e., the problem of recovering of Engineering, Colorado School of Mines, Golden, CO 80401. He the signal from the measurements. In the Shannon- is supported by NSF grants DMS-0603606 and CCF-0830320, and DARPA Grant HR0011-08-1-0078. Email: {md, richb}@rice.edu, pet- Nyquist framework, signal recovery is achieved through [email protected], [email protected] sinc interpolation—a linear process that requires little computation and has a simple interpretation. In CS, ate. However, a key objective of this paper is to illustrate however, signal recovery is achieved using nonlinear the agnostic and universal nature of random compressive and relatively expensive optimization-based or iterative measurements as a compact signal representation. These algorithms [3–5]. Thus, up to this point, most of the features enable the design of exceptionally efficient and CS literature has focused on improving the speed and flexible compressive sensing hardware that can be used accuracy of this process [6–9]. for the acquisition of a variety of signal classes and However, signal recovery is not actually necessary applied to a variety of inference tasks. in many signal processing applications. Very often we As has been demonstrated in the CS literature, for are only interested in solving an inference problem example, random measurements can be used to acquire (extracting certain information from measurements) or any sparse signal without requiring advance knowledge in filtering out information that is not of interest before of the locations of the nonzero coefficients. Thus, com- further processing. While one could always attempt to re- pressive measurements are agnostic in the sense that they cover the full signal from the compressive measurements capture the relevant information for the entire class of and then solve the inference or filtering problem using possible K-sparse signals. We extend this concept to the traditional DSP techniques, this approach is typically CSP framework and demonstrate that it is possible to suboptimal in terms of both accuracy and efficiency. design agnostic measurement schemes that preserve the This paper takes some initial steps towards a general necessary structure of large signal classes in a variety of framework for what we call compressive signal process- signal processing settings. ing (CSP), an alternative approach in which signal pro- Furthermore, we observe that one can select a random- cessing problems are solved directly in the compressive ized measurement scheme without any prior knowledge measurement domain without first resorting to a full- of the signal class. For instance, in conventional CS it scale signal reconstruction. In espousing the potential is not necessary to know the transform basis in which of CSP we focus on four fundamental signal processing the signal has a sparse representation when acquiring problems: detection, classification, estimation, and filter- the measurements. The only dependence is between the ing. The first three enable the extraction of information complexity of the signal class (e.g., the sparsity level from the samples, while the last enables the removal of the signal) and the number of random measure- of irrelevant information and separation of signals into ments that must be acquired. Thus, random compressive distinct components. While these choices do not exhaust measurements are universal in the sense that if one the set of canonical signal processing operations, we designs a measurement scheme at random, then with believe that they provide a strong initial foundation. high probability it will preserve the structure of the signal class of interest, and thus explicit a priori knowledge B. Relevance of the signal class is unnecessary. We broaden this In what settings is it actually beneficial to take ran- result and demonstrate that random measurements can domized, compressive measurements of a signal in order universally capture the information relevant for many to solve an inference problem? One may argue that CSP applications without any prior knowledge of either prior knowledge of the signal to be acquired or of the the signal class or the ultimate signal processing task. In inference task to be solved could lead to a customized such cases, the requisite number of measurements scales sensing protocol that very efficiently acquires the rel- efficiently with both the complexity of the signal and the evant information. For example, suppose we wish to complexity of the task to be performed. acquire a length-N signal that is K-sparse (i.e., has It follows that, in contrast to the task-specific hardware K nonzero coefficients) in a known transform basis. used in many classical acquisition systems, hardware If we knew in advance which elements were nonzero, designed to use a compressive measurement protocol then the most efficient and direct measurement scheme can be extremely flexible. Returning to the binary de- would simply project the signal into the appropriate K- tection scenario, for example, suppose that the signal dimensional subspace. As a second example, suppose we template is unknown at the time of acquisition, or that wish to detect a known signal. If we knew in advance one has a large number of candidate templates. Then the signal template, then the optimal and most efficient what information should be collected at the sensor? A measurement scheme would simply involve a receiving complete set of Nyquist samples would suffice, or a filter explicitly matched to the candidate signal. bank of matched filters could be employed. From a Clearly, in cases where strong a priori information is CSP standpoint, however, the solution is more elegant: available, customized sensing protocols may be appropri- one need only collect a small number of compressive

2 concerned with efficient algorithms for processing large Target Detection streams of data, has examined many similar problems Signal Identification over the past several years. In the data stream setting, ering … one is typically interested in estimating some function Fil t CS-based of the data stream (such as an `p norm, a histogram, receiver Signal Recovery or a linear functional) based on sketches, which in many cases can be thought of as random projections. Fig. 1. Example CSP application: broadband signal monitoring. For a concise review of these results, see [11]. Main differences with our work include: (i) data stream al- gorithms are typically designed to operate in noise-free measurements from which many candidate signals can environments on man-made digital signals, whereas we be tested, many signal models can be posited, and many view compressive measurements as a sensing scheme other inference tasks can be solved. What one loses in that will operate in an inherently noisy environment; performance compared to a tailor-made matched filter, (ii) data stream algorithms typically provide probabilistic one may gain in simplicity and in the ability to adapt guarantees, while we focus on providing deterministic to future information about the problem at hand. In guarantees; and (iii) data stream algorithms tend to tailor this sense, CSP impacts sensors in a similar manner the measurement scheme to the task at hand, while we as DSP impacted analog signal processing: expensive demonstrate that it is often possible to use the same and inflexible analog components can be replaced by measurements for a variety of signal processing tasks. a universal, flexible, and programmable digital system. There have been a number of related thrusts involving detection and classification using random measurements C. Applications in a variety of settings. For example, in [12] sparsity is A stylized application to demonstrate the potential and leveraged to perform classification with very few random applicability of the results in this paper is summarized in measurements, while in [13, 14] random measurements Figure 1. The figure schematically presents a wide-band are exploited to perform manifold-based image classifi- signal monitoring and processing system that receives cation. In [15], small numbers of random measurements signals from a variety of sources, including various tele- have also been noted as capturing sufficient information vision, radio, and cell-phone transmissions, radar signals, to allow robust face recognition. However, the most and satellite communication signals. The extremely wide directly relevant work has been the discussions of clas- bandwidth monitored by such a system makes CS a sification in [16] and detection in [17]. We will contrast natural approach for efficient signal acquisition [10]. our results to those of [16, 17] below. This paper builds In many cases, the system user might only be inter- upon work initially presented in [18, 19]. ested in extracting very small amounts of information from each signal. This can be efficiently performed using the tools we describe in the subsequent sections. For E. Organization example, the user might be interested in detecting and This paper is organized as follows. Section II provides classifying some of the signal sources, and in estimating the necessary background on dimensionality reduction some parameters, such as the location, of others. Full- and CS. In Sections III, IV, and V we develop algorithms scale signal recovery might be required for only a few for signal detection, classification, and estimation with of the signals in the monitored bandwidth. compressive measurements. In Section VI we explore The detection, estimation, and classification tools we the problem of filtering compressive measurements in develop in this paper enable the system to perform the compressive domain. Finally, Section VII concludes these tasks much more efficiently in the compressive with directions for future work. domain. Furthermore, the filtering procedure we describe facilitates the separation of signals after they have been acquired in the compressive domain so that each signal II.COMPRESSIVE MEASUREMENTSAND can be processed by the appropriate algorithm, depend- STABLE EMBEDDINGS ing on the information sought by the user. A. Compressive Sensing and Restricted Isometries In the standard CS framework, we acquire a signal D. Related Work x ∈ RN via the linear measurements In this paper we consider a variety of estimation and decision tasks. The data streaming community, which is y = Φx, (1)

3 where Φ is an M × N matrix representing the sampling where M √ √ system and y ∈ R is the vector of measurements. For 1 + δ 1 − (1 − 2)δ simplicity, we deal with real-valued rather than quantized C0 = 4 √ ,C1 = 2 √ . measurements y. Classical sampling theory dictates that, 1 − (1 + 2)δ 1 − (1 + 2)δ in order to ensure that there is no loss of information, Note that in practice we may wish to acquire signals the number of samples M should be at least the signal that are sparse or compressible with respect to a certain dimension N. The CS theory, on the other hand, allows sparsity basis Ψ, i.e., x = Ψα where Ψ is represented for M  N, as long as the signal x is sparse or as a unitary N × N matrix and α ∈ ΣK . In this case we compressible in some basis [3, 4, 20, 21]. would require instead that ΦΨ satisfy the RIP, and the To understand how many measurements are required performance guarantee would be on kαb − αk2. to enable the recovery of a signal x, we must first Before we discuss how one can actually obtain a examine the properties of Φ that guarantee satisfactory matrix Φ that satisfies the RIP, we observe that we can performance of the sensing system. In [21], Candes` and restate the RIP in a more general form. Let δ ∈ (0, 1) Tao introduced the restricted isometry property (RIP) of and U, V ⊂ RN be given. We say that a mapping Φ is a matrix Φ and established its important role in CS. First a δ-stable embedding of (U, V) if

define ΣK to be the set of all K-sparse signals, i.e., 2 2 2 (1 − δ)ku − vk2 ≤ kΦu − Φvk2 ≤ (1 + δ)ku − vk2 (5) Σ = {x ∈ N : kxk := |supp(x)| ≤ K}, K R 0 for all u ∈ U and v ∈ V. A mapping satisfying this N property is also commonly called bi-Lipschitz. Observe where supp(x) ⊂ R denotes the set of indices on which x is nonzero. We say that a matrix Φ satisfies the RIP of that for a matrix Φ, satisfying the RIP of order 2K is order K if there exists a constant δ ∈ (0, 1), such that equivalent to being a δ-stable embedding of (ΣK , ΣK ) or 1 of (Σ2K , {0}). Furthermore, if the matrix ΦΨ satisfies 2 2 2 (1 − δ)kxk2 ≤ kΦxk2 ≤ (1 + δ)kxk2 (2) the RIP of order 2K then Φ is a δ-stable embedding of (Ψ(ΣK ), Ψ(ΣK )) or (Ψ(Σ2K ), {0}), where Ψ(ΣK ) = x ∈ Σ Φ holds for all K . In other words, is an approxi- {Ψα : α ∈ ΣK }. mate isometry for vectors restricted to be K-sparse. It is clear that if we wish to be able to recover all K-sparse signals x from the measurements y, then a B. Random Matrix Constructions necessary condition on Φ is that Φx1 6= Φx2 for any We now turn to the more general question of how pair x1, x2 ∈ ΣK with x1 6= x2. Equivalently, we require to construct linear mappings Φ that satisfy (5) for 2 kΦ(x1−x2)k2 > 0, which is guaranteed if Φ satisfies the particular sets U and V. While it is possible to obtain RIP of order 2K with constant δ < 1. Furthermore, the deterministic constructions of such operators, at present RIP also ensures that a variety of practical algorithms the most efficient designs (i.e., those requiring the fewest can successfully recover any compressible signal from number of rows) rely on random matrix constructions. noisy measurements. The following result (Theorem 1.2 We construct our random matrices as follows: given M of [22]) makes this precise by bounding the recovery and N, we generate random M × N matrices Φ by error of x with respect to the sampling noise and with choosing the entries φij as independent and identically respect to the `1-distance from x to its best K-term distributed (i.i.d.) random variables. We impose two con- approximation denoted xK : ditions on the random distribution. First, we require that the distribution yields a matrix that is norm-preserving, 0 xK = arg min kx − x k1. 0 which requires that x ∈ΣK 1 Theorem 1 ([Candes])` . Suppose that Φ satisfies the RIP (φ2 ) = . (6) √ E ij M of order 2K with isometry constant δ < 2 − 1. Given Second, we require that the distribution is a sub- measurements of the form y = Φx + e, where kek2 ≤ , the solution to Gaussian distribution, meaning that there exists a con- stant C > 0 such that 0 0 x = arg min kx k1 subject to kΦx − yk2 ≤  (3) 2 2 b 0 N φij t C t /2 x ∈R E e ≤ e (7) obeys 1In general, if Φ is a δ-stable embedding of (U, V), this is equivalent kx − xK k1 to it being a δ-stable embedding of (Ue, {0}) where Ue = {u − v : kx − xk2 ≤ C0 + C1 √ , (4) b K u ∈ U, v ∈ V}. This formulation can sometimes be more convenient.

4 for all t ∈ R. This says that the moment-generating that the dimension K bounds the complexity of this function of our distribution is dominated by that of space and thus it can be characterized in terms of a finite a Gaussian distribution, which is also equivalent to number of points. The following lemma is an adaptation requiring that the tails of our distribution decay at least of Lemma 5.1 in [29].2 as fast as the tails of a Gaussian distribution. Examples Lemma 2. Suppose that X is a K-dimensional subspace of sub-Gaussian distributions include the Gaussian dis- of N . Fix δ, β ∈ (0, 1). Let Φ be an M × N random tribution, the Rademacher distribution, and the uniform R matrix with i.i.d. entries chosen from a distribution distribution. In general, any distribution with bounded satisfying (8). If support is sub-Gaussian. See [23] for more details on sub-Gaussian random variables. K ln(42/δ) + ln(2/β) M ≥ 2 (10) The key property of sub-Gaussian random variables cδ2 that will be of use in this paper is that for any x ∈ RN , then with probability exceeding 1 − β, Φ is a δ-stable 2 the random variable kΦxk2 is highly concentrated about embedding of (X , {0}). 2 kxk2; that is, there exists a constant c > 0 that depends only on the constant C in (7) such that Sketch of proof: It suffices to prove the result for x ∈ X satisfying kxk2 = 1, since Φ is linear. We   2 2 2 2 −cMδ consider a finite sampling of points U ⊂ X of unit Pr kΦxk2 − kxk2 ≥ δ kxk2 ≤ 2e , (8) norm and with resolution on the order of δ/14; one where the probability is taken over all M × N matrices can show that it is possible to construct such a U with Φ (see Lemma 6.1 of [24] or [25]). K |U| ≤ (42/δ) (see Ch. 15 of [30]).√ Applying Lemma 1 and setting M to ensure a (δ/ 2)-stable embedding of C. Stable Embeddings (U, {0}), we can use simple geometric arguments to We now provide a number of results that we will conclude that we must have a δ-stable embedding of use extensively in the sequel to ensure the stability of ({x}, {0}) for every x ∈ X satisfying kxk2 = 1. For our compressive detection, classification, estimation, and details, see Lemma 5.1 in [29]. filtering algorithms. We now observe that we can extend this result beyond We start with the simple case where we desire a δ- a single K-dimensional subspace to all possible K- |U| dimensional subspaces that are defined with respect to stable embedding of (U, V) where U = {ui}i=1 and V = |V| an orthonormal basis Ψ, i.e., Ψ(Σ ). The proof follows {v } are finite sets of points in N . In the case where K j j=1 R that of Theorem 5.2 of [29]. U = V, this is essentially the Johnson-Lindenstrauss (JL) lemma [26–28]. Lemma 3. Let Ψ be an orthonormal basis for RN and fix δ, β ∈ (0, 1). Let Φ be an M ×N random matrix with Lemma 1. Let U and V be sets of points in N . Fix R i.i.d. entries chosen from a distribution satisfying (8). If δ, β ∈ (0, 1). Let Φ be an M × N random matrix with i.i.d. entries chosen from a distribution satisfying (8). If K ln(42eN/δK) + ln(2/β) M ≥ 2 2 (11) ln(|U||V|) + ln(2/β) cδ M ≥ (9) cδ2 with e denoting the base of the natural logarithm, then with probability exceeding 1 − β, Φ is a δ-stable then with probability exceeding 1 − β, Φ is a δ-stable embedding of (Ψ(Σ ), {0}). embedding of (U, V). K Proof: This is a simple generalization of Lemma 2, Proof: To prove the result we apply (8) to the |U||V| which follows from the observation that there are N ≤ vectors corresponding to all possible u −v . By applying K i j (eN/K)K K-dimensional subspaces aligned with the the union bound, we obtain that the probability of (5) −cMδ2 coordinate axes of Ψ, and so the size of U increases not holding is bounded above by 2|U||V|e . By 42eN K 2 to |U| ≤ . requiring 2|U||V|e−cMδ ≤ β and solving for M we δK A similar technique has recently been used to demon- obtain the desired result. strate that random projections also provide a stable We now consider the case where U = X is a K- embedding of nonlinear manifolds [31]: under certain dimensional subspace of N and V = {0}. Thus, we R assumptions on the curvature and volume of a K- wish to obtain a Φ that nearly preserves the norm of any dimensional manifold M ⊂ N , a random sensing vector x ∈ X . At first glance, this goal might seem very R different than the setting for Lemma 1, since a subspace 2The constants in [29] differ from those in Lemma 2, but the proof forms an uncountable point set. However, we will see is substantially the same, so we provide only a sketch.

5 Integrator Sample-and-Hold Quantizer E. Deterministic versus probabilistic guarantees y[n] Throughout this paper, we state a variety of theorems that begin with the assumption that Φ is a δ-stable embedding of a pair of sets and then use this assumption Pseudorandom to establish performance guarantees for a particular CSP Number Seed Generator algorithm. These guarantees are completely determinis- tic and hold for any Φ that is a δ-stable embedding. However, we use random constructions as our main tool Fig. 2. Random demodulator for obtaining compressive measurements for obtaining stable embeddings. Thus, all of our results of analog signals. could be modified to be probabilistic statements in which we fix M and then argue that with high probability,  K log(N)  matrix with M = O δ2 will with high prob- a random Φ is a δ-stable embedding. Of course, the ability provide a δ-stable embedding of (M, M). Under concept of “high probability” is somewhat arbitrary. slightly different assumptions on M, a number of similar However, if we fix this probability of error to be an embedding results involving random projections have acceptable constant β, then as we increase M, we are been established [32–34]. able to reduce δ to be arbitrarily close to 0. This will We will make further use of these connections in the typically improve the accuracy of the guarantees. following sections in our analysis of a variety of algo- As a side comment, it is important to note that in rithms for compressive-domain inference and filtering. the case where one is able to generate a new Φ before acquiring each new signal x, then it is often possible to drastically reduce the required M. This is because one D. Stable embeddings in practice may be able to eliminate the requirement that Φ is a sta- Several hardware architectures have been proposed ble embedding for an entire class of candidate signals x, that enable the acquisition of compressive measure- and instead simply argue that for each x, a new random ments in practical settings. Examples include the ran- matrix Φ with M very small is a δ-stable embedding of dom demodulator [35], random filtering and random ({x}, {0}) (this is a direct consequence of (8)). Thus, if convolution [36–38], and several compressive imaging such a probabilistic “for each” guarantee is acceptable, architectures [39–41]. it is typically possible to place no assumptions on the We briefly describe the random demodulator as an signals being sparse, or indeed having any structure at example of such a system. Figure 2 depicts the block all. However, in the remainder of this paper we will diagram of the random demodulator. The four key com- restrict ourselves to the sort of deterministic guarantees Φ ponents are a pseudo-random ±1 “chipping sequence” that hold for a class of signals when provides a stable embedding of that class. pc(t) operating at the Nyquist rate or higher, a low pass filter, represented by an ideal integrator with reset, a low- rate sample-and-hold, and a quantizer. An input analog III.DETECTIONWITH COMPRESSIVE signal x(t) is modulated by the chipping sequence and MEASUREMENTS integrated. The output of the integrator is sampled and A. Problem Setup and Applications quantized, and the integrator is reset after each sample. We begin by examining the simplest of detection prob- Mathematically, systems such as these implement a lems. We aim to distinguish between two hypotheses: linear operator that maps the analog input signal to a discrete output vector followed by a quantizer. It is H0 : y = Φn possible to relate this operator to a discrete measurement H1 : y = Φ(s + n) matrix Φ which maps, for example, the Nyquist-rate N 2 samples of the input signal to the discrete output vector. where s ∈ R is a known signal, n ∼ N (0, σ IN ) is The resulting Φ matrices, while randomized, typically i.i.d. Gaussian noise, and Φ is a known (fixed) measure- contain some degree of structure. For example, a random ment matrix. If s is known at the time of the design convolution architecture gives rise to a Φ matrix with a of Φ, then it is easy to show that the optimal design subsampled Toeplitz structure. While theoretical analysis would be to set Φ = sT , which is just the matched filter. of these matrices remains a topic of active study in However, as mentioned in the introduction, we are often the CS community, there do exist guarantees of stable interested in universal or agnostic Φ. As an example, if embeddings for such practical architectures [35, 37]. we design hardware to implement the matched filter for

6 a particular s, then we are very limited in what other We now define the compressive detector:

signal processing tasks that hardware can perform. Even T T −1 if we are only interested in detection, it is possible that t := y (ΦΦ ) Φs. (12) the signal s that we wish to detect may evolve over It can be shown that t is a sufficient statistic for our time. Thus, we will consider instead the case where detection problem, and thus t contains all of the infor- Φ is designed without knowledge of s but is instead a mation relevant for distinguishing between H0 and H1. random matrix. From the results of Section II, this will We must now set γ to achieve the desired perfor- imply performance bounds that depend on how many mance. To simplify notation, we define measurements are acquired and the class S of possible T T −1 s that we wish to detect. PΦT = Φ (ΦΦ ) Φ as the orthogonal projection operator onto R(ΦT ), i.e., B. Theory T 2 the row space of Φ. Since PΦT = PΦT and PΦT = PΦT , To set notation, let we then have that T T T −1 2 PF = Pr(H1 chosen when H0 true) and s Φ (ΦΦ ) Φs = kPΦT sk2 . (13) PD = Pr(H1 chosen when H1 true) Using this notation, it is easy to show that denote the false alarm rate and the detection rate, ( 2 2 N (0, σ kPΦT sk2) under H0 respectively. The Neyman-Pearson (NP) detector is the t ∼ 2 2 2 N (kPΦT sk2 , σ kPΦT sk2) under H1. decision rule that maximizes PD subject to the constraint that PF ≤ α. In order to derive the NP detector, we first Thus we have observe that for our hypotheses, H0 and H1, we have   3 γ the probability density functions PF =P (t > γ|H0) = Q σ kPΦT sk2 1 T 2 T −1  ! exp − 2 y (σ ΦΦ ) y 2 f (y) = γ − kPΦT sk 0 1 M P =P (t > γ|H ) = Q 2 |σ2ΦΦT | 2 (2π) 2 D 1 σ kPΦT sk2 and where exp − 1 (y − Φs)T (σ2ΦΦT )−1(y − Φs) Z ∞ 2 Q(z) = exp −u2/2 du. f1(y) = 1 M . |σ2ΦΦT | 2 (2π) 2 z It is easy to show (see [42, 43], for example) that the NP- To determine the threshold, we set PF = α, and thus optimal decision rule is to compare the ratio f1(y)/f0(y) −1 γ = σ kPΦT sk Q (α) to a threshold η, i.e, the likelihood ratio test: 2

H resulting in f1(y) 1 Λ(y) = ≷ η −1  f0(y) H0 PD(α) = Q Q (α) − kPΦT sk2 /σ . (14) where η is chosen such that In general, this performance could be either quite good Z or quite poor depending on Φ. In particular, the larger PF = f0(y)dy = α. kPΦT sk2 is, then the better the performance. Recalling Λ(y)>η that PΦT is the orthogonal projection onto the row space of Φ, we see that kP T sk is simply the norm of the By taking a logarithm we obtain an equivalent test that Φ 2 component of s that lies in the row space of Φ. This simplifies to quantity is clearly at most ksk2, which would yield the H T T −1 1 2 1 T T T −1 same performance as the traditional matched filter, but y (ΦΦ ) Φs ≷ σ log(η)+ s Φ (ΦΦ ) Φs := γ. H0 2 it could also be 0 if s lies in the null space of Φ. As we will see below, however, in the case where Φ 3 T This formulation assumes that rank(Φ) = M so that ΦΦ is is random, we can expect that kPΦT sk2 concentrates invertible. If the entries of Φ are generated according to a continuous around pM/Nksk . distribution and M < N, then this will be true with probability 1. 2 This will also be true with high probability for discrete distributions Let us now define with high probability provided that M  N. In the event that Φ is 2 2 not full rank, appropriate adjustments can be made. SNR := ksk2 /σ . (15)

7 We can bound the performance of the compressive Theorem 2 tells us in a precise way how much detector as follows. information we lose by using random projections rather p than the signal samples themselves, not in terms of our Theorem 2. Suppose that N/MP T provides a δ- Φ ability to recover the signal as is typically addressed stable embedding of (S, {0}). Then for any s ∈ S, we in CS, but in terms of our ability to solve a detection can detect s with error rate problem. Specifically, for typical values of δ, ! √ r √ √ −1 M  −1 p  PD(α) ≤ Q Q (α) − 1 + δ SNR (16) P (α) ≈ Q Q (α) − M/N SNR , (20) N D which increases the miss probability by an amount and determined by the SNR and the ratio M/N. √ r √ ! −1 M In order to more clearly illustrate the behavior of PD(α) ≥ Q Q (α) − 1 − δ SNR . (17) N PD(α) as a function of M, we also establish the fol- lowing corollary of Theorem 2. p Proof: By our assumption that N/MPΦT pro- p Corollary 1. Suppose that N/M · PΦT provides a δ- vides a δ-stable embedding of (S, {0}), we know from stable embedding of (S, {0}). Then for any s ∈ S, we (5) that can detect s with success rate r √ N √ −C1M/N 1 − δ ksk ≤ kP T sk ≤ 1 + δ ksk . (18) PD(α) ≥ 1 − C2e , (21) 2 M Φ 2 2 where C and C are absolute constants depending only Combining (18) with (14) and recalling the definition of 1 2 on α, δ, and the SNR. the SNR from (15), the result follows. For certain randomized measurement systems, one can Proof: We begin with the following bound from p anticipate that N/MPΦT will provide a δ-stable em- (13.48) of [44] bedding of (S, {0}). As one example, if Φ has orthonor- 2 e−z /2 mal rows spanning a random subspace (i.e., it represents Q(z) ≤ , (22) a random orthogonal projection), then ΦΦT = I, and so 2 T T T T which allows us to bound PD as follows. Let C1 = PΦ = Φ Φ. It follows that kPΦ sk2 = Φ Φs 2 = (1 − δ)SNR/2. Then kΦsk2, and for random orthogonal projections, it is known [27] that kPΦT sk2 = kΦsk2 satisfies  −1 p  PD(α) ≥ Q Q (α) − 2C1M/N M 2 2 M 2   (1 − δ) ksk ≤ kP T sk ≤ (1 + δ) ksk (19) p −1 N 2 Φ 2 N 2 = 1 − Q 2C1M/N − Q (α) √ −cMδ2 1 −1 −1 2 with probability at least 1 − 2e . This statement is ≥ 1 − e−C1M/N− 2C1M/NQ (α)+(Q (α)) /2 2 analogous to (8) but rescaled to account for the unit-norm √ 1 −1 −1 2 rows of Φ. As a second example, if Φ is populated with ≥ 1 − e−C1M/N− 2C1Q (α)+(Q (α)) /2. i.i.d. zero-mean Gaussian entries (of any fixed variance), 2 then the orientation of the row space of Φ has random Thus, if we let √ uniform distribution. Thus, kP T sk for a Gaussian Φ 1 −1 −1 Φ 2 −Q (α)(Q (α)/2− 2C1) C2 = e , (23) has the same distribution as kPΦT sk2 for a random 2 orthogonal projection. It follows that Gaussian Φ also −cMδ2 we obtain the desired result. satisfy (19) with probability at least 1 − 2e . Thus, for a fixed SNR and signal length, the detec- The similarity between (19) and (8) immediately tion probability approaches 1 exponentially fast as we implies that we can generalize Lemmas 1, 2, and 3 increase the number of measurements. to establish δ-stable embedding results for orthogonal projection matrices PΦT . It follows that, when Φ is a Gaussian matrix (with entries satisfying (6)) or a C. Experiments and Discussion random orthogonal projection (multiplied by pN/M), We first explore how M affects the performance of the the number of measurements required to establish a δ- compressive detector. As described above, decreasing M p stable embedding for N/MPΦT on a particular signal does cause a degradation in performance. However, as family S is equivalent to the number of measurements illustrated in Figure 3, in certain cases (relatively high required to establish a δ-stable embedding for Φ on S. SNR; 20 dB in this example) the compressive detector

8 1 1

0.9

0.8 0.8

0.7 0.6 0.6 D D P P 0.5 0.4 0.4

M = 0.4 N 0.3 SNR = 25dB 0.2 M = 0.2 N SNR = 20dB M = 0.1 N 0.2 SNR = 15dB M = 0.05 N SNR = 10dB 0 0.1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 α M/N

Fig. 3. Effect of M on PD(α) predicted by (20) (SNR = 20dB). Fig. 4. Effect of M on PD predicted by (20) at several different SNR levels (α = 0.1).

can perform almost as well as the traditional detector 1 with a very small fraction of the number of measure- ments required by traditional detection. Specifically, in 0.8 Figure 3 we illustrate the receiver operating character- istic (ROC) curve, i.e., the relationship between PD and 0.6 PF predicted by (20). Observe that as M increases, the D

ROC curve approaches the upper-left corner, meaning P that we can achieve very high detection rates while 0.4 simultaneously keeping the false alarm rate very low. As M grows we see that we rapidly reach a regime 0.2 where any additional increase in M yields only marginal Predicted Upper bound improvements in the tradeoff between PD and PF . Lower bound Furthermore, the exponential increase in the detection 0 0 0.2 0.4 0.6 0.8 1 probability as we take more measurements is illustrated α in Figure 4, which plots the performance predicted by Fig. 5. Concentration of ROC curves for random Φ near the expected (20) for a range of SNRs with α = 0.1. However, we ROC curve (SNR = 20dB, M = 0.05N, N = 1000). again note that in practice this rate can be significantly affected by the SNR, which determines the constants in cant. We also note that while some of our theory has been the bound of (21). These results are consistent with those limited to Φ that are Gaussian or random orthogonal obtained in [17], which also established that P should D projections, we observe that in practice this does not approach 1 exponentially fast as M is increased. seem to be necessary. We repeated the above experiment Finally, we close by noting that for any given instance for matrices with independent Rademacher entries and of Φ, its ROC curve may be better or worse than that observed no significant differences in the results. predicted by (20). However, with high probability it is tightly concentrated around the expected performance IV. CLASSIFICATION WITH COMPRESSIVE curve. Figure 5 illustrates this for the case where s is MEASUREMENTS fixed and the SNR is 20dB, Φ has i.i.d. Gaussian entries, M = 0.05N, and N = 1000. The predicted ROC curve A. Problem Setup and Applications is illustrated along with curves displaying the best and We can easily generalize the setting of Section III to worst ROC curves obtained over 100 independent draws the problem of binary classification. Specifically, if we of Φ. We see that our performance is never significantly wish to distinguish between Φ(s0 + n) and Φ(s1 + n), different from what we expect. Furthermore, we have then it is equivalent to be able to distinguish Φ(s0 +n)− also observed that these bounds grow significantly tighter Φs0 = Φn and Φ(s1 − s0 + n). Thus, the conclusions as we increase N; so for large problems the difference for the case of binary classification are identical to those between the predicted and actual curves will be insignifi- discussed in Section III.

9 ∗ More generally suppose that we would like to distin- Proof: Let j 6= i . We will argue that tj > ti∗ with guish between the hypotheses: high probability. From (25) we have that 2 Hfi : y = Φ(si + n), ti∗ = kPΦT nk2 for i = 1, 2,...,R, where each si ∈ S is one of our and 2 known signals and as before, n ∼ N (0, σ IN ) is i.i.d. 2 tj = kPΦT (si∗ − sj + n)k2 Gaussian noise and Φ is a known M × N matrix. 2 = kP T (s ∗ − s ) + P T nk It is straightforward to show (see [42, 43], for ex- Φ i j Φ 2 2 ample), in the case where each hypothesis is equally = kτ + PΦT nk2 likely, that the classifier with minimum probability of where we have defined τ = PΦT (si∗ − sj) to simplify error selects the Hfi that minimizes T 2 notation. Let us define Pτ = ττ / kτk2 as the orthog- T T −1 ti := (y − Φsi) (ΦΦ ) (y − Φsi). (24) onal projection onto the 1-dimensional span of τ, and P ⊥ = (I − P ). Then we have If the rows of Φ are orthogonal and have equal norm, τ N τ 2 2 then this reduces to identifying which Φsi is closest to ti∗ = kPτ PΦT nk2 + kPτ ⊥ PΦT nk2 T −1 y. The (ΦΦ ) term arises when the rows of Φ are not and orthogonal because the noise is no longer uncorrelated. 2 2 As an alternative illustration of the classifier behavior, tj = kPτ (τ + PΦT n)k2 + kPτ ⊥ (τ + PΦT n)k2 let us suppose that y = Φx for some x ∈ N . Then, 2 2 R = kτ + Pτ PΦT nk2 + kPτ ⊥ PΦT nk2 . starting with (24), we have Thus, tj ≤ ti∗ if and only if T T −1 ti = (y − Φsi) (ΦΦ ) (y − Φsi) 2 2 kτ + P P T nk ≤ kP P T nk , T T −1 τ Φ 2 τ Φ 2 = (Φx − Φsi) (ΦΦ ) (Φx − Φsi) T T T −1 or equivalently, if = (x − si) Φ (ΦΦ ) Φ(x − si) T 2 T 2 2 τ τ = kPΦT x − PΦT sik2 (25) (τ + Pτ PΦT n) ≤ Pτ PΦT n , kτk2 2 kτk2 2 where (25) follows from the same argument as (13). or equivalently, if Thus, we can equivalently think of the classifier as T T simply projecting x and each candidate signal si onto τ τ kτk2 + PΦT n ≤ PΦT n , the row space of Φ and then classifying according to the kτk2 kτk2 nearest neighbor in this space. or equivalently, if T τ kτk2 B. Theory P T n ≤ − . kτk Φ 2 While in general it is difficult to find analytical expres- 2 τ T sions for the probability of error even in non-compressive The quantity PΦT n is a scalar, zero-mean Gaus- kτk2 classification settings, we can provide a bound for the sian random variable with variance T T 2 performance of the compressive classifier as follows. τ 2 T τ τ PΦT τσ 2 P T (σ I )P T = = σ . p kτk Φ N Φ kτk 2 Theorem 3. Suppose that N/MPΦT provides a δ- 2 2 kτk2 stable embedding of (S, S), and let R = |S|. Let p Because N/MPΦT provides a δ-stable embedding of ∗ d = min ksi − sjk2 (26) (S, S), and by our assumption that ksi − sjk2 ≥ d, we i,j 2 2 have that kτk2 ≥ d (1 − δ)M/N. Thus, using also (22), denote the minimum separation among the si. For we have ∗ some i ∈ {1, 2,...,R}, let y = Φ(si∗ + n), where  T  τ kτk2 2 P (t ≤ t ∗ ) = P P T n ≤ − n ∼ N (0, σ IN ) is i.i.d. Gaussian noise. Then with j i kτk Φ 2 probability at least 2 kτk    2 R − 1 2 2 = Q 1 − e−d (1−δ)M/8σ N , (27) 2σ 2 1 −kτk2/8σ2 ≤ e 2 the signal can be correctly classified, i.e., 2 2 2 ∗ 1 −d (1−δ)M/8σ N i = arg min ti. (28) ≤ e . i∈{1,2,...,R} 2

10 SNR = 10dB V. ESTIMATION WITH COMPRESSIVE SNR = 15dB 0.6 MEASUREMENTS SNR = 20dB SNR = 25dB A. Problem Setup and Applications 0.5 While many signal processing problems can be re- 0.4 duced to a detection or classification problem, in some E

P cases we cannot reduce our task to selecting among a 0.3 finite set of hypotheses. Rather, we might be interested in estimating some function of the data. In this section 0.2 we will focus on estimating a linear function of the data from compressive measurements. 0.1 Suppose that we observe y = Φs and wish to estimate N 0 h`, si from the measurements y, where ` ∈ R is a 0 0.2 0.4 0.6 0.8 1 M/N fixed test vector. In the case where Φ is a random matrix, a natural estimator is essentially the same as the Fig. 6. Effect of M on PE (the probability of error of a compressive domain classifier) for R = 3 signals at several different SNR levels, compressive detector. Specifically, suppose we have a 2 2 where SNR = 10 log10(d /σ ). set L of |L| linear functions we would like to estimate from y. Example applications include computing the coefficients of a basis or frame representation of the Finally, because ti∗ is compared to R − 1 other candi- signal, estimating the signal energy in a particular linear dates, we use a union bound to conclude that (28) holds subspace, parametric modeling, and so on. One potential with probability exceeding that given in (27). estimator for this scenario, which is essentially a simple We see from the above that, within the M-dimensional generalization of the compressive detector in (12), is measurement subspace (as mapped to by PΦT ), we will given by have a compaction of distances between points in S by a N T T −1 p y (ΦΦ ) Φ`i, (29) factor of approximately M/N. However, the variance M of the additive noise in this subspace is unchanged. In for i = 1, 2,..., |L|. While this approach, which we other words, the noise present in the test statistics does shall refer to as the orthogonalized estimator, has certain not decrease, but the relative sizes of the test statistics advantages, it is also enlightening to consider an even do. Hence, just as in detection (see equation (20)), the simpler estimator, given by probability of error of our classifier will increase upon projection to a lower-dimensional space in a way that hy, Φ`ii . (30) depends on the SNR and the number of measurements. However, it is again important to note that in a high-SNR We shall refer to this approach as the direct estimator regime, we may be able to successfully distinguish be- since it eliminates the orthogonalization step by directly tween the different classes with very few measurements. correlating the compressive measurements with Φ`i. We will provide a more detailed experimental comparison of these two approaches below, but in the proof of Theorem 4 we focus only on the direct estimator. C. Experiments and Discussion

In Figure 6 we display experimental results for classi- B. Theory fication among R = 3 test signals of length N = 1000. We now provide bounds on the performance of our The signals s , s , and s are drawn according to a 1 2 3 simple estimator.4 This bound is a generalization of Gaussian distribution with mean 0 and variance 1 and Lemma 2.1 of [22] to the case where h`, si= 6 0. then fixed. For each value of M, a single Gaussian Φ is drawn and then PE is computed by averaging the results Theorem 4. Suppose that ` ∈ L and s ∈ S and that Φ over 106 realizations of the noise vector n. The error is a δ-stable embedding of (L, S ∪ −S), then rates are very similar in spirit to those for detection (see Figure 4). The results agree with Theorem 3, in which |hΦ`, Φsi − h`, si| ≤ δk`k2ksk2. (31) we demonstrate that just as was the case for detection, as 4Note that the same guarantee can be established for the orthogo- M increases the probability of error decays expoentially p nalized estimator under the assumption that N/MPΦT is a δ-stable fast. This also agrees with the related results of [16]. embedding of (L, S ∪ −S).

11 Proof: We first assume that k`k2 = ksk2 = 1. Since While Theorem 4 suggests that the absolute error 2 2 2 in estimating h`, si must scale with k`k2ksk2, this is k` ± sk2 = k`k2 + ksk2 ± 2h`, si = 2 ± 2h`, si probably the best we can expect. If the k`k2ksk2 terms and since Φ is a δ-stable embedding of both (L, S) and were omitted on the right hand side of (31), one could (L, −S), we have that estimate h`, si with arbitrary accuracy using the follow- 2 ing strategy: (i) choose a large positive constant Cbig, kΦ` ± Φsk2 1 − δ ≤ ≤ 1 + δ. (ii) estimate the inner product hCbig`, Cbigsi, obtaining 2 ± 2h`, si 2 an accuracy δ, and then (iii) divide the estimate by Cbig δ From the parallelogram identity we obtain to estimate h`, si with accuracy 2 . Similarly, it is not Cbig 1 2 2 possible to replace the right hand side of (31) with an hΦ`, Φsi = kΦ` + Φsk − kΦ` − Φsk 4 2 2 expression proportional merely to h`, si, as this would (1 + h`, si)(1 + δ) − (1 − h`, si)(1 − δ) imply that hΦ`, Φsi = h`, si exactly when h`, si = 0, ≤ 2 and unfortunately this is not the case. (Were this possible, = h`, si + δ. one could exploit this fact to immediately identify the non-zero locations in a sparse signal by letting `i = ei, Similarly, one can show that hΦ`, Φsi ≥ h`, si −δ. Thus the ith canonical basis vector, for i = 1, 2,...,N.) −δ ≤ hΦ`, Φsi − h`, si ≤ δ. C. Experiments and Discussion From the bilinearity of the inner product the result In Figure 7 we display the average estimation follows for `, s with arbitrary norm. error for the orthogonalized and direct estimators, One way of interpreting our result is that the angle i.e., (N/M)sT ΦT (ΦΦT )−1Φ` − h`, si /ksk k`k and between two vectors can be estimated accurately; this is 2 2 |hΦ`, Φsi − h`, si| /ksk k`k respectively. The signal s formalized as follows. 2 2 is a length N = 1000 vector with entries distributed Corollary 2. Suppose that ` ∈ L and s ∈ S and that according to a Gaussian distribution with mean 1 and 1 1 1 T Φ is a δ-stable embedding of (L ∪ {0}, S ∪ −S ∪ {0}). unit variance. We choose ` = [ N N ··· N ] to compute Then the mean of s. The result displayed is the mean error |cos ](Φ`, Φs) − cos ](`, s)| ≤ 2δ, averaged over 104 different draws of Gaussian Φ with s fixed. Note that we obtain nearly identical results for where (·, ·) denotes the angle between two vectors. ] other candidate `, including ` both highly correlated Proof: Using the standard relationship between in- with s and ` nearly orthogonal to s. In all cases, as M ner products and angles, we have increases, the error decays because the random matrices h`, si Φ become δ-stable embeddings of {s} for smaller values cos (`, s) = of δ. Note that for small values of M, there is very ] k`k ksk 2 2 little difference between the orthogonalized and direct and estimators. The orthogonalized estimator only provides hΦ`, Φsi cos ](Φ`, Φs) = . notable improvement when M is large, in which case kΦ`k2 kΦsk2 the computational difference is significant. In this case Thus, from (31) we have one must weigh the relative importance of speed versus accuracy in order to judge which approach is best, so hΦ`, Φsi − cos (`, s) ≤ δ. (32) the proper choice will ultimately be dependent on the k`k ksk ] 2 2 application. Now, using (5), we can show that In the case where |L| = 1, Theorem 4 is a deter- (1 − δ) 1 (1 + δ) ministic version of Theorem 4.5 of [45] and Lemma ≤ ≤ , 3.1 of [46], which both show that for certain random kΦ`k kΦsk k`k ksk kΦ`k kΦsk 2 2 2 2 2 2 constructions of Φ, with probability at least 1 − β, from which we infer that |hΦ`, Φsi − h`, si| ≤ δ k`k2 ksk2 . (34) hΦ`, Φsi hΦ`, Φsi hΦ`, Φsi − ≤ δ ≤ δ. In [45] β = 2δ2/M while in [46] more sophisticated k`k2 ksk2 kΦ`k2 kΦsk2 kΦ`k2 kΦsk2 (33) methods are used to achieve a bound on β of the 2 Therefore, combining (32) and (33) using the triangle form β ≤ 2e−cMδ as in (8). Our result extends these inequality, the desired result follows. results to a wider class of random matrices. Furthermore,

12 Direct forms a nonlinear K-dimensional manifold. The actual Orthogonalized 0.3 position of a given signal on the manifold reveals the value of the underlying parameter θ. It is now under- 0.25 stood [47] that, because random projections can provide δ-stable embeddings of nonlinear manifolds using M = 0.2  K log N  O δ2 measurements [31], the task of estimating position on a manifold can also be addressed in the 0.15 compressive domain. Recovery bounds on kθ −θk2 akin

Estimation error b 0.1 to (4) can be established; see [47] for more details.

0.05 VI.FILTERING WITH COMPRESSIVE MEASUREMENTS 0 0 0.2 0.4 0.6 0.8 1 M/N A. Problem Setup and Applications Fig. 7. Average error in the estimate of the mean of a fixed signal s. In practice, it is often the case that the signal we wish to acquire is contaminated with interference. The universal nature of compressive measurements, while our approach generalizes naturally to simultaneously often advantageous, can also increase our susceptibility estimating multiple linear functions of the data. to interference and significantly affect the performance Specifically, it is straightforward to extend our analysis of algorithms such as those described in Sections III– beyond the estimation of scalar-valued linear functions V. It is therefore desirable to remove unwanted signal to more general linear operators. Any finite-dimensional components from the compressive measurements before linear operator on a signal x ∈ N can be represented R they are processed further. as a matrix multiplication Lx, where L has size Z × N More formally, suppose that the signal x ∈ N for some Z. Decomposing L in terms of its rows, this R consists of two components: computation can be expressed as

 T    x = xS + xI , `1 h`1, xi T  `2   h`2, xi  x signal of interest x Lx =   x =   . where S represents the and I rep-  .   .   .   .  resents an unwanted signal that we would like to reject. T We refer to xI as interference in the remainder of `Z h`Z , xi this section, although it might be the signal of interest From this point, the bound (31) can be applied to each for a different system module. Supposing we acquire component of the resulting vector. It is also interesting measurements of both components simultaneously to note that by setting L = I, then we can observe that T y = Φ(xS + xI ), (35) kΦ Φx − xk∞ ≤ δkxk2. our goal is to remove the contribution of x from the This could be used to establish deterministic bounds I measurements y while preserving the information about on the performance of the thresholding signal recovery x . In this section, we will assume that x ∈ S and algorithm described in [46], which simply thresholds S S S that x ∈ S . In our discussion, we will further assume ΦT y to keep only the K largest elements. I I that Φ is a δ-stable embedding of (S , S ), where S is We can also consider more general estimation prob- eS I eS a set with a simple relationship to S and S . lems in the context of parameterized signal models. S I While one could consider more general interference Suppose, for instance, that a K-dimensional parameter models, we restrict our attention to the case where either θ controls the generation of a signal s = s , and θ the interfering signal or the signal of interest lives in a we denote by Θ the K-dimensional space to which known subspace. For example, suppose we have obtained the parameter θ belongs. Common examples of such measurements of a radio signal that has been corrupted articulations include the angle and orientation of an edge by narrow band interference such as a TV or radio in an image (K = 2), or the start and end frequencies station operating at a known carrier frequency. In this and times of a linear radar chirp (K = 4). In cases such case we can project the compressive measurements into as these, the set of possible signals of interest a subspace orthogonal to the interference, and hence N M := {sθ : θ ∈ Θ} ⊂ R eliminate the contribution of the interference to the

13 measurements. We further demonstrate that provided that SS ⊕ SI we can write x = xeS + xeI , where xeS ∈ SeS and the signal of interest is orthogonal to the set of possible xeI ∈ SI . Then interference signals, the projection operator maintains a P ⊥ Φx = P ⊥ Φx (38) stable embedding for the set of signals of interest. Thus, Ω Ω eS the projected measurements retain sufficient information and to enable the use of efficient compressive-domain algo- PΩΦx = ΦxeI + PΩΦxeS. (39) rithms for further processing. Furthermore, 2 δ kPΩ⊥ ΦxeSk2 B. Theory 1 − ≤ 2 ≤ 1 + δ (40) 1 − δ kxeSk2 We first consider the case where S is a K - I I and dimensional subspace, and we place no restrictions on 2 kPΩΦxSk2 1 + δ S e ≤ δ2 . (41) the set S. We will later see that by symmetry the kx k2 (1 − δ)2 methods we develop for this case will have implications eS 2 for the setting where SS is a KS-dimensional subspace Proof: We begin by observing that since SeS and SI and where SI is a more general set. are orthogonal, the decomposition x = xeS +xeI is unique. We filter out the interference by constructing a linear Furthermore, since xeI ∈ SI , we have that ΦxeI ∈ R(Ω) operator P that operates on the measurements y. The and hence by the design of PΩ⊥ , PΩ⊥ ΦxeI = 0 and design of P is based solely on the measurement matrix PΩΦxeI = ΦxeI , which establishes (38) and (39). Φ and knowledge of the subspace SI . Our goal is to In order to establish (40) and (41), we decompose ΦxeS construct a P that maps ΦxI to zero for any xI ∈ SI . as ΦxeS = PΩΦxeS +PΩ⊥ ΦxeS. Since PΩ is an orthogonal To simplify notation, we assume that ΨI is an N × KI projection we can write matrix whose columns form an orthonormal basis for the 2 2 2 kΦxSk2 = kPΩΦxSk2 + kPΩ⊥ ΦxSk2. (42) KI -dimensional subspace SI , and we define the M ×KI e e e Ω = ΦΨ Ω† := (ΩT Ω)−1ΩT T 2 matrix I . Letting denote the Furthermore, note that PΩ = PΩ and PΩ = PΩ, so that Ω Moore-Penrose pseudoinverse of , we define 2 hPΩΦxS, ΦxSi = kPΩΦxSk2. (43) † e e e PΩ = ΩΩ (36) Since PΩ is a projection onto R(Ω) there exists a z ∈ SI and such that PΩΦxeS = Φz. Since xeS ∈ SeS, we have that † hxS, zi = 0, and since SI is a subspace, SI = SI ∪−SI , PΩ⊥ = I − PΩ = I − ΩΩ . (37) e and so we may apply Theorem 4 to obtain The resulting PΩ⊥ is our desired operator P : it is an orthogonal projection operator onto the orthogonal |hPΩΦxeS, ΦxeSi| = |hΦz, ΦxeSi| ≤ δkzk2kxeSk2. R(Ω) R(Ω) complement of , and its nullspace equals . Since 0 ∈ SI and Φ is a δ-stable embedding of (SeS ∪ Using Theorem 4, we now show that the fact that Φ is {0}, SI ), we have that a stable embedding allows us to argue that PΩ⊥ preserves ⊥ kΦzk2kΦxSk2 the structure of SeS = PS⊥ SS (where SI denotes the kzk kx k ≤ e . I 2 eS 2 1 − δ orthogonal complement of SI and P ⊥ denotes the SI ⊥ Recalling that Φz = P Φx , we obtain orthogonal projection onto SI ), while simultaneously Ω eS 5 cancelling out signals from SI . Additionally, PΩ pre- |hP Φx , Φx i| δ Ω eS eS ≤ . serves the structure in SI while nearly cancelling out kPΩΦxeSk2kΦxeSk2 1 − δ signals from SeS. Combining this with (43), we obtain Theorem 5. Suppose that Φ is a δ-stable embedding of δ (SeS ∪ {0}, SI ), where SI is a KI -dimensional subspace kPΩΦxSk2 ≤ kΦxSk2. N e 1 − δ e of R with orthonormal basis ΨI . Set Ω = ΦΨI and √ define PΩ and PΩ⊥ as in (36) and (37). For any x ∈ Since xeS ∈ SeS, kΦxeSk2 ≤ 1 + δkxeSk2, and thus we obtain (41). Since we trivially have that kPΩΦxSk2 ≥ 0, 5 e Note that we do not claim that PΩ⊥ preserves the structure of SS , we can combine this with (42) to obtain but rather the structure of SeS . This is because we do not restrict SS 2! to be orthogonal to the subspace SI which we cancel. Clearly, we   δ 2 2 2 cannot preserve the structure of the component of SS that lies within 1 − kΦxeSk2 ≤ kPΩ⊥ ΦxeSk2 ≤ kΦxeSk2. SI while simultaneously eliminating interference from SI . 1 − δ

14 Again, since xeS ∈ SeS, we have that our approach provides an alternative method for solving and analyzing the problem of CS recovery with partially  2! 2 δ kPΩ⊥ ΦxeSk2 known support considered in [48]. Furthermore, this 1 − (1 − δ) ≤ 2 ≤ 1 + δ, 1 − δ kxeSk2 result can also be useful in analyzing iterative recovery algorithms (in which the signal coefficients identified in which simplifies to yield (40). previous iterations are treated as interference) or in the Corollary 3. Suppose that Φ is a δ-stable embedding of case where we wish to recover a slowly varying signal (SeS ∪ {0}, SI ), where SI is a KI -dimensional subspace as it evolves in time, as in [49]. N of R with orthonormal basis ΨI . Set Ω = ΦΨI and This cancel-then-recover approach to signal recovery define PΩ and PΩ⊥ as in (36) and (37). Then PΩ⊥ Φ is has a number of advantages. Observe that if we attempt a δ/(1 − δ)-stable embedding of (SeS, {0}) and PΩΦ is to first recover x and then cancel xI , then we require a δ-stable embedding of (SI , {0}). the RIP of order 2(KS +KI ) to ensure that the recover- then-cancel approach will be successful. In contrast, Proof: This follows from Theorem 5 by picking x ∈ filtering out xI followed by recovery of xS requires the SS, in which case x = xS, or picking x ∈ SI , in which e e RIP of order only 2KS + KI . In certain cases (when case x = xI . e KI is significantly larger than KS), this results in a Theorem 5 and Corollary 3 have a number of practical substantial decrease in the required number of measure- benefits. For example, if we are interested in solving an ments. Furthermore, since all recovery algorithms have inference problem based only on the signal xS, then we computational complexity that is at least linear in the can use PΩ or PΩ⊥ to filter out the interference and then sparsity of the recovered signal, this can also result in apply the compressive domain inference techniques de- substantial computational savings for signal recovery. veloped above. The performance of these techniques will be significantly improved by eliminating the interference C. Experiments and Discussion due to xI . Furthermore, this result also has implications for the problem of signal recovery, as demonstrated by In this section we evaluate the performance of the following corollary. the cancel-then-recover approach suggested by Corol- lary 4. Rather than `1-minimization we use the iterative Corollary 4. Suppose that Ψ is an orthonormal ba- CoSaMP greedy algorithm [7] since it more naturally N sis for R and that Φ is a δ-stable embedding of naturally lends itself towards a simple modification (Ψ(Σ2KS ), R(ΨI )), where ΨI is an N × KI submatrix described below. More specifically, we evaluate three of Ψ. Set Ω = ΦΨI and define PΩ and PΩ⊥ as in (36) interference cancellation approaches: and (37). Then P ⊥ Φ is a δ/(1 − δ)-stable embedding Ω 1) Cancel-then-recover: This is the approach advo- of (P ⊥ Ψ(Σ ), {0}). R(ΨI ) 2KS cated in this paper. We cancel out the contribution Proof: This follows from the observation that of xI to the measurements y and directly recover

⊥ xS using the CoSaMP algorithm. PR(ΨI ) Ψ(Σ2KS ) ⊂ Ψ(Σ2KS ) and then applying Corollary 3. 2) Modified recovery: Since we know the support We emphasize that in the above Corollary, of xI , rather than cancelling out the contribution

⊥ from xI to the measurements, we modify a greedy PR(ΨI ) Ψ(Σ2KS ) will simply be the original family of algorithm such as CoSaMP to exploit the fact that sparse signals but with zeros in positions√ indexed√ by part of the support of x is known in advance. This ΨI . One can easily√ verify that if δ ≤ ( 2 − 1)/ 2, then δ/(1 − δ) ≤ 2 − 1, and thus Corollary 4 is modification is made simply by forcing CoSaMP sufficient to ensure that the conditions for Theorem 1 to always keep the elements of J in the active set are satisfied. We therefore conclude that under a slightly at each iteration. After recovering xb, we then set more restrictive bound on the required RIP constant, xbn = 0 for n ∈ J to filter out the interference. we can directly recover a sparse signal of interest xS 3) Recover-then-cancel: In this approach, we ignore that is orthogonal to the interfering xI without actually that we know the support of xI and try to recover recovering xI . Note that in addition to filtering out the signal x using the standard CoSaMP algorithm, true interference, this framework is also relevant to the and then set xbn = 0 for n ∈ J as before. problem of signal recovery when the support is partially In our experiments, we set N = 1000, M = 200, known, in which case the known support defines a and KS = 10. We then considered values of KI from subspace that can be thought of as interference to be 1 to 100. We choose SS and SI by selecting random, rejected prior to recovering the remaining signal. Thus, non-overlapping sets of indices, so in this experiment,

15 14 0.7 Cancel−then−recover Modified recovery 0.6 12 Recover−then−cancel

0.5 10 0.4 8 0.3

6 Recovery Time (s)

Recovered SNR (dB) 0.2 Oracle 4 Cancel−then−recover 0.1 Modified recovery Recover−then−cancel 2 0 0 2 4 6 8 10 0 2 4 6 8 10 K /K K /K I S I S

Fig. 8. SNR of xS recovered using the three different cancellation ap- Fig. 9. Recovery time for the three different cancellation approaches proaches for different ratios of KI to KS compared to the performance for different ratios of KI to KS . of an oracle.

projections PΩ and PΩ⊥ can leverage the structure of SS and SI are orthogonal (although they need not be Φ in order to ease the computational cost of applying in general, since S will always be orthogonal to S ). eS I PΩ and PΩ⊥ . For example, Φ may consist of random For each value of KI , we generated 2000 test signals rows of a Discrete Fourier Transform or a permuted where the coefficients were selected according to a Hadamard Transform matrix. In such a scenario, there Gaussian distribution and then contaminated with an are fast transform-based implementations of Φ and ΦT . N-dimensional Gaussian noise vector. For comparison, By observing that we also considered an oracle decoder that is given the T T −1 T T support of both xI and xS and solves the least-squares PΩ = ΦΨI (ΨI Φ ΦΨI ) ΨI Φ problem restricted to the known support set. We considered a range of signal-to-noise ratios we see that one can use the conjugate gradient method (SNRs) and signal-to-interference ratios (SIRs). Figure 8 or Richardson iteration to efficiently compute PΩy and shows the results for the case where xS and xI are by extension PΩ⊥ y [7]. normalized to have equal energy (an SIR of 0dB) and where the variance of the noise is selected so that the VII.CONCLUSIONS SNR is 15dB. Our results were consistent for a wide range of SNR and SIR values, and we omit the plots In this paper, we have taken some first steps towards due to space limitations. a theory of compressive signal processing (CSP) by Our results show that the cancel-then-recover ap- showing that compressive measurements can be effective proach performs significantly better than both of the for a variety of detection, classification, estimation, and other methods as KI grows larger than KS, in fact, filtering problems. We have provided theoretical bounds the cancel-then-recover approach to performs almost as backed up by experimental results that indicate that in well as the oracle decoder for the entire range of KI . many applications it can be more efficient and accurate to We also note that while the modified recovery method extract information directly from a signal’s compressive did perform slightly better than the recover-then-cancel measurements than first recover the signal and then approach, the improvement is relatively minor. extract the information. It is important to reemphasize We observe similar results in Figure 9 for the recov- that our techniques are universal and agnostic to the ery time (which includes the cost of computing P in signal structure and provide deterministic guarantees for the cancel-then-recover approach), with the cancel-then- a wide variety of signal classes. recover approach performing significantly faster than the In the future we hope to provide a more detailed other approaches as KI grows larger than KS. analysis of the classification setting and consider more We also note that in the case where Φ admits a general models, as well as consider detection, classifi- fast transform-based implementation (as is often the cation, and estimation settings that utilize more specific case for the constructions described in Section II-D) the models, such as sparsity or manifold structure.

16 REFERENCES [22] E. Candes,` “The restricted isometry property and its implications for compressed sensing,” Comptes rendus de l’Academie´ des [1] D. Healy, “Analog-to-information,” 2005, BAA #05-35. Sciences, Serie´ I, vol. 346, no. 9-10, pp. 589–592, 2008. Available from http://www.darpa.mil/mto/solicitations/baa05-35/ [23] V. Buldygin and Y. Kozachenko, Metric Characterization of s/index.html. Random Variables and Random Processes, AMS, 2000. [2] R. Walden, “Analog-to-digital converter survey and analysis,” [24] R. DeVore, G. Petrova, and P. Wojtaszczyk, “Instance-optimality IEEE J. Selected Areas in Comm., vol. 17, no. 4, pp. 539–550, in probability with an `1-minimization decoder,” Appl. Comput. 1999. Harmon. Anal., vol. 27, no. 3, pp. 275–288, 2009. [3] E. Candes,` J. Romberg, and T. Tao, “Robust uncertainty [25] M. Davenport, “Concentration of measure and sub-gaussian dis- principles: Exact signal reconstruction from highly incomplete tributions,” 2009, Available from http://cnx.org/content/m32583/ frequency information,” IEEE Trans. Inform. Theory, vol. 52, latest/. no. 2, pp. 489–509, 2006. [26] W. Johnson and J. Lindenstrauss, “Extensions of Lipschitz [4] D. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, mappings into a Hilbert space,” in Proc. Conf. Modern Anal. vol. 52, no. 4, pp. 1289–1306, 2006. and Prob., New Haven, CT, Jun 1982. [5] J. Tropp and A. Gilbert, “Signal recovery from partial infor- [27] S. Dasgupta and A. Gupta, “An elementary proof of the Johnson- mation via orthogonal matching pursuit,” IEEE Trans. Inform. Lindenstrauss lemma,” Tech. Rep. TR-99-006, Berkeley, CA, Theory, vol. 53, no. 12, pp. 4655–4666, 2007. 1999. [6] D. Needell and R. Vershynin, “Uniform uncertainty principle [28] D. Achlioptas, “Database-friendly random projections,” in Proc. and signal recovery via regularized orthogonal matching pursuit,” Symp. Principles of Database Systems (PODS), Santa Barbara, Found. Comput. Math., vol. 9, no. 3, pp. 317–334, 2009. CA, May 2001. [7] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery [29] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple from incomplete and inaccurate samples,” Appl. Comput. Har- proof of the restricted isometry property for random matrices,” mon. Anal., vol. 26, no. 3, pp. 301–321, 2009. Const. Approx., vol. 28, no. 3, pp. 253–263, Dec. 2008. [8] A. Cohen, W. Dahmen, and R. DeVore, “Instance optimal [30] G. Lorentz, M. Golitschek, and Makovoz, Springer-Verlag, 1996. decoding by thresholding in compressed sensing,” 2008, Preprint. [31] R. Baraniuk and M. Wakin, “Random projections of smooth [9] T. Blumensath and M. Davies, “Iterative hard thresholding for manifolds,” Found. Comput. Math., vol. 9, no. 1, pp. 51–77, compressive sensing,” Appl. Comput. Harmon. Anal., vol. 27, 2009. no. 3, pp. 265–274, 2009. [32] P. Indyk and A. Naor, “Nearest-neighbor-preserving embed- [10] J. Treichler, M. Davenport, and R. Baraniuk, “Application of dings,” ACM Trans. Algorithms, vol. 3, no. 3, 2007. compressive sensing to the design of wideband signal acquisition [33] P. Agarwal, S. Har-Peled, and H. Yu, “Embeddings of surfaces, receivers,” in U.S./Australia Joint Work. Defense Apps. of Signal curves, and moving points in euclidean space,” in Proc. Symp. Processing (DASP), Lihue, Hawaii, Sept. 2009. Comput. Geometry, Gyeongju, South Korea, Jun. 2007. [11] S. Muthukrishnan, Data Streams: Algorithms and Applications, [34] S. Dasgupta and Y. Freund, “Random projection trees and now, 2005. low dimensional manifolds,” in Proc. ACM Symp. Theory of [12] M. Duarte, M. Davenport, M. Wakin, and R. Baraniuk, “Sparse Computing (STOC), Victoria, BC, May 2008. signal detection from incoherent projections,” in Proc. IEEE [35] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk, “Be- Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), yond Nyquist: Efficient sampling of sparse, bandlimited signals,” Toulouse, France, May 2006. to appear in IEEE Trans. Inform. Theory, 2009. [13] M. Davenport, M. Duarte, M. Wakin, J. Laska, D. Takhar, [36] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk, K. Kelly, and R. Baraniuk, “The smashed filter for compressive “Random filters for compressive sampling and reconstruction,” in classification and target recognition,” in Proc. SPIE Symp. Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing Electronic Imaging: Comput. Imaging, San Jose, CA, Jan. 2007. (ICASSP), Toulouse, France, May 2006. [14] M. Duarte, M. Davenport, M. Wakin, J. Laska, D. Takhar, [37] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, “Toeplitz- K. Kelly, and R. Baraniuk, “Multiscale random projections for structured compressed sensing matrices,” in Proc. IEEE Work. compressive classification,” in Proc. IEEE Int. Conf. Image Statistical Signal Processing (SSP), Madison, WI, Aug. 2007. Processing (ICIP), San Antonio, TX, Sept. 2007. [38] J. Romberg, “Compressive sensing by random convolution,” to [15] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust appear in SIAM J. Imaging Sciences, 2009. face recognition via sparse representation,” IEEE Trans. Pattern [39] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, Anal. and Machine Intel., vol. 31, no. 2, pp. 210–227, 2009. and R. Baraniuk, “Single-pixel imaging via compressive sam- [16] J. Haupt, R. Castro, R. Nowak, G. Fudge, and A. Yeh, “Com- pling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 83–91, pressive sampling for signal classification,” in Proc. Asilomar 2008. Conf. Signals, Systems and Computers, Pacific Grove, CA, Nov. [40] R. Robucci, L. Chiu, J. Gray, J. Romberg, P. Hasler, and D. An- 2006. derson, “Compressive sensing on a CMOS separable transform [17] J. Haupt and R. Nowak, “Compressive sampling for signal image sensor,” in Proc. IEEE Int. Conf. Acoustics, Speech, and detection,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Signal Processing (ICASSP), Las Vegas, NV, Apr. 2008. Processing (ICASSP), Honolulu, HI, Apr. 2007. [41] R. Marcia, Z. Harmany, and R. Willett, “Compressive coded [18] M. Davenport, M. Wakin, and R. Baraniuk, “Detection and esti- aperture imaging,” in Proc. SPIE Symp. Electronic Imaging: mation with compressive measurements,” Tech. Rep. TREE0610, Comput. Imaging, San Jose, CA, Jan. 2009. Rice University ECE Department, 2006. [42] S. Kay, Fundamentals of Statistical Signal Processing, Volume [19] M. Davenport, P. Boufounos, and R. Baraniuk, “Compressive 2: Detection Theory, Prentice Hall, 1998. domain interference cancellation,” in Structure et parcimonie [43] L. Scharf, Statistical Signal Processing: Detection, Estimation, pour la representation´ adaptative de signaux (SPARS), Saint- and Time Series Analysis, Addison-Wesley, 1991. Malo, France, Apr. 2009. [44] N. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate [20] E. Candes,` “Compressive sampling,” in Proc. Int. Congress of Distributions, Volume 1, Wiley, 1994. Mathematics, Madrid, Spain, Aug. 2006. [45] N. Alon, P. Gibbons, Y. Matias, and M. Szegedy, “Tracking join [21] E. Candes` and T. Tao, “Decoding by linear programming,” IEEE and self-join sizes in limited storage,” in Proc. Symp. Principles Trans. Inform. Theory, vol. 51, no. 12, pp. 4203–4215, 2005. of Database Systems (PODS), Philadelphia, PA, May 1999.

17 [46] H. Rauhut, K. Schnass, and P. Vandergheynst, “Compressed Michael B. Wakin received the B.S. degree sensing and redundant dictionaries,” IEEE Trans. Inform. Theory, in electrical engineering and the B.A. degree vol. 54, no. 5, pp. 2210–2219, 2008. in mathematics in 2000 (summa cum laude), [47] M. Wakin, “Manifold-based signal recovery and parameter the M.S. degree in electrical engineering in estimation from compressive measurements,” 2008, Preprint. 2002, and the Ph.D. degree in electrical en- [48] N. Vaswani and W. Lu, “Modified-CS: Modifying compressive gineering in 2007, all from Rice University. sensing for problems with partially known support,” in Proc. He was an NSF Mathematical Sciences Post- IEEE Int. Symp. Inform. Theory (ISIT), Seoul, Korea, Jun. 2009. doctoral Research Fellow at the California [49] N. Vaswani, “Analyzing least squares and Kalman filtered Institute of Technology from 2006-2007 and compressed sensing,” in Proc. IEEE Int. Conf. Acoustics, Speech an Assistant Professor at the University of and Signal Processing (ICASSP), Taipei, Taiwan, Apr. 2009. Michigan in Ann Arbor from 2007-2008. He is now an Assistant Professor in the Division of Engineering at the Colorado School of Mines. His research interests include sparse, ge- ometric, and manifold-based models for signal and image processing, approximation, compression, compressive sensing, and dimensionality reduction. In 2007, Dr. Wakin shared the Hershel M. Rich Invention Award from Rice University for the design of a single-pixel camera based on compressive sensing, and in 2008, Dr. Wakin received Mark A. Davenport received the B.S.E.E. the DARPA Young Faculty Award for his research in compressive degree in Electrical and Computer Engineer- multi-signal processing for environments such as sensor and camera networks. ing in 2004 and the M.S. degree in Electrical Richard G. Baraniuk received the BSc de- and Computer Engineering in 2007, both gree in 1987 from the University of Manitoba from Rice University. He is currently a Ph.D. (Canada), the MSc degree in 1988 from the student in the Department of Electrical and University of Wisconsin-Madison, and the Computer Engineering at Rice. His research PhD degree in 1992 from the University of interests include compressive sensing, non- Illinois at Urbana-Champaign, all in Electri- linear approximation, and the application of cal Engineering. After spending 1992–1993 low-dimensional signal models to a variety of with the Signal Processing Laboratory of problems in signal processing and machine Ecole Normale Superieure,´ in Lyon, France, learning. In 2007, he shared the Hershel M. Rich Invention Award he joined Rice University, where he is cur- from Rice for his work on the single-pixel camera and compressive rently the Victor E. Cameron Professor of sensing. He is also co-founder and an editor of Rejecta Mathematica. Electrical and Computer Engineering. He spent sabbaticals at Ecole Nationale Superieure´ de Tel´ ecommunications´ in Paris in 2001 and Ecole Fed´ erale´ Polytechnique de Lausanne in Switzerland in 2002. His research interests lie in the area of signal and image processing. He has been a Guest Editor of several special issues of the IEEE Signal Processing Magazine, IEEE Journal of Special Topics in Signal Processing, and the Proceedings of the IEEE and has served as technical program chair or on the technical program committee for several IEEE Petros T. Boufounos completed his under- workshops and conferences. graduate and graduate studies at MIT. He re- In 1999, Dr. Baraniuk founded Connexions (cnx.org), a non-profit ceived the S.B. degree in Economics in 2000, publishing project that invites authors, educators, and learners world- the S.B. and M.Eng. degrees in Electrical En- wide to “create, rip, mix, and burn” free textbooks, courses, and gineering and Computer Science (EECS) in learning materials from a global open-access repository. 2002, and the Sc.D. degree in EECS in 2006. Dr. Baraniuk received a NATO postdoctoral fellowship from NSERC Since January 2009 he is with Mitsubishi in 1992, the National Young Investigator award from the National Electric Research Laboratories (MERL) in Science Foundation in 1994, a Young Investigator Award from the Cambridge, MA. He is also a visiting scholar Office of Naval Research in 1995, the Rosenbaum Fellowship from at the Rice University Electrical and Com- the Isaac Newton Institute of Cambridge University in 1998, the C. puter Engineering department. Holmes MacDonald National Outstanding Teaching Award from Eta Between September 2006 and December 2008, Dr. Boufounos was Kappa Nu in 1999, the Charles Duncan Junior Faculty Achievement a postdoctoral associate with the Digital Signal Processing Group at Award from Rice in 2000, the University of Illinois ECE Young Rice University doing research in Compressive Sensing. In addition to Alumni Achievement Award in 2000, the George R. Brown Award for Compressive Sensing, his immediate research interests include signal Superior Teaching at Rice in 2001, 2003, and 2006, the Hershel M. processing, data representations, frame theory, and machine learning Rich Invention Award from Rice in 2007, the Wavelet Pioneer Award applied to signal processing. He is also looking into how Compressed from SPIE in 2008, and the Internet Pioneer Award from the Berkman Sensing interacts with other fields that use sensing extensively, such Center for Internet and Society at Harvard Law School in 2008. He as robotics and mechatronics. Dr. Boufounos has received the Ernst A. was selected as one of Edutopia Magazine’s Daring Dozen educators in Guillemin Master Thesis Award for his work on DNA sequencing and 2007. Connexions received the Tech Museum Laureate Award from the the Harold E. Hazen Award for Teaching Excellence, both from the Tech Museum of Innovation in 2006. His work with Kevin Kelly on the MIT EECS department. He has also been an MIT Presidential Fellow. Rice single-pixel compressive camera was selected by MIT Technology Dr. Boufounos is a member of the IEEE, Sigma Xi, , Review Magazine as a TR10 Top 10 Emerging Technology in 2007. He and Phi Beta Kappa. was co-author on a paper with Matthew Crouse and Robert Nowak that won the IEEE Signal Processing Society Junior Paper Award in 2001 and another with Vinay Ribeiro and Rolf Riedi that won the Passive and Active Measurement (PAM) Workshop Best Student Paper Award in 2003. He was elected a Fellow of the IEEE in 2001 and a Plus Member of AAA in 1986.

18