Learning Sparse Representations for Adaptive Compressive Sensing

LEARNING SPARSE REPRESENTATIONS FOR ADAPTIVE COMPRESSIVE SENSING Akshay Soni and Jarvis Haupt University of Minnesota, Twin Cities Department of Electrical and Computer Engineering Minneapolis, Minnesota USA 55455 e-mail: {sonix022,jdhaupt}@umn.edu ABSTRACT a frame). In any case, when jSj is small relative to the ambient dimension n, we say that the signal x is sparse, or that it possesses a Breakthrough results in compressive sensing (CS) have shown that sparse representation in the dictionary D. high dimensional signals (vectors) can often be accurately recov- Initial results in compressive sensing (CS) established sparse ered from a relatively small number of non-adaptive linear projec- vectors can often be recovered from m n measurements, each in tion observations, provided that they possess a sparse representation the form of a randomized linear combination of the entries of x. The in some basis. Subsequent efforts have established that the recon- weights associated with each linear combination may, for example, struction performance of CS can be improved by employing addi- be selected as i.i.d. realizations of zero-mean random variables such tional prior signal knowledge, such as dependency in the location of as Gaussian or symmetric Bernoulli, and these random measure- the non-zero signal coefficients (structured sparsity) or by collecting ments can be modeled as inner products between the signal vector measurements sequentially and adaptively, in order to focus mea- and a sequence of randomly generated “test” vectors. Suppose, for surements into the proper subspace where the unknown signal re- the sake of illustration, that D is an orthonormal matrix. Then, sides. In this paper, we examine a powerful hybrid of adaptivity and the main result of CS is that signals that possess a sparse repre- structure. We identify a particular form of structured sparsity that is sentation with no more than s nonzero coefficients in this (known) amenable to adaptive sensing, and using concepts from sparse hier- dictionary D can, with high probability, be exactly recovered from archical dictionary learning we demonstrate that sparsifying dictio- m ≤ Cs log n so-called randomized projection measurements, naries exhibiting the appropriate form of structured sparsity can be where C ≥ 0 is a constant independent of s and n (see, eg., [2, 3]). learned from a collection of training data. The combination of these The salient point is that the number of measurements required for techniques (structured dictionary learning and adaptive sensing) re- exact reconstruction is on the order of the sparsity s, not the am- sults in an effective and efficient adaptive compressive acquisition bient dimension n, and when s n the savings in the number of approach which we refer to as LASeR (Learning Adaptive Sensing measurements required for recovery can be quite significant. Representations.) A number of subsequent efforts in CS have examined settings Index Terms— Compressive sensing, adaptive sensing, struc- where, in addition to being sparse, the representation of x in terms of tured sparsity, principal component analysis the dictionary D possesses some additional structure (see, for example, the tutorial article [4] and the references therein). The nonzero 1. INTRODUCTION coefficients may, for example, occur in clusters, or it may be the case that the presence of a particular coefficient in the representa- Motivated in large part by the surge of research activity in com- tion guarantees the presence of other coefficients, according to some pressive sensing [1–3], the identification of efficient sensing and re- a priori known dependency structure. This latter case of coefficient construction procedures for high dimensional inference problems re- dependency occurs, for example, in the wavelet coefficients of piece- mains an extremely active research area. The basic problem can be wise smooth signals and many natural images, where the nonzero n coefficients cluster across levels of a rooted connected tree. In any explained as follows. Let x 2 R represent some unknown signal, and suppose that x can be accurately represented using only a small case, taking advantage of this so-called structured sparsity has been n shown to result in further reductions in the number of measurements number of atoms di 2 R from some dictionary D, so that required for recovery of s-sparse n-dimensional vectors from ran- X domized projections. For example, it was shown in [5, 6] that in x = a d + , (1) i i these cases m ≤ C0k randomized measurements suffice for exact i2S recovery, where C0 ≥ 0 is another constant. The savings in this case amount to a reduction in the scaling behavior by a factor of log n, where jSj is small (relative to the ambient dimension n), the ai are the coefficients corresponding to the relative weight of the contribu- which can itself be a significant savings when n is large. tion of each of the di that contribute to the approximation, and the Several techniques for implementing some form of feedback in vector represents a (nominally small) modeling error. The dictio- the compressive measurement process have also been examined re- nary D may, for example, consist of all of the columns of an or- cently in the CS literature. These so-called sequential adaptive mea- thonormal matrix (eg., a discrete wavelet or Fourier transform ma- surement procedures attempt to glean some information about the trix), though other representations may be possible (eg., D may be unknown signal from initial compressive measurements, which is then used to shape subsequent test vectors in order to focus more This work was supported by grant DARPA/ONR N66001-10-1-4090. directly on the subspace in which the signal resides. Adaptive CS procedures have been shown to provide an improved resilience (rela- vector of ai restricted to the indices in the set g 2 G, the !g are tive to traditional CS) in the presence of additive measurement noise non-negative weights, and the norm can be either the `2 or `1 norm. (see, for example, [7–9], as well as the summary article [10] and the references therein). 2.2. Structured Sparsity and Adaptive Sensing In this paper, we examine a powerful hybrid of the notions of structured sparsity and adaptivity. Our approach, which we refer Suppose that we wish to efficiently sense and acquire a signal x 2 n to as Learning Adaptive Sensing Representations, or LASeR, en- R . If we know a priori that x possesses a sparse representation (1) tails the identification of dictionaries in which each element in a in some dictionary D having orthonormal columns, and that the co- given collection of training data exhibits a special form of structured efficients ai exhibit tree sparsity (as described above), then we may sparsity that is amenable to a particular adaptive compressive mea- acquire x using an efficient sequential adaptive sensing procedure, as surement strategy. This approach is described and evaluated in the follows. Without loss of generality, let the index 1 correspond to the remainder of this paper, which is organized as follows. Our dictio- root of the tree. Begin by initializing a stack (or queue) with the in- nary identification procedure which comprises an extension of tech- dex 1, and collect a measurement by projecting x onto the dictionary niques recently proposed in the literature on dictionary learning, is element d1; that is, obtain the measurement described in Section 2, along with our proposed adaptive sensing T X T T procedure. The performance of the LASeR procedure is evaluated y1 = d1 x = aid1 di + d1 (5) in Section 3, and conclusions and directions for future work are dis- i2S cussed in Section 4. T = a1 + d1 . (6) 2. LEARNING ADAPTIVE SENSING REPRESENTATIONS Notice that the last equality follows from the fact that we assumed D to have orthonormal columns, so that the sum reduces to the sin- 2.1. Structured Dictionary Learning gle coefficient ai. Now, perform a significance test on the measured n×p value y1 (eg., compare its amplitude to a threshold τ). If the mea- Consider a matrix X = [x1; : : : ; xp] 2 R , whose p columns of surement is deemed significant, then add the locations of its imme- ambient dimension n each represent a training vector from a collec- diate descendants to the stack (or queue). If the measurement is not tion of training data. Dictionary learning describes a general matrix deemed significant, then proceed by processing the stack (or queue) factorization problem, the goal of which is to to identify matrices to obtain the index of the next coefficient to observe (ie., the index D 2 n×q and A 2 q×p such that X ≈ DA. Such factoriza- R R of the next vector di to project x onto). Notice that using a stack tions are generally non-unique, so it is common to impose additional in the aforementioned process results in a depth-first traversal of the constraints on the coefficient matrix A — for example, requiring its q tree, while using a queue results in breadth-first traversal. Similar columns ai 2 R , i = 1; 2; : : : ; p, be sparse [11, 12]. Such condi- tree-based sequential sensing procedures have been proposed in the tions result in learned representations having the property that each context of rapid MRI imaging using non-Fourier encoding [15] and column of the training data matrix may be expressed as a sparse more recently in the context of adaptive compressive imaging via linear combination of dictionary atoms denoted by columns of the so-called Direct Wavelet Sensing [16]. matrix D. Overall, this type of factorization may be accomplished For LASeR, our goal is to learn tree-sparse approximations of by obtaining a (local) solution to an optimization of the form each element in a collection of training data, and then to apply the p aforementioned adaptive (compressive) sensing procedure to effi- X 2 fD; Ag = arg min kxi − Daik2 + λkaik1; (2) ciently obtain signals that are similar to the training data.

Load more