Abstract Compressed sensing (CS) enables people to ac- least twice the highest frequency present in a signal (so-called quire the compressed measurements directly and recover Nyquist rate) to retain signal information intact [1, 2]. The sparse or compressible signals faithfully even when the sam- Shannon’s theorem solves the problem in theory perfectly, yet pling rate is much lower than the Nyquist rate. However, the unfortunately it is not omnipotent. In many applications such pure random sensing matrices usually require huge memory as remote surveillance or spectroscopy, sampling in the result for storage and high computational cost for signal reconstruc- with Nyquist rate is expensive, or even physically impossible. tion. Many structured sensing matrices have been proposed So as long as the recovery performance achieves an accept- recently to simplify the sensing scheme and the hardware im- able level, people want to build devices which are capable plementation in practice. Based on the restricted isometry of acquiring samples at a necessary rate as low as possible. property and coherence, couples of existing structured sens- In some other applications, such as imaging system or video ing matrices are reviewed in this paper, which have special processing, sampling a large number of measurements seems structures, high recovery performance, and many advantages feasible. However, because of the limited storage space and such as the simple construction, fast calculation and easy using of advanced compression techniques, people often dis- hardware implementation. The number of measurements and card the most received data, and just save a small amount of

arXiv:1408.1391v1 [cs.IT] 7 Aug 2014 the universality of different structure matrices are compared. the compressed data (e.g., JEPG). Apparently it will waste Keywords Compressed sensing, structured sensing matri- lots of valuable sensing resources since the entire data are ces, RIP, coherence; sampled at first. Aiming at solving above problems, the compressed sens- ing (short for CS) theory [3–7] has become one of the hottest 1 Introduction research areas in signal processing since 2006. The research of CS has been growing very fast and it focuses on acquiring In the digital revolution, people are now employing various and reconstructing sparse or compressible signals. By using signal processing techniques and new sensing systems in gen- CS, compressed measurements can be acquired directly and eral electronic products with ever-increasing resolution and one may recover the original sparse or compressible signal fidelity. The conventional manners of sampling signals, im- faithfully even when the sampling rate is much lower than the ages, videos, or other data obey the celebrated Shannon’s the- Nyquist rate. An N-length signal x is regarded as sparse if x orem, that requires to sample a signal at a sampling rate at has K nonzero values and K  N. Compressible x means that x can be well-approximated by another sparse signal f Received month dd, yyyy; accepted month dd, yyyy in certain domain Ψ by using only K nonzero coefficients:

E-mail: [email protected] x = Ψf, |f|0 = K. Normally the traditional compression tech- Kezhi LI et al. State of the Art and Prospects of Structured Sensing Matrices in Compressed Sensing 2niques preserve the values and locations of the largest co- in the construction, calculation or hardware implementation. efficients, such as JPEG, JPEG2000, MPEG. While CS has For many of them, corresponding fast recovery algorithms more efficient sensing or sampling protocols that capture the have been developed by exploiting their specific structures. essential information content embedded in the original sig- Here we put our emphasis on how the structured matrices are nal and obtain the condensed data straightforwardly. More generated and what are their recovery performances. Based precisely, these protocols are nonadaptive linear transforms, on these, the number of measurements and the universality which can be represented by well-designed matrices, called of different structure matrices are compared. The paper will sensing matrices Φ. These matrices should be incoherent to help readers to understand the characteristics of popular sens- the sparsifying Ψ of the compressible signal. With ing matrices well and may inspire them to explore or pursuit the measurements and the sensing matrix, the process of ex- more efficient sensing schemes in the CS area. act reconstructing signals from a subset of measurements can The remaining paper is organized as follows. In Section be implemented by solving a nonlinear optimization prob- 2, we describe the core concepts of this paper: sensing ma- lem. The approaches to solving the nonlinear problems are trices, and introduce two prevalent criteria that examine the named reconstruction/recovery algorithms. From a mathe- effectiveness of sensing matrices: restricted isometry prop- matical point of view, compressed sensing is also deemed as erty and coherence. In Section 3, couples of structured sens- a technique of finding sparse solutions to underdetermined ing matrices are analyzed. The overview of applications is linear systems. discussed in Section 4. Finally, the prospects of structured The CS theory is a revolution in both the theory of reliable sensing matrices are discussed in Section 5 followed by the signal sampling and physical design of sensors. Since the conclusions in Section 6. original signal can be sensed from fewer linear projections rather than acquired in its initial domain, the sensing matri- ces play an important role in the CS framework. The prop- 2 Sensing Matrices erty of the sensing matrices affects the number of necessary In our real world, normally the useful signals are not random. measurements and the recovery performance directly. Early Images, videos or voices often contain specific structures and researchers proved that a random projection is one of the best strong correlation among pixels, frames or samples. These solutions [5, 8]. The projection matrices are generated by or- structures and correlations are the assumptions behind the thogonalizing measured vectors uniformly and independently sparse representation theory. Given an N-dimensional sig- on a unit sphere. In addition, sensing matrices consisting of nal x, Ψ denotes the sparsifying transform basis for x, where independent and identically distributed (i.i.d) entries drawn throughout this paper we assume that Ψ is an N × N normal- from a Gaussian or Bernoulli distribution also perform well ized satisfying Ψ∗Ψ = NI . So x can easily in both theory and practice [3, 4]. Though the problem has N be decomposed by means of a linear superposition of K ele- been solved in mathematics, there still exist many obstacles mentary components: to overcome. One main drawback of the pure random sensing matrices is that they require huge storage-memory, namely XK M × N entries to recover a length-N signal, and high com- x = fkψk. (1) k=1 putational cost for signal reconstruction. Moreover, the diffi- culty of hardware implementation also makes them expensive which can be rewritten in the form of in practice. as To simplify the sensing scheme, many structured sensing x = Ψf, (2) matrices have been proposed in recent years. In this paper, where f is a length N sparse vertical vector with K nonzero after explaining some terminologies such as restricted isom- values, K  N. Typical transforms Ψ include discrete etry property and coherence, we give an introduction to cou- (DFT), discrete cosine transform (DCT) ples of existing structured sensing matrices, including sub- and Discrete Wavelet Transform (DWT). Sometimes the sampled incoherent bases, random Toeplitz matrices, random number of nonzero values in f is larger than K. In this case demodulator matrices, random matrices, struc- people usually encode the most significant K non-zero entries turally random matrices, convolutional matrices using se- of f and disregard the rest, which is also the core principle quences and some other structured sensing matrices. These of the image compression standard JPEG (using DCT) and matrices have special structures which equip them efficiency JPEG2000 (using DWT). Front. Comput. Sci. 2014, X(X): XXXXX Now assuming a length-N signal x as defined in (1), the satisfy the RIP with parameters (K, δ) (δ ∈ (0, 1)) if 3 data acquisition process can be described as (1 − δ)kfk2 ≤ kΘfk2 ≤ (1 + δ)kfk2, for all f ∈ Γ, (4) y = Φx = ΦΨf = Θf, (3) where Γ represents the set of all length-N vectors with K non- where the measurement y represents an M × 1 sampled vec- zero coefficients. tor, Φ is an M × N measurement/sensing matrix, Θ = ΦΨ. (3) is the kernel equation of the sensing process. The sens- Generally speaking, RIP requires the sensing matrix to act ing process with a random Gaussian measurement matrix Φ as a near isometry on the set of all K-sparse signals. It is con- and a DCT matrix Ψ is illustrated in Figure 1, in which there sistent with the thought of spreading energy behind random are four columns of Ψ that correspond to nonzero fi coeffi- sensing matrices. So measurement y preserves the energy cients; the measurement vector y is a linear combination of that does not shrink or expand too much comparing with the these columns. The CS theory considers problems based on original signal x. If Θ satisfies the RIP, many reconstruction the fundamental equation (3). These problems can be sum- algorithms can be used to recover any K-sparse signal f from marized as how to design efficient sensing matrix Φ, and how M measurements Θf, such as Basis Pursuit (BP) or Matching to recover x given y and Φ. Pursuit recovery algorithms [11,12]. In addition, RIP guaran- Here we focus on the problem of designing proper sens- tees the uniqueness of the reconstruction result f, which does ing matrices. Conventionally, if K entries in y are more im- not hold automatically for some other RIP-related property, portant than other entries, people may capture the signal en- such as the weaker Statistical Restricted Isometry Property ergy roughly and recover the original signal from K measure- (StRIP) [13]. ments, like what we do for recovering natural image from its Because there is no existing algorithm for efficiently ver- frequency spectrum. However, in the CS framework the en- ifying whether a matrix satisfies RIP, people also need the tries of x are assumed sparse and randomly distributed, which coherence property to examine the “quality" of Θ: means people do not know where the large entries locate. In this circumstance x is able to be recovered by exploiting the Definition 2 (Coherence [14,15]). The coherence µ(Θ) is the sparsity from M measurements, M  N and M > K. Be- largest absolute inner product between any two normalized cause only partial M measurements are captured to recover columns of Θ the original signal, a good sensing scheme should spread out the information of the non-zero entries into every measure- µ(Θ) = max |hΘi, Θ ji| (5) 1≤i, j≤N ment yk evenly, in case of losing significant information. Fol- low this intuitive idea, people found that random projection where Θi, Θ j represent two columns of Θ. is one of the best candidates as a sensing matrix [5, 9]. Be- sides, if Φ represents a Gaussian or Bernoulli random opera- If Θ = ΦΨ, the coherence can also be quantified by cal- tor, x can also be faithfully recovered from y using nonlinear culating the maximal correlation among all rows of Φ and all optimization approaches provided that M ≥ O(K log(N/K)) columns of Ψ [3, 4, 10]. These early works made by D. Donoho, E. Can- dès, T. Tao and Romberg established the foundation of the µ(Φ, Ψ) = max |hΦi, Ψ ji| = max |Θ(i, j)|. (6) 1≤i, j≤N 1≤i, j≤N CS theory. Note that for a unitary matrix Φ with Φ∗Φ = NI , the mutual From Eq. (3) an essential question might be raised instinc- N √ tively: apart from the general random operators, what kinds coherence coefficient µ is bounded by 1 ≤ µ(Θ) ≤ N [6]. of sensing matrices Φ are capable to recover x uniquely from When Φ is chosen as the DFT or the Walsh-Hadamard trans- measurements y? Fortunately, two important criteria for eval- form and Ψ is an , µ(Θ) = 1. If Φ is a matrix uating proper operators were created to provide fundamen- of random basis vectors or a matrix of i.i.d. Gaussian entries tal insights into the geometry of sensing matrices. The most N(0, 1), the mutual coherence between Φ and any orthonor-  p  well-known one is often referred as the Restricted Isometry mal matrix Ψ is on the order of O 2 log N with very high Property (RIP): probability, far from the lower bound [16]. Coherence µ is a core concept in constructing CS matrix, and it will be fre- Definition 1 (RIP [5]). An M × N matrix Θ = ΦΨ is said to quently used in the sensing matrix analysis. Kezhi LI et al. State of the Art and Prospects of Structured Sensing Matrices in Compressed Sensing 4 f f

Fig. 1 (a) Compressive sensing measurement process with a random Gaussian measurement matrix Φ and DCT matrix Ψ as sparsifying matrix. (b) Measurement process with Θ = ΦΨ. The original scheme figure is from [7]. We use f to denote the K sparse vector.

− 2πi ω e N N where√ = is a primitive th root of unity in which − i = 1 and " # HN/2 HN/2 3 Structured Sensing Matrices HN = , (9) HN/2 −HN/2

The initial work of CS focused on randomized sensing matri- with an initial matrix H1 = [+1]. The oversampling factor ces, in which the entries of matrices are independently gen- for partial DFT matrix was proved as (log N)6 at first in [5], erated from standard probability distributions. For instance, then was improved to (log N)4 in [17]. Generally speaking, with overwhelming probability, all matrices satisfying ran- the RIP property of sampled unitary matrix is summarized as dom Gaussian/Bernoulli distribution obey the RIP could be following theorems. uniquely recovered from number of measurements M and Theorem 1 (RIP for randomly subsampled unitary matrix [17, 18]). Suppose that the M × N matrix Θ is a randomly M ≥ C · K log(N/K) (7) subsampled unitary matrix, i.e., it can be written as Θ = √1 R U, where √1 is a normalizing coefficient, R is a ran- where C is some constant depending on each instance [4]. As M Ω M Ω mentioned in the introduction, pure random matrices are not dom sampling operator which selects M samples out of N easily applicable to real implementations due to its large stor- ones uniformly at random, and U is an N × N unitary matrix ∗ age and heavy computation. In recent years some structured satisfying U U = NIN . Then Θ satisfies the RIP with high sensing matrices have been proposed. Unlike pure random probability provided that matrices, special constructions make structured sensing ma-  −2 2 4  trices suitable for various applications, and we will introduce M ≥ O δ µ (U)K log N . (10) them chronologically and analyze their performances respec- where δ denotes the restricted isometry constant in defini- tively. tion 1. Theorem 1 implies that the RIP bound of a randomly sub- 3.1 Subsampled Incoherent Bases sampled unitary matrix depends on µ(U). Note that for a uni- √ tary matrix U with U∗U = NI , 1 ≤ µ(U) ≤ N. When For subsampled incoherent base matrices, the most fa- N U is chosen as the FFT or the Walsh-Hadamard transform, mous examples are random subsampled Fourier and Walsh- µ(U) = 1 and by Eq. (10), one has Hadamard matrices. An M × N sensing matrix is constructed   by random selecting rows from an N×N square DFT (or FFT) M ≥ O δ−2K log4 N . (11) matrix F or a Walsh-Hadamard transform (WHT) matrix H, All above bounds are for the uniform reconstruction, respectively. Specifically, which means that once the sampling operator Φ is con-  1 1 1 ··· 1  structed, all sparse signals in a certain basis Ψ can be recov-    1 ω ω2 ··· ωN−1  ered as long as M is sufficiently large. If one fixes x and wants   1  2 4 ··· 2(N−1)  F = √  1 ω ω ω  , (8) to recover it specifically, the problem turns to a non-uniform N  . . . . .   ......  one and this weaker assumption leads to less measurements.   1 ωN−1 ω2(N−1) ··· ω(N−1)(N−1) In detail: Front. Comput. Sci. 2014, X(X): XXXXX N+M−2 5 Theorem 2 (Non-uniform recovery [19]). Assume that Θ is where the entries {ai}i=1 have been drawn independently a randomly subsampled unitary matrix that follows the same from P(a), is also a CS matrix in the sense that it satisfies definition as in Theorem 1. Let f in (3) be a fixed arbitrary RIP of order 3K with high probability for every δ ∈ (0, 1/3) K-sparse signal. Then f can be faithfully recovered from y provided M > C · K3 log(N/K), where C is a constant [21]. using l1 norm optimization, if M satisfies In the technical aspect, the proof of RIP of RTM used the celebrated Hajnal-Szemeredi theorem on equitable col- 2 M ≥ O(µ (U)K log N). (12) oring of graphs to partition an M × 3K Toeplitz-structured submatrix A into roughly O(K2) i.i.d. submatrices having In addition, if we fix f ∈ RN and suppose that the coef- T dimensions approximately equal to O(M/K2) × 3K. By us- ficient sequence f of x is K-sparse in the basis Ψ; select M ing random Toeplitz matrices, only O(N) independent ran- measurements in the Φ domain uniformly at random, then if dom variables are required to generate. Multiplication with M ≥ C · µ2(Φ, Ψ)K log N (13) Toeplitz matrices can be more efficiently implemented using , resulting in faster acquisition and re- for some positive constant C, its l1 norm minimization solu- construction algorithms. In addition, Toeplitz-structured ma- tion is exact with overwhelming probability [6, 20]. For the trices meet the naturally requirement for certain application cases of DFT and WHT matrices, Φ = F or H, the bound of areas such as system identification. Later Haupt et. al. and M holds for O(K log N). The theorems listed here are also Rauhut improved the bound of M to O(K2 log N) [22] and very useful to prove the feasibility of other structured sensing O(K log2(N)) [23], respectively. matrices. Meanwhile, random Toeplitz matrices also have disadvan- Although partial FFT (or WHT) has near-optimal theo- tages. For example, RTM are proved to be able to recover retical guarantee, easy hardware implementation and fast- signals sparse only in the time domain. Their strong struc- computable recovery, its major shortcoming is the lack of the tures make them not suitable for processing signals sparse in universality property. A universal sensing matrix means that other bases, such as DCT domain. the matrix can handle signals that are sparse in any domain. If Φ is a Gaussian , the matrix ΦΨ will remain Gaussian for any unitary transform Ψ. However, if Φ is ran- 3.3 Random Demodulator domly sampled from a FFT, it will not be universal, as µ(FΨ) can not be O(1) for all bases Ψ, eg. when Ψ = F∗, µ(FΨ) will The random demodulation (RD) matrix was proposed by be large. Tropp et. al. in 2010 [24]. Pseudorandom binary sequence are often used to modulate the input signal. Similar imple- 3.2 Random Toeplitz Matrices mentations include Bernoulli or Rademacher random vari- ables. The random demodulator is a sampling system that Because all elements in random matrices are required to sat- can be used to acquire sparse, bandlimited signals in an ana- isfy the i.i.d. random distribution, it becomes natural to raise log model. Fig. 2 displays a block diagram for the RD sys- a question one step further: can we reduce the randomness a tem [24]. It is for a continuous-time signal f whose highest little and achieve a similar reconstruction performance? Ba- frequency is less than W/2 hertz. Tropp et. al. modulated the jwa et. al. first followed this thought to propose random signal by multiplying the signal with a high-rate pseudonoise Toeplitz matrices (RTM) in 2007 [21,22]. In RTM, the entries sequence, which smeared the tones across the entire spec- are independence distributed in one row, while reserve certain trum. Then a low-pass anti-aliasing filter was applied to cap- structure among other rows. Specifically, if a probability dis- ture the signal x by sampling x at a relatively low rate. Sim- tribution P(a) yields an i.i.d. CS matrix (having unit-norm ulations suggested that the RD requires just O(K log(W/K)) columns in expectation) then an M × N (partial) Toeplitz ma- samples per second to stably reconstruct the original signal. trix A (also having unit-norm columns in expectation) of the In mathematics, the random demodulator can be seen as a form linear system that maps a continuous-time signal to a discrete  a a ··· a  sequence of samples. To express the system in matrix form,  N−1 N−2 0   a a ··· a   N N−1 1  let ε0, ε1, ··· , εW−1 be the chipping sequence in a diagonal A =  . . . . , (14)  . . .. .  matrix D, H is an R × W accumulate-and-dump sampler ma-   aN+M−2 aN+M−3 ··· aM−1 trix, where R is the sampling rate. Assume that W is divisible Kezhi LI et al. State of the Art and Prospects of Structured Sensing Matrices in Compressed Sensing 6

Fig. 2 Block diagram for the random demodulator. The components include a random number generator, a mixer, an accumulator, and a sampler (taken from [24]). by R, the overall action of the system is to be random and its energy spreads uniformly across the dis- crete spectrum. If one writes the convolution of x and h into Θ = ΦΨ = HD · Fˆ , (15) the matrix form as Hx, where [25] H = N−1/2F∗ΣF, (19) where  1 1 ···    with F as the discrete Fourier matrix and Σ as a diagonal ma-  1 1 ···    trix whose non-zero elements are the Fourier transform of h. H =  . . . , (16)  ......  The matrix Σ can be generated by    1 1 ···    σ0 there are W/R 1 in each row of H, and    σ1      Σ =  . , (20)  ε0   ..       ε1  σ D =  , (17) N−1  ..   .    where the diagonal entries σw are unit magnitude complex ε − W 1 numbers with random phases as follows: ˆ and F is a W × W permuted DFT matrix with w = 0 : σ ∼ ±1 with equal probability, 1 h i 1 ≤ w < N/2 : σ = e jθw , where θ ∼ Uniform([0, 2π]) Fˆ = √ e−2πi·nw/W , (18) w w n,w W w = N/2 : σN/2+1 ∼ ±1 with equal probability ∗ where n = 0, 1, ··· , W − 1 and w = 0, ±1, ··· , ±(W/2 − N/2 + 1 ≤ w ≤ N − 1 : σw = σN−w, the conjugate of σN−w. 1), W/2. (21) The main advantage of the RD system is it bypasses the From (21) one can see that the action of H on a signal x can need for a high-rate analog-to-digital converter (ADC). It is be broken down into a DFT followed by a randomization of typically much easier to implement demodulation rather than the phases with symmetric constraints, followed by an inverse sampling, thus a low-rate ADC is allowed to use and a more DFT. Fourier optics imaging architecture implementing ran- robust system with low-power can be achieved. In theory, dom convolution followed by randomly pre-modulated sum- the RD guaranteed the recovery of random general signals   mation (RPMS) is shown in Fig. 3. Alternatively, the ran- ∼ O 3 with the sampling rate of R K log W + K log W in the dom sampling process can also be substituted with randomly 6 noiseless case and R ∼ O(K log W) in the noisy case, where pre-modulated summation, which means to break them into C is a positive constant. blocks of size N/M, and summarize each block with a sin- gle number. This action will influence the bound of sufficient 3.4 Random Convolution recovery measurements with a factor of log N. Random convolution is significant since it is deemed as an The random convolution (RC) model was first proposed by efficient data acquisition strategy that can recover noiseless Romberg in 2007 [25, 26]. In the RC, the construction has N-length signals in any fixed representation from O(K log N) two steps. The signal x ∈ RN was circularly convolved with measurements, which is relatively small for structured CS a “pulse” h ∈ RN , then subsampled. The pulse is supposed matrices. The randomness exists in both sampling process Front. Comput. Sci. 2014, X(X): XXXXX 7

!"#$%& '%()&'% () *+,& '%()&'%()() *+,& -%.%/.01 #2& x0 F Σ F ∗ Θ P 1#(-0"&/0(50'670(& H 34,*& Fig. 3 Fourier optics imaging architecture implementing random convolution followed by RPMS. [25]. SLM represents the spatial light modulator. and entries generation, making RC universal (or uniform) to- the random downsampler, fast transform and random diago- wards the choice of signal representation. It is specially im- nal matrix, like random convolution. SRMs provide the prop- portant for signals sparse in unknown bases. erties of universality and hardware implementation friendli- ness for reconstructing sparse signals. 3.5 Structurally Random Matrices 3.6 Structured matrices using sequences Structurally Random Matrix (SRM) is a novel framework of fast and efficient CS introduced by Do et. al. [27, 28]. In the In the most previous work, random sequences have been ex- SRM, the sensing signal is prerandomized by scrambling its ploited to generate sensing matrices. [22, 29] use Bernoulli sample locations for flipping its sample signs and then fast- random sequence. An alternative way is to obtain matrices transforming the randomized samples. The sensing measure- from diagonal unimodular sequences σ with random phases jθ ments are obtained by subsampling the resulting transform [25], i.e., σk = e k , where θk is a random variable that is uni- coefficients finally. The sampling algorithm contains 3 steps. formly distributed in [0, 2π). In [30, 31], σ can be perfect or The diagram is illustrated in Fig. 4. nearly perfect sequences. As shown in Figure 4, the sampling procedure is (i) pre- Different from random sequences, recently many re- randomizing a signal; (ii) applying some fast transform to the searchers adopt deterministic sequences to construct sensing randomized signal; (iii) randomly subsampling the transform matrices. These sequences are generated delicately and many coefficients to get compressed measurements. If decompos- of them have been widely implemented in communication ing the algorithm mathematically as a product of 3 matrices, and coding theory. Because the sequences have determined then the SRM can be represented as [27] the formulation, the sensing matrices based on sequences of- ten have less randomness, and many of them are even deter- A = DFR, (22) ministic [13, 32–35]. Here we only introduce one of them named convolutional matrices using deterministic filter [34] where as an example to have a look how to construct sensing matri- • R, the randomizer, is a random (de- ces employing sequences. noted as the global randomizer) or a random diagonal The sampling operator Φ can be represented as a partial matrix of Bernoulli i.i.d entries (denoted as the local ran- with the following form [34] domizer) 1 • F is some computable transform such as the FFT, the Φ = √ RΩA, (23) DCT, the WHT, ect M • D, the random downsampler, is a matrix composed of where A is a circulant matrix that can be expressed as nonzero rows of a random whose diag- onal entries Dii are i.i.d. binary random variables with  a a ··· a   0 N−1 1  P(Dii = 1) = M/N, where M is the number of measure-  a a ··· a   1 0 2  ments. A =  . . . . . (24)  . . .. .    The reconstruction algorithm can be any l1 norm mini- aN−1 aN−2 ··· a0 mization or greedy pursuit algorithm. SRMs are highly rel- evant for large-scale, real-time compressed sensing applica- For Φ given in (23), the measurement process can be realized h iT tions as they have fast computation and support block-based by circularly convolving x with a filter a = a0 a1 ··· aN−1 processing. Meanwhile, SRMs have theoretical sensing per- and then downsample the output at locations indexed by Ω. formance of O(K log N) measurements for exact recovery, As known the circulant matrix A can be diagonalized using which is comparable to that of completely random sensing FFT. This property enables the convolutional matrix with fast matrices. In the construction of sensing operator, SRMs use computation. It is easy to see that the filter vector a (i.e., the Kezhi LI et al. State of the Art and Prospects of Structured Sensing Matrices in Compressed Sensing 8 Random Input signal FFT,WHT,DCT Pre-randomizer downsampler

Compressed measurements Reconstruction Signal recovery Basis Pursuit

Fig. 4 Block diagram for sampling scheme of SRM [27].

first column of A) can be obtained by taking the inverse FFT other sequences are also employed in the deterministic ma- h iT trix design, such as discrete chirp sequences [13, 32, 34], of sequence σ = σ0 σ1 ··· σN−1 , i.e., Kerdock and Delsarte-Goethals codes [45], Sidelnikov se-

1 ∗ quences [46] and Alltop sequences [13, 47, 48] etc. Deter- a = √ F σ. (25) ministic sensing matrices have fixed constructions, and thus N normally they can not guarantee to recover all signals with σ may adopt various unimodular sequences. The coherence high probability. They are able to recover most signals but an bounds for different sequences are given in Table 1. For real exponential fraction with high probability. Some papers fo- sensing matrices A, the diagonal sequence needs to be conju- cus on the problems of designing sensing matrices that lead gate symmetric, shown as extended sequences in Table 1. to good (expected-case) mean squared error (MSE) perfor- Using the uniform and non-uniform theorems, the coher- mance rather than the worst case [49]. For more informa- ence bounds reveal that M ≥ O(δ−2µK log4 N) measurements tion regarding sensing matrices the readers may refer to ref- are enough for uniform recovery and M ≥ O(δ−2µK log N) erences [18, 50] and CS website [51]. for non-uniform recovery, where δ denotes the restricted isometry constant. These convolutional matrices are not uni- 3.8 Relations Between Structured Sensing Matrices versal, while they show the effectiveness for signals sparse in both the time and frequency domain. When σ is the Frank- The sensing matrices introduced in this section are not de- Zadoff-Chu (FZC) sequence, the corresponding sensing ma- veloped independently. They are associated with each other. trices are also capable for recovering signals sparse in the Subsampled Fourier and Hadamard matrices were firstly DCT domain. The number of measurements in regard to dif- proved as the qualified structured sensing matrices. They be- ferent sequences can be calculated easily from Table 1 and long to the subsampled incoherent bases group of matrices. Theorem 1, 2. Random Toeplitz matrices are very famous and significant to many applications, such as channel estimation [22] and sys- 3.7 Other Sensing Matrices tem identification [40]. The randomness exists in each row while between rows they have strong structure. In real time Many other sensing matrices were developed in recent years. signal processing the modulation idea has been widely imple- To accelerate the computational speed for large data, block mented, which is also used in random demodulator. The cele- structures were introduced for Gaussian matrix [36], Toeplitz brated random convolution is actually a specially modulation matrix [37], [38] and SRM [27] etc. The of signals in the Fourier domain. Moreover, SRM are a group block structure means the sensing matrices have the follow- of structured matrices generated from an approach based on ing form with structured matrices as blocks Ai, i = 1, ··· , l. random convolution but with Bernoulli diagonal phase mod- ulation for signals in more flexible domains. Finally, struc-  A   1  tured and deterministic sensing matrices using sequences are  A   2  analyzed as a new sub-area in sensing matrix design. A =  . , (26)  ..  Practically people may utilize different structured sensing   Al matrices according to the sensing models and hardware con- straints. For instance, if one needs structured matrices to Block-based sensing has more advantageous for realtime model the 1-dimensional convolution in sensing processing, applications since the encoder does not need to send the sam- random Toeplitz or Circulant matrices are employed due to pled data until the whole signal is measured. Besides struc- the natural of the convolution calculation. In addition, in the ture sensing matrices, the sensing matrices can even be de- same model if the objective signal is sparse in the Fourier do- terministic. Various deterministic matrices have been intro- main and the phases of the modulated signal can be symmet- duced in [13, 32, 39–42]. Comparing to structured sensing rical, random convolution are suitable to solve this problem matrices, deterministic sensing matrices has fixed forms and accelerated by fast algorithms. When the phases can only there is no randomness in the construction. Specifically, sec- be modulated as ±1 and should be determined in advance ond order Reed-Muller codes are used in [13,43] and dual of for hardware reasons, convolutional CS matrices using deter- extended binary BCH codes are exploited in [13, 44]. Many ministic sequences are recommended with the price of more Front. Comput. Sci. 2014, X(X): XXXXX σ N µ(A) 9 FZC Arbitrary 1 q k 1 m-sequence 2 − 1, k ∈ N 1 + N Complex matrices q N ≡ 3 (mod 4) and N prime 1 + 1 Legendre sequence N N ≡ 1 (mod 4) and N prime 1 + √1 √ N κ1 κ2 κ3 Golay sequence 2 10 26 , κ1, κ2, κ3 ∈ N 2 Even N 4 + √4 Extended FZC N Odd N 2.69 + 8√.15 Real matrices  N κ1 κ2 κ3 1 Even N, N = 2 10 26 , κ1, κ2, κ3 ∈ N 2 1 + √ Extended Golay N Odd N, N = 2κ1 10κ2 26κ3 − 1, κ , κ , κ ∈ N 1 + √2 1 2 3 N

Table 1 Coherence parameter µ(A) for different diagonal sequences σ number of measurements M comparing with that of random MRI scanners sequentially sample the human’s body in the convolution. If fixing the entire sensing scheme, the sensing 2-D continuous Fourier domain, and sensed coefficients sat- matrices will be deterministic and there is no randomness in isfy the sparse property which is also the prerequisite of the construction. In this case deterministic matrices constructed theory of CS. Moreover, MRI is very time costly. In order from coding theory are the only candidates, and usually they to obtain a clearer image, one often needs a long time to col- are with strict size constraints. In general, there is a tradeoff lect the data. However, the speed of data collection is limited between randomness and number of measurements M. Less by physical and physiological constraints. Applying the CS randomness facilitates the sensing scheme, however it often technique may accelerate the scanning process with the same leads to more measurements and consequently longer sensing accuracy due to fewer CS measurements being required. The time. schematic diagram of MRI using CS is shown Fig. 5 (a).

Another category of the application involves the design of new acquisition hardware that is able to acquire projec- 4 Applications of Structured Sensing Matrices tions of a signal against a class of vectors. In this case, the sensing process is accomplished by physical optical instru- Essentially CS theory can be recognized as a data processing ments, and the research normally focuses on the problem of technique that recovers sparse data from under-determined how to design sensing matrices whose entries belong to some equations. The advantage of CS is to process sparse signals patterns/bases that can be easily implemented on the hard- that can not be processed appropriately before, or obtain the ware. One example is the framework of recovering an im- compressed data using proper physical instruments directly. age based on optical modulators, known as the single pixel Fortunately most of the signals in the real world that people camera shown in Fig. 5 (b) [58]. The digital micromirror are interest in belong to sparse signals or can be approximated device (DMD) is a reflective spatial light modulator that se- in certain domain. So from its emergence CS has been imple- lectively redirects parts of the light beam [59]. The DMD mented in numerous applications including communications, is comprised of an array of bacterium-sized, electrostatically machine learning, imaging, geophysical data analysis, radar, actuated micro-mirrors, and each mirror rotates about a hinge remote sensing, data streaming, quantum state tomography, and can swing between two stages +10o or −10o. The state and so on. For instance the matrices mentioned previously, of each mirror depends on the bit loaded in the correspond- the Toeplitz matrices are quite suitable for communication ing position of the programmable sensing matrix, and many channel estimation [22]; random demodulators are designed structured sensing matrices may be implemented in this sce- for sampling of sparse wideband analog signals [24,52]; ran- nario. People have tested that the system works well when dom convolution matrices can be exploited in radar imag- matrix entries are drawn randomly from a fast transform such ing [25]; also, the validity of SRMs has been verified in image as a Walsh Hadamard transform [60]. Many advanced imag- processing [27, 28]; convolutional matrices using sequences ing hardware architectures based on the single pixel camera have widely applications in communication and signal pro- model have been developed after these techniques mature, cessing [34,35,53,54]. Apart from these works, here we sim- e.g. in terahertz imaging [61, 62]. With regard to real ap- ply introduce two celebrated applications of CS, in medical plications, actually it is not trivial to decide which strategy imaging and single pixel camera. or structured sensing matrices we shall use. Because differ- A promising application for compressed sensing is in re- ent matrices have their own features and performances, we ducing the sampling rate in magnetic resonance imaging have to investigate the practical scenarios and make the trade- (MRI) [55–57]. The main motivation of CS MRI is that, off between number of measurements M, universality or not, Kezhi LI et al. State of the Art and Prospects of Structured Sensing Matrices in Compressed Sensing 10

Figure 1: Illustration of the domains and operators(a) used in the paper as well as the requirements (b) of CS: sparsity in the transform domain, incoherence of the undersampling artifacts and the need for nonlinearFig. 5 reconstruction(a) Illustration that of enforces the domains sparsity and operators used in [57] as the requirements of CS: sparsity in the transform domain, incoherence of the under- sampling artifacts and the need for nonlinear reconstruction that enforces sparsity. (b) Diagram of the single pixel camera. The image x is reflected off a digital micro-mirror device (DMD) array whose mirror orientations are modulated in the pseudorandom pattern supplied by the random number generator (RNG) [58].

hardware constraints, computation and so on. is equivalent to a consisting of the transmit- 32 ting signals multiplying the system impulse response func- tion. That’s the reason why Toeplitz CS matrices could be utilized in sparse channel estimation [22]. In [52] the authors 5 Prospects And Future Works proposed a practical sampling system called modulated wide- band converter (MWC) by adopting periodic waveforms, a As the key research area of the encoding part of the com- low-pass filter and a low rate sampler. They proved that pressed sensing theory, the research of structured sensing ma- perfect recovery of multi-bandlimited signals from the pro- trices is really important and has attracted more and more posed samples can be achieved under certain necessary and attention in recent several years. Although in literatures peo- sufficient conditions. In mathematics, the sampling process ple have proposed many structured sensing matrices, matri- can be reformed as a structured sensing matrices y = SFD¯ , ces with special structures are deadly needed with regard to where SFD¯ represent the sign pattern matrix, reorder Fourier special settings or hardware requirements. matrix and diagonal matrix, respectively. This matrix struc- Generally speaking, the future development of sensing ture comes from the hardware design, and it performs well matrices will focus on two aspects. The first one is to use less in practice [52]. In addition, the structured sensing matri- randomness and less memory storage. For instance, compar- ces were also implemented in statistical physics, such as the ing with full random matrices, more sparse sensing matrices seeding matrix with coupling block diagonal structure. This with certain structure have and will be exploited to reduce work was proposed in [65] for a framework named seeded the calculation in CS [63]. The structure of a network also compressed sensing. Krzakala et. al. proved that in their can be embodied in a matrix revealed by a one-to-one corre- model the experimental recovery results approached the the- spondence with an expander graph [64]. The second aspect is oretical limit for large systems. To sum up, people will con- to design sensing matrices satisfying certain structure in re- tinue to work on pursuing various structured sensing matri- ality. This will be the main motivation for developing more ces with less randomness/measurements, better performances structured sensing matrices. Take several examples to illus- and hardware friendly property cooperating knowledge from trate it. In the communication system the convolution process Front. Comput. 