Structured Sparse Representation with Union of Data-Driven Linear

Home , Linear subspace

5062 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017 Structured Sparse Representation With Union of Data-Driven Linear and Multilinear Subspaces Model for Compressive Video Sampling Yong Li, Student Member, IEEE,WenruiDai, Member, IEEE, Junni Zou, Member, IEEE, Hongkai Xiong, Senior Member, IEEE, and Yuan F. Zheng, Fellow, IEEE

Abstract—The standard compressive sensing (CS) theory can be be represented by a linear combination of a few column vectors improved for robust recovery with fewer measurements under the from the dictionary. In a certain sense, accompanied by a cor- assumption that signals lie on a union of subspaces (UoS). However, responding dictionary, sparsity has a close connection with the the UoS model is restricted to specific types of signal regularities concept subspace spanned by its few column vectors. Recently, with predetermined topology for subspaces. This paper proposes it has been proved that a trained dictionary is more effective a generalized model which adaptively decomposes signals into a union of data-driven subspaces (UoDS) for structured sparse rep- than a predefined one (e.g., wavelets [1]) for many tasks such resentation. The proposed UoDS model leverages subspace clus- as image denoising [2]Ð[4], compressive sensing [5], [18] and tering to derive the optimal structures and bases for the subspaces classification tasks [6], [7]. The dictionary directly learned from conditioned on the sample signals. For multidimensional signals input signals is shown to provide more adaptivity and sparsity with various statistics, it supports linear and multilinear subspace than the off-the-shelf bases. learning for compressive sampling. As an improvement for generic As an application of sparsity, compressive sensing (CS) is CS model, the basis which represents the sparsity of sample signals a desirable framework for signal acquisition [8]Ð[10]. CS at- is adaptively generated via linear subspace learning method. Fur- tempts to acquire the dictionary-based sparse representation for thermore, a generalized model with multilinear subspace learning unknown signals by randomly projecting them onto a space is considered for CS to avoid vectorization of high-dimensional (observation) with much lower dimension. Recently, CS has signals. In comparison to UoS, the UoDS model requires fewer degrees of freedom for a desirable recovery quality. Experimental been widely adopted in video acquisition and reconstruction results demonstrate that the proposed model for video sampling is [11]Ð[18]. In comparison to conventional video acquisition and promising and applicable. compression approaches, CS can relieve the burden of video encoder by reducing the number of measurements to be sam- Index Terms—Structured sparsity, compressive video sampling, pled. At the decoder side, its recovery quality can be guaranteed union of data-driven subspaces, tensor subspace. with effective reconstruction methods based on a sparse representation with certain basis. Wakin et al. [11] first applied CS I. INTRODUCTION into video acquisition, which jointly made compressive sam- PARSITY is widely concerned in many areas such as statis- pling and reconstruction for video sequences with 3-D wavelet tics, machine learning and signal processing. A vector ad- transform. Recognizing that compressive sampling was inade- mitsS a sparse representation over a basis (or dictionary) if it can quate to the entire frame, a block-based CS (BCS) method [12]) utilized CS for non-overlapping blocks to exploit the local spar- Manuscript received November 16, 2016; revised May 4, 2017; accepted June sity within the DCT domain. Later, temporal correlations were 10, 2017. Date of publication June 29, 2017; date of current version July 24, exploited to improve BCS. The distributed BCS framework [13] 2017. The associate editor coordinating the review of this manuscript and ap- approximated each block by a linear combination of blocks in proving it for publication was Prof. Xin Wang. The work was supported in part by previous frames. Liu et al. [14] developed an adaptive CS strat- the National Natural Science Foundation of China under Grant 61501294, under egy for blocks in various regions of independent motion and Grant 61622112, under Grant 61529101, under Grant 61472234, under Grant 61425011, in part by China Postdoctoral Science Foundation 2015M581617, texture complexity. BCS-SPL framework [15] made smooth and in part by Program of Shanghai Academic Research Leader 17XD1401900. projected Landweber (SPL) reconstruction by incorporating (Corresponding author: Hongkai Xiong.) motion estimation and compensation. Chen et al. [16] combined Y. Li and H. Xiong are with the Department of Electronic Engineering, multi-hypothesis prediction with BCS-SPL to enhance the re- Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: marsleely@ sjtu.edu.cn; [email protected]). construction performance. The approximate message passing W. Dai is with the Department of Biomedical Informatics, University of (AMP) reconstruction based on the dual-tree complex wavelet California, San Diego, CA 92093 USA (e-mail: [email protected]). transform was adopted in [17]. Liu et al. [18] improved J. Zou is with the Department of Computer Science and Engineering, the recovery performance of BCS framework by introducing Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: zou-jn@cs. Karhunen-Loeve` transform (KLT) basis in the decoder. To over- sjtu.edu.cn). Y. F. Zheng is with the Department of Electrical and Computer Engineer- come the deficiency of sampling high-order signals, multidi- ing, The Ohio State University, Columbus, OH 43210 USA (e-mail: zheng@ mensional CS techniques [28]Ð[32] have been developed for ece.osu.edu). practical implementations. However, these methods assume that Color versions of one or more of the figures in this paper are available online signals live in a single vector or tensor space, which ignore the at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2017.2721905 structures within the sparse coefficients.

1053-587X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information. LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5063

To capture the underlying structures for recovery, the union of tensors and adaptively derives the basis for each tensor sub- subspaces model [19]Ð[21] has been established to significantly space with multilinear subspace learning (MSL). Moreover, it reduce the number of measurements. In [19], a general sampling relieves the large-scale sensing matrices led by vectorization of framework was studied, where sampled signals lived in a known high-dimensional signals and preserves the intrinsic structures union of subspaces with linear operators. Under the notion of in the original signals. Therefore, CTS is efficient and prac- block restricted isometry property (RIP), robust block sparse tical for sampling high-dimensional signals in comparison to recovery with a mixed 2 /1 programming was developed for previous schemes in literature. the UoS model in [21]. Blumensath [20] demonstrated that pro- The rest of the paper is organized as follows. Section II jected Landweber algorithm could recover signals from UoS for provides the background and motivation of UoDS. Section III all Hilbert spaces and unions of such subspaces, as long as the proposes the UoDS model for compressive video sampling, in- sampling operator satisfied bi-Lipschitz embedding condition. cluding subspace clustering, linear subspace learning and stable The UoS model exploits the structural sparsity like tree-sparsity recovery. Furthermore, the UoDS model is generalized to the and block-sparsity [22], [23] to span subspaces with DCT and multilinear subspaces in Section IV. Section V provides the wavelet bases. However, it is restricted to signals with varying experimental results for validation in terms of SR-PSNR per- signal regularities, especially video sequences. formance and visual quality. Finally, Section VI concludes this Furthermore, multi-linear subspaces (tensor subspaces) learn- paper. ing (MSL [33]Ð[35], [37]) was adopted under the assumption that high-dimensional signals live in a tensor product of sub- II. DEFINITION AND MOTIVATION spaces. It is due to the fact that a predetermined vectorization of tensor data can obscure the statistical correlations among sam- To motivate the proposed UoDS model, we begin with a ples and discard important structural information. Tensor sub- review of the compressive sensing models, including both single space analysis (TSA [33]) detected the intrinsic local geometric subspace model and union of subspace model. structures of the tensor space to generate a representation matrix for images. In the meantime, generalized low rank approxima- A. Single Subspace Model tions of matrices (GLRAM [34]) iteratively made bi-directional { } linear projecting transform for feature extraction of images, Given an orthonormal basis Ψ= ψi ,ann-dimensional sig- x ∈ Rn x c which outperformed the traditional SVD-based methods. For nal has a sparse representation in the form of =Ψ , where c is the representation vector with k nonzero components. efficient tensor subspace representation, multilinear principal x component analysis (MPCA [35]) framework suggested fea- This fact implies that lives in a k-dimension single subspace ture extraction with dimensionality reduction to model a major spanned by the k basis vectors corresponding to the nonzero part of input signals. Recently, L1-norm-based tensor analy- components in Ψ. Under the assumption of simple sparsity, this sis (TPCA-L1 [37]) replaced Frobenius norm and L2 norm with single subspace model is widely adopted in traditional compres- y ∈ Rm L1-norm to formulate the problem of tensor analysis with the ro- sive sampling. Thus, the measurement is obtained by linear sampling y =Φx based on a sensing matrix Φ ∈ Rm ×n bustness to outliers. Unfortunately, their performance for video sequences is degraded, as they failed to consider the temporal with kset of patches for learning. The λ Λ non-reference frames are sparsely represented and recovered where Λ is a list of indices, and Sλ is a subspace of Hilbert space with the derived UoDS model. Its invertibility and stability H spanned by a predefined basis, e.g., DCT and wavelet basis. of compressive sampling have been demonstrated under the The UoS model considers block-sparse structure [22], [23] in the block restricted isometric property (RIP) condition. Moreover, signals as shown in Fig. 1, where the representation vector cT is some ingredients, which affect the performance of the proposed segmented into t blocks [c[1]T , ··· , c[t]T ]. c is called k-block- model, are discussed including subspace clustering comparison, sparse, when at most k blocks c[i] ∈ Rdi are nonzero. For a number and dimension of linear subspaces. stable recovery, block restricted isometric property (Block-RIP) Furthermore, the UoDS model is generalized to the multi- is imposed on A, which guarantees to recover c with the convex linear (tensor) subspaces to maintain the structural information formulation minimizing a mixed 2 /1 norm of signals by avoiding vectorization. We first extend the CS min c , s.t. y = Ac, (3) theory to higher-order cases with data-driven basis to make c 2,1 signals compressible. The proposed compressive tensor sam- c t c c ≥ ≤ ≤ pling (CTS) method represents signals with the product of where 2,1 i=1 i 2 with i 2 0 for 1 i t. 5064 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

U ∗ to recover NRFs with learned structural information. In Section III-B, the UoDS model incorporates subspace clustering to adaptively group their patches based on their spatio-temporal structures. The overcomplete basis Ψ∗ = {Ψ∗[i]} for recovery is adaptively derived for each subspace with PCA-based learning methods, as shown in Section III-C. On the other hand, NRFs are sparsely represented with a low sampling rate. For stable recovery, NRFs are reconstructed based on RFs to maintain the spatio-temporal consistency in video sequences. Hence, the trained overcomplete basis Ψ∗ is adopted to utilize the dependencies and structures within representation vectors c. For each reconstructed patch x∗,the block-sparse vector c∗ is recovered by optimizing Eq. (15) in Section III-D. Consequently, x∗ can be obtained based on the subspace consisting of patches with similar structures in U ∗.A detailed description of the proposed UoDS model can be found Fig. 1. The comparison of the single subspace model and union of subspaces in Algorithm 1 in Section III-D, which considers both sampling model. As shown in this ﬁgure, the basic (simple) sparsity can be induced from data-driven basis based on single subspaces model. While based on union of and reconstruction for the UoDS model. subspaces model, the block (group) sparsity can be induced from predeﬁned basis. B. Subspace Clustering for UoDS

Denote X =[x1 , ··· , xK ] the set of K vectorized patches extracted from RFs. In this section, we incorporate sparse sub- C. Motivation space clustering (SSC [26]) to segment X into t clusters that nat- High-dimensional signals are characterized by various types urally correspond to the underlying subspaces. SSC is suitable of signal regularities. Taking video sequences for example, their for training-based recovery, as it is rooted on the fact that each spatial correlations within a frame are represented by regu- point in the union of subspaces has a sparse representation based lar textures and structures, while the temporal correlations are on a trained dictionary constrained by the self-expressiveness characterized by the motion trajectory of contents among neigh- property over the training set. Consequently, the representa- n boring frames. Consequently, most standard compressive video tion vector ci of the vectorized patch xi ∈ R can be obtained sampling methods treat the processing of video as the processing over X of a set of patches with single subspace model. min ci 1 s.t. xi = Xci , ci[i]=0. (4) Instead of assuming high-dimensional signals lying on a sin- ci gle subspace, their sparse representation can be improved by Here, the extracted patches are overlapped to guarantee the considering intrinsic structures among representation vectors. smoothness of reconstructed frames. The non-zero coefficients Under the independent assumption for c, the single subspace of c correspond to patches from the same subspace. Fig. 3 model obscures the dependencies and structures within the rep- i provides an example for the generation of representation vector, resentation vector c. Thus, it cannot utilize various types of where c is obtained over X with K = 1505. It shows that the prior knowledge for high-dimensional signals to make a sparse 122 structure of x is similar to those of x , x , x , and x , representation and stable recovery. 122 34 78 166 210 which implies that they are lying on the same subspace. Eq. (4) As an alternative, the UoS model projects such structures can be rewritten in matrix form by collecting all the patches onto a union of low-dimensional linear subspaces while still in X preserving its information. It improves the applicability and efficiency of the single subspace model, especially for signals min C1 s.t. X = XC,diag(C)=0, (5) with noise and in the presence of under-sampling. However, it C is still rigid to generate the geometry of a union of subspaces K ×K where C [c1 , ··· , cK ] ∈ R is a matrix of coefficients to and describe the structure of each subspace with a predefined generate the balanced similarity graph. basis spanning. Thus, we propose a generalized model which For a valid representation of the similarity, a weighted bal- adaptively endows structures and dependencies to sample high- anced similarity graph G =(V, E, W) is established based dimensional signals with non-stationary statistics, especially for on the normalized columns ci/ci ∞ of C, where V is the patch-based video sampling. set of vertices corresponding to the K vectorized patches x1 , ··· , xK , and E is the set of directed edges (vi ,vj ) III. UNION OF DATA-DRIVEN SUBSPACES (UODS) MODEL indicating that xj is one of the vectorized patch in the sparse representation of x . Taking Fig. 3 for example, A. The Proposed Framework i since x122 =0.39 x78 +0.22 x166 + ···, there exist two edges Fig. 2 depicts the proposed framework of compressive video (v122,v78) and (v122,v166) with weights c122[78] = 0.39 and sampling with the UoDS model. For efficient sampling and sta- c122[166] = 0.22, respectively. To ensure that G is balanced, we ble recovery, a video sequence with group of pictures (GOP) is adopt the adjacency matrix W = |C| + |C|T ∈ RK ×K with its T decomposed into the selected reference frames (RFs) and the elements wij = |ci[j]| + |cj [i]| . As shown in [24], each con- remaining non-reference frames (NRFs). RFs are fully sam- nected component in G suggests the vertices representing the pled and recovered with a trivial loss of quality. Therefore, they vectorized patches in the same subspace. Fig. 4 provides an ex- are suitable for generating a union of data-driven subspaces ample of the similarity graph with four connected components LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5065

Fig. 2. The proposed union of data-driven subspaces (UoDS) model for compressive video sampling.

Algorithm 1: UoDS model for compressive video sampling and reconstruction. Task: Sample and reconstruct a video with each group of N frames {Xi },i=1, ··· ,N. Initialization: Generate a random Gaussian matrix Φ ∈ Rm ×n ,setX1 as RF and X2 , ··· ,XN as NRFs. Sampling: Fully sample non-overlapped blocks x1 in X1 , and compressively sample those blocks in X2 , ··· , XN with Eq. (13). Reconstruction: Step 1: Reconstruct each block xˆ1 to obtained recovered RF Xˆ 1 Step 2: Do SSC to the patches set X of Xˆ 1 and clustering X into t clusters X1 , ··· ,Xt Step 3: for j =1to t do Ψ∗ S∗ X[j] Obtain j for j by LSL over with Eq. (8) Fig. 3. The representation vector c122 of 16 × 16 patch x122 over a training or (9); set X of K = 1505 patches extracted from the ﬁrst frame of Foreman sequence. end for ∗ ∗ ∗ Step 4: Concatenate Ψj to form the dictionary Ψ =[Ψ1 , spectral graph theory to segment X by applying K-means to ··· ∗ , Ψt ] the eigenvectors of the Laplacian matrix L. The connected com- Step 5: for i =2to N do ponents of G can be determined from the eigenspace of the ∗ Recover the block-sparse vector c by Eq. (15) and zero eigenvalue. Based on the t eigenvectors corresponding to t reconstruct each non-overlapped block of Xi by smallest eigenvalues, X is segmented into t clusters {X[i]}t ∗ ∗ i=1 x =Ψc , with K-means. For X drawn from the underlying subspaces, then assemble them back to form the recovered NKF each cluster corresponds to a low-dimensional linear subspace Xˆ i S∗ G i , as the similarity graph has the t connected components end for corresponding to these subspaces. Therefore, U ∗ is a union of Return Xˆ 1 , ··· , Xˆ N S∗ low-dimensional linear subspaces by collecting all t clusters i . t x ∈U∗ S∗ for the UoDS model. It shows that vectorized patches in one i . (6) connected component have similar structures. i=1 Denote L = D − W the Laplacian matrix of W, where Remarkably, the number of the underlying subspaces can be D L is a diagonal matrix with Dii = j wij . We leverage estimated as the number of zero eigenvalues of . Thus, an 5066 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

Fig. 4. The balanced similarity graph G with four connected components, where each subspace in U ∗ forms one component in G.

Fig. 6. The performance comparison between UoDS-SSC and UoDS-LRR for Football sequence with various numbers of subspaces t. UoDS-SSC and UoDS- LRR denote the proposed UoDS incorporated with SSC and LRR, respectively. SR denotes sampling rate.

Fig. 5. Subspace clustering for the UoDS model. Vectorized patches are S∗ segmented into three low-dimensional linear subspaces, where 1 is a bi- C in LRR dimensional plane and the dimension of S∗ and S∗ are one. For example, ∗ 2 3 ∗ blue points like x1 and x2 live in S spanned by basis Ψ , and green and red ∗ 1 ∗ 1 ∗ C X XC C points like x3 and x4 live in S spanned by basis Ψ and S spanned by basis min ∗ s.t. = ,diag( )=0. (7) ∗ 2 2 3 C Ψ3 , respectively. For simplicity, we use UoDS-SSC and UoDS-LRR for the proposed UoDS incorporated with SSC and LRR, respectively. adaptive clustering of vectorized patches can be achieved, even Fig. 6 compares UoDS-SSC and UoDS-LRR in terms of re- though the number of their underlying subspaces are unknown. covery performance. It is obvious that UoDS-SSC outperforms Fig. 5 shows a conceptual diagram for clustering the vectorized UoDS-LRR under different numbers of subspaces. SSC derives patches into three low-dimensional linear subspaces. We first “cleaner” latent matrix C in comparison to LRR. Thus, SSC solve Eq. (4) to each vectorized patch xi and thus solve Eq. (5) to can yield more accurate clustering results when t is known, X. Later, we can derive a balanced similarity graph G similarly which performs better to fit the signals with the derived union as shown in Fig. 4 but with three underlying connected com- of data-driven subspaces U ∗. ponents. Finally, we employ K-means algorithm with K =3to 2) Number of Subspaces Discussion: Furthermore, we stud- the eigenvectors of the Laplacian matrix L of the balanced simi- ied the number t of subspaces in the union U ∗ for its effect on larity graph G, and we get the final clustering result as shown in the accuracy of the UoDS model. As shown in Fig. 6, when the Fig. 5 with blue point x1 ,x2 are clustered together into common number of subspaces increases, the recovery accuracy increases S∗ ∗ underlying data-driven subspace 1 spanned by basis Ψ1 with rapidly at first and then slightly fluctuates around a certain max- S∗ two basis vectors, green and red points like x3 and x4 into 2 imum value. It is because that t affects the distribution of these ∗ S∗ ∗ U ∗ spanned by Ψ2 with one basis vector and 3 spanned by Ψ3 with data-driven subspaces in . When t is small, different under- one basis vector, respectively. Because the texture complexity lying subspaces tend to merge into one cluster, which leads to ∗ for patches x1 ,x2 is larger than x3 and x4 , the dimension of S a blend of structural information in these subspaces. Thus, this ∗ ∗ 1 is larger than S2 and S3 . inaccurate clustering would produce a poor recovery accuracy. 1) Subspace Clustering Comparison: We compared SSC On the other hand, a large t makes these subspaces indepen- with another popular subspace clustering method, low-rank rep- dent as we assume. However, the clustering accuracy will be resentation (LRR) [25]. Different from SSC, the nuclear norm suppressed and the computational complexity will increase with C∗ (sum of singular values) is used to minimize the rank of the growth of t. LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5067

representation vectors of the same signal lying in the UoDS spanned by PCA and PCA-L1 basis, respectively. 1) Subspace Dimension Selection: Since dimensions of linear subspaces depend on the structures of their patches, they should be determined adaptively and independently. In the UoDS model, we employ the average gradient to measure the texture complexity for patches in each subspace, which is asso- ciated with the cost of representing the structures like textures and edges. Linear subspaces are classified into three categories Fig. 7. PCA-based learning for bases of 10 clusters. Each cluster corresponds with refined candidate dimensions for various texture complex- to a subspace and its dimension is fixed to 5. PCA is performed for each cluster ity. The classification is based on adaptive calculation of average X separately. gradient gi for [i] K 1 i g = g(x ), x ∈ X[i], (10) i K j j i j=1

where Ki is the number of patches in cluster X[i], and g(xj ) is the average gradient for patch xj . Assuming that x is the vectorized version of patch F ∈ Ra×a , we can obtain g(x) by − − 1 a 1 a 1 (F −F )2 +(F −F )2 g(x)= i,j i+1,j i,j i,j+1 . (a−1) 2 2 i=1 j=1 (11) Fig. 8. Representation vector c∗ obtained based on bases trained by PCA and PCA-L1. Dimension of each subspace is 10. Therefore, the dimension of each subspace is selected according to the three categories. ⎧ ≤ ⎨⎪dh ,T2 gi C. Linear Subspace Learning for UoDS ≤ di = dm ,T1 gi g X {X }t l 1 i derived based on the clustered training set = [i] i=1. Principal component analysis (PCA) is leveraged for each where T1 and T2 are related with the minimum and maximum X ∗ cluster [i] to learn the basis Ψi for corresponding linear of the average gradients. S∗ subspace i independently, as shown in Fig. 7. Therefore, Ψ∗ =[Ψ∗ , Ψ∗ ,...,Ψ∗] is the derived basis for U ∗. This fact 1 2 ∗ t× D. Stable Recovery ∈ Rn di implies that Ψi spans the di -dimensional subspace S∗. In comparison to the UoS model, dimensions and bases of Stable recovery is significant for patches with a sparse repre- i sentation in NRFs. Linear sampling in the UoDS model is based these subspaces are conditioned on structures embodied by their A∗ patches. Fig. 5 shows that patches from plain backgrounds lie in on the block diagonal measurement matrix a lower-dimensional subspace than those patches from regions y =Φx =ΦΨ∗c∗ = A∗c∗, (13) with dramatic change of details. Furthermore, Ψ∗ is nonlocal, t as it is learned from the complete frame rather than a fixed local where Φ is an i.i.d. random sensing matrix, r = di , and ⎡ ⎤ i=1 window of pixels. A∗ , To sufficiently capture the patch-based structures, Ψ∗ ∈ Rn×r 1 ⎢ A∗ ⎥ t ⎢ 0 2 , ⎥ is an overcomplete basis with r = i=1 di >n.Itissolved A∗ ⎢ ⎥ X ∗ VT = ⎢ . . ⎥. based on singular value decomposition (SVD) [i]=Ψi Σ ⎣ .. .. ⎦ ∗ ∗ T T A X I 0 t Ψi = arg max Ψi [i] 2 , s.t. Ψi Ψi = di . (8) Ψ i The UoDS model inherits the merit of UoS model, which Instead of classical PCA algorithm, we utilize the PCA-L1 al- implies that input signals have a block-sparse representation gorithm [27] to obtain a subspace that is robust to outliers as over Ψ∗. When these subspaces are disjoint or independent, well as invariant to rotations c∗ is a 1-block-sparse vector that is more sparser than c in Eq. (3). Fig. 8 shows the sparsity of the proposed model for one li c∗ ∗ T X | | selected block in . For a complete evaluation, the uniqueness Ψi = arg max Ψi [i] 1 = arg max ψiqxij Ψ i ψ iq and stability conditions are derived in this section. j=1 S∗ S∗ S∗ Denote ij = i j the convex hull of a set of two data- ∗ ∗ ∗ T driven subspaces S and S and kmax = maxi= j dim(S ) its s.t. Ψi Ψi = Ili ,q=1,...,di , (9) i j ij maximum dimension. In Proposition 1, we demonstrate the in- ∗ ∗ ∗ ∗ U ∗ → Rm where q is the index of vectors in Ψi =[wi1 ,wi2 ,...,wid ], and vertibility of the linear sampling operator Φ: . i U ∗ → Rm li is the number of patches in X[i]. The (sub-)optimal solution Proposition 1: Linear sampling operator Φ: is ∗ U ∗ ≥ to Ψi is obtained by the greedy algorithm. Fig. 8 shows the invertible for if m kmax. 5068 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

Proof: Please refer to Appendix A. Proposition 1 provides the minimal number of samples required to guarantee a stable reconstruction for the UoDS model. It suggests that kmax samples are required for invertible representation based on the UoDS model. Substituting Φ and x with A∗ and the k-block sparse vector c∗, we can draw Proposition 2. Proposition 2: The measurement matrix A∗ is stable for ev- ery 2k-block sparse vector u if and only if there exists C1 > 0 and C2 < ∞ such that

u2 ≤A∗u2 ≤ u2 C1 2 2 C2 2 , (14) Fig. 9. Visual illustration of the n-mode unfolding of Tensor I1 × I2 × I3 X ∈ R ,X(n ) is the n-mode unfolding matrix, n =1, 2, 3. u c∗ − c∗ where = 1 2 . Proof: Please refer to Appendix A. A. Generalized Formulation Proposition 2 ensures the stable sampling and recovery of Φ Given arbitrary tensor X, it can be decomposed according to in the UoDS model. In fact, it imposes the classical conditioning standard multilinear algebra. requirements on Gram matrices A =(<ψi ,φj >)1≤i≤m,1≤j≤n {φ }m {ψ }n (1) (2) (N ) between the sets of vectors Φ= i i=1 and Ψ= j j=1. X =Θ×1 Ψ ×2 Ψ ×···×N Ψ , (16) Thus, C1 and C2 are the tightest stability bounds related with the minimum and maximum singular values of A, respectively. where Θ is the tensor representation for X, Ψ(n) = Under the invertibility of linear sampling, convex optimiza- (ψ(n)ψ(n) ···ψ(n)) is an orthogonal I × I basis matrix, and 1 2 In n n tion is considered for efficient exact recovery. Theorem 1 proves ×n denotes the n-mode product of a tensor by a matrix. Given that there exists a unique block-sparse vector c when A satisfies the matrix Ψ,then-mode product of X is defined as the block-RIP condition with constant δ +2k<1. ∗ Theorem√ 1: If A satisfies the block-RIP condition with Θ=X ×n Ψ ⇔ Θ(n) =ΨX(n), (17) δ ≤ 2 − 1, the vector c∗ of Eq. (15) can be determined ac- 2k X X cording to the convex second-order cone programming (SOCP) where (n) is the n-mode unfolding matrix of tensor .Fig.9 [21]. illustrates the n-mode unfolding of tensor X with n ≤ 3. MSL methods seek a tensor subspace that captures most ∗ ∗ ∗ of the variation in the original tensor objects and project X min c I s.t. y = A c , (15) c∗ 2, from the original tensor space RI1 ⊗ RI2 ⊗···⊗RIN onto the P P P tensor subspace S˜ = R 1 ⊗ R 2 ⊗···⊗R N , Pn In .The I (1)T (2)T (N )T (n) where represents the group index set. projection Θ=˜ X ×1 Ψ˜ ×2 Ψ˜ ×···×N Ψ˜ , Ψ˜ In Proof: Please refer to Appendix B. is an In × Pn basis matrix for the n-mode linear space R .Ifa Theorem 1 implies that the unique solution to Eq. (15) exists tensor X lies in a tensor subspace S˜, the vectors corresponding as long as δ2k is small enough. SOCP determination is desir- to its modes admit sparse representation over the basis of cor- able in comparison to standard CS results, as it considers the responding modes. The data-driven orthonormal basis matrices c × block structure of explicitly and adaptively. In practice, we {Ψ˜ (n) ∈ RIn Pn ,n=1,...,N} can be learned based on the reconstruct the block-sparse vector c∗ by solving Eq. (15) with set of K available tensor objects {X1 , X2 ,...,XK } via MPCA group-BP algorithm [39]. [35] by solving the following problem.

K { ˜ (n) } ˜ − ¯ 2 IV. TENSOR GENERALIZATION FOR UODS Ψ ,n=1,...,N = arg max Θi Θ F , (18) Ψ˜ (1),...,Ψ˜ ( N ) In Section III-C, PCA utilizes linear subspace learning (LSL) i=1 to deal with the vectorized tensor data (e.g., two-order tensors ˜ X × ˜ (1)T × ˜ (2)T ×···× ˜ (N )T ¯ where Θi = i 1 Ψ 2 Ψ N Ψ and Θ= like image). For standard compressive sampling, its efficiency K ˜ ˜ (n) is degraded by the large-scale sampling matrix generated by (1/K) i=1 Θi. The n-mode basis matrix Ψ consists of the the vectorization of high-dimensional signals, e.g., images and Pn eigenvectors corresponding to the largest Pn eigenvalues of videos. For example, it is less practical to store and transmit the the matrix sampling matrix with size 1000 × 10000 if we sample and re- K construct an image with size 100 × 100 at SR =0.1. Besides, (n) − ¯ · ˜ · ˜ T · − ¯ T Λ = (Xi(n) X(n)) ΨΛ( n ) ΨΛ( n ) (Xi(n) X(n)) reshaping high-order tensors into vectors destroys the intrinsic i=1 structure and correlation in the original tensors, making the rep- (19) ˜ ˜ (1) ⊗···⊗ ˜ (n−1) ⊗ ˜ (n+1) ⊗···⊗ ˜ (N ) resentation less compact or useful. Therefore, we generalize it to where ΨΛ( n ) = Ψ Ψ Ψ Ψ . the compressive tensor sensing (CTS) with the extension from The higher-order singular value decomposition (HOSVD)[36] LSL to MSL. The proposed CTS algorithm directly samples the is used in multilinear principal component analysis. tensor (signal) in each of its mode under the assumption that the Later, we define the compressive tensor sampling and recov- tensor lives in a lower-dimensional tensor subspace. The data- ery for arbitrary signals. driven basis is obtained by using multilinear subspace learning Definition 1 (Compressive Tensor Sampling): Given an N I ×I ×···×I (MSL). th-order tensor X ∈ R 1 2 N who is K1 -K2 -...-KN - LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5069

Fig. 10. 1, 2, 3-mode compressive sampling by projecting tensor X ∈ 16×10×8 8×5×4 ∗ 16×10×8 R onto tensor Y∈R ,whereY = X ×1 Φ1 ×2 Φ2 ×3 Φ3 Fig. 11. 1, 2, 3 mode recovery of tensor X ∈ R from tensor Y∈ 8×5×4 with sampling matrices Φ1 , Φ2 ,andΦ3 . R ,whereY = X ×1 Φ1 ×2 Φ2 ×3 Φ3 with sampling matrices Φ1 , Φ2 , (1) (2) (3) and Φ3 , and basis Ψ , Ψ ,andΨ for three modes. sparse, the compressive tensor sampling (CTS) is deﬁned by

Y = ΦˆX =(Φ1 ⊗ Φ2 ⊗···⊗ΦN )X Algorithm 2: Framework of CTS based on a tensor subspace. × ×···× Task: Sample and reconstruct the tensor X∈RI1 I2 IN . = X ×1 Φ1 ×2 Φ2 ×···×N ΦN , (20) Input: Tensor X, training set {X1 ,...,XK }, random M ×M ×···×M where ⊗ is the Kronecker product, Y∈R 1 2 N is the Gaussian sampling matrix Φ1 ,...,ΦN . M ×I measurement, and Φi ∈ R i i is the sensing matrix for the 1 Sampling: i-mode linear space. Here, Ki 0 k1, k2, k3 corresponding column vectors span the tensor sub- ∞ and C2 < such that space containing most of the variation of X1 ,...,XK . Thus, X can be considered as a “k1-k2-k3-sparse” tensor with respect to C u2 ≤A(n)u2 ≤ C u2 . (23) 1 2 2 2 2 the whole basis Ψ=Ψˆ (1) ⊗ Ψ(2) ⊗ Ψ(3). Subsequently, each Proof: Please refer to Appendix A. of the NRFs cubes X can be recovered for each mode according Deﬁnition 2 (n-Mode Recovery): Provided that each column to (24). vector θ of Θ(n) is sparse, θ can be recovered by The proposed algorithm can alleviate the computational and storage burden in sampling and recovery because of small-scale (n) (n) min θ 1 , s.t. y(n) =Φn Ψ θ = A θ. (24) sampling matrix utilized for each mode. Moreover, in compar- √ ison to existing works [28]Ð[32], it makes compressible repre- (n) ≤ − Here A satisﬁes the RIP condition with δ2k 2 1, and sentation with sparse tensor based on data-driven basis, so that y(n) denotes each corresponding column vector of the n-mode it can capture the non-stationarity in high-dimensional signals. unfolding matrix Y(n) . Fig. 11 illustrates an example for n-Mode Recovery. V. E XPERIMENTAL RESULTS C. Generalized Model for Video Sampling and Recovery A. Implementation A detailed description of the proposed tensor sampling and In this subsection, the UoDS model is employed on a variety recovery framework is given in Algorithm 2. Taking video se- of video sequences with multiple resolutions, including CIF quences for example, we decompose each GOP into a set of RFs (352 × 288), DVD (720 × 480), and 1080 p (1920 × 1080). 5070 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

Fig. 12. Sampling-rate-distortion curves for the proposed UoDS-PCA and UoDS-MH-PCA, BCS-DCT, BCS-KLT, MC-BCS-SPL, MH-BCS-SPL, and UoS- BCS-DCT under 8 × 8 patches (n =64) and dimension d =6, respectively.

Fig. 13. Sampling-rate-distortion curves for the proposed UoDS-PCA and UoDS-MH-PCA, BCS-DCT, BCS-KLT, MC-BCS-SPL, MH-BCS-SPL, and UoS- BCS-DCT under 16 × 16 patches (n = 256) and dimension d =10, respectively.

Commonly, the size of each non-overlapping block is 8 × 8 The proposed models are implemented with the The PGL1 Mat- and 16 × 16, thereby n =64and 256, respectively. As an ex- lab solver [39] under the configurations of 3.2-GHz CPU and ception, the block size for Park Scene is fixed to 32 × 32. 12-GB RAM. The sampling matrix Φ ∈ Rm ×n is an i.i.d. Gaussian random matrix with zero-mean and unit-variance. The sampling rate B. Results for Linear Subspaces SR ∈{0.1, 0.2,...,0.6}. The maximal value of SR is set to 0.6, as it can already guarantee perfect recovery. Each GOP con- In this section, the UoDS model is evaluated for the union tains 10 grayscale frames. Without loss of generality, the first of data-driven linear subspaces. Figs. 12 and 13 show the dis- frame of each GOP is set as the RF, while NRFs for the remain- tortion curves obtained under various sampling rates and block sizes for video sequences of various resolutions. We can see that ing nine frames. The training set of patches√ for SSC1 consists of 1 the proposed UoDS model with multihypothesis prediction is overlapping blocks with a step of 2 n pixels along both rows and columns. For example, the training set for CIF sequences competitive with BCS-based methods and the UoS model in the contains 1505 blocks, when block size is 16 × 16 and step is 8. low SR region, and outperforms them when the sampling rate is Thus, representation matrix C ∈ R1505×1505. In SSC, 50 clus- high. The UoDS model can be sampled for a desirable recovery ters are adopted for spectral clustering of vectorized patches quality under a high sampling rate, as dependencies between from RFs. RFs and NRFs can be exactly learned. Table I summarizes the The basis of UoDS is learned for each cluster by PCA and results for six video sequences under various configurations of block sizes, dimensions and sampling rates, where the proposed PCA-L1, respectively. The dimension of each subspace di se- lects from {2, 4, 6, 8, 10, 20}. The proposed models (UoDS- model is shown to perform better than benchmark methods in PCA and UoDS-PCAL1) are compared with five state-of-the-art most cases. The fact implies that the UoDS model can adap- sampling methods, including one UoS model UoS-BCS-DCT tively deal with structures with varying signal singularities to and four BCS based methods, e.g. BCS-DCT [12], BCS-KLT significantly decrease the necessary measurements for recovery [18], BCS-MC-SPL [15], and BCS-MH-SPL [16]. Here, BCS in comparison to the single subspace and UoS models. Further- based methods are single subspace models with adaptive KLT more, the proposed model is suitable for a wide range of resolutions. Fig. 14 provides the comparative results for Park Scene basis for BCS-KLT and predefined basis for BCS-DCT, BCS- × MC-SPL and BCS-MH-SPL. To be concrete, BCS-MC-SPL sequence with a resolution of 1920 1080, where UoDS-MH- incorporates motion compensation and BCS-MH-SPL utilizes PCA still maintains its sampling and recovery performance. multihypothesis prediction. For a fair comparison, we also in- Figs. 15Ð17 present the reconstructed fourth NRF in Foot- corporate multihypothesis prediction for an improved version ball, Whale Show, and Park Scene sequences obtained by MH-UoDS-PCA and MH-UoDS-PCAL1 of the UoDS model. the proposed model, BCS-based models and the UoS model, respectively. These reconstructed frames are obtained under sufficient sampling and proper block sizes to contain local 1Available at http://www.cis.jhu.edu/∼ehsan/ structures, i.e., n = 256,SR=0.6,d=10for CIF and DVD LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5071

TABLE I AVERAGE PSNR (dB) FOR RECONSTRUCTED VIDEO SEQUENCES WITH VARIOUS BLOCK SIZES n AND DIMENSIONS d

sequences, and n = 1024,SR=0.6,d=40 for 1080 p sequence. In overall, the proposed model achieves better visual quality in comparison to BCS-based models and the UoS model, especially in the texture regions like “grass”, “water” and “tree” in the three video sequences. PSNR performance of the reconstructed frames is also provided for validation. The proposed model is shown to achieve best visual quality under such benchmark. These results are consistent with the sampling and recovery performance, which validates the UoDS model. Moreover, Table I shows that 16 × 16 patches can achieve better recovery performance in comparison to 8 × 8 ones. It means that c∗ can be exactly reconstructed with a high prob- ability when n grows. The dimension of each subspace d also affects the performance of the UoDS model, as shown in Fig. 18. In the region of low SR, recovery performance for small d, e.g., d =2, and SR =0.1, is superior to the large d at a high SR, Fig. 14. Sampling-rate-distortion curves for Park Scene sequence obtained by the proposed UoDS-PCA and UoDS-MH-PCA, BCS-DCT, BCS-KLT, e.g., d =20and SR =0.1. Since reconstructed RFs are de- BCS-MC-SPL, BCS-MH-SPL, and UoS-BCS-DCT under 32 × 32 blocks composed into 50 clusters, we can obtain from Eq. (13) that (n = 1024) and dimension d =40, respectively. m =6,r = 100 for t =50,d=2,SR=0.1,n=64, while 5072 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

Fig. 15. Reconstructed frames for Football sequence under n = 256,SR= 0.6 and d =10. (a) the 4th NRF; (b) BCS-DCT; (c) BCS-KLT; (d) BCS-MC-SPL; (e) BCS-MH-SPL; (f) UoS-BCS-DCT; (g) UoDS-PCA; (h) UoDS-MH-PCA.

Fig. 16. Reconstructed frames for Whale Show sequence under n = 256,SR= 0.6 and d =10. (a) the 4th NRF; (b) BCS-DCT; (c) BCS-KLT; (d) BCS-MC-SPL; (e) BCS-MH-SPL; (f) UoS-BCS-DCT; (g) UoDS-PCA; (h) UoDS-MH-PCA. m =6,r = 200 for d =4,SR=0.1,n=64. Therefore, c∗ is Bus, Football,NBA) and DVD (720 × 480) resolution (i.e., Driv- more accurate for lower dimensions. However, Fig. 18(b) and (d) ing, Whale Show). We compare the proposed CTS method with suggest that larger d can improve the recovery performance at the state-of-the-art GTCS [31], [32], which utilizes DCT basis to a high SR, as enough measurements are provided for subspaces make video compressible in DCT domain. Video sequences are with larger dimension. For example, r = 1000,m= 614, when represented by a 352 × 288 × 16 tensor. For CTS, we choose t =50,d=20,SR=0.6,n= 1024. Consequently, it is desir- the ﬁrst 3 frames and split them into overlapping 32 × 32 × 3 able to properly select n and d for an exact recovery. For ex- tensors to generate the training set. MPCA is applied to this ample, r = 1000,m=38for d =20,SR=0.6,n=64would training set to derive 1-mode and 2-mode basis, which is adap- fail to recover c∗ with necessary measurements. tive and makes the tensors compressible. We use DCT basis for the 3-mode basis. For both methods, we sample each 32 × 32 × 16 non-overlapping sub-video cube of the following C. Results for Multilinear Subspaces 32SR×32 16 frames by i.i.d. random Gaussian matrix Φ1 , Φ2 ∈ R For further validation, we employ CTS in video sampling and with zero-mean and unit-variance for 1-mode and 2-mode sam- 16×16 reconstruction. In this subsection, experiments are conducted on pling, respectively, and set Φ3 ∈ R to an identity matrix for a variety of video sequences with CIF (352 × 288) (i.e., Bike, 3-mode full sampling. Thus, the total number of measurements LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5073

Fig. 17. Reconstructed frames for Park Scene sequence under n = 1024,SR= 0.6 and d =40. (a) the 4th NRF; (b) BCS-DCT; (c) BCS-KLT; (d) BCS-MC-SPL; (e) BCS-MH-SPL; (f) UoS-BCS-DCT; (g) UoDS-PCA; (h) MH-UoDS-PCA.

Fig. 18. The performance of the proposed scheme (UoDS-PCA) with various dimensions d and patch sizes.

Fig. 20. The experimental result of sampling rate allocation for the CTS model over four sequences. Fig. 19. Average PSNR performance (dB) of the 16 recovered frames with SR from 0.4 to 0.9. Tensor sizes 32 × 32 × 16 and 64 × 64 × 16 are adopted for CIF and DVD test sequences, respectively. test sequences. In comparison to DCT basis adopted by GTCS, CTS is more flexible to fit the non-stationary statistics of video sequences with the adaptive basis of each mode of tensors to be of each sub-video cube is SR2 · 16384. We also test tensor sampled. size of 64 × 64 × 16 for Driving and Whale Show sequences with similar settings. CTS and GTCS are compared under SR D. Sampling Rate Allocation ranging from 0.4 to 0.9. For the recovery of each mode, we apply the SPGL1 Matlab solver[38] for both schemes. Besides, Moreover, we discuss the rate allocation for the spatial and a MATLAB Tensor Toolbox [40] is used in this experiment. temporal sampling to obtain optimal overall performance. In- Fig. 19 shows the recovery performance for CTS and GTCS, stead of full sampling along the temporal dimension, the pro- where the proposed CTS outperforms GTCS in five out of six posed CTS varies the temporal sampling rates SR3 from 0.5 5074 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

TABLE II AVERAGE RECOVERY SPEED (SEC/FRAME) FOR VIDEO SEQUENCES Bike, Bus, Football, NBA, Driving AND WhaleShow OBTAINED UNDER BLOCK SIZE n = 256, DIMENSION d =10AND SAMPLING RATE SR =0.6

to 0.9 to exploit the compressible temporal components. Un- 16·SR3 ×16 der the fixed overall sampling rate SR =0.5, Φ3 ∈ R 32·SR1 ×32 and Φ1 , Φ2 ∈ R are adopted for temporal and spatial sampling with the rates SR1 = SR2 = SR/SR3 . Fig. 20 provides the sampling-rate-distortion curves obtained by CTS, where recovery performance is obtained under temporal sampling rates varying from 0.5 to 0.9 with a step of 0.05. It shows that the optimal sampling rate allocation is dis- tinguishing for different test sequences. Specifically as shown in Fig. 20, lower temporal sampling rate is required for test sequences with smooth motion trajectories, e.g., Bike and Bus. On the contrary, recovery performance would be improved with higher temporal sampling rate for test sequences with sharp variation of corresponding pixels along motion trajectories, e.g., Football. As mentioned in Section III-C, the dimensions of subspaces are determined by the texture complexities, which serve Fig. 21. Average training speed (sec/GOP) for video sequences Bike, Bus, as the measure of patch based structures. Thus, the distribution Football, NBA, Driving and WhaleShow obtained by UoS and UoDS under SR of sparsities would affect the sampling rate allocation, as block = 0.6. sparsities are related with the number of basis (dimensions). These facts imply that the allocation of sampling rates for different dimensions tends to follow the distribution of sparsity to UoDS-PCA and UoDS-PCAL1. Moreover, it would require an guarantee the quality of reconstruction. additional 70% to 90% time cost to introduce multihypothesis compensated prediction in the UoDS model. These facts E. Computational Complexity demonstrate that the recovery efficiency would be affected by the SOCP determination and motion compensation. The computational complexity of the proposed UoDS method Furthermore, we evaluate the computational complexity for comes from training and recovery process. For each group of pic- the training process. Fig. 21 compares the training speed tures (GOP), the training process (Step 1-4 in Algorithm 1) uti- (sec/GOP) for UoDS and UoS over the six video sequences. lizes the patches from recovered fully-sampled RF to derive the Here, subspace clustering and basis derivation are not required union of data-driven subspaces. Thus, the training speed is re- for the other benchmarks, as they are based on single subspace lated to the subspace clustering and learning conditioned on the (simple sparsity). Fig. 21 shows that the proposed UoDS model RF. In recovery process (Step 5 in Algorithm 1), the efficiency is competitive with UoS in terms of training speed, but achieves of UoDS depends on the recovery of NRFs, where second-order a better reconstruction performance with the data-driven sub- cone programming (SOCP) determination is leveraged to recon- spaces and corresponding bases. It should be noted that the ∗ struct the block-sparse vector c and multihypothesis prediction training of union of subspaces and their bases is only performed can be introduced to improve motion compensation. once on the RFs for each GOP. It is also promising to leverage The experiments for compressive video sampling and recon- low-rank representation (LRR) to balance the complexity and struction are implemented with Matlab on a PC with 3.2 GHz performance of training process. Fig. 6 shows that the UoDS- CPU and 12 GB RAM. The complexity for the proposed UoDS LRR can significantly reduce the complexity (about 70%Ð90%) model is evaluated in terms of recovery speed (sec/frame) and for the training process with a slight loss (up to 0.5 dB) of recon- training speed (sec/GOP). Table II provides the recovery speeds struction performance, when the number t of subspaces ranges for the proposed UoDS model and benchmarks for block CS, from 10 to 100. i.e., BCS-DCT, BCS-KLT, BCS-MC-SPL, BCS-MH-SPL and UoS-BCS-DCT, under block size n = 256, dimension d =10 VI. CONCLUSION and sampling rate SR =0.6. In comparison to UoS-BCS-DCT, the UoDS models UoDS-PCA and UoDS-PCAL1 require ap- This paper proposes an explicit sampling scheme to recover an proximately 1.4 to 1.7 times the complexity of UoS-BCS-DCT, unknown signal from a union of data-driven subspaces (UoDS). due to the SOCP determination that optimizes coefficients for It investigates neighboring data structures by clustering to form data-driven subspaces. When compared with BCS-KLT, mo- classified signal series. Subsequently, the union of subspaces tion compensation leads to a 30% to 50% excess time cost in is learned uniquely from the classified signal series by a linear LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5075

subspace learning method thereby deriving an adaptive basis where we defined a∞,I = maxi a[i]2 . For there are at most and enhancing the structured sparsity representation. Besides, ≤ k nonzero blocks, k bIi ∞,I bIi −1 2,I . Therefore, combine the general sampling model which is based on tensor is also Eq. (26) and (27), we have proposed. With the proof of stable reconstruction, the proposed t−2 t−1 scheme is fulfilled in video acquisition where the UoDS is −1/2 −1/2 I ∪I c ≤ I I ≤ I I learned from decoded reference frames. Experimental results b( 0 1 ) 2 k b i 2, k b i 2, show that the proposed method gets better performance in com- i=1 i=1 parison to the other compressive video sampling methods. Ad- −1/2 = k bIc 2,I (28) ditionally, we extend tradition CS to its tensor form for signals 0 ∗ I c c c b b c lying in a tensor subspace. To bound b( 0 ) 2,I , we use the fact that = + I0 + I , ∗ ∗ 0 c is support on I0 , c is the solution of Eq. (15), c 2,I ≥ APPENDIX A c2,I ,wehave PROOF OF PROPOSITION 1—PROPOSITION 3 ∗ ∗ c I ≥c + bI I + bIc I 2, 0 2, 0 2, Proof of Proposition 1: The proof is similar with that of ∗ ≥c I −bI I + bIc I (29) Proposition 3 in [19]. Since the UoDS model inherits the merit of 2, 0 2, 0 2, U ∗ the UoS model, still follows the property of predefined UoS. thereby we have S∗ S∗ ≤ Thus, Φ is one-on-one on each ij with dim( ij ) = dim(Φ) ∗ 1/2 ≥ S bIc I ≤bI I ≤ k bI . (30) m. As a result, m kmax = maxi= j dim( ij ). 0 2, 0 2, 0 2 Proof of Proposition 2: Firstly, we obtain from Eq. (13) that Combine Eq. (30) and (28), we have A∗ =ΦΨ∗. Since PCA-based learning methods generate an ∗ ∗ S b c ≤bI ≤bI ∪I orthornormal basis Ψi for each linear subspace i and Φ is (I0 ∪I1 ) 2 0 2 0 1 2 . (31) an i.i.d. random matrix, Proposition 2 is obtained according to Part 2: bI ∪I =0. 0 1 2 Proposition 4 and 5 in [19]. y A∗c∗ A∗c A∗b (n) (n) Since = = , =0. Besides, we have the fact Proof of Proposition 3: Eq. (22) shows that A =Φn Ψ that b = bI ∪I + bI . for the orthonormal basis Ψ(n) of S˜(n) obtained by HOSVD 0 1 i≥2 i − and the i.i.d. random matrix Φn . According to Proposition 4 t 1 A∗b 2 − A∗ b b A∗b and Proposition 5 in [19], we can easily obtain Proposition 3. I0 ∪I1 2 = ( I0 + I1 ), Ii . (32) i=2 From the block-RIP and the parallelogram identity, we have APPENDIX B |A∗c A∗c | ≤ c c PROOF OF THEOREM 1 1 , 2 δ2k 1 2 2 2 (33) ∗ According to Proposition 1, δ2k < 1 makes c unique. First, for any two-block k-sparse vectors with disjoint support. There- we assume that c = c∗ + b is a solution of Eq. (15). To prove fore we have c∗ is the true solution of Eq. (15), we just need to prove b =0. |A∗b A∗b | ≤ b b ∗ I0 , Ii δ2k I0 2 Ii 2 (34) We know that c is a k-block sparse vector, so let I0 denote the c∗ A∗b A∗b indices for where the coefficients are nonzero, bI0 denotes also similarly for I1 , Ii . Thereby, Eq. (32) becomes the restriction of b to these blocks. Then we can decompose b as t−1 ∗ 2 ∗ ∗ ∗ ∗ t−1 A b ≤ |A b A b | |A b A b | I0 ∪I1 2 ( I0 , Ii + I1 , Ii )

b = bIi (25) i=2 i=0 t−1 I ≤ δ (bI + bI ) bI . (35) where bIi is the restriction of b to the set i which comprises of 2k 0 2 1 2 i 2 k blocks, selected such that the norm of bIc over I is largest, i=2 0 1 the norm over I2 is the second largest, and so on. Therefore, we From the CauchyÐSchwartz inequality, we have can prove that √ b b ≤ b 2 b 2 b I0 2 + I1 2 2( I0 2 + I1 2 )= 2 I0 ∪I1 2 I ∪I c ≤ I ∪I c b 2 = b 0 1 + b(I0 ∪I1 ) 2 b 0 1 2 + b(I0 ∪I1 ) 2 where the last equality is a result of the fact that bI and bI Because our goal is to prove b =0, the work below is to prove 0 1 have disjoint support. Combine Eq. (27), (28), (30), and (35), both bI ∪I =0and b I ∪I c =0. 0 1 2 ( 0 1 ) 2 we have I ∪I c ≤ I ∪I Part 1: b( 0 1 ) 2 b 0 1 2 . √ ∗ 2 −1/2 First we have A bI ∪I ≤ 2k δ bI ∪I bIc I 0 1 2 2k 0 1 2 0 2, t−1 t−1 √ ≤ 2δ2k bI ∪I 2 bI 2 b I ∪I c = bI ≤ bI (26) 0 1 0 ( 0 1 ) 2 i 2 i 2 √ i=2 i=2 ≤ b 2 2δ2k I0 ∪I1 2 (36) which bounds bI for i ≥ 2. Then i 2 where the ﬁrst inequality is obtained by Eq. (27) and (28), ≤ 1/2 ≤ −1/2 b ≤ bIi 2 k bIi ∞,I k bIi −1 2,I (27) the second follows from Eq. (30), and the last from I0 2 5076 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017

b I0 ∪I1 2 . Then with the Block-RIP, we have [22] Y. C. Eldar, P. Kuppinger, and H. Bocskei,¬ “Block-sparse signals: Un- √ certainty relations and efficient recovery,” IEEE Trans. Signal Process., 2 ∗ 2 2 vol. 58, no. 6, pp. 3042Ð3054, Jun. 2010. − bI ∪I ≤A bI ∪I ≤ I ∪I (1 δ2k ) 0 1 2 0 1 2 2δ2k b 0 1 2 . [23] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-based com- √ (37) pressive sensing,” IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1982Ð2001, Apr. 2010. Since δ2k < 2 − 1, Eq. (37) can only hold if bI ∪I 2 =0, 0 1 [24] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proc. IEEE which completes the second part of the proof. Conf. Comput. Vis. Pattern Recognit., Miami, FL, USA, Jun. 2009, b c ≤b Since (I0 ∪I1 ) 2 I0 ∪I1 2 which part 1 has proved, pp. 2790Ð2797. ∗ [25] G. Liu et al., “Robust recovery of subspace structures by low-rank rep- therefore b I ∪I c 2 =0, b2 =0which means c = c and ( 0 1 ) resentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, we complete the whole proof. pp. 171Ð184, Jan. 2013. [26] E. Elhamifar and R. Vidal.,“Sparse subspace clustering: Algorithm, theory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, REFERENCES no. 11, pp. 2765Ð2781, Nov. 2013. [27] N. Kwak, “Principal component analysis based on L1-norm maxi- [1] S. Mallat, A Wavelet Tour of Signal Processing,2nded.NewYork,NY, mization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 9, USA: Academic, Sep. 1999. pp. 1672Ð1680, Sep. 2008. [2] M. Elad and M. Aharon, “Image denoising via sparse and redundant [28] L. Lim and P. Comon, “Multiarray signal processing: Tensor decomposi- representations over learned dictionaries,” IEEE Trans. Image Process., tion meets compressed sensing,” Comptes Rendus Mecanique, vol. 338, vol. 54, no. 12, pp. 3736Ð3745, Dec. 2006. no. 6, pp. 311Ð320, Jun. 2010. [3] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local [29] M. Duarte and R. Baraniuk, “Kronecker product matrices for compressive sparse models for image restoration,” in Proc. IEEE Int. Conf. Comput. sensing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Dallas, Vis., Kyoto, Japan, Sep. 2009, pp. 2272Ð2279. TX, USA, Mar. 2010, pp. 3650Ð3653. [4] J. Mairal, F. Bach, and J. Ponce, “Task-driven dictionary learning,” IEEE [30] N. Sidiropoulos and A. Kyrillidis, “Multi-way compressed sensing for Trans. Pattern Anal. Mach. Intell., vol. 34, no. 4, pp. 791Ð804, Apr. 2012. sparse low-rank tensors,” IEEE Signal Process. Lett., vol. 19, no. 11, [5] J. Zhang, D. Zhao, F. Jiang, and W. Gao, “Structural group sparse repre- pp. 757Ð760, Nov. 2012. sentation for image compressive sensing recovery,” in Proc. Data Com- [31] Q. Li, D. Schonfeld, and S. Friedland, “Generalized tensor compressive pression Conf., Snowbird, UT, USA, Mar. 2013, pp. 331Ð340. sensing,” in Proc. IEEE Int. Conf. Multimedia Expo., San Jose, CA, USA, [6] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recog- Jul. 2013, pp. 1Ð6. nition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., [32] S. Friedland, Q. Li, and D. Schonfeld, “Compressive sensing of sparse vol. 31, no. 2, pp. 210Ð227, Feb. 2009. tensors,” IEEE Trans. Image Process., vol. 23, no. 10, pp. 4438Ð4447, [7] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching Oct. 2014. using sparse coding for image classification,” in Proc. IEEE Conf. Comput. [33] X. He, D. Cai, and P. Niyogi, “Tensor subspace analysis,” in Proc. Vis. Pattern Recognit., Miami, FL, USA, Jun. 2009, pp. 1794Ð1801. Adv. Neural Inf. Process. Syst., Vancouver, BC, Canada, Dec. 2005, [8] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, pp. 499Ð506. no. 4, pp. 1289Ð1306, Apr. 2006. [34] J. Ye, “Generalized low rank approximations of matrices,” Mach. Learn., [9] E. J. Candes,` J. Romberg, and T. Tao, “Robust uncertainty principles: Ex- vol. 61, no. 1-3, pp. 167Ð191, Nov. 2005. act signal reconstruction from highly incomplete frequency information,” [35] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “MPCA: Multilin- IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489Ð509, Feb. 2006. ear principal component analysis of tensor objects,” IEEE Trans. Neural [10] E. J. Candes` and T. Tao, “Near optimal signal recovery from random Netw., vol. 19, no. 1, pp. 18Ð39, Jan. 2008. projections: Universal encoding strategies,” IEEE Trans. Inf. Theory, [36] D. L. Lieven, B. D. Moor, and J. Vandewalle, “A multilinear singu- vol. 52, no. 12, pp. 5406Ð5425, Dec. 2006. lar value decomposition,” SIAM J. Matrix Anal. Appl., vol. 21, no. 4, [11] M. Wakin et al., “Compressive imaging for video representation and cod- pp. 1253Ð1278, 2000. ing,” in Proc. Picture Coding Symp., Beijing, China, Apr. 2006, pp. 1Ð6. [37] Y. Pang, X. Li, and Y. Yuan, “Robust tensor analysis with L1-norm,” IEEE [12] V. Stankovic, L. Stankovic, and S. Cheng, “Compressive video sam- Trans. Circuits Syst. Video Technol., vol. 20, no. 2, pp. 172Ð178, Feb. 2010. pling,” in Proc. 16th Eur. Signal Process. Conf., Lausanne, Switzerland, [38] E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for Aug. 2008, pp. 1Ð5. basis pursuit solutions,” SIAM J. Sci. Comput., vol. 31, no. 2, pp. 890Ð912, [13] J. Prades-Nebot, Y. Ma, and T. Huang, “Distributed video coding using Nov. 2008. compressive sampling,” in Proc. Picture Coding Symp., Chicago, IL, USA, [39] E. van den Berg and M. P. Friedlander, “Sparse optimization with least- May 2009, pp. 1Ð4. squares constraints,” SIAM J. Optim., vol. 21, no. 4, pp. 1201Ð1229, 2011. [14] Z. Liu, A. Elezzabi, and H. Zhao, “Maximum frame rate video acquisition [40] T. G. Kolda et al., MATLAB Tensor Toolbox Version 2.5, Jan. 2012. using adaptive compressed sensing,” IEEE Trans. Circuits Syst. Video [Online]. Available: http://www.sandia.gov/ tgkolda/TensorToolbox/ Technol., vol. 21, no. 11, pp. 1704Ð1718, Nov. 2011. [15] S. Mun and J. E. Fowler, “Residual reconstruction for block-based compressed sensing of video,” in Proc. Data Compression Conf., Snowbird, UT, USA, Mar 2011, pp. 183Ð192. [16] C. Chen, E. W. Tramel, and J. E. Fowler, “Compressed-sensing recovery of images and video using multihypothesis predictions,” in Proc. 45th Asilomar Conf. Signals Syst. Comput., Pacific Grove, CA, USA, Nov. 2011, pp. 1193Ð1198. [17] J. Ma, G. Plonka, and M. Yousuff Hussaini, “Compressive video sampling with approximate message passing decoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 9, pp. 1354Ð1364, Sep. 2012. [18] Y. Liu, M. Li, and D. A. Pados, “Motion-aware decoding of compressed- Yong Li (S’15) received the B.S. degree from Xuzhou sensed video,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 3, Normal University, Xuzhou, China, and the M.S. de- pp. 438Ð444, Mar. 2013. gree from China University of Mining and Technol- [19] Y. M. Lu and M. N. Do, “A theory for sampling signals from a union of ogy, Xuzhou, China, in 2009 and 2012, respectively. subspaces,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2334Ð2345, He is currently working toward the Ph.D. degree with Jun. 2008. the Department of Electronic Engineering, Shanghai [20] T. Blumensath, “Sampling and reconstructing signals from a union of Jiao Tong University, Shanghai, China. From Novem- linear subspaces,” IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4660Ð4671, ber 2014 to May 2015, November 2015 to May 2016, Jul. 2011. he was with the Department of Biomedical Informat- [21] Y. C. Eldar and M. Mishali, “Robust recovery of signals from a struc- ics, University of California, San Diego (UCSD), as a tured union of subspaces,” IEEE Trans. Inf. Theory, vol. 55, no. 11, visiting scholar. His current research interests include pp. 5302Ð5316, Nov. 2009. compressive sensing, sparse representation, and image/signal processing. LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5077

Wenrui Dai (M’15) received the B.S., M.S., Yuan F. Zheng (F’97) received the M.S. and Ph.D. and Ph.D. degrees in electronic engineering from degrees in electrical engineering from The Ohio State Shanghai Jiao Tong University, Shanghai, China, in University, Columbus, OH, USA, in 1980 and 1984, 2005, 2008, and 2014. He is currently a Postdoc- respectively. He received undergraduate education at toral Scholar with the Department of Biomedical In- Tsinghua University, Beijing, China, in 1970. From formatics, University of California, San Diego, CA, 1984 to 1989, he was with the Department of Electri- USA. His research interests include learning-based cal and Computer Engineering, Clemson University, image/video coding, image/signal processing, and Clemson, SC, USA. predictive modeling. Since August 1989, he has been with The Ohio State University, where he is currently Winbigler Des- ignated Chair Professor and was the Chairman of the Department of Electrical and Computer Engineering from 1993 to 2004. From 2004 to 2005, he spent sabbatical year at the Shanghai Jiao Tong University, Junni Zou (M’07) received the M.S. and the Ph.D. Shanghai, China, and continued to be involved as the Dean of School of Elec- degrees in communication and information system tronic, Information and Electrical Engineering until 2008. His research interests from Shanghai University, Shanghai, China, in 2004 include two aspects. One is in wavelet transform for image and video, radar and 2006, respectively. waveform and signal, and object tracking. The other is in robotics which in- She is currently a full Professor with the De- cludes robotics for life science applications, multiple robots coordination, legged partment of Computer Science and Engineering, walking robots, and service robots. He has been on the editorial board of ﬁve Shanghai Jiao Tong University (SJTU), Shanghai, international journals. For his research contributions, he received the Presiden- China. From 2006 to 2016, she was with the School tial Young Investigator Award from President Ronald Reagan in 1986, and the of Communication and Information Engineering, Research Awards from the College of Engineering of The Ohio State University Shanghai University, Shanghai, and became a full in 1993, 1997, and 2007, respectively. He along with his students received the Professor in 2014. From June 2011 to June 2012, she best conference and best student paper award a few times in 2000, 2002, 2006, was with the Department of Electrical and Computer Engineering, University of 2009, and 2010, respectively, and received the Fred Diamond for Best Techni- California, San Diego, CA, USA, as a visiting Professor. Her research interests cal Paper Award from the Air Force Research Laboratory in Rome, New York, include multimedia communication, network resource optimization, wireless NY, USA, in 2006. In 2004 and 2005, he was appointed to the International communication, and network information theory. She has published more than Robotics Assessment Panel by the NSF, NASA, and NIH to assess the robotics 80 IEEE journal/conference papers, and 2 book chapters, including 15 IEEE technologies worldwide. Most recently in 2017, he received the Innovator of Transactions journal papers. She holds 12 patents and has more than 10 under the Year Award from The Ohio State University. reviewing patents. Dr. Zou was granted National Science Fund for Outstanding Young Scholar in 2016. She received Shanghai Yong Rising Star Scientist Award in 2011. She received the First Prize of the Shanghai Technological Innovation Award in 2011, and the First Prize of the Shanghai Science and Technology Advancement Award in 2008. Also, she has served on some technical program committees for the IEEE and other international conferences.

Hongkai Xiong (M’01–SM’10) received the Ph.D. degree in communication and information system from Shanghai Jiao Tong University (SJTU), Shang- hai, China, in 2003. Since 2003, he has been with the Department of Electronic Engineering, SJTU, where he is currently a full Professor. From December 2007 to December 2008, he was with the Department of Electrical and Computer Engineering, Carnegie Mellon University (CMU), Pittsburgh, PA, USA, as a Research Scholar. From 2011 to 2012, he was a Scientist with the Divi- sion of Biomedical Informatics, University of California (UCSD), San Diego, CA, USA. His research interests include source coding/network information theory, signal processing, computer vision, and machine learning. He has published more than 180 refereed journal/conference papers. Dr. Xiong was granted the Yangtze River Scholar Distinguished Professor from Ministry of Education of China in 2016, and the Youth Science and Tech- nology Innovation Leader from Ministry of Science and Technology of China. He became Shanghai Academic Research Leader. In 2014, he was granted National Science Fund for Distinguished Young Scholar and Shanghai Youth Science and Technology Talent as well. In 2013, he received Shanghai Shu Guang Scholar. From 2012, he is a member of Innovative Research Groups of the National Natural Science. In 2011, he obtained the First Prize of the Shang- hai Technological Innovation Award for Network-oriented Video Processing and Dissemination: Theory and Technology. In 2010 and 2013, he obtained the SMC-A Excellent Young Faculty Award of Shanghai Jiao Tong University. In 2009, he received New Century Excellent Talents in University, Ministry of Education of China. He served as TPC members for prestigious conferences such as ACM Multimedia, ICIP, ICME, and ISCAS. He received the Top 10% Paper Award at the 2016 IEEE Visual Communication and Image Processing, the Best Student Paper Award at the 2014 IEEE Visual Communication and Image Processing, the Best Paper Award at the 2013 IEEE International Sym- posium on Broadband Multimedia Systems and Broadcasting, and the Top 10% Paper Award at the 2011 IEEE International Workshop on Multimedia Signal Processing (IEEE MMSP’11).