5062 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 19, OCTOBER 1, 2017 Structured Sparse Representation With Union of Data-Driven Linear and Multilinear Subspaces Model for Compressive Video Sampling Yong Li, Student Member, IEEE,WenruiDai, Member, IEEE, Junni Zou, Member, IEEE, Hongkai Xiong, Senior Member, IEEE, and Yuan F. Zheng, Fellow, IEEE
Abstract—The standard compressive sensing (CS) theory can be be represented by a linear combination of a few column vectors improved for robust recovery with fewer measurements under the from the dictionary. In a certain sense, accompanied by a cor- assumption that signals lie on a union of subspaces (UoS). However, responding dictionary, sparsity has a close connection with the the UoS model is restricted to specific types of signal regularities concept subspace spanned by its few column vectors. Recently, with predetermined topology for subspaces. This paper proposes it has been proved that a trained dictionary is more effective a generalized model which adaptively decomposes signals into a union of data-driven subspaces (UoDS) for structured sparse rep- than a predefined one (e.g., wavelets [1]) for many tasks such resentation. The proposed UoDS model leverages subspace clus- as image denoising [2]Ð[4], compressive sensing [5], [18] and tering to derive the optimal structures and bases for the subspaces classification tasks [6], [7]. The dictionary directly learned from conditioned on the sample signals. For multidimensional signals input signals is shown to provide more adaptivity and sparsity with various statistics, it supports linear and multilinear subspace than the off-the-shelf bases. learning for compressive sampling. As an improvement for generic As an application of sparsity, compressive sensing (CS) is CS model, the basis which represents the sparsity of sample signals a desirable framework for signal acquisition [8]Ð[10]. CS at- is adaptively generated via linear subspace learning method. Fur- tempts to acquire the dictionary-based sparse representation for thermore, a generalized model with multilinear subspace learning unknown signals by randomly projecting them onto a space is considered for CS to avoid vectorization of high-dimensional (observation) with much lower dimension. Recently, CS has signals. In comparison to UoS, the UoDS model requires fewer degrees of freedom for a desirable recovery quality. Experimental been widely adopted in video acquisition and reconstruction results demonstrate that the proposed model for video sampling is [11]Ð[18]. In comparison to conventional video acquisition and promising and applicable. compression approaches, CS can relieve the burden of video encoder by reducing the number of measurements to be sam- Index Terms—Structured sparsity, compressive video sampling, pled. At the decoder side, its recovery quality can be guaranteed union of data-driven subspaces, tensor subspace. with effective reconstruction methods based on a sparse repre- sentation with certain basis. Wakin et al. [11] first applied CS I. INTRODUCTION into video acquisition, which jointly made compressive sam- PARSITY is widely concerned in many areas such as statis- pling and reconstruction for video sequences with 3-D wavelet tics, machine learning and signal processing. A vector ad- transform. Recognizing that compressive sampling was inade- mitsS a sparse representation over a basis (or dictionary) if it can quate to the entire frame, a block-based CS (BCS) method [12]) utilized CS for non-overlapping blocks to exploit the local spar- Manuscript received November 16, 2016; revised May 4, 2017; accepted June sity within the DCT domain. Later, temporal correlations were 10, 2017. Date of publication June 29, 2017; date of current version July 24, exploited to improve BCS. The distributed BCS framework [13] 2017. The associate editor coordinating the review of this manuscript and ap- approximated each block by a linear combination of blocks in proving it for publication was Prof. Xin Wang. The work was supported in part by previous frames. Liu et al. [14] developed an adaptive CS strat- the National Natural Science Foundation of China under Grant 61501294, under egy for blocks in various regions of independent motion and Grant 61622112, under Grant 61529101, under Grant 61472234, under Grant 61425011, in part by China Postdoctoral Science Foundation 2015M581617, texture complexity. BCS-SPL framework [15] made smooth and in part by Program of Shanghai Academic Research Leader 17XD1401900. projected Landweber (SPL) reconstruction by incorporating (Corresponding author: Hongkai Xiong.) motion estimation and compensation. Chen et al. [16] combined Y. Li and H. Xiong are with the Department of Electronic Engineering, multi-hypothesis prediction with BCS-SPL to enhance the re- Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: marsleely@ sjtu.edu.cn; [email protected]). construction performance. The approximate message passing W. Dai is with the Department of Biomedical Informatics, University of (AMP) reconstruction based on the dual-tree complex wavelet California, San Diego, CA 92093 USA (e-mail: [email protected]). transform was adopted in [17]. Liu et al. [18] improved J. Zou is with the Department of Computer Science and Engineering, the recovery performance of BCS framework by introducing Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: zou-jn@cs. Karhunen-Loeve` transform (KLT) basis in the decoder. To over- sjtu.edu.cn). Y. F. Zheng is with the Department of Electrical and Computer Engineer- come the deficiency of sampling high-order signals, multidi- ing, The Ohio State University, Columbus, OH 43210 USA (e-mail: zheng@ mensional CS techniques [28]Ð[32] have been developed for ece.osu.edu). practical implementations. However, these methods assume that Color versions of one or more of the figures in this paper are available online signals live in a single vector or tensor space, which ignore the at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2017.2721905 structures within the sparse coefficients.
1053-587X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information. LI et al.: STRUCTURED SPARSE REPRESENTATION WITH UNION OF DATA-DRIVEN LINEAR AND MULTILINEAR SUBSPACES MODEL 5063
To capture the underlying structures for recovery, the union of tensors and adaptively derives the basis for each tensor sub- subspaces model [19]Ð[21] has been established to significantly space with multilinear subspace learning (MSL). Moreover, it reduce the number of measurements. In [19], a general sampling relieves the large-scale sensing matrices led by vectorization of framework was studied, where sampled signals lived in a known high-dimensional signals and preserves the intrinsic structures union of subspaces with linear operators. Under the notion of in the original signals. Therefore, CTS is efficient and prac- block restricted isometry property (RIP), robust block sparse tical for sampling high-dimensional signals in comparison to recovery with a mixed 2 / 1 programming was developed for previous schemes in literature. the UoS model in [21]. Blumensath [20] demonstrated that pro- The rest of the paper is organized as follows. Section II jected Landweber algorithm could recover signals from UoS for provides the background and motivation of UoDS. Section III all Hilbert spaces and unions of such subspaces, as long as the proposes the UoDS model for compressive video sampling, in- sampling operator satisfied bi-Lipschitz embedding condition. cluding subspace clustering, linear subspace learning and stable The UoS model exploits the structural sparsity like tree-sparsity recovery. Furthermore, the UoDS model is generalized to the and block-sparsity [22], [23] to span subspaces with DCT and multilinear subspaces in Section IV. Section V provides the wavelet bases. However, it is restricted to signals with varying experimental results for validation in terms of SR-PSNR per- signal regularities, especially video sequences. formance and visual quality. Finally, Section VI concludes this Furthermore, multi-linear subspaces (tensor subspaces) learn- paper. ing (MSL [33]Ð[35], [37]) was adopted under the assumption that high-dimensional signals live in a tensor product of sub- II. DEFINITION AND MOTIVATION spaces. It is due to the fact that a predetermined vectorization of tensor data can obscure the statistical correlations among sam- To motivate the proposed UoDS model, we begin with a ples and discard important structural information. Tensor sub- review of the compressive sensing models, including both single space analysis (TSA [33]) detected the intrinsic local geometric subspace model and union of subspace model. structures of the tensor space to generate a representation matrix for images. In the meantime, generalized low rank approxima- A. Single Subspace Model tions of matrices (GLRAM [34]) iteratively made bi-directional { } linear projecting transform for feature extraction of images, Given an orthonormal basis Ψ= ψi ,ann-dimensional sig- x ∈ Rn x c which outperformed the traditional SVD-based methods. For nal has a sparse representation in the form of =Ψ , where c is the representation vector with k nonzero components. efficient tensor subspace representation, multilinear principal x component analysis (MPCA [35]) framework suggested fea- This fact implies that lives in a k-dimension single subspace ture extraction with dimensionality reduction to model a major spanned by the k basis vectors corresponding to the nonzero part of input signals. Recently, L1-norm-based tensor analy- components in Ψ. Under the assumption of simple sparsity, this sis (TPCA-L1 [37]) replaced Frobenius norm and L2 norm with single subspace model is widely adopted in traditional compres- y ∈ Rm L1-norm to formulate the problem of tensor analysis with the ro- sive sampling. Thus, the measurement is obtained by linear sampling y =Φx based on a sensing matrix Φ ∈ Rm ×n bustness to outliers. Unfortunately, their performance for video sequences is degraded, as they failed to consider the temporal with k