Robust Matrix Decomposition with Sparse Corruptions Daniel Hsu, Sham M

1 Robust Matrix Decomposition with Sparse Corruptions Daniel Hsu, Sham M. Kakade, and Tong Zhang Abstract—Suppose a given observation matrix can be matrix XS of errors before being observed as decomposed as the sum of a low-rank matrix and a sparse matrix, and the goal is to recover these individual compo- Y = XS + XL: nents from the observed sum. Such additive decompositions (sparse) (low-rank) have applications in a variety of numerical problems including system identification, latent variable graphical The goal is to recover the original data matrix XL modeling, and principal components analysis. We study (and the error components XS) from the corrupted conditions under which recovering such a decomposition observations Y . In the latent variable model application is possible via a combination of `1 norm and trace norm minimization. We are specifically interested in the question of Chandrasekaran et al. [2], Y represents the precision of how many sparse corruptions are allowed so that convex matrix over visible nodes of a Gaussian graphical model, programming can still achieve accurate recovery, and we and XS represents the precision matrix over the visible obtain stronger recovery guarantees than previous studies. nodes when conditioned on the hidden nodes. In general, Moreover, we do not assume that the spatial pattern of Y may be dense as a result of dependencies between corruptions is random, which stands in contrast to related analyses under such assumptions via matrix completion. visible nodes through the hidden nodes. However, XS will be sparse when the visible nodes are mostly in- Index Terms—Matrix decompositions, sparsity, low- dependent after conditioning on the hidden nodes, and rank, outliers the difference XL = Y − XS will be low-rank when the number of hidden nodes is small. The goal is then I. INTRODUCTION to infer the relevant dependency structure from just the visible nodes and measurements of their correlations. HIS work studies additive decompositions of ma- Even if the matrix Y is exactly the sum of a sparse trices into sparse and low-rank components. Such T matrix X and a low-rank matrix X , it may be im- decompositions have found applications in a variety of S L possible to identify these components from the sum. For numerical problems, including system identification [1], instance, the sparse matrix X may be low-rank, or the latent variable graphical modeling [2], and principal S low-rank matrix X may be sparse. In such cases, these component analysis (PCA) [3]. In these settings, the L components may be confused for each other, and thus user has an input matrix Y 2 m×n which is believed R the desired decomposition of Y may not be identifiable. to be the sum of a sparse matrix X and a low-rank S Therefore, one must impose conditions on the sparse matrix X . For instance, in the application to PCA, L and low-rank components in order to guarantee their X represents a matrix of m data points from a low- L identifiability from Y . dimensional subspace of Rn, and is corrupted by a sparse We present sufficient conditions under which XS and D. Hsu is with Microsoft Research New England, Cambridge, MA XL are identifiable from the sum Y . Essentially, we 02142 USA (e-mail: [email protected]). require that XS not be too dense in any single row S. M. Kakade is with Microsoft Research New England, Cambridge, or column, and that the singular vectors of XL not be MA 02142 USA, and also with the Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA 19104 (e-mail: too sparse. The level of denseness and sparseness are [email protected]). This author was partially supported by considered jointly in the conditions in order to obtain the NSF grant IIS-1018651. weakest possible conditions. Under a mild strengthening T. Zhang is with the Department of Statistics, Rutgers University (e- mail: [email protected]). This author was partially supported by of the condition, we also show that XS and XL can be the following grants: AFOSR FA9550-09-1-0425, NSA-AMS 081024, recovered by solving certain convex programs, and that NSF DMS-1007527, and NSF IIS-1016061. the solution is robust under small perturbations of Y . Copyright c 2011 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes The first program we consider is must be obtained from the IEEE by sending a request to pubs- [email protected]. min λkXSkvec(1) + kXLk∗ (subject to certain feasibility constraints such as kXS + of [3]). Narrowing this gap with alternative deterministic XL −Y k ≤ ), where k·kvec(1) is the entry-wise 1-norm conditions is an interesting open problem. Follow-up and k · k∗ is the trace norm. These norms are natural work to [3] studies the robustness of the recovery pro- convex surrogates for the sparsity of XS and the rank of cedure [8], as well as quantitatively weaker conditions XL [4], [5], which are generally intractable to optimize. on XS [9], but these works are only considered under We also consider a regularized formulation the random support model. Our work is therefore largely 1 complementary to these probabilistic analyses. min kX + X − Y k2 + λkX k + kX k 2µ S L vec(2) S vec(1) L ∗ B. Outline where k · k is the Frobenius norm; this formulation vec(2) We describe our main results in Section II. In Sec- may be more suitable in certain applications and enjoys tion III, we review a number of technical tools such as different recovery guarantees. matrix and operator norms that are used to characterize the rank-sparsity incoherence properties of the desired A. Related work decomposition. Section IV analyzes these incoherence Our work closely follows that of Chandrasekaran et properties in detail, giving sufficient conditions for iden- al. [1], who initiated the study of rank-sparsity incoher- tifiability as well as for certifying the (approximate) ence and its application to matrix decompositions. There, optimality of a target decomposition for our optimization the authors identify parameters that characterize the formulations. The main recovery guarantees are proved in Sections V and VI. incoherence of XS and XL sufficient to guarantee identifiability and recovery using convex programs. However, their analysis of this characterization yields conditions II. MAIN RESULTS that are significantly stronger than those given in our Fix an observation matrix Y 2 Rm×n. Our goal is to present work. For instance, the allowed fraction of non- (approximately) decompose the matrix Y into the sum zero entries in XS is quickly vanishing as a function of of a sparse matrix XS and a low-rank matrix XL. the matrix size, even under the most favorable conditions on XL; our analysis does not have this restriction and A. Optimization formulations allows X to have up to Ω(mn) non-zero entries when S We consider two convex optimization problems over X is low-rank and has non-sparse singular vectors. In L (X ;X ) 2 m×n × m×n. The first is the constrained terms of the PCA application, our analysis allows for S L R R formulation (parametrized by λ > 0, ≥ 0, and up to a constant fraction of the data matrix entries to vec(1) ≥ 0) be corrupted by noise of arbitrary magnitude, while the ∗ analysis of [1] requires that it decrease as a function of min λkXSkvec(1) + kXLk∗ the matrix dimensions. Moreover, [1] only considers ex- s.t. kXS + XL − Y kvec(1) ≤ vec(1) (1) act decompositions, which may be unrealistic in certain kXS + XL − Y k∗ ≤ ∗ applications; we allow for approximate decompositions, where k · kvec(1) is the entry-wise 1-norm, and k · k∗ is and study the effect of perturbations on the accuracy of the trace norm (i.e., sum of singular values). The sec- the recovered components. ond is the regularized formulation (with regularization The application to principal component analysis with parameter µ > 0) gross sparse errors was studied by Candes` et al. [3], 1 2 building on previous results and analysis techniques for min kXS + XL − Y kvec(2) + λkXSkvec(1) + kXLk∗ e.g. 2µ the related matrix completion problem ( , [6], [7]). (2) The sparse errors model of [3] requires that the support where k · k is the Frobenius norm (entry-wise 2- X vec(2) of the sparse matrix S be random, which can be norm). unrealistic in some settings. However, the conditions We also consider adding a constraint to control are significantly weaker than those of [1]: for instance, kXLkvec(1), the entry-wise 1-norm of XL. To (1), we they allow for Ω(mn) non-zero entries in XS. Our add the constraint work makes no probabilistic assumption on the sparsity pattern of XS and instead studies purely deterministic kXLkvec(1) ≤ b structural conditions. The price we pay, however, is and to (2), we add roughly a factor of rank(XL) in what is allowed for the support size of XS (relative to the probabilistic analysis kXS − Y kvec(1) ≤ b 2 n The parameter b is intended as a natural bound for XL where kMkp!q := maxfkMvkq : v 2 R ; kvkp ≤ 1g, and is typically known in applications. For example, in 8 −1 if M < 0 image processing, the values of interest may lie in the < i;j sign(M) = 0 if M = 0 8i 2 [m]; j 2 [n] interval [0; 255] (say), and hence, we might take b = 500 i;j i;j : +1 if M > 0 as a relaxation of the box constraint [0; 255]. The core of i;j our analyses do not rely on these additional constraints; and ρ > 0 is a balancing parameter to accommodate we only consider them to obtain improved robustness disparity between the number of rows and columns; guarantees for recovering XL, which may be important a natural choice for the balancing parameter is ρ := in some applications.

Robust Matrix Decomposition with Sparse Corruptions Daniel Hsu, Sham M

Geometric Polarimetry − Part I: Spinors and Wave States

A Fast Method for Computing the Inverse of Symmetric Block Arrowhead Matrices

Structured Factorizations in Scalar Product Spaces

Linear Algebra - Part II Projection, Eigendecomposition, SVD

Unit 6: Matrix Decomposition

LU Decomposition - Wikipedia, the Free Encyclopedia

Arxiv:1105.1185V1 [Math.NA] 5 May 2011 Ento 2.2

Householder Matrix H( W )= E ( W , W ,2) = I − 2 Wwh for Wh W =1

Electronic Reprint Wedge Reversion Antisymmetry and 41 Types of Physical Quantities in Arbitrary Dimensions Iucr Journals

Matrix Decompositions

Usage of Matrix Decomposition

Path Connectedness and Invertible Matrices