Arxiv:1803.00420V1

Tractable and Scalable Schatten Quasi-Norm Approximations for Rank Minimization Fanhua Shang Yuanyuan Liu James Cheng Department of Computer Science and Engineering, The Chinese University of Hong Kong Abstract convex optimization problem. In fact, the trace norm penalty is an ℓ1-norm regularization of the singular values, and thus it motivates a low-rank solution. How- The Schatten quasi-norm was introduced to ever, [8] pointed out that the ℓ -norm over-penalizes bridge the gap between the trace norm and 1 large entries of vectors, and results in a biased solution. rank function. However, existing algorithms Similar to the ℓ -norm case, the trace norm penalty are too slow or even impractical for large- 1 shrinks all singular values equally, which also leads to scale problems. Motivated by the equivalence over-penalize large singular values. In other words, the relation between the trace norm and its bilin- trace norm may make the solution deviate from the ear spectral penalty, we define two tractable original solution as the ℓ -norm does. Compared with Schatten norms, i.e. the bi-trace and tri-trace 1 the trace norm, although the Schatten-p quasi-norm norms, and prove that they are in essence the for 0<p<1 is non-convex, it gives a closer approxima- Schatten-1/2 and 1/3 quasi-norms, respec- tion to the rank function. Therefore, the Schatten-p tively. By applying the two defined Schat- quasi-norm minimization has attracted a great deal of ten quasi-norms to various rank minimiza- attention in images recovery [9, 10], collaborative fil- tion problems such as MC and RPCA, we tering [11] and MRI analysis [12]. only need to solve much smaller factor matrices. We design two efficient linearized alter- [13] and [14] proposed iterative reweighted lease nating minimization algorithms to solve our squares (IRLS) algorithms to approximate associated problems and establish that each bounded Schatten-p quasi-norm minimization problems. In ad- sequence generated by our algorithms condition, [10] proposed an iteratively reweighted nu- verges to a critical point. We also provide the clear norm (IRNN) algorithm to solve non-convex restricted strong convexity (RSC) based and surrogate minimization problems. In some recent MC error bounds for our algorithms. Our ex- work [15, 16, 11, 9, 10], the Schatten-p quasi-norm perimental results verified both the efficiency has been shown to be empirically superior to the trace and effectiveness of our algorithms compared norm. Moreover, [17] theoretically proved that the with the state-of-the-art methods. Schatten-p quasi-norm minimization with small p re- arXiv:1803.00420v1 [cs.LG] 28 Feb 2018 quires significantly fewer measurements than the convex trace norm minimization. However, all existing 1 Introduction algorithms have to be solved iteratively and involve singular value decomposition (SVD) or eigenvalue de- The rank minimization problem has a wide range composition (EVD) in each iteration. Thus they suffer of applications in matrix completion (MC) [1], ro- from high computational cost and are even not appli- bust principal component analysis (RPCA) [2], low- cable for large-scale problems [18]. rank representation [3], multivariate regression [4] and In contrast, the trace norm has a scalable equivalent multi-task learning [5]. To efficiently solve these prob- formulation, the bilinear spectral regularization [19, 7], lems, a principled way is to relax the rank function which has been successfully applied in many large-scale by its convex envelope [6, 7], i.e., the trace norm (also applications, such as collaborative filtering [20, 21]. known as the nuclear norm), which also leads to a Since the Schatten-p quasi-norm is equivalent to the ℓp quasi-norm on the singular values, it is natural to Appearing in Proceedings of the 19th International Con- ask the following question: can we design an equiv- ference on Artificial Intelligence and Statistics (AISTATS) alent matrix factorization form to some cases of the 2016, Cadiz, Spain. JMLR: W&CP volume 51. Copyright 2016 by the authors. Schatten-p quasi-norm, e.g., p=1/2or 1/3? Tractable and Scalable Schatten Quasi-Norm Approximations for Rank Minimization In this paper we first define two tractable Schatten 3 Tractable Schatten Quasi-Norm norms, the bi-trace (Bi-tr) and tri-trace (Tri-tr) norms. Minimization We then prove that they are in essence the Schatten- 1/2 and 1/3 quasi-norms, respectively, for solving [19] and [7] pointed out that the trace norm has the whose minimization we only need to perform SVDs following equivalent non-convex formulations. on much smaller factor matrices to replace the large Lemma 1. Given a matrix X Rm n with matrices in the algorithms mentioned above. Then we × rank(X)= r d, the following holds: ∈ design two efficient linearized alternating minimization ≤ X tr = min U F V F algorithms with guaranteed convergence to solve our k k U Rm×d,V Rn×d:X=UV T k k k k problems. Finally, we provide the sufficient condition ∈ ∈ U 2 + V 2 for exact recovery, and the restricted strong convexity = min k kF k kF . U,V :X=UV T 2 (RSC) based and MC error bounds. 3.1 Bi-Trace Quasi-Norm 2 Notations and Background Motivated by the equivalence relation between the The Schatten-p norm (0 <p< ) of a matrix X trace norm and its bilinear spectral regularization form m n ∞ ∈ R × (m n) is defined as stated in Lemma 1, our bi-trace (Bi-tr) norm is natu- ≥ 1/p n p rally defined as follows [18]. X Sp = σi (X) , k k i=1 m n Definition 1. For any matrix X R × with X ∈ where σi(X) denotes the i-th singular value of X. For rank(X) = r d, we can factorize it into two much ≤ m d n d p 1 it defines a natural norm, for instance, the smaller matrices U R × and V R × such that ≥ ∈ ∈ Schatten-1 norm is the so-called trace norm, X tr, X = UV T . Then the bi-trace norm of X is defined as whereas for p< 1 it defines a quasi-norm. As thek non-k convex surrogate for the rank function, the Schatten-p X Bi-tr := min U tr V tr. k k U,V :X=UV T k k k k quasi-norm with 0 <p< 1 is the better approximation of the matrix rank than the trace norm [17] (analo- In fact, the bi-trace norm defined above is not a real gous to the superiority of the ℓp quasi-norm to the norm, because it is non-convex and does not satisfy ℓ1-norm [14, 22]). the triangle inequality of a norm. Similar to the well- known Schatten-p quasi-norm (0 <p< 1), the bi-trace We mainly consider the following Schatten quasi-norm norm is also a quasi-norm, and their relationship is minimization problem to recover a low-rank matrix stated in the following theorem [18]. from a small set of linear observations, b Rl, ∈ Theorem 1. The bi-trace norm Bi-tr is a quasi- min X p : (X)= b , (1) k·k × Sp norm. Surprisingly, it is also the Schatten-1/2 quasi- X Rm n k k A ∈ m n n l o norm, i.e., where : R × R is a linear measurement oper- X Bi-tr = X S1 2 , ator. Alternatively,A → the Lagrangian version of (1) is k k k k / where X S1/2 is the Schatten-1/2 quasi-norm of X. p 1 k k min X S + f( (X) b) , (2) X Rm×n k k p µ A − The proof of Theorem 1 can be found in the Supple- ∈ where µ > 0 is a regularization parameter, and the mentary Materials. Due to such a relationship, it is loss function f( ): Rl R generally denotes certain easy to verify that the bi-trace quasi-norm possesses · → measurement for characterizing the loss term (X) b the following properties. A − (for instance, is the linear projection operator Ω, Rm n A P Property 1. For any matrix X × with and f( )= 2 in MC problems [15, 13, 23, 10]). ∈ · k·k2 rank(X)= r d, the following holds: ≤ 2 2 The Schatten-p quasi-norm minimization problems (1) U tr + V tr X Bi-tr = min U tr V tr = min k k k k and (2) are non-convex, non-smooth and even non- k k U,V :X=UVTk k k k U,V :X=UVT 2 Lipschitz [24]. Therefore, it is crucial to develop effi- U + V 2 cient algorithms that are specialized to solve some al- = min k ktr k ktr . U,V :X=UV T 2 ternative formulations of Schatten-p quasi-norm minimization (1) or (2). So far, only few algorithms, such Property 2. The bi-trace quasi-norm satisfies the fol- as IRLS [14, 13] and IRNN [10], have been developed lowing properties: to solve such problems. In addition, since all existing Schatten-p quasi-norm minimization algorithms in- 1. X Bi-tr 0, with equality iff X =0. k k ≥ volve SVD or EVD in each iteration, they suffer from 2. X Bi-tr is unitarily invariant, i.e., X Bi-tr = 2 k k T m m k k n n a high computational cost of O(n m), which severely PXQ Bi-tr, where P R × and Q R × limits their applicability to large-scale problems. havek orthonormalk columns.∈ ∈ Fanhua Shang, Yuanyuan Liu, James Cheng m n 3.2 Tri-Trace Quasi-Norm Suppose X R × is of rank r, and denote its skinny SVD by X ∈= UΣV T .

Arxiv:1803.00420V1

Functional Analysis 1 Winter Semester 2013-14

HYPERCYCLIC SUBSPACES in FRÉCHET SPACES 1. Introduction

L P and Sobolev Spaces

Fact Sheet Functional Analysis

Basic Differentiable Calculus Review

Normed Linear Spaces of Continuous Functions

Bounded Operator - Wikipedia, the Free Encyclopedia

Chapter 6 Linear Transformations and Operators

An Introduction to Some Aspects of Functional Analysis

Normed Vector Spaces

SCALAR PRODUCTS, NORMS and METRIC SPACES 1. Definitions Below, “Real Vector Space” Means a Vector Space V Whose Field Of

Convexity “Warm-Up” II: the Minkowski Functional