1

Robust Low-rank Matrix Completion via an Alternating Manifold Proximal Gradient Continuation Method Minhui Huang, Shiqian Ma, and Lifeng Lai

Abstract—Robust low-rank matrix completion (RMC), or ro- nonconvex. To resolve this issue, there is a line of research bust principal component analysis with partially observed data, focusing on convex relaxation of RMC formulations [5], [6]. has been studied extensively for computer vision, signal process- For example, a widely used technique in these works is ing and machine learning applications. This problem aims to decompose a partially observed matrix into the superposition of employing its tightest convex proxy, i.e. the nuclear norm. a low-rank matrix and a sparse matrix, where the sparse matrix However, calculating the nuclear norm in each iteration can captures the grossly corrupted entries of the matrix. A widely bring numerical difficulty for large-scale problems in practice, used approach to tackle RMC is to consider a convex formulation, because algorithms for solving them need to compute a singular which minimizes the nuclear norm of the low-rank matrix (to value decomposition (SVD) in each iteration, which can be promote low-rankness) and the `1 norm of the sparse matrix (to promote sparsity). In this paper, motivated by some recent works very time consuming. Recently, nonconvex RMC formulations on low-rank matrix completion and Riemannian optimization, we based on low-rank matrix factorization were proposed in the formulate this problem as a nonsmooth Riemannian optimization literature [7], [8], [9], [10]. In these formulations, the target problem over Grassmann manifold. This new formulation is matrix is factorized as the product of two much smaller matrices scalable because the low-rank matrix is factorized to the mul- so that the low-rank property is automatically satisfied. The tiplication of two much smaller matrices. We then propose an alternating manifold proximal gradient continuation (AManPGC) idea of factorization greatly reduces the price of promoting method to solve the proposed new formulation. Convergence rate low-rankness, thus relieves the pressure of computation. On of the proposed algorithm is rigorously analyzed. Numerical the other hand, researchers have also proposed the RMC results on both synthetic data and real data on background problem over a fixed rank manifold [5] and solve it by manifold extraction from surveillance videos are reported to demonstrate optimization. The retraction operation of a fixed rank manifold the advantages of the proposed new formulation and algorithm over several popular existing approaches. only requires performing a truncated SVD, thus is much more computationally efficient. Index Terms—Robust Matrix Completion, Nonsmooth Opti- In many practical scenarios, the collected matrix datasets mization, Manifold Optimization always come with noise. Without an effective approach to deal with these noise, recovering the ground truth of a low-rank I.INTRODUCTION matrix can be prevented from a reasonable solution. Fortunately, the noises in many real datasets have some common structures OBUST matrix completion (RMC) targets at fulfilling such as the sparsity. Therefore, it is reasonable to assume R the missing entries of a partially observed matrix with the noisy entries are sparse in matrix completion problems. the presence of sparse noisy entries. The recovery relies on a To deal with the sparse noisy entries in matrix completion, critical low-rank assumption of real datasets: a low dimensional researchers have employed the `1-norm instead of the `0-norm subspace can capture most information of high-dimensional for decomposing sparse noisy entries in matrix completion arXiv:2008.07740v1 [cs.LG] 18 Aug 2020 observations. Therefore, the RMC model frequently arises in a formulations. This leads to the robust matrix completion wide range of applications including recommendation systems model, which is robust to sparse outliers. In practice, the `1- [1], face recognition [2], collaborative filtering [3], and MRI norm minimization can be efficiently solved by iterative soft image processing [4]. thresholding in many cases. Some other robust loss functions Due to its low-rank property, matrix completion can be such as the Huber loss were also proposed in the literature naturally formulated as a rank constrained optimization problem. [11]. However, it is computationally intractable to directly minimize In this paper, by utilizing the recent developments on the rank function since the low-rank constraint is discrete and manifold optimization, we propose a nonsmooth RMC formu- M. Huang and L. Lai are with the Department of Electrical and Computer lation over the Grassmann manifold with a properly designed Engineering, University of California, Davis, CA, 95616. Email: {mhhuang, regularizer. The Grassmann manifold automatically restricts lflai}@ucdavis.edu. our factorizer on a fixed dimension subspace, thus promotes the S. Ma is with the Department of Mathematics, University of California, Davis, CA, 95616. Email: [email protected]. fixed low rank property. In each iteration, we only require a QR This research was partially supported by NSF HDR TRIPODS grant CCF- decomposition as our retraction for the Grassmann manifold, 1934568, NSF grants CCF-1717943, CNS-1824553, CCF-1908258, DMS- which is more computationally efficient compared with the 1953210 and CCF-2007797, and UC Davis CeDAR (Center for Data Science and Artificial Intelligence Research) Innovative Data Science Seed Funding truncated SVD. Notice that in [12], the author designed a Program. regularizer to balance the scale of two factorizers. In our 2 formulation, the QR decomposition automatically balances where Ω denotes the set of the indices of the observed entries m×n m×n the scale of two factorizers, which prevents our formulation and PΩ : R 7→ R denotes a projection defined from being ill-conditioned. Moreover, compared with previous as: [PΩ(Z)]ij = Zij, if (i, j) ∈ Ω, and [PΩ(Z)]ij = 0, work on matrix completion over Grassmann manifold [13], otherwise. The convex formulations (1) and (2) have been [14], [15], [16], our formulation is continuous with a smaller studied extensively in the literature, and we refer to the recent searching space. We then propose an alternating manifold survey paper [6] for algorithms for solving them. proximal gradient (AManPG) algorithm for solving it. The By assuming that the rank of L is known (denoted by r), the AManPG algorithm alternately updates between the low-rank idea of many nonconvex formulations is based on the fact that factorizer with a Riemannian gradient step and the nonsmooth L can be factorized to L = UV , where U ∈ Rm×r, V ∈ Rr×n. sparse variable with a proximal gradient step. While recent Replacing L by UV in (1) and (2) leads to various nonconvex ADMM based methods [10], [17], [18] lack any convergence formulations for robust PCA and RMC. In particular, Li et al. guarantee, we can rigorously analyze the convergence of [12] suggested using subgradient method to solve the following AManPG algorithm and prove a complexity bound of O(−2) to nonsmooth robust matrix recovery model reach an -stationary point. Furthermore, compared with recent min 1 ky − A(UV )k , smoothing technique developed in [5], we solve the nonsmooth m×r r×n |Ω| 1 (3) U∈R ,V ∈R subproblem directly. To further boost the convergence speed and recover the low-rank matrix with a high accuracy, we apply where y is a small number of linear measurements and a continuation framework [19] to AManPG. Finally, we conduct A : Rm×n → R|Ω| is a known linear operator. Shen et extensive numerical experiments and show that the proposed al. [10] proposed the LMaFit algorithm that implements an AManPG with continuation is more efficient compared with alternating direction method of multipliers (ADMM) for solving previous works on the RMC problem. the following nonconvex formulation of robust PCA: The remainder of the paper is organized as follows. In min kPΩ(Z − M)k , Section II, we briefly review some existing related work on m×r r×n m×n 1 U∈R ,V ∈R ,Z∈R (4) the RMC problem and give our new RMC formulation over s.t.UV − Z = 0. Grassmann manifold. In Section III, we review some basics of manifold optimization and propose ManPG algorithm for our Note that if (U,ˆ Vˆ ) solves (4), then (UQ,ˆ Q−1Vˆ ) also solves RMC formulation. In Section IV and Section V, we propose (4) for any invertible Q ∈ Rr×r. Since all matrices UQˆ share our AManPG with continuation algorithm and provide its the same column space, Dai et al. [13], [14] exploited this fact convergence analysis. In Section VI, numerical results are and formulated the matrix completion problem as the following presented to demonstrate the the advantages of the proposed optimization problem over a Grassmann manifold: new formulation and algorithm. Finally, the conclusions are 2 min kPΩ(UV − M)k , given in Section VII. r×n F (5) U∈Gr(m,r),V ∈R

II.PROBLEM FORMULATION where Gr(m, r) denotes the Grassmann manifold. However, it is noticed that the outer problem for U might be discontinuous In this section, to motivate our proposed problem formulation, at points U for which the V problem does not have a unique we will first introduce some closely related work. We then solution. To address this issue, Keshavan et al. [15], [16] present our problem formulation. proposed to optimize both the column space and row space at the same time, which results in the following so-called A. Related Work OptSpace formulation for matrix completion:

Robust principal component analysis (PCA) [20], [21] is an > 2 min min kPΩ(UΣV − M)k important tool in data analysis and has found many interesting r×r F U∈Gr(m,r),V ∈Gr(n,r) Σ∈R (6) > 2 applications in computer vision, signal processing, machine +λkUΣV kF . learning, and statistics, and so on. The goal is to decompose a given matrix M ∈ m×n into the superposition of a low-rank Here λ > 0 is a weighting parameter, and the regularizer R > 2 matrix L and a sparse matrix S, i.e., M = L+S. The works by kUΣV kF is used so that the outer problem is continuous. Candes` et al. [20] and Chandrasekaran et al. [21] formulate Boumal and Absil [22], [23] proposed to study the following the problem as the following problem: variant of (5):

1 2 λ2 2 min kLk∗ + γkSk1, s.t.L + S = M, (1) min kPΩ(UV − M)k + kP¯ (UV )k , L,S r×n 2 F 2 Ω F U∈Gr(m,r),V ∈R (7) where γ > 0 is a weighting parameter, the nuclear norm kLk ∗ where Ω¯ is the complement of Ω, and they proposed to sums the singular values of L, and the ` norm kSk sums 1 1 use Riemannian trust region method to solve this problem. the absolute values of all entries of S. When only a subset Comparing with OptSpace (6), formulation (7) has a much of the entries of M is observed, robust PCA becomes the smaller searching space. Note that in (7), λ is usually chosen robust low-rank matrix completion problem [5]. Similar to (1), to be very close to zero, as it indicates that we have a small a convex formulation for RMC can be cast as follows: confidence that the entries (UV )ij for (i, j) ∈/ Ω are equal to min kLk∗ + γkSk1, s.t. PΩ(L + S) = PΩ(M), L,S (2) zero. 3

For RMC, Cambier and Absil [5] proposed the following III.THE MANPGALGORITHMFOR ROBUST MATRIX Riemannian optimization formulation: COMPLETION

2 In this section, we show that the ManPG algorithm recently min kPΩ(X − M)k1 + λkPΩ¯ (X)kF , (8) X∈Mr proposed by Chen et al. [27] can be naturally adopted to solve (10). Note that the ManPG algorithm was originally where Mr denotes the fixed-rank manifold, i.e., Mr := {X | proposed for solving problems over the Stiefel manifold, but rank(X) = r}. The algorithm proposed in [5] needs to smooth the manifold in (10) is Grassmann manifold. Therefore, we the `1 norm first to change the problem to a smooth problem, need to further elaborate on the details how ManPG works and then applies the Riemannian conjugate gradient method on Grassmann manifold. To this end, we first introduce some to solve the smoothed problem. As a result, the algorithm in backgrounds on the geometry of Grassmann manifold. [5] does not solve (8) exactly. Related to (5) and (4), He et al. proposed the GRASTA algorithm [17], [18] which can be used to solve the following formulation of RMC: A. Geometry of the Grassmann Manifold In this subsection we briefly introduce concepts and proper- min kPΩ(UV − M)k1. r×n (9) ties of Grassmann manifold. Much of the materials here are U∈Gr(m,r),V ∈R from [22], and we include them here for the ease of discussion GRASTA uses alternating minimization and ADMM to solve later. Grassmann manifold Gr(m, r) is the set of r-dimensional (9), which is efficient in practice but lacks convergence linear subspaces of Rm endowed with quotient manifold guarantees. Moreover, Zhao et al. [11] explores the effect structure, whose dimension is dim(Gr(m, r)) = r(m − r) of different robust loss functions for proposing the robustness [28]. Each point of Gr(m, r) is a linear subspace spanned by against specific categories of outliers. He et al. [24] derived a the column space of a full-rank matrix U:

correntropy-based cost function and applied the half-quadratic m×r technique to solve the formulation. Zeng et al. [25] proposed Gr(m, r) = {span(U): U ∈ R∗ }, (11) two schemes, namely the iterative `p-regression algorithm and m×r where R∗ denotes the set of all m × r matrices with full ADMM for RMC under `p minimization. Zhang et al. [26] column rank, and span(U) denotes the subspace spanned by proposed the RMC problem over Hankel matrix and claimed the columns of U. Since multiplying by an r × r orthonormal that they can deal with case when all the observations in one matrix does not change the column space of U, we can regard column are erroneous. m×r Gr(m, r) as a quotient of R∗ by the equivalent relation U 0 = UQ, where Q is any r×r orthonormal matrices. Endowed with the Riemannian metric hU, V i = Tr(UV ), the Grassmann B. Our formulation and contributions manifold is also a Riemannian quotient manifold, and it admits Motivated by these existing works, in this paper, we propose a tangent space at each point of Gr(m, r) given by to solve the following formulation of RMC: m×r > TU Gr(m, r) = {H ∈ R : U H = 0}. (12) min F (U, V, S) = r×n m×n For Riemannian manifold M, the Riemannian gradient of a U∈Gr(m,r),V ∈R ,S∈R 1 2 λ2 2 smooth function f : M → R is defined as follows. 2 kPΩ(UV − M + S)kF + 2 kPΩ¯ (UV )kF + γ kPΩ(S)k1 . (10) Definition III.1. (Riemannian Gradient) Given a smooth We show that the manifold proximal gradient method (ManPG) function f : M → R, the Riemannian gradient of f at X ∈ M, proposed by Chen et al. [27] can be applied to solve (10) and denoted by gradf(X), is the unique tangent vector in TX M the corresponding convergence analysis applies naturally. We such that then propose a variant of ManPG, named alternating ManPG (AManPG) that can significantly improve the efficiency of hgradf(X), ξi = Df(X)[ξ], ∀ξ ∈ TX M, (13) ManPG for solving (10). We further rigorously analyze the where Df denotes the directional derivatives of f. convergence rate of AManPG. Compared with GRASTA, our proposed algorithms for solving (10) have rigorous convergence For Grassmann manifold, the orthogonal projector from m×r guarantees and convergence rate analysis. Compared with RMC R onto the tangent space TU Gr(m, r) is given by: (8), our algorithms solve the nonsmooth problem (10) directly. Proj : m×r → T Gr(m, r), Compared with the convex formulation (2), our nonconvex U R U (14) Proj (H) = (I − UU >)H. formulation appears to be more robust and scalable, see the U numerical experiments for comparison results. Finally, to further The definition of retraction operation for manifold M is given accelerate the convergence of AManPG, we incorporate the below. so-called continuation technique on the weighting parameter γ in (10). Our numerical results on both synthetic data Definition III.2. (Retraction) Let RetrX (ξ):TM → M be and real data on background extraction from surveillance a mapping from the tangent bundle TM to the manifold M. video demonstrate that our final algorithm, AManPG with We call RetrX (·) a retraction at X if Continuation (AManPGC), compares favorably with existing Retr (0) = X, d Retr (tξ)| = ξ, X dt X t=0 (15) methods for RMC. ∀X ∈ M, ∀ξ ∈ TX M. 4

In our numerical experiments, we choose the QR decompo- Algorithm 1 ManPG for solving RMC (22) sition as the retraction for Grassmann manifold: 1: Input: step sizes tS, tU ,α, β, parameters λ, γ, accuracy tolerance , and initial point (U 0,S0). Retr (H) = qf(U + H), (16) U 2: for k = 0, 1,... do k where qf(X) denotes the Q-factor of the QR decomposition 3: Compute VU k,Sk using (27) of X. 4: Compute ∆Sk by (25) 5: Update Sk+1 by (23c) k B. The ManPG Algorithm 6: Compute ∆U by (24) 7: Update U k+1 by (23d) Recently, Chen et al. [27] proposed a novel ManPG 2 2 8: if ∆U k+1 + ∆Sk+1 ≤ 2 then algorithm for solving nonsmooth optimization problem over F F 9: break the Stiefel manifold St(n, r) in the following form: 10: end if min F1(X) + F2(X), (17) 11: end for X∈St(n,r) 12: Output: U k+1, Sk+1 in which F1 is smooth with Lipschitz continuous gradient, F2 is nonsmooth and convex. Here the smoothness, convexity and Lipschitz continuity are interpreted when the functions are where t , t , α and β are all step sizes. We now make some considered in the ambient Euclidean space. A typical iteration S U necessary remarks on this ManPG algorithm (23). First, (23) is of ManPG algorithm for solving (17) is as follows: actually slightly different with a direct application of ManPG k k 1 2 Y := argminY h∇F1(X ),Y i + 2t kY kF for solving (22). For a direct application of ManPG, we should k +F2(X + Y ), s.t., Y ∈ TXk St(n, r) have tS = tU and α = β. Here in (23) we allow these step k+1 k X := RetrXk (αkY ), sizes to be different so that we have more freedom to choose (18) the best step sizes in practice. Also we note that (23b) and where t > 0 and αk > 0 are step sizes. Chen et al. [27] proved (23d) correspond to a Riemannian gradient step with respect that the iteration complexity of ManPG (18) is O(1/2) for to the U variable. The updates (23a) and (23c) correspond to obtaining an -stationary solution. They also demonstrated that a proximal gradient step for the S variable in the Euclidean ManPG is very efficient for solving sparse PCA and compressed space. This can be interpreted in the following way. First, in modes problems. RMC (22), the U variable does not appear in the nonsmooth Now we discuss how to apply ManPG [27] to solve (10). part of the objective, so it is natural to perform a Riemannian For ease of presentation, we denote the smooth part of F in gradient step for U. Second, for fixed U, the S problem is (10) by f¯, i.e., only an unconstrained problem in the Euclidean space, so it ¯ 1 2 λ2 2 is reasonable to take a proximal gradient step. Moreover, the f(U, V, S) = 2 kPΩ(UV − M + S)kF + 2 kPΩ¯ (UV )kF . (19) two subproblems (23b) and (23a) are very easy to solve. Note that for fixed U and S, the optimal V of F (U, V, S) (and Specifically, (23b) can be reduced to f¯(U, V, S)) is uniquely determined. Therefore, by denoting k k k ¯ ∆U = −tU gradU f(U ,S ), (24) VU,S := argminV f(U, V, S), (20) and i.e., it is the negative Riemannian gradient of f multiplied by ¯ f(U, S) = f(U, VU,S,S), (21) the step size tU . The ∆S subproblem (23a) can be solved by a simple `1 norm shrinkage operation (note that we are only we know that our RMC formulation (10) reduces to k k interested in the PΩ(∆S ) and PΩ¯ (∆S ) can be simply set min f(U, S) + γ kPΩ(S)k . (22) to 0): m×n 1 U∈Gr(m,r),S∈R k k k k It is easy to see that ManPG for solving (22) reduces to the PΩ(∆S ) = ProxγtS k·k1 (PΩ(S − tS∇Sf(U ,S ))) k following two subproblems in the k-th iteration: −PΩ(S ). (25) k k k 1 2 ∆S := argmin h∇Sf(U ,S ), ∆Si + k∆Sk Now, to implement the ManPG (23), the only remaining com- 2t F ∆S S ponent is to calculate the Riemannian gradient grad f(U, S) + γ P (Sk + ∆S) U Ω 1 used in (24). The procedure for computing it is outlined in [23]. (23a) By assuming that the subspace of the Grassmann manifold is k k k 1 2 represented by orthonormal bases, which means U is restricted ∆U := argmin h∇U f(U ,S ), ∆Ui + k∆UkF , ∆U 2tU to the Stiefel manifold, the Riemannian gradient of the smooth

s.t., ∆U ∈ TU k Gr(m, r) function f(U, S) with respect to U is given by: (23b) 2 Sk+1 := Sk + α∆Sk, (23c) gradU f(U, S) = ((1 − λ )C (UVU,S − M + S)− 2 > 2 > k+1 k k λ (M − S))VU,S + λ U(VU,SVU,S), U := RetrU k (U + β∆U ), (23d) (26) 5 where C ∈ Rm×n is the mask operator whose components are Algorithm 3 AManPG with Continuation (AManPGC) for given by: Cij = 1 if (i, j) ∈ Ω, and Cij = 0 otherwise. Here solving (22) VU,S is defined in (20), and it can be computed as follows: 1: Input: Step sizes tS, tU , parameters γ0  γmin, shrinking factors µ < 1, µ < 1, initial accuracy tolerance 0 −1 > 1 2 vec(VU,S) = A vec(U [C (M − S)]), (27) 2: Initialize: U 0, S0. Set ` = 0 3: γ > γ where vec denotes the vectorization operator, and A is defined while ` min do 4: γ = γ as Call AManPG to solve (22) with `, and set the output of AManPG as (U `+1,S`+1). > 2 2 A = (In ⊗ U )diag(vec((1 − λ )C))(In ⊗ U) + λ Irn. 5: γ`+1 = µ1γ` 6: `+1 = µ2` and ⊗ denotes the Kronecker product. For more details about 7: ` = ` + 1 these calculation, we refer the reader to [23]. 8: end while With these preparations, we can finally summarize the 9: Output: U `, S` ManPG algorithm (23) for solving (10) (or, (22)) as in Algorithm 1.

IV. ALTERNATING MANPG WITH CONTINUATION The continuation technique. There are two parameters in the model (10): λ and γ. Since λ indicates our confidence It should be noted that ManPG updates S and U in parallel. level of the entries of (UV ) being zero, it needs to very That is, ManPG (23) is a Jacobi type iterative algorithm. One small. In practice, it is easy to choose λ, and in our numerical way that can possibly improve the speed of ManPG is to experiments, we always choose λ = 10−8. The parameter use a Gauss-Seidel type algorithm. This idea has also been γ in (10) controls the sparsity level of P (S). A larger γ adopted in [29], where the authors showed that the Gauss- Ω yields sparser P (S). However, in practice, we usually have Seidel type ManPG performs much better than the original Ω no clue how sparse the matrix S should be. Thus, it is not Jacobi type ManPG. Motivated by this, here we also propose easy to choose γ. A usual practice in the literature to deal an alternating ManPG (AManPG), which updates S and U with this issue is to conduct a continuation technique on γ. sequentially, instead of in parallel. However, one crucial thing Roughly speaking, the continuation starts with solving (10) to note here is that the V variable will need to be re-calculated, with a relatively large γ. Then the parameter γ is decreased and when we have a new S variable before we update the U (10) is solved again. This process is repeated until γ is very variable. Our AManPG algorithm is summarized in Algorithm small. This idea has been widely adopted in the literature, e.g., 2. [19], [30], [31], [32]. Combining this continuation idea with Algorithm 2 AManPG for solving (22) our AManPG algorithm, we obtain the AManPGC algorithm which works greatly in practice as confirmed by our numerical 1: Input: step sizes tS, tU ,α, β, parameters λ, γ, threshold , U 0,S0. results in Section VI. AManPGC is summarized in Algorithm 3. Note that in Algorithm 3, we also shrink the accuracy tolerance 2: for k = 0, 1,... do  in each iteration, as we want to solve the problem more and 3: Compute V k using (27) U k,Sk more accurately. 4: Compute ∆Sk by (25) 5: Update Sk+1 by (23c) k 6: Compute VU k,Sk+1 using (27) k k k+1 7: Compute ∆U = −tU gradU f(U ,S ) 8: Update U k+1 by (23d) 2 2 V. CONVERGENCE ANALYSIS FOR AMANPG 9: k+1 k+1 2 if ∆U F + ∆S F ≤  then 10: break 11: end if In this section, we analyze the convergence behavior and 12: end for iteration complexity of AManPG (Algorithm 2). To simplify the 13: Output: U k+1, Sk+1 notation, we denote M = Gr(m, r) and h(S) = γkPΩ(S)k1, and we analyze the convergence of AManPG for solving the following problem: Remark IV.1. When we compute ∆U k in AManPG, we used the latest Sk+1, which requires us to compute the latest V k . While in ManPG, we used Sk in the updates of U k,Sk+1 min F (U, S) = f(U, S) + h(S), s.t., U ∈ M, (28) ∆U k, and this does not require us to compute another V k. This is the main difference between AManPG (Algorithm 2) and ManPG (Algorithm 1). In both algorithms, we always where f(U, S) is smooth and h(S) is nonsmooth and convex. set α = β = 1. Noticing that the matrix A only depends on Here the smoothness and convexity are interpreted when the variable U, there is no need to recalculate A when computing functions are considered in the ambient Euclidean space. For V k . U k,Sk+1 simplicity, we rewrite the AManPG for solving (28) here. One 6

typical iteration of AManPG for solving (28) is: Definition V.5. (-stationary point). We say that (U k,Sk) ∈ M × m×n is an -stationary point of (28), if (∆Sk, ∆U k) k k k 1 2 R ∆S := argmin h∇Sf(U ,S ), ∆Si + k∆SkF given by (29a) and (29c) satisfies ∆S 2tS k k 2 k 2 2 2 + h(S + ∆S), k∆S kF + k∆U kF ≤  /L , (32) (29a) where L := min{LS,LU }. Sk+1 := Sk + α∆Sk, (29b) Now we are ready to analyze the iteration complexity of k k k+1 1 2 ∆U := argmin h∇U f(U ,S ), ∆Ui + k∆Uk , AManPG for obtaining an -stationary point of (28). First, 2t F ∆U U we prove two lemmas, which show that there is a sufficient s.t. ∆U ∈ TU k M reduction of the objective value after each update of S and U. (29c) k+1 k k Lemma V.6. Assume Assumption V.2 holds, and the sequence U := Retr k (U + β∆U ). (29d) U (Sk,U k, ∆Sk, ∆U k) is generated by AManPG. By choosing We make the following assumptions of (28) throughout this tS = 1/LS and α = 1, the following inequality holds section. k k+1 k k LS k 2 F (U ,S ) − F (U ,S ) ≤ − ∆S . (33) Assumption V.1. (F is lower bounded) There exists a finite 2 F constant F ∗, such that Proof. Please see Appendix A for details. F (X) ≥ F ∗, ∀X ∈ M. Lemma V.7. Assume Assumption V.3 holds, and the sequence (Sk,U k, ∆Sk, ∆U k) is generated by AManPG. By choosing Note that for (22), it is easy to see that F ∗ = 0. tU = 1/LU and β = 1, the following inequality holds Assumption V.2. (Lipschitz Continuity of ∇Sf(U, S)) The k+1 k+1 k k+1 LU k 2 gradient ∇Sf(U, S) is Lipschitz continuous with Lipschitz F (U ,S ) − F (U ,S ) ≤ − ∆U . (34) 2 F constant LS. That is Proof. Please see Appendix B for details. k∇Sf(U, S1) − ∇Sf(U, S2)kF ≤ LS kS1 − S2kF , m×n ∀S1,S2 ∈ R ,U ∈ M. Now we are ready to present our main convergence result of AManPG. The following assumption is about gradU f(U, S), which regards the regularity of the pullback function fˆ(∆U, S) = Theorem V.8. Assume Assumptions V.1, (V.2) and (V.3) hold. f(RetrU (∆U),S), and differs from the standard Lipschitz By choosing tU = 1/LU , tS = 1/LS, α = β = 1 in AManPG continuity assumption because of the retraction operator. This (29), every limit point of the sequence {U k,Sk} generated by assumption was originally suggested in [33]. AManPG (29) is a stationary point of problem (28). Moreover, AManPG (29) returns an -stationary point of problem (28) Assumption V.3. (Restricted Lipschitz-type gradient for pull- in at most d2L(F (U 0,S0) − F ∗)/2e iterations, where L := backs) There exists LU ≥ 0 such that, for sequence k k min(LS,LU ). (U ,S )k≥0 generated by AManPG (Algorithm 2), the pullback ˆ k+1 function fk(∆U) = f(RetrU k (∆U),S ) satisfies Proof. Please see Appendix C for details.

ˆ k k+1 k k+1 fk(∆U) − [f(U ,S ) + h∆U, gradU f(U ,S )i] VI.NUMERICAL EXPERIMENTS LU 2 ≤ 2 k∆UkF , ∀∆U ∈ TU k M. In this section, we provide numerical results for both From the Theorem 4.1 in [34], we can define the stationary synthetic and real datasets to verify the performance of the point of problem (28) as follows. proposed algorithms. We focus on comparing our ManPG and AManPG algorithms with some baseline algorithms using the Definition V.4. (Stationary point). A pair of (U, S) ∈ M × robustness of ` -norm, in particular, the subgradient method m×n 1 R is called a stationary point of problem (28) if it satisfies (SubGM) [12], the LMaFit [10] and the Riemannian conjugate the first-order necessary conditions: gradient method for the smoothed `1-norm objective function (RMC) [5]. We use the same continuation framework for 0 = gradU f(U, S), 0 ∈ ∇Sf(U, S) + ∂h(S). (30) ManPG and call it ManPGC. For the SubGM method, we According to Theorem 4.1 in [34], the optimality conditions assume that the linear operator A is a simple projection. For of the subproblems (29c) and (29a) are the LMaFit algorithm, we turn off the rank estimation since we 0 = grad f(U k,Sk+1) + 1 ∆U k, assume that the rank is known for all cases. We use the original U tU 1 k k k k k (31) setting for the RMC algorithm. All algorithms use same C- 0 ∈ ∆S + ∇Sf(U ,S ) + ∂h(S + ∆S ). tS Mex code for accelerating the matrix multiplication between If ∆U k = 0 and ∆Sk = 0, then we know that Sk+1 = Sk, a sparse matrix and a full matrix and some other bottleneck and (U k,Sk) satisfies (30) and thus is a stationary point of computations. Moreover, to guarantee a fair comparison, each (28). Therefore, we can use the norm of (U k,Sk) to measure algorithm is carefully tuned to achieve its best performance. the closeness to stationary point, and we define the -stationary All experiments were run on Matlab R2018b with a 2.3 GHz point of (28) as follows. Dual-Core Intel Core i5 CPU. 7

(a) Case 1 (b) Case 2 (c) Case 3 (d) Case 4

(e) Case 1 (f) Case 2 (g) Case 3 (h) Case 4

(i) Case 5 (j) Case 6 (k) Case 7 (l) Case 8

(m) Case 5 (n) Case 6 (o) Case 7 (p) Case 8

Fig. 1. Relative difference for synthetic data on different cases. (a), (b), (c), (d), (i), (j), (k), (l) present the CPU time comparison; (e), (f), (g), (h), (m), (n), (o), (p) present the running iteration comparison.

A. Synthetic Data our initial point, but in practice, we use the singular value decomposition of the observed matrix M as our initial point We first test our ManPGC and AManPGC algorithms on for all algorithms. different cases of synthetic data. After picking the values of We then test our algorithms in the following settings: ∗ m×r ∗ r×n m, n, r, we generate the ground truth U ∈ R , V ∈ R Case 1: We pick m = n = 5000, r = 5, sampling ratio of with i.i.d. normal entries of zero mean and unit variance. The around 10% and sparsity of 10%. We tune tU = 2/|Ω|, 0 = 30 ∗ ∗ ∗ target matrix is X = U V . We then choose a sampling ratio for both AManPGC and ManPGC. and sample entries uniformly at random to get the observed Case 2: We pick m = 1000, n = 30000, r = 5. sampling ∗ matrix M. Finally, we add a sparse matrix S , whose nonzero ratio of around 10% and sparsity of 10%. We tune entries are generated by a normal distribution with zero mean tU = 2/|Ω|, 0 = 30 for both AManPGC and ManPGC. and unit variance with a sparsity rate, to the observed matrix Case 3: We pick m = n = 10000, r = 5. sampling ratio of M. around 10% and sparsity of 10%. We tune tU = 2/|Ω|, 0 = 30 Parameters: In our experiment, we observe that by setting for both AManPGC and ManPGC. tS = 1, we get very good performance. Due to the problem Case 4: We pick m = n = 2000, r = 10, sampling ratio of formulation (10), tS = 1 gives us a direct proximal mapping around 20% and sparsity of 10%. We tune tU = 2/|Ω|, 0 = 30 for the S subproblem when we fix VU k,Sk . We tune tU between for AManPGC and tU = 1.6/|Ω|, 0 = 20 for ManPGC. −8 1/|Ω| and 3/|Ω| and set γ0 = 10, λ = 10 , µ1 = µ2 = 1/10 Case 5: We pick m = n = 5000, r = 10, sampling ratio of in all experiments. We use different 0 values specified in around 10% and sparsity of 10%. We tune tU = 2/|Ω|, 0 = 30 the following cases. We may use any random matrix U0 as for both AManPGC and ManPGC. 8

(a) Original Image (b) ManPGC (c) AManPGC (d) RMC (e) LMaFit (f) SubGM

(g) Original Image (h) ManPGC (i) AManPGC (j) RMC (k) LMaFit (l) SubGM

Fig. 2. Background estimation for “Hall of a business building” video data. The first row are recovered from 50% observed pixels and the second row are recovered from 10% observed pixels . (a), (g) One of the original image frame. (b), (h) Background frame estimated by ManPGC. (c), (i) Background frame estimated by AManPGC. (d), (j) Background frame estimated by RMC [5]. (e), (k) Background frame estimated by LMaFit [10]. (f), (l) Background frame estimated by SubGM [12].

TABLE I CPU TIME AND ITERATION NUMBER COMPARISON (DATASET 1).

Algorithm AManPGC ManPGC RMC LMaFit SubGM CPU time (50%) 1.53 1.67 3.55 15.90 5.78 Iteration number (50%) 9 10 14 31 50 CPU time (10%) 1.02 0.91 1.56 13.44 1.07 Iteration number (10%) 11 12 30 27 60

(a) Original Image (b) ManPGC (c) AManPGC (d) RMC (e) LMaFit (f) SubGM

(g) Original Image (h) ManPGC (i) AManPGC (j) RMC (k) LMaFit (l) SubGM

Fig. 3. Background estimation for “Campus Trees” video data. The first row are recovered from 50% observed pixels and the second row are recovered from 10% observed pixels . (a), (g) One of the original image frame. (b), (h) Background frame estimated by ManPGC. (c), (i) Background frame estimated by AManPGC. (d), (j) Background frame estimated by RMC [5]. (e), (k) Background frame estimated by LMaFit [10]. (f), (l) Background frame estimated by SubGM [12].

Case 6: We pick m = n = 5000, r = 5, sampling ratio of and report the Relative Difference versus both CPU time around 20% and sparsity of 10%. We tune tU = 2/|Ω|, 0 = 30 (second) and iteration number in Figure 1. for both AManPGC and ManPGC. From Figure 1, we can see that the proposed ManPGC and Case 7: We pick m = n = 5000, r = 5, sampling ratio of AManPGC algorithms outperform all other baseline algorithms around 10% and sparsity of 20%. We tune tU = 2/|Ω|, 0 = 30 on both CPU time and number of iterations for all cases. It for both AManPGC and ManPGC. also shows that for most cases, AManPGC performs better Case 8: We pick m = n = 10000, r = 5, sampling than ManPGC. ratio of around 20% and sparsity of 10%.We tune tU = 2/|Ω|, 0 = 100 for both AManPGC and ManPGC. B. Real Data: Video Background Estimation from Partial Observation In the k-th iteration, we calculate the relative difference for We now evaluate the performance of ManPGC and each method as AManPGC for video background estimation [35]. By stacking

U kV k − X∗ the columns of each frame into a long vector, we obtain a low- Relative Difference(k) = F , (35) ∗ rank plus sparse matrix with the almost fixed background as the kX kF 9

TABLE II CPU TIME AND ITERATION NUMBER COMPARISON (DATASET 2).

Algorithm AManPGC ManPGC RMC LMaFit SubGM CPU time (50%) 1.76 1.56 3.21 13.42 3.78 Iteration number (50%) 7 7 17 26 50 CPU time (10%) 0.82 0.67 2.31 10.93 1.76 Iteration number (10%) 9 11 26 28 50

(a) Original Image (b) ManPGC (c) AManPGC (d) RMC (e) LMaFit (f) SubGM

(g) Original Image (h) ManPGC (i) AManPGC (j) RMC (k) LMaFit (l) SubGM

Fig. 4. Background estimation for “Airport Elevator” video data. The first row are recovered from 50% observed pixels and the second row are recovered from 10% observed pixels . (a), (g) One of the original image frame. (b), (h) Background frame estimated by ManPGC. (c), (i) Background frame estimated by AManPGC. (d), (j) Background frame estimated by RMC [5]. (e), (k) Background frame estimated by LMaFit [10]. (f), (l) Background frame estimated by SubGM [12].

TABLE III CPU TIME AND ITERATION NUMBER COMPARISON (DATASET 3).

Algorithm AManPGC ManPGC RMC LMaFit SubGM CPU time (50%) 1.61 1.37 2.70 9.97 3.65 Iteration number (50%) 8 8 26 25 60 CPU time (10%) 0.74 0.95 1.72 9.12 1.08 Iteration number (10%) 15 15 28 26 60 low-rank part. In this case, we observe that even if we only have when the recovered matrix is stable. Specifically, we stop each partial observation for each video frame, we still recover the algorithm when the following inequality is satisfied: whole background with very good quality. Additionally, using

U kV k − U k−1V k−1 partial observation speeds up the background reconstruction F (36) k−1 k−1 ≤ δ, process since we need less computation in each iteration for kU V kF all algorithms. We apply all algorithms for the background estimation to three surveillance video datasets: “Hall of a where we choose δ = 0.01 for all cases. We report the business building”, “Campus Trees” and “Airport Elevator”. In background pictures, the running time and the number of our implementation, the observed pixels are randomly picked iterations in Figure 2, Figure 3, Figure 4, and Table I, Table from the stacked matrix. II, Table III. From Figure 2, Figure 3, Figure 4, and Table I, Table II, Dataset 1: “Hall of a business building” video is a sequence Table III, we conclude that when we have 50% observed pixels, of 300 grayscale frames of size 144 × 176. So the matrix all algorithms can recover the background to a very high X∗ ∈ 25344×300. R quality. Furthermore, the proposed ManPGC and AManPGC Dataset 2: “Campus Trees” video is a sequence of algorithms can recover the background using the least number 994 grayscale frames of size 128 × 160. So the matrix of iterations and run the fastest among all algorithms. We see X∗ ∈ 20480×994. R that 10% observed pixels also can give a good recovery and Dataset 3:“Airport Elevator” video is a sequence of it further reduces the running time and the iteration numbers 997 grayscale frames of size 130 × 160. So the matrix for all algorithms. In each case, our proposed ManPGC and X∗ ∈ 20800×997. R AManPGC algorithms still run the fastest. It is reported in [36] that when we have full observation of the RPCA, recovering We set r = 2 and test two cases when we have 50% and 10% the background for Dataset 1 takes at least 18 seconds. Here observed pixels for each dataset. We terminate each algorithm by only using partial observations, we finish the same task in 10

less than one second, which shows the advantages of partial APPENDIX B observation background recovery. PROOFOF LEMMA V.7 Proof. From Assumption V.3, we have k+1 k+1 k k+1 VII.CONCLUSION F (U ,S ) − F (U ,S ) = f(U k+1,Sk+1) − f(U k,Sk+1) In this paper, we have proposed a new formulation for k k+1 k k+1 RMC over Grassmann Manifold. Inspired by recent work of ≤ f(RetrU k (∆U ),S ) − f(U ,S ) ManPG, we have developed a new algorithm called AManPGC k k k+1 LU k 2 ≤ h∆U , grad f(U ,S )i + ∆U for solving this nonconvex nonsmooth manifold optimization U 2 F   problem. We have provided rigorous analysis for the conver- LU 1 k 2 ≤ − ∆U F gence of AManPG algorithm. In our numerical experiments, 2 tU we have tested our proposed formulation and algorithms for LU k 2 = − ∆U , both synthetic and real datasets. Both experiments show that 2 F our proposed methods outperform the state-of-the-art methods. where the second inequality is due to (31). This completes the proof.

PPENDIX A A APPENDIX C ROOFOF EMMA P L V.6 PROOFOF THEOREM V.8 Proof. For fixed U ∈ M and S ∈ Rm×n, define Proof. Combining (33) and (34) yields, F (U k+1,Sk+1) − F (U k,Sk) 1 2 gU,S(T ) := h∇Sf(U, S),T i + kT kF + h(S + T ). = F (U k+1,Sk+1) − F (U k,Sk+1) + F (U k,Sk+1) 2tS − F (U k,Sk) It is obvious that gU,S is (1/tS)-strongly convex, so we have (40) LS k 2 LU k 2 ≤ − ∆S − ∆U 2 F 2 F gU,S(T1) ≥ gU,S(T2) + h∂gU,S(T2),T1 − T2i+ 1 2 m×n (37) L  k 2 k 2  kT1 − T2k , ∀T1,T2 ∈ R . ≤ − ∆S + ∆U . 2tS F 2 F F k Since F is decreasing and bounded below, we have By letting T1 = 0, T2 = ∆S in (37), we have  k 2 k 2  k k k lim ∆S F + ∆U F = 0. gU k,Sk (0) ≥ g(∆S ) − h∂gU k,Sk (∆S ), ∆S i k→∞ 2 + 1 ∆Sk k k 2tS F (38) It follows that every limit point of {(U ,S )} is a stationary k 1 k 2 = g k k (∆S ) + ∆S , point of (28). Moreover, if AManPG (29) does not terminate U ,S 2tS F after K iterations, i.e., (32) is not satisfied, we have where the equality is from the optimality condition of (29a),  k 2 k 2  2 2 k ∆S + ∆U >  /L , for k = 0, 1, ...K. i.e., 0 ∈ ∂gU k,Sk (∆S ). Using Assumption V.2, we can get F F

k k+1 k k k k k Then summing (40) over k = 0,...,K − 1, we have f(U ,S ) − f(U ,S ) ≤ h∇Sf(U ,S ), ∆S i F (U 0,S0) − F ∗ ≥ F (U 0,S0) − F (U K ,SK ) LS k 2 + ∆S . (39) 2 F K−1 X L  k 2 k 2  ≥ ∆S + ∆U 2 F F (41) Therefore k=0 K2 F (U k,Sk+1) − F (U k,Sk) ≥ . 2L k k+1 k k k k k = f(U ,S ) − f(U ,S ) + h(S + ∆S ) − h(S ) Therefore, the AManPG (29) with termination criterion (32) k k k LS k 2 finds an -stationary point of problem (28) in at most ≤ h∇Sf(U ,S ), ∆S i + ∆S 2 F d2L(F (X0) − F ∗)/2e iterations. + h(Sk + ∆Sk) − h(Sk)

LS k 2 k 1 k 2 REFERENCES = ∆S F + g(∆S ) − ∆S F − g(0) 2 2tS [1] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for   recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug 2009. LS 1 k 2 ≤ − ∆S F [2] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal 2 tS component analysis: Exact recovery of corrupted low-rank matrices via convex optimization,” in Advances in neural information processing LS k 2 = − ∆S F , systems, Vancouver, Canada, Dec 2009, pp. 2080–2088. 2 [3] Y. Koren, “Factorization meets the neighborhood: a multifaceted col- laborative filtering model,” in Proceedings of the 14th ACM SIGKDD where the first inequality comes from (39), and the second international conference on Knowledge discovery and data mining, Las inequality is from (38). This completes the proof. Vegas, NV, Aug 2008, pp. 426–434. 11

[4] R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse [29] S. Chen, S. Ma, L. Xue, and H. Zou, “An alternating manifold matrix decomposition for accelerated dynamic mri with separation of proximal gradient method for sparse pca and sparse cca,” arXiv preprint background and dynamic components,” Magnetic resonance in medicine, arXiv:1903.11576, 2019. vol. 73, no. 3, pp. 1125–1136, Mar 2015. [30] S. Ma, D. Goldfarb, and L. Chen, “Fixed point and bregman iterative [5] L. Cambier and P.-A. Absil, “Robust low-rank matrix completion by methods for matrix rank minimization,” Mathematical Programming, vol. riemannian optimization,” SIAM Journal on Scientific Computing, vol. 38, 128, no. 1-2, pp. 321–353, Sep 2011. no. 5, pp. S440–S460, Oct 2016. [31] D. Goldfarb and S. Ma, “Convergence of fixed-point continuation [6] S. Ma and N. S. Aybat, “Efficient optimization algorithms for robust algorithms for matrix rank minimization,” Foundations of Computational principal component analysis and its variants,” Proceedings of the IEEE, Mathematics, vol. 11, no. 2, pp. 183–210, Feb 2011. vol. 106, no. 8, pp. 1411–1426, Jun 2018. [32] L. Xiao and T. Zhang, “A proximal-gradient homotopy method for the [7] X. Guo and Z. Lin, “Low-rank matrix recovery via robust outlier sparse least-squares problem,” SIAM Journal on Optimization, vol. 23, estimation,” IEEE Transactions on Image Processing, vol. 27, no. 11, pp. 1062–1091, May 2013. pp. 5316–5327, Nov 2018. [33] N. Boumal, P.-A. Absil, and C. Cartis, “Global rates of convergence [8] X. Cao, Q. Zhao, D. Meng, Y. Chen, and Z. Xu, “Robust low-rank for nonconvex optimization on manifolds,” IMA Journal of Numerical matrix factorization under general mixture noise distributions,” IEEE Analysis, vol. 39, no. 1, pp. 1–33, Jan 2018. Transactions on Image Processing, vol. 25, no. 10, pp. 4677–4690, Oct [34] W. H. Yang, L.-H. Zhang, and R. Song, “Optimality conditions for 2016. the nonlinear programming problems on riemannian manifolds,” Pacific [9] Y. Chen and M. J. Wainwright, “Fast low-rank estimation by projected Journal of Optimization, vol. 10, no. 2, pp. 415–434, Jul 2014. : General statistical and algorithmic guarantees,” arXiv [35] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of com- preprint arXiv:1509.03025, 2015. plex backgrounds for foreground object detection,” IEEE Transactions [10] Y. Shen, Z. Wen, and Y. Zhang, “Augmented lagrangian alternating on Image Processing, vol. 13, no. 11, pp. 1459–1472, Oct 2004. direction method for matrix separation based on low-rank factorization,” [36] X. Zhang, L. Wang, and Q. Gu, “A unified framework for nonconvex low- Optimization Methods and Software, vol. 29, no. 2, pp. 239–263, Jul rank plus sparse matrix recovery,” in Proceedings of the 21st International 2014. Conference on Artificial Intelligence and Statistics, Playa Blanca, Spain, [11] L. Zhao, P. Babu, and D. P. Palomar, “Efficient algorithms on robust Apr 2018. low-rank matrix completion against outliers,” IEEE Transactions on Signal Processing, vol. 64, no. 18, pp. 4767–4780, May 2016. [12] X. Li, Z. Zhu, A. Man-Cho So, and R. Vidal, “Nonconvex robust low- rank matrix recovery,” SIAM Journal on Optimization, vol. 30, no. 1, pp. 660–686, Feb 2020. [13] W. Dai, O. Milenkovic, and E. Kerman, “Subspace evolution and transfer (SET) for low-rank matrix completion,” Signal Processing, IEEE Transactions on, vol. 59, no. 7, pp. 3120–3132, Jul 2011. [14] W. Dai, E. Kerman, and O. Milenkovic, “A geometric approach to low- rank matrix completion,” Information Theory, IEEE Transactions on, vol. 58, no. 1, pp. 237–247, Jan 2012. [15] R. H. Keshavan, A. Montanari, and S. Oh, “Low-rank matrix completion with noisy observations: a quantitative comparison,” in Proceedings of the 47th Allerton Conference in Communication, Control, and Computing, Allerton House, IL, Sep 2009. [16] R. H. Keshavan and S. Oh, “OptSpace: A gradient descent algorithm on the Grassman manifold for matrix completion,” arXiv:0910.5260, 2009. [17] J. He, L. Balzano, and J. Lui, “Online robust subspace tracking from partial information,” arXiv preprint arXiv:1109.3827, 2011. [18] J. He, L. Balzano, and A. Szlam, “Incremental gradient on the Grass- mannian for online foreground and background separation in subsampled video,” in Proceedings of the 25th IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2012), Providence, RI, Jun 2012. [19] E. T. Hale, W. Yin, and Y. Zhang, “Fixed-point continuation for `1-minimization: Methodology and convergence,” SIAM Journal on Optimization, vol. 19, no. 3, pp. 1107–1130, Oct 2008. [20] E. J. Candes,` X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM (JACM), vol. 58, no. 3, p. 11, Jun 2011. [21] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Rank-sparsity incoherence for matrix decomposition,” SIAM Journal on Optimization, vol. 21, no. 2, pp. 572–596, Jun 2011. [22] N. Boumal and P.-A. Absil, “RTRMC: A Riemannian trust-region method for low-rank matrix completion,” in Proceedings of the 25th Annual Conference on Neural Information Processing Systems(NeurIPS 2011), Granada, Spain, Dec 2011, pp. 406–414. [23] ——, “Low-rank matrix completion via preconditioned optimization on the Grassmann manifold,” Linear Algebra and its Applications, vol. 475, pp. 200–239, Jun 2015. [24] Y. He, F. Wang, Y. Li, J. Qin, and B. Chen, “Robust matrix completion via maximum correntropy criterion and half-quadratic optimization,” IEEE Transactions on Signal Processing, vol. 68, pp. 181–195, Nov 2019. [25] W.-J. Zeng and H. C. So, “Outlier-robust matrix completion via `p- minimization,” IEEE Transactions on Signal Processing, vol. 66, no. 5, pp. 1125–1140, Mar 2017. [26] S. Zhang and M. Wang, “Correction of corrupted columns through fast robust hankel matrix completion,” IEEE Transactions on Signal Processing, vol. 67, no. 10, pp. 2580–2594, Mar 2019. [27] S. Chen, S. Ma, A. M.-C. So, and T. Zhang, “Proximal gradient method for nonsmooth optimization over the Stiefel manifold,” SIAM J. Optimization, vol. 30, no. 1, pp. 210–239, Jan 2020. [28] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds. Princeton University Press, 2009.