An Exact Penalty Method for Semidefinite-Box Constrained Low-Rank Matrix Optimization Problems

An exact penalty method for semidefinite-box constrained low-rank matrix optimization problems Tianxiang Liu∗ Zhaosong Luy Xiaojun Chenz Yu-Hong Daix September 9, 2018 Abstract This paper considers a matrix optimization problem where the objective function is continuously differentiable and the constraints involve a semidefinite-box constraint and a rank constraint. We first replace the rank constraint by adding a non-Lipschitz penalty function in the objective and prove that this penalty problem is exact with respect to the original problem. Next, for the penalty problem, we present a nonmonotone proximal gradient (NPG) algorithm whose subproblem can be solved by Newton's method with globally quadratic convergence. We also prove the convergence of the NPG algorithm to a first-order stationary point of the penalty problem. Furthermore, based on the NPG algorithm, we propose an adaptive penalty method (APM) for solving the original problem. Finally, the efficiency of APM is shown via numerical experiments for the sensor network localization (SNL) problem and the nearest low-rank correlation matrix problem. Keywords: rank constrained optimization, non-Lipschitz penalty, nonmonotone proximal gradient, penalty method. 1 Introduction In this paper we consider the following constrained problem min f(X) s.t. 0 X I; rank(X) ≤ r; (1.1) n where f : S+ ! < is continuously differentiable with gradient rf being Lipschitz continuous, n and r < n is a given positive integer. Here, S+ denotes the cone of n × n positive semidefinite n n symmetric matrices, I is the n×n identity matrix, and 0 X I means X 2 S+ and I −X 2 S+, ∗Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China ([email protected]). This author was supported partly by the AMSS-PolyU Joint Research Institute Post- doctoral Scheme. yDepartment of Mathematics, Simon Fraser University, Canada ([email protected]). This author was supported in part by NSERC Discovery Grant. zDepartment of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China ([email protected]). This author was supported in part by NSFC/Hong Kong Research Grant Council N-PolyU504/14. xAcademy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China ([email protected]). This author was supported in part by the Chinese Natural Science Foundation (Nos. 11631013 and 11331012) and the National 973 Program of China (Nos. 2015CB856002). 1 which is referred to as a semidefinite-box constraint. Many application problems can be modeled by (1.1), including the wireless sensor network localization problem [4, 14] and the nearest low-rank correlation matrix problem [5, 12, 21]. Problem (1.1) is generally difficult to solve, due to the discontinuity and nonconvexity of the rank function. Recently, approximations of the rank function have been extensively studied. One well-known convex approximation is the nuclear norm kXk∗, namely, the sum of singular values of X (see for example [10]). For other research works involving this approximation, see for example [7, 22, 23]. Besides, a nonconvex and nonsmooth approximation, the so-called Schatten p-norm p P p kXkp = i≥1 σi(X) (p 2 (0; 1); σi(X) is the i-th largest singular value), has attracted a lot of attention due to its good computational performance (see for example [14, 20, 17]). However, simply adding these approximations into the objective genenerally cannot guarantee to produce a solution satisfying the rank constraint rank(X) ≤ r since they are not the exact penalty function for this constraint. Inspired by the relation n X p rank(X) ≤ r , λi (X) = 0 for X 0 i=r+1 and good computational performance of the p-norm with p 2 (0; 1] for sparsity, we propose the following penalty model for problem (1.1): n X p min Fµ(X) := f(X) + µ λi (X); (1.2) 0XI i=r+1 where µ > 0 and λi(X)(i = 1; :::; n) is the i-th largest eigenvalue of X. Such a penalty term with p = 1 has been used in [11] for solving a nearest low-rank correlation matrix problem. Nevertheless, we observe in numerical experiments that the penalty term with p 2 (0; 1) is generally more efficient than p = 1 in producing a low-rank solution of problem (1.1). The main contributions of this paper are as follows. • We propose a new penalty model (1.2) for the low-rank constrained problem (1.1) and prove that (1.2) is an exact penalty reformulation for (1.1) in the sense: there exists someµ ¯ > 0 such that for any µ > µ¯, X∗ is a global minimizer of problem (1.1) if and only if it is a global minimizer of problem (1.2). Furthermore, for any µ ≥ µ¯, any local minimizer of problem (1.1) is a local minimizer of problem (1.2). • We propose a nonmonotone proximal gradient (NPG) method for solving the penalty model (1.2). Although the associated proximal subproblem is sophisticated and challenging due to the partial set of eigenvalues, we reduce it into a set of univariate root-finding problems and show that they can be suitably solved by Newton's method with globally quadratic convergence. • We propose an adaptive penalty method (APM) for (1.1) with a suitable updating scheme on penalty parameter in which each penalty subproblem is solved by the aforementioned NPG. We establish its global convergence and also provide an estimate on iteration complexity for finding an approximated stationary point of (1.1). The rest of this paper is organized as follows. In Section 2, notation and preliminaries are given. In Section 3, we show that the penalty model (1.2) is an exact penalty reformulation of problem (1.1). In Section 4, we present an NPG algorithm for solving the penalty problem (1.2). In Section 5, we propose an APM for solving problem (1.1). In Section 6, we present numerical experiments for solving a sensor network localization problem and a nearest low-rank correlation matrix problem. 2 2 Notation and preliminaries n The following notation will be used throughout this paper. Given any x 2 < , x[i] denotes the ith largest entry of x and supp(x) denotes the support of x, namely, supp(x) = fi : xi 6= 0g. n n The symbol 1n denotes the all-ones vector of dimension n. Given x; y 2 < and Ω ⊆ < , x ≤ y means xi ≤ yi for all i and δΩ(·) is the indicator function of Ω, i.e., δΩ(x) = 0 if x 2 Ω, otherwise n n δΩ(x) = 1. For x 2 < and a closed convex set Ω ⊆ < , PΩ(x) is the projection of x onto Ω. The space of symmetric n × n matrices is denoted by Sn. If X 2 Sn is positive semidefinite, we write X 0. Given any X and Y in Sn, X Y means Y − X is postive semidefinite. In addition, given matrices X and Y in <m×n, the standard inner product is defined by hX; Y i := tr(XY T ), where tr(·) denotes the trace of a matrix. The Frobenius norm of a real matrix X is defined as p T kXkF := tr(XX ). The identity matrix is denoted by I, and all-ones matrix is denoted by E, whose dimensions shall be clear from the context. For any A; B 2 <n×n,\◦" denotes the n Hadamard product, i.e., (A ◦ B)ij = AijBij, i; j = 1; :::; n. For any X 2 S , we denote by λi(X) T (i = 1; :::; n) the ith largest eigenvalue of X and write λ(X) = (λ1(X); :::; λn(X)) . We use k · kF and k·k2 to denote the Frobenius norm and the Euclidean norm, respectively. In addition, B(X; ) n n stands for a ball in S centered at X with radius , that is, B(X; ) := fY 2 S : kY − XkF ≤ g. Given x 2 <n and X 2 <n×n, Diag(x) and diag(X) denote an n × n diagonal matrix whose diagonal is formed by the vector x and vector extracted from the diagonal of X, respectively. For the sake of convenience, we use C := fX 2 Sn : 0 X Ig; Ω := fX 2 C : rank(X) ≤ rg; (2.1) n to denote the feasible regions of problems (1.2) and (1.1), respectively. Given any X 2 S , let XΩ be a projection of X onto Ω, that is, XΩ 2 Ω and kX − XΩkF = min kX − ZkF : (2.2) Z2Ω Recall that f is assumed to be continuously differentiable in C. It follows that f is Lipschitz continuous in C, that is, there exists some constant Lf > 0 such that jf(X) − f(Y )j ≤ Lf kX − Y kF ; 8X; Y 2 C: (2.3) Before ending this section, we present some preliminary technical results that will be used subsequently. Lemma 2.1. Let p 2 (0; 1] and XΩ be a projection of X onto Ω. Then it holds n X p kX − XΩkF ≤ λi (X); 8X 2 C: (2.4) i=r+1 Proof. By Proposition 2.6 of [19], it is not hard to show that v u n u X 2 kX − XΩkF = t λi (X); 8X 2 C: (2.5) i=r+1 Notice from (2.1) that 0 ≤ λi(X) ≤ 1 for all i and X 2 C. In view of this fact and p 2 (0; 1], one can observe that v u n n n u X 2 X X p t λi (X) ≤ λi(X) ≤ λi (X); 8X 2 C: (2.6) i=r+1 i=r+1 i=r+1 It then follows from this relation and (2.5) that (2.4) holds as desired.

Load more