STRIDE Along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations

STRIDE along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations Heng Yang1, Ling Liang2, Kim-Chuan Toh2, and Luca Carlone1 1Laboratory for Information and Decision Systems, MIT 2Department of Mathematics, National University of Singapore Abstract We consider solving high-order semidefinite programming (SDP) relaxations of nonconvex polynomial optimization problems (POPs) that admit rank-one optimal solutions. Existing approaches, which solve the SDP independently from the POP, either cannot scale to large problems or suffer from slow convergence due to the typical degeneracy of such SDPs. We propose a new algorithmic framework, called SpecTrahedral pRoximal gradIent Descent along vErtices (STRIDE), that blends fast local search on the nonconvex POP with global descent on the convex SDP. Specifically, STRIDE follows a globally convergent trajectory driven by a proximal gradient method (PGM) for solving the SDP, while simultaneously probing long, but safeguarded, rank-one “strides”, generated by fast nonlinear programming algorithms on the POP, to seek rapid descent. We prove STRIDE has global convergence. To solve the subproblem of projecting a given point onto the feasible set of the SDP, we reformulate the projection step as a continuously differentiable unconstrained optimization and apply a limited-memory BFGS method to achieve both scalability and accuracy. We conduct numerical experiments on solving second-order SDP relaxations arising from two important applications in machine learning and computer vision. STRIDE dominates a diverse set of five existing SDP solvers and is the only solver that can solve degenerate rank-one SDPs to high accuracy (e.g., KKT residuals below 1e−9), even in the presence of millions of equality constraints. 1 Introduction d Let p; h1; : : : ; hl 2 R[x] be real-valued multivariate polynomials in x 2 R . We consider the following equality constrained polynomial optimization problem arXiv:2105.14033v1 [math.OC] 28 May 2021 min fp(x) j hi(x) = 0; i = 1; : : : ; lg : (POP) d x2R Assuming problem (POP) is feasible and bounded below, we denote by −∞ < p? < 1 the global minimum and by x? a global minimizer. Finding (p?; x?) has applications in various fields of engineering and science [28]. It is, however, well recognized that problem (POP) is NP-hard except T ? ? for some special cases (e.g., minxTx=1 x Qx for a symmetric matrix Q, has (p ; x ) as the minimum eigenvalue and associated eigenvector of Q). Therefore, solving convex relaxations of (POP), especially semidefinite programming (SDP) relaxations arising from the celebrated moment/sums- of-squares hierarchy [27, 40], has been the predominant way of finding (p?; x?). In this paper, we design robust and scalable algorithms for computing (p?; x?) of (POP) through its SDP relaxations. SDP Relaxations for Polynomial Optimization Problems. We first give an overview of the moment/sums-of-squares semidefinite relaxation hierarchy [27, 40]. Let κ ≥ 1 be an integer such that 2κ is equal or greater than the maximum degree of p; hi; i = 1; : : : ; l, and denote v := [x]κ as Preprint. Under review. T the standard vector of monomials in x of degree up to κ. We build the moment matrix X := [x]κ[x]κ that contains the standard monomials in x of degree up to 2κ. As a result, p and hi’s can be written as linear functions in X. We then consider two types of necessary linear constraints that can be imposed on X: (i) the entries of X are not linearly independent, in fact the same monomial could appear in multiple locations of X; (ii) each constraint hi in (POP) generates a set of equalities of the form hi(x)[x]2κ−deg(hi) = 0, having monomials in x of degree up to 2κ and thus linear in X, where deg(hi) is the degree of hi. Since X is positive semidefinite by construction, reformulating (POP) using the moment matrix X leads to the following primal semidefinite relaxation min fhC; Xi j A(X) = b; X 0g ; (P) n X2S ¯ d+κ m n m where n = dκ := ( κ ) is the dimension of [x]κ, b 2 R , A : S ! R is a linear map A(X) := m n n (hAi;Xi)i=1; 8X 2 S with Ai 2 S ; i = 1; : : : ; m, that is onto and collects all the linearly independent constraints generated by the relaxation. The feasible set of (P) is called a spectrahedron and is defined by the intersection of the positive semidefinite cone with a finite set of linear equality ∗ m n ∗ Pm m constraints. Let A : R 2 S be the adjoint of A defined as A y := i=1 yiAi; 8y 2 R , the standard Lagrangian dual of (P) is min fhb; yi j A∗y + S = C; S 0g ; (D) m n y2R ;S2S and admits an interpretation using sums-of-squares polynomials [40]. SDP relaxations (P)-(D) correspond to the so-called dense hierarchy, while many sparse variants exist to exploit the sparsity of p and hi’s to generate relaxations with multiple smaller blocks, see [51]. The algorithms developed in this paper can be extended in a straightforward way to solve sparse multi-block relaxations. ? ? ? ? Assuming strong duality holds between (P) and (D), it follows that p ≥ fP = fD, where fP and ? fD are the optimal values of (P) and (D), respectively. The SDP relaxation is said to be tight if ? ? ? ? ? T p = fP. In such cases, for any global minimizer x of (POP), the rank-one lifting X = [x ]κ[x ]κ is a minimizer of (P), and any rank-one optimal solution X? of (P) corresponds to a global minimizer of (POP). A special case of the relaxation hierarchy is Shor’s relaxation (κ = 1) for quadratically constrained quadratic programs [47, 30], of which applications and analyses have been extensively studied [15, 1, 12, 45, 42, 17, 50, 14]. Although κ = 1 is interesting and typically leads to SDPs with a small number of constraints (e.g., m = n in MAXCUT [21]) that can be solved efficiently by existing solvers, higher order relaxations (κ ≥ 2) are often desirable to produce tighter relaxations. ? ? In the seminal work [27], Lasserre proved that fP asymptotically approaches p as κ increases to infinity [27, Theorem 4.2]. Later on, several authors showed that tightness indeed happens for a finite κ [29, 37, 38]. Notably, Nie proved that, under the archimedean condition, if constraint qualification, strict complementarity and second-order sufficient conditions hold at every global minimizer of (POP), then the hierarchy converges at a finite κ [38, Theorem 1.1]. What is more encouraging is that, empirically, for many important (POP) instances, tightness happens at a very low relaxation order such as κ = 2; 3; 4. For instance, Lasserre [26] showed that κ = 2 attains tightness for a set of 50 randomly generated MAXCUT problems; in the experiments of Henrion and Lasserre [22], tightness holds at a small κ for most of the (POP) problems in the literature; Doherty, Parrilo and Spedalieri [20] demonstrated that the second-order SOS relaxation is sufficient to decide quantum separability in all bound entangled states found in the literature of dimensions up to 6 by 6; Cifuentes [16] designed and proved that a sparse variant of the second-order moment relaxation is guaranteed to be tight for the structured total least squares problem under a low noise assumption; Yang and Carlone applied a sparse second-order moment relaxation to globally solve a broad family of nonconvex and combinatorial computer vision problems [54, 56, 55]. Computational Challenges in Solving High-order Relaxations. The surging applications listed above make it imperative to design computational tools for solving high-order (tight) SDP relaxations. However, high-order relaxations pose notorious challenges to existing SDP solvers. The computational challenges mainly come from two aspects. In the first place, in stark contrast to Shor’s semidefinite relaxation, high-order relaxations lead to SDPs with a large amount of linear constraints, even if the dimension of the original (POP) is small. Taking binary quadratic programming as an example (i.e., xi 2 f+1; −1g and p is quadratic in (POP)), when d = 60, the number of equality constraints for κ = 2 is m = 1; 266; 971. For this reason, in all applications above, the experiments were performed on (POP) problems of very small size, e.g., the original (POP) has d up to 20 for dense relaxations. In the second place, the resulting SDP is not only large, but 2 also highly degenerate. Specifically, since the relaxation is tight and the optimal solution of (P) is rank one, it necessarily fails the primal nondegeneracy condition [3], which is crucial for ensuring numerical stability and fast convergence of existing algorithms [4, 61]. Due to these two challenges, our experiments show that no existing solver can consistently solve semidefinite relaxations with rank-one solutions to a desired accuracy when m is larger than 500; 000. In particular, interior point methods, such as SDPT3 [49] and MOSEK [6], can only handle up to m = 50; 000 before they run out of memory on an ordinary workstation. First-order methods based on ADMM and conditional gradient method, such as CDCS [62] and SketchyCGAL [59], albeit scalable, converge very slowly and cannot attain even modest accuracy. The best performing existing solver is SDPNAL+ [58], which employs an augmented Lagrangian framework where the inner problem is solved inexactly by a semi-smooth Newton method with a conjugate gradient solver (SSNCG). However, the problem with SDPNAL+ is that, in the presence of large m and degeneracy, computing the inexact SSN step involves solving a large and singular linear system of size m × m, and CG becomes incapable of computing the search direction with sufficient accuracy.

Load more