Paper, We Mainly Focus on the Following Binary Opti- Problem Iteratively, but It Provably Terminates in Polynomial Mization Problem: Iterations

An Exact Penalty Method for Binary Optimization Based on MPEC Formulation Ganzhao Yuan and Bernard Ghanem King Abdullah University of Science and Technology (KAUST), Saudi Arabia [email protected], [email protected] Abstract registration (Wang et al., 2016), and social network analy- sis (e.g. subgraph discovery (Yuan and Zhang, 2013; Ames, Binary optimization is a central problem in mathematical op- 2015), biclustering (Ames, 2014), planted clique and bi- timization and its applications are abundant. To solve this problem, we propose a new class of continuous optimization clique discovery (Ames and Vavasis, 2011), and community techniques, which is based on Mathematical Programming discovery (He et al., 2016; Chan and Yeung, 2011)), etc. with Equilibrium Constraints (MPECs). We first reformulate The binary optimization problem is difficult to solve, s- the binary program as an equivalent augmented biconvex op- ince it is NP-hard. One type of method to solve this problem timization problem with a bilinear equality constraint, then is continuous in nature. The simple way is to relax the bina- we propose an exact penalty method to solve it. The result- ry constraint with Linear Programming (LP) relaxation con- ing algorithm seeks a desirable solution to the original prob- straints −1 ≤ x ≤ 1 and round the entries of the resulting lem via solving a sequence of linear programming convex re- continuous solution to the nearest integer at the end. Howev- laxation subproblems. In addition, we prove that the penalty function, induced by adding the complementarity constraint er, not only may this solution not be optimal, it may not even to the objective, is exact, i.e., it has the same local and glob- be feasible and violate some constraint. Another type of op- al minima with those of the original binary program when timization focuses on the cutting-plane and branch-and-cut the penalty parameter is over some threshold. The conver- method. The cutting plane method solves the LP relaxation gence of the algorithm can be guaranteed, since it essentially and then adds linear constraints that drive the solution to- reduces to block coordinate descent in the literature. Final- wards integers. The branch-and-cut method partially devel- ly, we demonstrate the effectiveness of our method on the ops a binary tree and iteratively cuts out the nodes having problem of dense subgraph discovery. Extensive experiments a lower bound that is worse than the current upper bound, show that our method outperforms existing techniques, such while the lower bound can be found using convex relaxation, as iterative hard thresholding and linear programming relax- Lagrangian duality, or Lipschitz continuity. However, this ation. class of method ends up solving all 2n convex subproblems in the worst case. Our algorithm aligns with the first research 1 Introduction direction. It relies on solving a convex LP relaxation sub- In this paper, we mainly focus on the following binary opti- problem iteratively, but it provably terminates in polynomial mization problem: iterations. In non-convex optimization, good initialization is very min f(x); s:t: x 2 {−1; +1gn; x 2 Ω: (1) x important to the quality of the solution. Motivated by this, n several papers design smart initialization strategies and es- where the objective function f : R ! R is convex but tablish optimality qualification of the solutions for non- not necessarily smooth on some convex set Ω, and the non- convex problems. For example, the work of (Zhang, 2010) convexity of (1) is only caused by the binary constraints. In n considers a multi-stage convex optimization algorithm to re- addition, we assume {−1; 1g \ Ω 6= ;. fine the global solution by the initial convex method; the The optimization in (1) describes many applications of work of (Candes,` Li, and Soltanolkotabi, 2015) starts with a interest in both computer vision and machine learning, in- careful initialization obtained by a spectral method and im- cluding graph bisection (Goemans and Williamson, 1995; proves this estimate by gradient descent; the work of (Jain, Keuchel et al., 2003), Markov random fields (Boykov, Vek- Netrapalli, and Sanghavi, 2013) uses the top-k singular vec- sler, and Zabih, 2001), the permutation problem (Jiang, Liu, tors of the matrix as initialization and provides theoretical and Wen, 2016; Fogel et al., 2015), graph matching (Cour, guarantees for biconvex alternating minimization algorith- Srinivasan, and Shi, 2007; Toshev, Shi, and Daniilidis, 2007; m. The proposed method also uses a similar initialization Zaslavskiy, Bach, and Vert, 2009), image (co-)segmentation strategy since it reduces to convex LP relaxation in the first (Shi and Malik, 2000; Joulin, Bach, and Ponce, 2010), image iteration. Copyright c 2017, Association for the Advancement of Artificial The contributions of this paper are three-fold. (a) We re- Intelligence (www.aaai.org). All rights reserved. formulate the binary program as an equivalent augmented Table 1: Existing continuous methods for binary optimization. Method and Reference Description n 2 spectral relaxation (Cour and Shi, 2007) {−1; +1g ≈ fx j kxk2 = ng linear programming relaxation (Komodakis and Tziritas, 2007) {−1; +1gn ≈ fx j − 1 ≤ x ≤ 1g f0; +1gn ≈ fx j X xxT ; diag(X) = xg SDP relaxation (Wang et al., 2016) {−1; +1gn ≈ fx j X xxT ; diag(X) = 1g doubly positive relaxation (Huang, Chen, and Guibas, 2014) f0; +1gn ≈ fx j X xxT ; diag(X) = x; x ≥ 0; X ≥ 0g completely positive relaxation (Burer, 2009) f0; +1gn ≈ fx j X xxT ; diag(X) = x; x ≥ 0; X is CPg Relaxed Approximation SOCP relaxation (Kumar, Kolmogorov, and Torr, 2009) {−1; +1gn ≈ fx j hX − xxT ; LLT i ≥ 0; diag(X) = 1g; 8 L 0 2 n iterative hard thresholding (Yuan and Zhang, 2013) minx kx − x k2; s:t: x 2 {−1; +1g piecewise separable reformulation (Zhang et al., 2007) {−1; +1gn , fx j (1 + x) (1 − x) = 0g n `0 norm non-separable reformulation (Yuan and Ghanem, 2016b) {−1; +1g , fx j kx + 1k0 + kx − 1k0 ≤ ng n 2 `2 box non-separable reformulation (Murray and Ng, 2010) {−1; +1g , fx j − 1 ≤ x ≤ 1; kxk2 = ng n p `p box non-separable reformulation (Wu and Ghanem, 2016) {−1; +1g , fx j − 1 ≤ x ≤ 1; kxkp = n; 0 < p < 1g n 2 Equivalent Optimization `2 box non-separable MPEC [This paper] {−1; +1g , fx j −1 ≤ x ≤ 1; kvk2 ≤ n; hx; vi = n; 8vg optimization problem with a bilinear equality constraint via straints. Linear programming relaxation (Komodakis and a variational characterization of the binary constraint. Then, Tziritas, 2007; Kumar, Kolmogorov, and Torr, 2009) trans- we propose an exact penalty method to solve it. The result- forms the NP-hard optimization problem into a convex box- ing algorithm seeks a desirable solution to the original bi- constrained optimization problem, which can be solved by nary program. (b) We prove that the penalty function, in- well-established optimization methods and software. Semi- duced by adding the complementarity constraint to the ob- Definite Programming (SDP) relaxation (Huang, Chen, and jective is exact, i.e. the set of their globally optimal solu- Guibas, 2014) uses a lifting technique X = xxT and relax- tions coincide with that of (1) when the penalty parame- es to a convex conic X xxT 1 to handle the binary con- ter is over some threshold. Thus, the convergence of the straint. Combining this with a unit-ball randomized round- algorithm can be guaranteed, since it reduces to block co- ing algorithm, the work of (Goemans and Williamson, 1995) ordinate descent in the literature (Tseng, 2001; Bolte, S- proves that at least a factor of 87.8% to the global optimal abach, and Teboulle, 2014). To our knowledge, this is the solution can be achieved for the graph bisection problem. S- first attempt to solve general non-smooth binary optimiza- ince the original paper of (Goemans and Williamson, 1995), tion with guaranteed convergence. (c) We provide numeri- SDP has been applied to develop numerous approximation cal comparisons with state-of-the-art techniques, such as it- algorithms for NP-hard problems. As more constraints lead erative hard thresholding (Yuan and Zhang, 2013) and lin- to tighter bounds for the objective, doubly positive relax- ear programming relaxation (Komodakis and Tziritas, 2007; ation considers constraining both the eigenvalues and the el- Kumar, Kolmogorov, and Torr, 2009) on dense subgraph dis- ements of the SDP solution to be nonnegative, leading to covery. Extensive experiments demonstrate the effectiveness better solutions than canonical SDP methods. In addition, of our proposed method. Completely Positive (CP) relaxation (Burer, 2010, 2009) Notations. We use lowercase and uppercase boldfaced further constrains the entries of the factorization of the solu- letters to denote real vectors and matrices respectively. The tion X = LLT to be nonnegative L ≥ 0. It can be solved Euclidean inner product between x and y is denoted by by tackling its associated dual co-positive program, which is hx; yi or xT y. X 0 means that matrix X is positive semi- related to the study of indefinite optimization and sum-of- definite. Finally, sign is a signum function with sign(0) = squares optimization in the literature. Second-Order Cone ±1. Programming (SOCP) relaxes the SDP conic into the nonnegative orthant (Kumar, Kolmogorov, and Torr, 2009) us- 2 Related Work ing the fact that hX − xxT ; LLT i ≥ 0; 8 L, resulting in This paper proposes a new continuous method for binary tighter bound than the LP method, but looser than that of optimization. We briefly review existing related work in this the SDP method. Therefore it can be viewed as a balance research direction in the literature (see Table1). between efficiency and efficacy. There are generally two types of methods in the lit- Another type of methods for binary optimization relates erature.

Paper, We Mainly Focus on the Following Binary Opti- Problem Iteratively, but It Provably Terminates in Polynomial Mization Problem: Iterations

Globally Optimal Model-Based Clustering Via Mixed Integer Nonlinear Programming

Attenuation Imaging by Wavefield Reconstruction Inversion with Bound

Arxiv:1608.04430V3 [Math.OC] 28 Dec 2018 R R 0 Counts the Number of Nonzero Elements in a Vector

Cosparse Regularization of Physics-Driven Inverse Problems Srdan Kitic

A Primal Method for Multiple Kernel Learning

Binary Optimization Via Mathematical Programming with Equilibrium Constraints

New SOCP Relaxation and Branching Rule for Bipartite Bilinear Programs

New SOCP Relaxation and Branching Rule for Bipartite Bilinear Programs

Minimum Spanning Trees with Neighborhoods Let G = (V, E) Be a Connected Undirected Graph, Whose Vertices Are Embedded D D D in R , I.E., V ∈ R for All V ∈ V

Chemical and Phase Equilibria Through Deterministic Global Optimization

Parametric Dual Maximization for Non-Convex Learning Problems

Biconvex Relaxation for Semidefinite Programming in Computer Vision