DC Decomposition of Nonconvex Polynomials with Algebraic Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Noname manuscript No. (will be inserted by the editor) DC Decomposition of Nonconvex Polynomials with Algebraic Techniques Amir Ali Ahmadi · Georgina Hall Received: date / Accepted: date Abstract We consider the problem of decomposing a multivariate polynomial as the difference of two convex polynomials. We introduce algebraic techniques which reduce this task to linear, second order cone, and semidefinite program- ming. This allows us to optimize over subsets of valid difference of convex decompositions (dcds) and find ones that speed up the convex-concave proce- dure (CCP). We prove, however, that optimizing over the entire set of dcds is NP-hard. Keywords Difference of convex programming, conic relaxations, polynomial optimization, algebraic decomposition of polynomials 1 Introduction A difference of convex (dc) program is an optimization problem of the form min f0(x) (1) s.t. fi(x) ≤ 0; i = 1; : : : ; m; where f0; : : : ; fm are difference of convex functions; i.e., fi(x) = gi(x) − hi(x); i = 0; : : : ; m; (2) The authors are partially supported by the Young Investigator Program Award of the AFOSR and the CAREER Award of the NSF. Amir Ali Ahmadi ORFE, Princeton University, Sherrerd Hall, Princeton, NJ 08540 E-mail: a a [email protected] Georgina Hall arXiv:1510.01518v2 [math.OC] 12 Sep 2018 ORFE, Princeton University, Sherrerd Hall, Princeton, NJ 08540 E-mail: [email protected] 2 Amir Ali Ahmadi, Georgina Hall n n and gi : R ! R; hi : R ! R are convex functions. The class of functions that can be written as a difference of convex functions is very broad containing for instance all functions that are twice continuously differentiable [18], [22]. Furthermore, any continuous function over a compact set is the uniform limit of a sequence of dc functions; see, e.g., reference [25] where several properties of dc functions are discussed. Optimization problems that appear in dc form arise in a wide range of ap- plications. Representative examples from the literature include machine learn- ing and statistics (e.g., kernel selection [9], feature selection in support vector machines [23], sparse principal component analysis [30], and reinforcement learning [37]), operations research (e.g., packing problems and production- transportation problems [45]), communications and networks [8],[31], circuit design [30], finance and game theory [17], and computational chemistry [13]. We also observe that dc programs can encode constraints of the type x 2 f0; 1g by replacing them with the dc constraints 0 ≤ x ≤ 1; x − x2 ≤ 0. This en- tails that any binary optimization problem can in theory be written as a dc program, but it also implies that dc problems are hard to solve in general. As described in [40], there are essentially two schools of thought when it comes to solving dc programs. The first approach is global and generally con- sists of rewriting the original problem as a concave minimization problem (i.e., minimizing a concave function over a convex set; see [46], [44]) or as a reverse convex problem (i.e., a convex problem with a linear objective and one con- straint of the type h(x) ≥ 0 where h is convex). We refer the reader to [43] for an explanation on how one can convert a dc program to a reverse convex prob- lem, and to [21] for more general results on reverse convex programming. These problems are then solved using branch-and-bound or cutting plane techniques (see, e.g., [45] or [25]). The goal of these approaches is to return global solu- tions but their main drawback is scalibility. The second approach by contrast aims for local solutions while still exploiting the dc structure of the problem by applying the tools of convex analysis to the two convex components of a dc decomposition. One such algorithm is the Difference of Convex Algorithm (DCA) introduced by Pham Dinh Tao in [41] and expanded on by Le Thi Hoai An and Pham Dinh Tao. This algorithm exploits the duality theory of dc programming [42] and is popular because of its ease of implementation, scalability, and ability to handle nonsmooth problems. In the case where the functions gi and hi in (2) are differentiable, DCA reduces to another popular algorithm called the Convex-Concave Procedure (CCP) [27]. The idea of this technique is to simply replace the concave part of fi (i.e., −hi) by a linear overestimator as described in Algorithm 1. By doing this, problem (1) becomes a convex optimization problem that can be solved using tools from convex analysis. The simplicity of CCP has made it an attrac- tive algorithm in various areas of application. These include statistical physics (for minimizing Bethe and Kikuchi free energy functions [48]), machine learn- ing [30],[15],[11], and image processing [47], just to name a few. In addition, CCP enjoys two valuable features: (i) if one starts with a feasible solution, the solution produced after each iteration remains feasible, and (ii) the objective DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 3 value improves in every iteration, i.e., the method is a descent algorithm. The proof of both claims readily comes out of the description of the algorithm and can be found, e.g., in [30, Section 1.3.], where several other properties of the method are also laid out. Like many iterative algorithms, CCP relies on a stopping criterion to end. This criterion can be chosen amongst a few alternatives. For example, one could stop if the value of the objective does not improve enough, or if the iterates are too close to one another, or if the norm of the gradient of f0 gets small. Algorithm 1 CCP Input: x0; fi = gi − hi; i = 0; : : : ; m 1: k 0 2: while stopping criterion not satisfied do k T 3: Convexify: fi (x) := gi(x) − (hi(xk) + rhi(xk) (x − xk)); i = 0; : : : ; m k k 4: Solve convex subroutine: min f0 (x), s.t. fi (x) ≤ 0; i = 1; : : : ; m k 5: xk+1 := argmin f0 (x) k fi (x)≤0 6: k k + 1 7: end while Output: xk Convergence results for CCP can be derived from existing results found for DCA, since CCP is a subcase of DCA as mentioned earlier. But CCP can also be seen as a special case of the family of majorization-minimization (MM) algorithms. Indeed, the general concept of MM algorithms is to iteratively up- perbound the objective by a convex function and then minimize this function, which is precisely what is done in CCP. This fact is exploited by Lanckriet and Sriperumbudur in [27] and Salakhutdinov et al. in [39] to obtain convergence results for the algorithm, showing, e.g., that under mild assumptions, CCP converges to a stationary point of the optimization problem (1). 1.1 Motivation and organization of the paper Although a wide range of problems already appear in dc form (2), such a decomposition is not always available. In this situation, algorithms of dc pro- gramming, such as CCP, generally fail to be applicable. Hence, the question arises as to whether one can (efficiently) compute a difference of convex de- composition (dcd) of a given function. This challenge has been raised several times in the literature. For instance, Hiriart-Urruty [22] states \All the proofs [of existence of dc decompositions] we know are \constructive" in the sense that they indeed yield [gi] and [hi] satisfying (2) but could hardly be carried over [to] computational aspects". As another example, Tuy [45] writes: \The dc structure of a given problem is not always apparent or easy to disclose, and even when it is known explicitly, there remains for the problem solver the hard task of bringing this structure to a form amenable to computational analysis." 4 Amir Ali Ahmadi, Georgina Hall Ideally, we would like to have not just the ability to find one dc decom- position, but also to optimize over the set of valid dc decompositions. Indeed, dc decompositions are not unique: Given a decomposition f = g − h, one can produce infinitely many others by writing f = g + p − (h + p); for any convex function p. This naturally raises the question whether some dc decompositions are better than others, for example for the purposes of CCP. In this paper we consider these decomposition questions for multivariate polynomials. Since polynomial functions are finitely parameterized by their co- efficients, they provide a convenient setting for a computational study of the dc decomposition questions. Moreover, in most practical applications, the class of polynomial functions is large enough for modeling purposes as polynomi- als can approximate any continuous function on compact sets with arbitrary accuracy. It could also be interesting for future research to explore the po- tential of dc programming techniques for solving the polynomial optimization problem. This is the problem of minimizing a multivariate polynomial sub- ject to polynomial inequalities and is currently an active area of research with applications throughout engineering and applied mathematics. In the case of quadratic polynomial optimization problems, the dc decomposition approach has already been studied [10],[24]. With these motivations in mind, we organize the paper as follows. In Sec- tion 2, we start by showing that unlike the quadratic case, the problem of testing if two given polynomials g; h form a valid dc decomposition of a third polynomial f is NP-hard (Proposition 1). We then investigate a few candi- date optimization problems for finding dc decompositions that speed up the convex-concave procedure. In particular, we extend the notion of an undom- inated dc decomposition from the quadratic case [10] to higher order poly- nomials. We show that an undominated dcd always exists (Theorem 1) and can be found by minimizing a certain linear function of one of the two convex functions in the decomposition.