Noname manuscript No. (will be inserted by the editor)
DC Decomposition of Nonconvex Polynomials with Algebraic Techniques
Amir Ali Ahmadi · Georgina Hall
Received: date / Accepted: date
Abstract We consider the problem of decomposing a multivariate polynomial as the difference of two convex polynomials. We introduce algebraic techniques which reduce this task to linear, second order cone, and semidefinite program- ming. This allows us to optimize over subsets of valid difference of convex decompositions (dcds) and find ones that speed up the convex-concave proce- dure (CCP). We prove, however, that optimizing over the entire set of dcds is NP-hard. Keywords Difference of convex programming, conic relaxations, polynomial optimization, algebraic decomposition of polynomials
1 Introduction
A difference of convex (dc) program is an optimization problem of the form
min f0(x) (1) s.t. fi(x) ≤ 0, i = 1, . . . , m,
where f0, . . . , fm are difference of convex functions; i.e.,
fi(x) = gi(x) − hi(x), i = 0, . . . , m, (2)
The authors are partially supported by the Young Investigator Program Award of the AFOSR and the CAREER Award of the NSF.
Amir Ali Ahmadi ORFE, Princeton University, Sherrerd Hall, Princeton, NJ 08540 E-mail: a a [email protected] Georgina Hall arXiv:1510.01518v2 [math.OC] 12 Sep 2018 ORFE, Princeton University, Sherrerd Hall, Princeton, NJ 08540 E-mail: [email protected] 2 Amir Ali Ahmadi, Georgina Hall
n n and gi : R → R, hi : R → R are convex functions. The class of functions that can be written as a difference of convex functions is very broad containing for instance all functions that are twice continuously differentiable [18], [22]. Furthermore, any continuous function over a compact set is the uniform limit of a sequence of dc functions; see, e.g., reference [25] where several properties of dc functions are discussed. Optimization problems that appear in dc form arise in a wide range of ap- plications. Representative examples from the literature include machine learn- ing and statistics (e.g., kernel selection [9], feature selection in support vector machines [23], sparse principal component analysis [30], and reinforcement learning [37]), operations research (e.g., packing problems and production- transportation problems [45]), communications and networks [8],[31], circuit design [30], finance and game theory [17], and computational chemistry [13]. We also observe that dc programs can encode constraints of the type x ∈ {0, 1} by replacing them with the dc constraints 0 ≤ x ≤ 1, x − x2 ≤ 0. This en- tails that any binary optimization problem can in theory be written as a dc program, but it also implies that dc problems are hard to solve in general. As described in [40], there are essentially two schools of thought when it comes to solving dc programs. The first approach is global and generally con- sists of rewriting the original problem as a concave minimization problem (i.e., minimizing a concave function over a convex set; see [46], [44]) or as a reverse convex problem (i.e., a convex problem with a linear objective and one con- straint of the type h(x) ≥ 0 where h is convex). We refer the reader to [43] for an explanation on how one can convert a dc program to a reverse convex prob- lem, and to [21] for more general results on reverse convex programming. These problems are then solved using branch-and-bound or cutting plane techniques (see, e.g., [45] or [25]). The goal of these approaches is to return global solu- tions but their main drawback is scalibility. The second approach by contrast aims for local solutions while still exploiting the dc structure of the problem by applying the tools of convex analysis to the two convex components of a dc decomposition. One such algorithm is the Difference of Convex Algorithm (DCA) introduced by Pham Dinh Tao in [41] and expanded on by Le Thi Hoai An and Pham Dinh Tao. This algorithm exploits the duality theory of dc programming [42] and is popular because of its ease of implementation, scalability, and ability to handle nonsmooth problems. In the case where the functions gi and hi in (2) are differentiable, DCA reduces to another popular algorithm called the Convex-Concave Procedure (CCP) [27]. The idea of this technique is to simply replace the concave part of fi (i.e., −hi) by a linear overestimator as described in Algorithm 1. By doing this, problem (1) becomes a convex optimization problem that can be solved using tools from convex analysis. The simplicity of CCP has made it an attrac- tive algorithm in various areas of application. These include statistical physics (for minimizing Bethe and Kikuchi free energy functions [48]), machine learn- ing [30],[15],[11], and image processing [47], just to name a few. In addition, CCP enjoys two valuable features: (i) if one starts with a feasible solution, the solution produced after each iteration remains feasible, and (ii) the objective DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 3 value improves in every iteration, i.e., the method is a descent algorithm. The proof of both claims readily comes out of the description of the algorithm and can be found, e.g., in [30, Section 1.3.], where several other properties of the method are also laid out. Like many iterative algorithms, CCP relies on a stopping criterion to end. This criterion can be chosen amongst a few alternatives. For example, one could stop if the value of the objective does not improve enough, or if the iterates are too close to one another, or if the norm of the gradient of f0 gets small.
Algorithm 1 CCP
Input: x0, fi = gi − hi, i = 0, . . . , m 1: k ← 0 2: while stopping criterion not satisfied do k T 3: Convexify: fi (x) := gi(x) − (hi(xk) + ∇hi(xk) (x − xk)), i = 0, . . . , m k k 4: Solve convex subroutine: min f0 (x), s.t. fi (x) ≤ 0, i = 1, . . . , m k 5: xk+1 := argmin f0 (x) k fi (x)≤0 6: k ← k + 1 7: end while Output: xk
Convergence results for CCP can be derived from existing results found for DCA, since CCP is a subcase of DCA as mentioned earlier. But CCP can also be seen as a special case of the family of majorization-minimization (MM) algorithms. Indeed, the general concept of MM algorithms is to iteratively up- perbound the objective by a convex function and then minimize this function, which is precisely what is done in CCP. This fact is exploited by Lanckriet and Sriperumbudur in [27] and Salakhutdinov et al. in [39] to obtain convergence results for the algorithm, showing, e.g., that under mild assumptions, CCP converges to a stationary point of the optimization problem (1).
1.1 Motivation and organization of the paper
Although a wide range of problems already appear in dc form (2), such a decomposition is not always available. In this situation, algorithms of dc pro- gramming, such as CCP, generally fail to be applicable. Hence, the question arises as to whether one can (efficiently) compute a difference of convex de- composition (dcd) of a given function. This challenge has been raised several times in the literature. For instance, Hiriart-Urruty [22] states “All the proofs [of existence of dc decompositions] we know are “constructive” in the sense that they indeed yield [gi] and [hi] satisfying (2) but could hardly be carried over [to] computational aspects”. As another example, Tuy [45] writes: “The dc structure of a given problem is not always apparent or easy to disclose, and even when it is known explicitly, there remains for the problem solver the hard task of bringing this structure to a form amenable to computational analysis.” 4 Amir Ali Ahmadi, Georgina Hall
Ideally, we would like to have not just the ability to find one dc decom- position, but also to optimize over the set of valid dc decompositions. Indeed, dc decompositions are not unique: Given a decomposition f = g − h, one can produce infinitely many others by writing f = g + p − (h + p), for any convex function p. This naturally raises the question whether some dc decompositions are better than others, for example for the purposes of CCP. In this paper we consider these decomposition questions for multivariate polynomials. Since polynomial functions are finitely parameterized by their co- efficients, they provide a convenient setting for a computational study of the dc decomposition questions. Moreover, in most practical applications, the class of polynomial functions is large enough for modeling purposes as polynomi- als can approximate any continuous function on compact sets with arbitrary accuracy. It could also be interesting for future research to explore the po- tential of dc programming techniques for solving the polynomial optimization problem. This is the problem of minimizing a multivariate polynomial sub- ject to polynomial inequalities and is currently an active area of research with applications throughout engineering and applied mathematics. In the case of quadratic polynomial optimization problems, the dc decomposition approach has already been studied [10],[24]. With these motivations in mind, we organize the paper as follows. In Sec- tion 2, we start by showing that unlike the quadratic case, the problem of testing if two given polynomials g, h form a valid dc decomposition of a third polynomial f is NP-hard (Proposition 1). We then investigate a few candi- date optimization problems for finding dc decompositions that speed up the convex-concave procedure. In particular, we extend the notion of an undom- inated dc decomposition from the quadratic case [10] to higher order poly- nomials. We show that an undominated dcd always exists (Theorem 1) and can be found by minimizing a certain linear function of one of the two convex functions in the decomposition. However, this optimization problem is proved to be NP-hard for polynomials of degree four or larger (Proposition 4). To cope with intractability of finding optimal dc decompositions, we propose in Section 3 a class of algebraic relaxations that allow us to optimize over subsets of dcds. These relaxations will be based on the notions of dsos-convex, sdsos- convex, and sos-convex polynomials (see Definition 5), which respectively lend themselves to linear, second order cone, and semidefinite programming. In particular, we show that a dc decomposition can always be found by linear programming (Theorem 2). Finally, in Section 4, we perform some numeri- cal experiments to compare the scalability and performance of our different algebraic relaxations.
2 Polynomial dc decompositions and their complexity
To study questions around dc decompositions of polynomials more formally, let us start by introducing some notation. A multivariate polynomial p(x) in T n variables x := (x1, . . . , xn) is a function from R to R that is a finite linear DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 5
combination of monomials:
X α X α1 αn p(x) = cαx = cα1,...,αn x1 ··· xn , (3) α α1,...,αn where the sum is over n-tuples of nonnegative integers αi. The degree of a α monomial x is equal to α1 + ··· + αn. The degree of a polynomial p(x) is defined to be the highest degree of its component monomials. A simple counting argument shows that a polynomial of degree d in n variables has n+d d coefficients. A homogeneous polynomial (or a form) is a polynomial where all the monomials have the same degree. An n-variate form p of degree n+d−1 d has d coefficients. We denote the set of polynomials (resp. forms) of degree 2d in n variables by H˜n,2d (resp. Hn,2d). Recall that a symmetric matrix A is positive semidefinite (psd) if xT Ax ≥ 0 for all x ∈ Rn; this will be denoted by the standard notation A 0. Similarly, a polynomial p(x) is said to be nonnegative or positive semidefinite if p(x) ≥ 0 for n all x ∈ R . For a polynomial p, we denote its Hessian by Hp. The second order characterization of convexity states that p is convex if and only if Hp(x) 0, ∀x ∈ Rn.
Definition 1 We say a polynomial g is a dcd of a polynomial f if g is convex and g − f is convex.
Note that if we let h := g − f, then indeed we are writing f as a difference of two convex functions f = g − h. It is known that any polynomial f has a (polynomial) dcd g. A proof of this is given, e.g., in [47], or in Section 3.2, where it is obtained as corollary of a stronger theorem (see Corollary 1). By default, all dcds considered in the sequel will be of even degree. Indeed, if f is of even degree 2d, then it admits a dcd g of degree 2d. If f is of odd degree 2d − 1, it can be viewed as a polynomial f˜ of even degree 2d with highest- degree coefficients which are 0. The previous result then remains true, and f˜ admits a dcd of degree 2d. Our results show that such a decomposition can be found efficiently (e.g., by linear programming); see Theorem 3. Interestingly enough though, it is not easy to check if a candidate g is a valid dcd of f.
Proposition 1 Given two n-variate polynomials f and g of degree 4, with f 6= g, it is strongly NP-hard 1 to determine whether g is a dcd of f.2
Proof We will show this via a reduction from the problem of testing nonneg- ativity of biquadratic forms, which is already known to be strongly NP-hard
1 For a strongly NP-hard problem, even a pseudo-polynomial time algorithm cannot exist unless P=NP [16]. 2 If we do not add the condition on the input that f 6= g, the problem would again be NP-hard (in fact, this is even easier to prove). However, we believe that in any interesting instance of this question, one would have f 6= g. 6 Amir Ali Ahmadi, Georgina Hall
T [29], [4]. A biquadratic form b(x, y) in the variables x = (x1, . . . , xn) and T y = (y1, . . . , ym) is a quartic form that can be written as X b(x; y) = aijklxixjykyl. i≤j,k≤l
Given a biquadratic form b(x; y), define the n × n polynomial matrix C(x, y) ∂b(x;y) by setting [C(x, y)]ij := , and let γ be the largest coefficient in absolute ∂xi∂yj value of any monomial present in some entry of C(x, y). Moreover, we define
n n n2γ X X X X r(x; y) := x4 + y4 + x2x2 + y2y2. 2 i i i j i j i=1 i=1 1≤i It is proven in [4, Theorem 3.2.] that b(x; y) is nonnegative if and only if q(x, y) := b(x; y) + r(x, y) is convex. We now give our reduction. Given a biquadratic form b(x; y), we take g = q(x, y) + r(x, y) and f = r(x, y). If b(x; y) is nonnegative, from the theorem quoted above, g − f = q is convex. Furthermore, it is straightforward to establish that r(x, y) is convex, which implies that g is also convex. This means that g is a dcd of f. If b(x; y) is not nonnegative, then we know that q(x, y) is not convex. This implies that g − f is not convex, and so g cannot be a dcd of f. ut Unlike the quartic case, it is worth noting that in the quadratic case, it is easy to test whether a polynomial g(x) = xT Gx is a dcd of f(x) = xT F x. Indeed, this amounts to testing whether F 0 and G − F 0 which can be done in O(n3) time. As mentioned earlier, there is not only one dcd for a given polynomial f, but an infinite number. Indeed, if f = g − h with g and h convex then any convex polynomial p generates a new dcd f = (g + p) − (h + p). It is natural then to investigate if some dcds are better than others, e.g., for use in the convex-concave procedure. Recall that the main idea of CCP is to upperbound the non-convex function f = g − h by a convex function f k. These convex functions are obtained by linearizing h around the optimal solution of the previous iteration. Hence, a reasonable way of choosing a good dcd would be to look for dcds of f that minimize the curvature of h around a point. Two natural formulations of this problem are given below. The first one attempts to minimize the average3 curvature of h at a pointx ¯ over all directions: min Tr Hh(¯x) g (4) s.t. f = g − h, g, h convex. 3 T Note that Tr Hh(¯x) (resp. λmaxHh(¯x)) gives the average (resp. maximum) of y Hh(¯x)y over {y | ||y|| = 1}. DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 7 The second one attempts to minimize the worst-case3 curvature of h at a point x¯ over all directions: min λmaxHh(¯x) g (5) s.t. f − g − h, g, h convex. A few numerical experiments using these objective functions will be presented in Section 4.2. Another popular notion that appears in the literature and that also relates to finding dcds with minimal curvature is that of undominated dcds. These were studied in depth by Bomze and Locatelli in the quadratic case [10]. We extend their definition to general polynomials here. Definition 2 Let g be a dcd of f. A dcd g0 of f is said to dominate g if g − g0 is convex and nonaffine. A dcd g of f is undominated if no dcd of f dominates g. Arguments for chosing undominated dcds can be found in [10], [12, Section 3]. One motivation that is relevant to CCP appears in Proposition 24. Essentially, the proposition shows that if we were to start at some initial point and apply one iteration of CCP, the iterate obtained using a dc decomposition g would always beat an iterate obtained using a dcd dominated by g. Proposition 2 Let g and g0 be two dcds of f. Define the convex functions 0 0 0 h := g − f and h := g − f, and assume that g dominates g. For a point x0 in Rn, define the convexified versions of f T fg(x) := g(x) − (h(x0) + ∇h(x0) (x − x0)), 0 0 0 T fg0 (x) := g (x) − (h (x0) + ∇h (x0) (x − x0)). Then, we have 0 fg(x) ≤ fg(x), ∀x. Proof As g0 dominates g, there exists a nonaffine convex polynomial c such that c = g − g0. We then have g0 = g − c and h0 = h − c, and 0 T T fg(x) = g(x) − c(x) − h(x0) + c(x0) − ∇h(x0) (x − x0) + ∇c(x0) (x − x0) T = fg(x) − (c(x) − c(x0) − ∇c(x0) (x − x0)). The first order characterization of convexity of c then gives us fg0 (x) ≤ fg(x), ∀x. ut In the quadratic case, it turns out that an optimal solution to (4) is an un- dominated dcd [10]. A solution given by (5) on the other hand is not necessarily undominated. Consider the quadratic function 2 2 2 f(x) = 8x1 − 2x2 − 8x3 4 A variant of this proposition in the quadratic case appears in [10, Proposition 12]. 8 Amir Ali Ahmadi, Georgina Hall and assume that we want to decompose it using (5). An optimal solution is ∗ 2 2 ∗ 2 2 given by g (x) = 8x1 + 6x2 and h (x) = 8x2 + 8x3 with λmaxHh = 8. This is 0 2 ∗ 0 2 clearly dominated by g (x) = 8x1 as g (x) − g (x) = 6x2 which is convex. When the degree is higher than 2, it is no longer true however that solving (4) returns an undominated dcd. Consider for example the degree-4 polynomial f(x) = x12 − x10 + x6 − x4. A solution to (4) withx ¯ = 0 is given by g(x) = x12 + x6 and h(x) = x10 + x4 12 8 6 (as TrHh(0) = 0). This is dominated by the dcd g(x) = x − x + x and h(x) = x10 − x8 + x4 as g − g0 = x8 is clearly convex. It is unclear at this point how one can obtain an undominated dcd for higher degree polynomials, or even if one exists. In the next theorem, we show that such a dcd always exists and provide an optimization problem whose op- timal solution(s) will always be undominated dcds. This optimization problem involves the integral of a polynomial over a sphere which conveniently turns out to be an explicit linear expression in its coefficients. Proposition 3 ([14]) Let Sn−1 denote the unit sphere in Rn. For a monomial α1 αn 1 x1 . . . xn , define βj := 2 (αj + 1). Then ( Z 0 if some α is odd, α1 αn j x1 . . . xn dσ = 2Γ (β1)...Γ (βn) n−1 if all αj are even, S Γ (β1+...+βn) where Γ denotes the gamma function, and σ is the rotation invariant proba- bility measure on Sn−1. Theorem 1 Let f ∈ H˜n,2d. Consider the optimization problem 1 Z min Tr Hgdσ ˜ g∈Hn,2d An Sn−1 (6) s.t. g convex, g − f convex, 2πn/2 n−1 where An = Γ (n/2) is a normalization constant which equals the area of S . Then, an optimal solution to (6) exists and any optimal solution is an undom- inated dcd of f. Note that problem (6) is exactly equivalent to (4) in the case where n = 2 and so can be seen as a generalization of the quadratic case. Proof We first show that an optimal solution to (6) exists. As any polynomial f R admits a dcd, (6) is feasible. Letg ˜ be a dcd of f and define γ := Sn−1 Tr Hg˜dσ. DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 9 Consider the optimization problem given by (6) with the additional con- straints: 1 Z min Tr Hgdσ ˜ g∈Hn,2d An Sn−1 s.t. g convex and with no affine terms (7) g − f convex, Z Tr Hgdσ ≤ γ. Sn−1 Notice that any optimal solution to (7) is an optimal solution to (6). Hence, it suffices to show that (7) has an optimal solution. Let U denote the feasible set R of (7). Evidently, the set U is closed and g → Sn−1 Tr Hgdσ is continuous. If we also show that U is bounded, we will know that the optimal solution to (7) is achieved. To see this, assume that U is unbounded. Then for any β, there exists a coefficient cg of some g ∈ U that is larger than β. By absence of affine terms in g, cg features in an entry of Hg as the coefficient of a nonzero monomial. Takex ¯ ∈ Sn−1 such that this monomial evaluated atx ¯ is nonzero: this entails that at least one entry of Hg(¯x) can get arbitrarily large. However, since g → R Tr Hg is continuous and Sn−1 Tr Hgdσ ≤ γ, ∃γ¯ such that Tr Hg(x) ≤ γ¯, n−1 ∀x ∈ S . This, combined with the fact that Hg(x) 0 ∀x, implies that n−1 ||Hg(x)|| ≤ γ,¯ ∀x ∈ S , which contradicts the fact that an entry of Hg(¯x) can get arbitrarily large. We now show that if g∗ is any optimal solution to (6), then g∗ is an un- dominated dcd of f. Suppose that this is not the case. Then, there exists a dcd g0 of f such that g∗ − g0 is nonaffine and convex. As g0 is a dcd of f, g0 is feasible for (6). The fact that g∗ − g0 is nonaffine and convex implies that Z Z Z Tr Hg∗−g0 dσ > 0 ⇔ TrHg∗ dσ > Tr Hg0 dσ, Sn−1 Sn−1 Sn−1 which contradicts the assumption that g∗ is optimal to (6). ut Although optimization problem (6) is guaranteed to produce an undomi- nated dcd, we show that unfortunately it is intractable to solve. Proposition 4 Given an n-variate polynomial f of degree 4 with rational coefficients, and a rational number k, it is strongly NP-hard to decide whether there exists a feasible solution to (6) with objective value ≤ k. Proof We give a reduction from the problem of deciding convexity of quar- tic polynomials. Let q be a quartic polynomial. We take f = q and k = 1 R n−1 Tr Hq(x). If q is convex, then g = q is trivially a dcd of f and An S 1 Z Tr Hgdσ ≤ k. (8) An Sn−1 10 Amir Ali Ahmadi, Georgina Hall If q is not convex, assume that there exists a feasible solution g for (6) that satisfies (8). From (8) we have Z Z Z Tr Hg(x) ≤ Tr Hf dσ ⇔ Tr Hf−gdσ ≥ 0. (9) Sn−1 Sn−1 Sn−1 R But from (6), as g −f is convex, Sn−1 Tr Hg−f dσ ≥ 0. Together with (9), this implies that Z Tr Hg−f dσ = 0 Sn−1 which in turn implies that Hg−f (x) = Hg(x) − Hf (x) = 0. To see this, note that Tr(Hg−f ) is a nonnegative polynomial which must be identically equal to 0 since its integral over the sphere is 0. As Hg−f (x) 0, ∀x, we get that Hg−f = 0. Thus, Hg(x) = Hf (x), ∀x, which is not possible as g is convex and f is not. ut We remark that solving (6) in the quadratic case (i.e., 2d = 2) is simply a semidefinite program. 3 Alegbraic relaxations and more tractable subsets of the set of convex polynomials We have just seen in the previous section that for polynomials with degree as low as four, some basic tasks related to dc decomposition are computation- ally intractable. In this section, we identify three subsets of the set of convex polynomials that lend themselves to polynomial-time algorithms. These are the sets of sos-convex, sdsos-convex, and dsos-convex polynomials, which will respectively lead to semidefinite, second order cone, and linear programs. The latter two concepts are to our knowledge new and are meant to serve as more scalable alternatives to sos-convexity. All three concepts certify convexity of polynomials via explicit algebraic identities, which is the reason why we refer to them as algebraic relaxations. 3.1 DSOS-convexity, SDSOS-convexity, SOS-convexity To present these three notions we need to introduce some notation and briefly review the concepts of sos, dsos, and sdsos polynomials. We denote the set of nonnegative polynomials (resp. forms) in n variables and of degree d by PSD˜ n,d (resp. PSDn,d). A polynomial p is a sum of squares Pr 2 (sos) if it can be written as p(x) = i=1 qi (x) for some polynomials q1, . . . , qr. The set of sos polynomials (resp. forms) in n variables and of degree d is denoted by SOS˜ n,d (resp. SOSn,d). We have the obvious inclusion SOS˜ n,d ⊆ PSD˜ n,d (resp. SOSn,d ⊆ PSDn,d), which is strict unless d = 2, or n = 1, or (n, d) = (2, 4) (resp. d = 2, or n = 2, or (n, d) = (3, 4)) [20], [38]. DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 11 Letz ˜n,d(x) (resp. zn,d(x)) denote the vector of all monomials in x = (x1, . . . , xn) of degree up to (resp. exactly) d; the length of this vector is n+d n+d−1 d (resp. d ). It is well known that a polynomial (resp. form) p of T degree 2d is sos if and only if it can be written as p(x) =z ˜n,d(x)Qz˜n,d(x) T (resp. p(x) = zn,d(x)Qzn,d(x)), for some psd matrix Q [35], [34]. The matrix Q is generally called the Gram matrix of p. An SOS optimization problem is the problem of minimizing a linear function over the intersection of the con- vex cone SOSn,d with an affine subspace. The previous statement implies that SOS optimization problems can be cast as semidefinite programs. We now define dsos and sdsos polynomials, which were recently proposed by Ahmadi and Majumdar [3], [2] as more tractable subsets of sos polynomi- als. When working with dc decompositions of n-variate polynomials, we will end up needing to impose sum of squares conditions on polynomials that have 2n variables (see Definition 5). While in theory the SDPs arising from sos conditions are of polynomial size, in practice we rather quickly face a scala- bility challenge. For this reason, we also consider the class of dsos and sdsos polynomials, which while more restrictive than sos polynomials, are consid- erably more tractable. For example, Table 2 in Section 4.2 shows that when n = 14, dc decompositions using these concepts are about 250 times faster than an sos-based approach. At n = 18 variables, we are unable to run the sos-based approach on our machine. With this motivation in mind, let us start by recalling some concepts from linear algebra. Definition 3 A symmetric matrix M is said to be diagonally dominant (dd) if P P mii ≥ j6=i |mij| for all i, and strictly diagonally dominant if mii > j6=i |mij| for all i. We say that M is scaled diagonally dominant (sdd) if there exists a diagonal matrix D, with positive diagonal entries, such that DAD is dd. We have the following implications from Gershgorin’s circle theorem M dd ⇒ M sdd ⇒ M psd. (10) Furthermore, notice that requiring M to be dd can be encoded via a linear program (LP) as the constraints are linear inequalities in the coefficients of M. Requiring that M be sdd can be encoded via a second order cone program (SOCP). This follows from the fact that M is sdd if and only if X ij M = M2×2, i ij where each M2×2 is an n × n symmetric matrix with zeros everywhere except Mii Mij four entries Mii,Mij,Mji,Mjj which must make the 2 × 2 matrix Mji Mjj symmetric positive semidefinite [3]. These constraints are rotated quadratic cone constraints and can be imposed via SOCP [7]. Definition 4 ([3]) A polynomial p ∈ H˜n,2d is said to be 12 Amir Ali Ahmadi, Georgina Hall – diagonally-dominant-sum-of-squares (dsos) if it admits a representation T p(x) =z ˜n,d(x)Qz˜n,d(x), where Q is a dd matrix. – scaled-diagonally-dominant-sum-of-squares (sdsos) it it admits a represen- T tation p(x) =z ˜n,d(x)Qz˜n,d(x), where Q is an sdd matrix. Identical conditions involving zn,d instead ofz ˜n,d define the sets of dsos and sdsos forms. The following implications are again straightforward: p(x) dsos ⇒ p(x) sdsos ⇒ p(x) sos ⇒ p(x) nonnegative. (11) Given the fact that our Gram matrices and polynomials are related to each other via linear equalities, it should be clear that optimizing over the set of dsos (resp. sdsos, sos) polynomials is an LP (resp. SOCP, SDP). Let us now get back to convexity. T Definition 5 Let y = (y1, . . . , yn) be a vector of variables. A polynomial p := p(x) is said to be T – dsos-convex if y Hp(x)y is dsos (as a polynomial in x and y). T – sdsos-convex if y Hp(x)y is sdsos (as a polynomial in x and y). T 5 – sos-convex if y Hp(x)y is sos (as a polynomial in x and y). We denote the set of dsos-convex (resp. sdsos-convex, sos-convex, convex) forms in Hn,2d by ΣDCn,2d (resp. ΣSCn,2d, ΣCn,2d, Cn,2d). Similarly, Σ˜DCn,2d (resp. Σ˜SCn,2d, ΣC˜ n,2d, C˜n,2d) denote the set of dsos-convex (resp. sdsos- convex, sos-convex, convex) polynomials in H˜n,2d. The following inclusions ΣDCn,2d ⊆ ΣSCn,2d ⊆ ΣCn,2d ⊆ Cn,2d (12) are a direct consequence of (11) and the second-order necessary and sufficient condition for convexity which reads n T n p(x) is convex ⇔ Hp(x) 0, ∀x ∈ R ⇔ y Hp(x)y ≥ 0, ∀x, y ∈ R . Optimizing over ΣDCn,2d (resp. ΣSCn,2d, ΣCn,2d) is an LP (resp. SOCP, SDP). The same statements are true for Σ˜DCn,2d, Σ˜SCn,2d and ΣC˜ n,2d. Let us draw these sets for a parametric family of polynomials 4 4 3 2 2 3 p(x1, x2) = 2x1 + 2x2 + ax1x2 + bx1x2 + cx1x2. (13) Here, a, b and c are parameters. It is known that for bivariate quartics, all 6 convex polynomials are sos-convex; i.e., ΣC2,4 = C2,4. To obtain Figure 1, 5 The notion of sos-convexity has already appeared in the study of semidefinite repre- sentability of convex sets [19] and in applications such as shaped-constrained regression in statistics [32]. 6 In general, constructing polynomials that are convex but not sos-convex seems to be a nontrivial task [5]. A complete characterization of the dimensions and degrees for which convexity and sos-convexity are equivalent is given in [6]. DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 13 we fix c to some value and then plot the values of a and b for which p(x1, x2) is s/d/sos-convex. As we can see, the quality of the inner approximation of the set of convex polynomials by the sets of dsos/sdsos-convex polynomials can be very good (e.g., c = 0) or less so (e.g., c = 1). Fig. 1: The sets ΣDCn,2d,ΣSCn,2d and ΣCn,2d for the parametric family of polynomials in (13) 3.2 Existence of difference of s/d/sos-convex decompositions of polynomials The reason we introduced the notions of s/d/sos-convexity is that in our op- timization problems for finding dcds, we would like to replace the condition f = g − h, g, h convex with the computationally tractable condition f = g − h, g, h s/d/sos-convex. The first question that needs to be addressed is whether for any polynomial such a decomposition exists. In this section, we prove that this is indeed the case. This in particular implies that a dcd can be found efficiently. We start by proving a lemma about cones. Lemma 1 Consider a vector space E and a full-dimensional cone K ⊆ E. Then, any v ∈ E can be written as v = k1 − k2, where k1, k2 ∈ K. Proof Let v ∈ E. If v ∈ K, then we take k1 = v and k2 = 0. Assume now that v∈ / K and let k be any element in the interior of the cone K. As k ∈ int(K), there exists 0 < α < 1 such that k0 := (1 − α)v + αk ∈ K. Rewriting the previous equation, we obtain 1 α v = k0 − k. 1 − α 1 − α 1 0 α By taking k1 := 1−α k and k2 := 1−α k, we observe that v = k1 − k2 and k1, k2 ∈ K. ut 14 Amir Ali Ahmadi, Georgina Hall The following theorem is the main result of the section. Theorem 2 Any polynomal p ∈ H˜n,2d can be written as the difference of two dsos-convex polynomials in H˜n,2d. Corollary 1 Any polynomial p ∈ H˜n,2d can be written as the difference of two sdsos-convex, sos-convex, or convex polynomials in H˜n,2d. Proof This is straightforward from the inclusions Σ˜DCn,2d ⊆ Σ˜SCn,2d ⊆ ΣC˜ n,2d ⊆ C˜n,2d. ut In view of Lemma 1, it suffices to show that Σ˜DCn,2d is full dimensional in the vector space H˜n,2d to prove Theorem 2. We do this by constructing a polynomial in int(Σ˜DCn,2d) for any n, d. Recall that zn,d (resp.z ˜n,d) denotes the vector of all monomials in x = (x1, . . . , xn) of degree exactly (resp. up to) d. If y = (y1, . . . , yn) is a vector of variables of length n, we define wn,d(x, y) := y · zn,d(x), T where y · zn,d(x) = (y1zn,d(x), . . . , ynzn,d(x)) . Analogously, we define w˜n,d(x, y) := y · z˜n,d(x). Theorem 3 For all n, d, there exists a polynomial p ∈ H˜n,2d such that T T y Hp(x)y =w ˜n,d−1(x, y)Qw˜n,d−1(x, y), (14) where Q is strictly dd. Any such polynomial will be in int(Σ˜DCn,2d). Indeed, if we were to pertub the coefficients of p slightly, then each coefficient of Q would undergo a slight perturbation. As Q is strictly dd, Q would remain dd, and hence p would remain dsos-convex. We will prove Theorem 3 through a series of lemmas. First, we show that this is true in the homogeneous case and when n = 2 (Lemma 2). By induction, we prove that this result still holds in the homogeneous case for any n (Lemma 3). We then extend this result to the nonhomogeneous case. Lemma 2 For all d, there exists a polynomial p ∈ H˜2,2d such that T T y Hp(x)y = w2,d−1(x, y)Qw2,d−1(x, y), (15) for some strictly dd matrix Q. We remind the reader that Lemma 2 corresponds to the base case of a proof by induction on n for Theorem 3. DC Decomposition of Nonconvex Polynomials with Algebraic Techniques 15 Proof In this proof, we show that there exists a polynomial p that satisfies (15) for some strictly dd matrix Q in the case where n = 2, and for any d ≥ 1. 2 2 T T First, if 2d = 2, we simply take p(x1, x2) = x1 + x2 as y Hp(x)y = 2y Iy and the identity matrix is strictly dd. Now, assume 2d > 2. We consider two cases depending on whether d is divisible by 2. In the case that it is, we construct p as 2d 2d−2 2 2d−4 4 d d 2 2d−2 2d p(x1, x2) := a0x1 + a1x1 x2 + a2x1 x2 + ... + ad/2x1 x2 + ... + a1x1x2 + a0x2 , with the sequence {ak} d defined as follows k=0,..., 2 a1 = 1 2d − 2k d a = a , k = 1,..., − 1 (for 2d > 4) k+1 2k + 2 k 2 (16) 1 d a0 = + a d . d 2(2d − 1) 2 Let d β = a (2d − 2k)(2d − 2k − 1), k = 0,..., − 1, k k 2 d γ = a · 2k(2k − 1), k = 1,..., , (17) k k 2 d δ = a (2d − 2k) · 2k, k = 1,..., . k k 2 16 Amir Ali Ahmadi, Georgina Hall We claim that the matrix Q defined as β0 0 δ1 δ d 2 ...... . . . βk 0 δk+1 . . . ...... . . . β d −2 δ d −1 2 2 β d −1 0 0 2 γ d 0 δ d −1 2 2 . . . ...... γk 0 δk−1 ...... . . . γ1 0 0 γ1 . . . ...... δk−1 0 γk . . . ...... δ d −1 0 γ d 2 2 0 0 β d −1 2 δ d −1 0 β d −2 2 2 ...... . . . δ 0 β k+1 k . . . ...... δ d δ1 0 β0 2 is strictly dd and satisfies (15) with w2,d−1(x, y) ordered as