arXiv:1902.10921v1 [math.OC] 28 Feb 2019 programming nvriy ejn,109,P .CiaEmi:yxia@buaa and E-mail: Mathematics China of R. School P. Education; 100191, of Beijing, University, Ministry the of LMIB Xia FoundationY. Science Natural Fou Beijing Science and Natural 11571029 National 11822103, by supported was research This o nonconvex Keywords raised. of are classes t problems ways different open widely-used ten for three Finally, structure paper problems. this convex in hidden survey we the solve, to difficult are Abstract date Accepted: / date Received: Xia Yong optimization convex hidden of survey A ovxfnto sas o lases.Freape h nue ma induced evaluatin the Second, example, For [1]. easy. always NP-hard not is also convex is globally function convex is deciding Actually, polynomials convexity. identify quartic to a difficult be may it First, solve. o not lo optimality. any to global that property remains due charming also the popular also but very applications real becomes various programming convex Nowadays, Introduction 1 (2010) 90C32 Classification Subject Mathematics programming wl eisre yteeditor) the by inserted be (will No. manuscript Noname oee,i osntma htcne rgamn rbesaee are problems programming convex that mean not does it However, oiae ytefc htntalnnovxotmzto problems optimization nonconvex all not that fact the by Motivated ovxprogramming convex · rcinlprogramming fractional k A k p · sup = udai programming quadratic x 6=0 · k k arnindual Lagrangian Ax x k k p p 02,9C5 90C26, 90C25, 90C20, .edu.cn dto fCiaudrgrants under China of ndation Z180005. ytmSine,Beihang Sciences, System · · udai matrix quadratic semidefinite ptimization a solution cal rxnorm trix whether reveal o l its nly s to asy a g 2 Yong Xia
is a convex function. But evaluating A p is NP-hard if p =1, 2, [35]. Third, convex programming problems mayk bek difficult to solve.6 It has been∞ shown in [20] that the general mixed-binary quadratic optimization problems can be equivalently reformulated as convex programming over the copositive cone Co := A Rn×n : A = AT , xT Ax 0, x 0, x Rn . { ∈ ≥ ∀ ≥ ∈ } Notice that checking whether A Co is NP-Complete [52]. Like a coin has two sides, there∈ are quite a few nonconvex optimization problems could be globally and efficiently solved in polynomial time. The rea- son behind this observation is that most of them belong to the hidden convex optimization, i.e., they admit equivalent convex programming reformulations. In the past two decades, hidden convex optimization problems were studied case by case in literature. In this survey, we summarize three class of ap- proaches to reveal the hidden convex structure, namely, nonlinear transforma- tion, Lagrangian dual and the primal tight convex relaxation. The remainder of this paper is organized as follows. Section 2 presents the most common used nonlinear transformations for different classes of problems. Section 3 shows how Lagrangian duality and its variations achieve zero gap. Section 4 summarizes some classes of nonconvex optimization problems ad- mitting tight primal relaxations. As a concluding remark, ten open problems are raised in Section 5. Rn Throughout the paper, let v( ) be the optimal value of problem ( ). + de- notes the nonnegative orthant in·Rn. For a matrix A, denote by A ·( )0 that A is positive (semi)definite. The trace of A is defined as the sum≻ of its diago- n nal elements, i.e., tr(A)= i=1 Aii. λmin(A) and λmax(A) denote the minimal and maximal eigenvalues of A, respectively. In is the identity matrix of order T P n. x = √x x stands for the ℓ2 norm of a vector x. diag(x1,..., xn) returns k k T n a diagonal matrix with diagonal elements x1,...,xn. e = (1, 1,..., 1) R . Denote by conv Ω the convex hull of Ω. E denotes the number of elements∈ in the set E. For{ a} real value x,[x] returns| | the largest integer less than or equal to x.
2 Nonlinear transformation
In this section, we survey some nonlinear transformation approaches widely used in convexifying different classes of nonconvex optimization problems.
2.1 A univariate example
Univariate examples are not always as simple as converting the concave func- tion √x to y( 0) by introducing the one-to-one mapping x = y2 with y 0. The approach≥ [77] for reducing the duality gap for box constrained noncon-≥ vex quadratic program requires solving the following nonconvex subproblem 2 (G) max φ(θ) := min a1θ,a2δ (θ) , (1) θ { } Hidden convex optimization 3
where a1, a2 are two positive scalars,
2 2 δ (θ) = min (xi yi) − i I J ∈X∪ s.t. x C, yi [ 1, 1], i I, ∈ ∈ − ∈ √1 θ ωiyi 1, i J, − ≤ ≤ ∈ C is a linear manifold and ωi 1, 1 is given. The objective function φ(θ) (1) is not concave and thus (G)∈ {− is a} nonconvex minimization. Introducing σ = √1 θ, one can reformulate (G) as the following convex program: − max σ √a2 (x y) s.t. · − √a1, √a1 σ ≤ ·
1 yi 1, i I, − ≤ ≤ ∈ σ ωiyi 1, i J, ≤ ≤ ∈ x C, σ 0. ∈ ≥
2.2 p-th power approach
Consider the following constrained nonconvex optimization problems:
(P) min f(x): gi(x) bi,i =1,...,m, x S , { ≤ ∈ } where f(x), gi(x) (i =1, 2,...,m) are strictly positive over S and S is closed, bounded, connected and of a full dimension. It is known [41] that if the per- turbation function
w(y) := min f(x): gi(x) yi,i =1,...,m, x S x∈S { ≤ ∈ } is locally convex in a closed and non-degenerate neighbourhood of b, then there is no duality gap between (P ) and its Lagrangian dual in the sense that the inner minimization is local and the outer maximization is in the neighbourhood of the optimal Lagrangian multiplier. In order to achieve such a zero duality gap, Li [43] first introduced the p-th power transformation:
p p p (Pp) min f (x): g (x) b ,i =1,...,m, x S . { i ≤ i ∈ } Under some additional assumptions, Li [43] showed that, for a sufficiently large p p, the perturbation function of (Pp) is a convex function of y for any y in a neighbourhood of b. For more applications of the p-th power formulation, we refer to [45,46] and references therein. More generalizations of the p-th power convexification approach are further studied in [44,75]. Recently, a novel shifted p-th power reformulation is introduced in [84]:
p p (Pp,µ) min f(x) : (gi(x)+ µi) (bi + µi) ,i =1,...,m, x S . { ≤ ∈ } 4 Yong Xia
Surprisingly, under the same assumptions, there is a parametric vector µ Rm such that p = 3 is sufficient to guarantee the convexity of the perturbation∈ p function of (Pp,µ) in terms of (y+µ) with y lying in a neighbourhood of b. Be- sides, the assumption on the strict positiveness of f(x) and gi(x) is redundant in this new reformulation. For more details, we refer to [84].
2.3 Minimal-volumn ellipsoid cover
Consider the geometric problem of finding n-dimensional ellipsoid of minimal volume covering a set of m given points ai, i =1,...,m:
−1 (MVE) minQ,x un det(Q ) T s.t. (x ai) Q(x ai) 1,i =1, . . . , m, p− − ≤
n where un is the volume of the unit ball in R and the objective is the n- dimensional volume. 1 By first introducing M = Q 2 (which is well defined as Q 0) and z = Mx, and then taking a logarithmic transformation for the objective≻ function, we can reformulate the nonconvex optimization problem (MVE) as the following convex program [70]:
minM,z un log det(M) − T s.t. (z Mai) (z Mai) 1,i =1, . . . , m, − − ≤ where the fact det(Q−1) = (det(M))−2 is used.
2.4 Multiplicative programming
Consider the linear multiplicative programming [40]:
T T (LMP) maxx∈C (d1 x + c1)(d2 x + c2) T s.t. d x + ci 0, i =1, 2, i ≥ where C is a convex set. (LMP) is NP-hard [51] to solve if replacing the max- imization with minimization. The objective function of (LMP) is nonconcave but quasiconcave [40]. The hidden convexity of (LMP) is viewed by the equiv- alent convex programming problem in the sense that both sharing the same optimal solution:
T T maxx∈C log(d1 x + c1) + log(d2 x + c2) T s.t. d x + ci 0, i =1, 2. i ≥ Hidden convex optimization 5
Moreover, we notice that (LMP) has the following new second-order cone programming representation:
maxx∈C z T s.t. d x + ci 0, i =1, 2, i ≥ s dT x + c , t dT x + c , ≤ 1 1 ≤ 2 2 s t − s + t. 2z ≤
2.5 Geometric programming
Geometric program was first introduced in the book [26]. It is an optimization problem of the form
(GP) min f0(x)
s.t. fi(x) 1, i =1, . . . , m, (2) ≤ gj(x)=1, j =1, . . . , p, (3)
n aij where gj (j =1,...,p) are monomials, i.e. the form cj i=1 xi with cj > 0, and fi (i =0,...,m) are posynomials, i.e., the sum of several monomials. We also assume that the constraints (2)-(3) implicitly implyQ that all the variables are positive. Obviously, the general (GP) is a nonconvex optimization problem. One can verify that introducing logarithmic change of variables yi = log(xi) and log- arithmic transformations of fi(x) yields a convex programming reformulation of (GP):
y min log f0(e ) (4) y s.t. log fi(e ) 0, i =1, . . . , m, (5) y ≤ log gj (e )=0, j =1, . . . , p, (6)
y yi where (e )i = e .
Consider the linear fractional program
cT x + α (LF) min f(x)= (7) dT x + β s.t. x Ω := x : Ax b, x 0 . (8) ∈ { ≤ ≥ } T To guarantee that (LF) is well defined, we assume minx∈Ω d x + β > 0. It is trivial to verify that the objective function f(x) (7) is quasi-convex and x generally nonconvex. Introducing new variables y and t to replace dT x+β and 6 Yong Xia
1 dT x+β , respectively, we can equivalently reformulate (LF) as the following linear programming:
min cT y + αt s.t. Ay tb, y 0. ≤ ≥
The above approach was initially suggested by Charnes and Cooper [23]. It was then extended by Schaible [64] to nonlinear convex fractional programs:
g (x) (NLF) min 0 (9) h(x)
s.t. gi(x) 0, i =1, . . . , m, (10) ≤ where gi(x) (i =0, 1,...,m) are all convex functions, h(x) is a concave function x 1 and h(x) > 0 over the feasible region. Then, introducing y = h(x) and t = h(x) reduces the nonconvex optimization problem (NLF) to the following convex programming:
y (CLF) min t g (11) · 0 t y s.t. t gi 0, i =1, . . . , m, (12) · t ≤ y t h 1, t> 0. (13) · t ≤ Minimizing the sum of a linear and a linear fractional function over a polyhedral set is generally NP-hard [51]. Necessary and sufficient condition for the pseudoconcavity of the objective function is established in [22]. The pseudoconvex but nonconvex case is the following:
T a1 x + b1 T (SLF) min T + a2 x a2 x + b2 s.t. Ax = b, x 0. ≥
Based on the Charnes-Cooper transformation, it was shown in [27] that (SLF) enjoys a second-order cone programming reformulation:
(SOCPLP) mins,t,y,z t s.t. Ay sb =0, y 0, T − ≥ a2 y + b2s =1, T 2 T 2 z = b2a2 y + b2 + a1 y + (b2 + b1)s, (2aT y,s t + z)T s + t z. k 2 − k≤ − Hidden convex optimization 7
2.7 Eigenvalue and trust-region subproblem
Define the Rayleigh quotient of a symmetric matrix A Rn×n as: ∈ xT Ax RA(x)= , x =0. xT x 6 The Rayleigh-Ritz formula is well known for the maximal eigenvalue of A: T sup RA(x)= max x Ax. (14) x=06 xT x=1 Let A = UDU T be the eigenvalue decomposition of A, where U is orthogonal and D = diag(λ1,...,λn) with λ1 ... λn being the n eigenvalues of A. Then, we have ≤ ≤ n n T T 2 max x Ax = max y Dy = max λiyi = max λizi, xT x=1 yT y=1 Pn y2=1 eT z=1,z≥0,∀i i=1 i i=1 i=1 X X where y = U T x is replaced in the first equality and it holds that y = x 2 k k k k as U is orthogonal, the nonlinear transformation zi = yi , i = 1,...,n are introduced in the last equality and we note that these mappings are no longer one-to-one. Problem on the right-hand side of the above chain equalities is a linear programming over a standard simplex, which serves as the hidden convex reformulation of (14). Now we move to the well-known Kantorovich inequality, which is related to the following nonconvex optimization: (xT Ax)(xT A−1x) (KI) max , x=06 (xT x)2 where A 0. Based on the same transformation as above, (KI) is reduced to ≻ n n −1 max λizi λi zi , eT z=1,z≥0 i=1 ! i=1 ! X X which is further equivalent (in the sense that both optimal solutions are the same) to solving
n n −1 −1 max min t λizi + t λi zi eT z=1,z≥0 t∈[t1,t2] i=1 ! i=1 ! Xn Xn −1 −1 = min max t λizi + t λi zi , (15) t∈[t1,t2] eT z=1,z≥0 i=1 ! i=1 ! X X −1 −1 = min f(t) := max tλi + t λi , (16) t∈[t1,t2] i=1,...,n −1 where t1 = λ1/λn, t2 = t1 , and (15) follows from the classical von Neu- mann’s minimax theorem. Problem (16) is a univariate convex optimization p problem and can be explicitly solved. 8 Yong Xia
The inhomogeneous extension of (14) is the well-known trust-region sub- problem (TRS) [30]:
(TRS) min xT Ax + bT x : xT x δ , (17) ≤ where δ is a positive scalar. (TRS) (17) plays a great role in the trust-region method [91] and also has some other applications such as the constrained least squares [31]. In the case that A 0, (TRS) is a nonconvex optimization. Interestingly, it was shown in [50] that6 (TRS) has at most one local non-global minimizer. T As above, let A = Udiag(λ1,...,λn)U be the eigenvalue decomposition of A and b = U T b. By introducing y = U T x, (TRS) is equivalent to
n n e 2 2 min λiy + biyi : y δ . (18) i i ≤ (i=1 i=1 ) X X e For i =1,...,n, by further introducing
√zi, bi 0, yi = ≤ ( √zi, bi > 0, − e we can reduce (TRS) (18) to the following convexe programming over a simplex:
n n min λizi bi √zi : zi δ, zi 0, i =1,...,n . − | | ≤ ≥ (i=1 i=1 ) X X e The above convexification approach is further extended by Ben-Tal and Teboulle [14] to the two-sided trust-region subproblem [68]. Similar application in con- vexifying the following regularized problem
min xT Ax + bT x + ρ x p x∈Rn k k with p> 2 can be found in [38].
2.8 Quadratic matrix programming over orthogonal constraints
For vectors u and v, define the minimal product as
n
u, v − := min uivπ(i), (19) h i π i=1 X where π is a permutation of 1, 2,...,n. Notice that u, v − is easy to solve by first sorting u and v, respectively. u, v is similarlyh definedi and computed: h i+ n
u, v + := max uivπ(i). (20) h i π i=1 X Hidden convex optimization 9
The quadratic assignment problem (QAP) is a classical combinatorial op- timization problem [48]. The trace formulation reads as follows: (QAP) min tr(AXBT XT ) (21) n×n T X Πn := X 0, 1 : Xe = X e = e , (22) ∈ { ∈{ } } where A, B Rn×n correspond to flow matrix and distance matrix in a facil- ∈ ity location application, respectively, Πn is the set of all n n permutation matrices. The orthogonal relaxation of (QAP) was first proposed× in [28]: (O) min tr(AXBT XT ) (23) n×n T X n := X R : X X = In . (24) ∈ O { ∈ } Let λ and µ be the vectors composed by the eigenvalues of A and B, respec- tively, i.e., A = Udiag(λ)U T , B = V diag(µ)V T . It is not difficult to show the following result.
Theorem 1 ([28,61]) For any X n, it holds that ∈ O λ, µ tr(AXBT XT ) λ, µ . h i− ≤ ≤ h i+ The hidden convex reformulation of (O) reads as follows: min tr(Udiag(λ)U T XV diag(µ)V T XT ) T X X=In = min tr(diag(λ)U T XV diag(µ)V T XT U) T X X=In = min tr(diag(λ)Y diag(µ)Y T ) (25) T Y Y =In n n 2 min λiµj y (26) n 2 n 2 ij ≥ P =1 y =P =1 y =1 ∀i,j i ij j ij i=1 j=1 X X n n = min λ µ z , (27) n n i j ij P zij =P zij =1,zij ≥0 ∀i,j i=1 j=1 i=1 j=1 X X where Y = U T XV is introduced in (25) so that Y remains orthogonal, and in 2 (27), zij = yij (i, j =1,...,n) are introduced. Notice that the problem (27) is a linear assignment problem. Therefore, there is an optimal solution, denoted ∗ ∗ by Z , lying at one of the vertices. That is, for all i, j, we have zij 0, 1 . ∗2 ∗ ∈ { } It follows that yij 0, 1 and hence yij 0, 1 , which further implies that ∗ ∈{ } ∈{ } Y Πn n. Therefore, the inequality (26) holds as an equality. This∈ convexification⊆ O approach could be further extended to the trust-region type relaxation [3] by replacing (24) with n×n T X R : In X X , { ∈ } and the enhanced version [76] with (24) being replaced by n×n T T X R : In X X, tr(X X) m , { ∈ ≥ } where m is an integer between 0 and n. 10 Yong Xia
3 Lagrangian dual and its variation
In this section, we first show that strong Lagrangian duality could hold for a few nonconvex optimization problems or their special cases. Sometimes, in order to achieve strong duality, the approach such as adding redundant con- straints or making suitable transformation should be introduced in advance.
3.1 Strong Lagrangian duality
Let us begin with the generalized trust-region subproblem with interval bounds [60]:
(QP) min f1(x) s.t. x Ω := x Rn : α f (x) β , ∈ { ∈ ≤ 2 ≤ } T T Rn×n where fi(x) = x Aix +2ai x + ci for i = 1, 2, A1, A2 are symmetric matrices, a ,a Rn and c ,c R. We make a further∈ assumption: 1 2 ∈ 1 2 ∈ Assumption 1 A = 0, < α β < + and there are y,z Rn such 2 6 −∞ ≤ ∞ ∈ that either α