<<

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Conic Programming

Recall the linear programming (LP) problem that we have seen in the last lecture:

min cTx s.t. Ax − b ≥ 0. x∈Rn

n m×n m with c ∈ R , A ∈ R and b ∈ R . The constraints of this problem (interpreted as an element-wise inequality) can be written equivalently as

m m Ax − b ∈ R+ = {y ∈ R : yi ≥ 0, ∀i}.

m The space R+ is usually called the positive (or, more accurately, non-negative) orthant; the constraints of LP can be interpreted as the intersection of the affine space Ax − b with the positive orthant.

1 Convex cones

Geometrically, the orthant is a particular case of a construction called a :

Definition. A set C is a convex cone iff αC + βC = C for any α, β > 0. The cone is said pointed if 0 ∈ C and blunt otherwise. The cone is said salient if for every 0 6= x ∈ C, −x ∈/ C, and flat otherwise.

m The way we defined R+ makes it a pointed salient convex cone. Other examples of cones we will see next fall into this category:

1. Lorentz (aka “ice cream” or “second-order”) cone defined as

m m 2 2 2 2 L = {x ∈ R : xm ≥ x1 + ··· + xm−1} = {x : xm ≥ k(x1, . . . , xm−1)k2}.

2. Positive semidefinite cone

m m×m S+ = {A ∈ R : A  0}.

m m m Exercise 1. Show that R+ , L , and S+ are pointed salient convex cones. The notion of a cone gives rise to a generalized form of conic inequalities: for a cone K, we will denote a ∈ K equivalently as a ≥K 0. Similarly, we will say a ≥K b implying a − b ≥K 0 or, equivalently, a − b ∈ K. In the same way, Ax − b ≥K 0 defines the intersection of an affine space with the cone K.

1 2 Conic programs

The notion of a conic inequality allows to generalize a linear program into a conic program (CP) T min c x s.t. Ax − b ≥K 0 x∈Rn

by simply replacing the standard inequality (i.e., ≥ m ) with a conic inequality w.r.t. some R+ convex cone K. Apart from linear programming, which is an obvious particular case of CP, the following two problems are of major importance:

1. Second-order conic program (SOCP) corresponding to K = Lm:   T Ax − b min c x s.t. T ≥K 0 x∈Rn d x − e

which can be rewritten explicitly as

T T min c x s.t. kAx − bk2 ≤ d x − e. x∈Rn

m 2. Semidefinite program (SDP) corresponding to K = S+ . In order to avoid high-order n m×m tensor notation, we will define a general linear operator A : R 7→ R as

n X Ax = xiAi, i=1

m where A1,..., An ∈ S+ . In these terms, an SDP can be written as

min cTx s.t. Ax − B  0. x∈Rn

Surprisingly, SDP generalizes both LP and SOCP! Any linear program can be written in the form T T min c x s.t. ai x − bi ≥ 0, i = 1, . . . , m, x∈Rn which can be written as an equivalent SDP

 T  a1 x − b1 T . min c x s.t.  ..   0. x∈ n   R T amx − bm

Note that a diagonal is positive semi-definite iff all its diagonal elements are non- negative. In order to reduce SOCP to SDP, we will need the following very useful result:

2 Theorem 1 (Schur complement). Let A be a symmetric block matrix  PQT  A = QR with R 0. Then, A 0 iff P − QTR−1Q 0. The theorem holds for a semi-definite A as well. To prove this result, observe that A  0 iff for every u,  PQT   u  inf(uT, vT) = inf uTPu + 2uTQTv + vTRv. v QR v v The latter is an infimum of a quadratic function in v, for which we can write the following necessary optimality condition (which is also sufficient, since R 0 and the function is convex): 2Qu + 2Rv = 0, from where v = −R−1Qu. Substituting the solution into the quadratic function yields  u  inf(uTvT)A = uTPu − 2uTQTR−1Qu + uTQTR−1RR−1Qu v v = uT(P − QTR−1Q)u The latter quadratic form is non-negative iff P − QTR−1Q  0. Armed with Schur’s complement theorem, we can now return to the SOCP problem, which we will now write as

T min c x s.t. ku(x)k2 ≤ v(x), x∈Rn where u(x) = Ax − b and v(x) = dTx − e. Note that v in the constraint upper-bounds a , so it cannot be negative. We rewrite the constraint as uTu ≤ v2 or, dividing by v as

−1  v  1 T T . v − u u = v − u  ..  u ≥ 0. v   v Invoking the Schur’s complement, we can equivalently write the constraint as  v(x) − uT(x) −   | v(x)     ..   0,  u(x) .  | v(x) turning SOCP into an SDP.

3 3 Barrier (interior point) methods

One of the most successful ways of solving conic programs are barrier methods, also know as interior point methods. Recall that we defined barriers as a particular family of penalty functions that disallow infeasible solutions. For our purpose, a barrier representing a cone K is a function ϕ : K → R such that lim ϕ(x) = ∞. K3x→∂K A barrier method consists of solving a sequence of unconstrained minimization problems with the barrier aggregate

T 1 min Fp(x) = min c x + ϕ(Ax − b), x∈Rn x∈Rn p

with increasing p. In the limit p → ∞, unconstrained minimization of F∞ is equivalent to the original conic program. As we have already mentioned, the barrier method has to be initialized with a feasible point, and it will always produce a strictly feasible solution (this is the reason for the name interior point). The following barrier functions are frequently used for standard convex cones: Cone Barrier function m m X R+ ϕ(x) = − log xi i=1 m−1 ! m 2 X 2 L ϕ(x) = − log xm − xi i=1 m S+ ϕ(X) = − log det X Exercise 2. Derive the gradients of the above functions and show that they qualify as bar- riers. Hint: for the semidefinite cone barrier, show that − log det X can be represented as −tr log X, where log X is understood as a function of a matrix.

4 Dual cones

Definition. The dual of a cone K is K∗ = {y : hx, yi ≥ 0 ∀x ∈ K}

Exercise 3. Prove that K∗ is a cone. Geometrically, the dual cone consists of vectors forming acute angles with all the vectors of the cone K.

Exercise 4. Prove that (K∗)∗ = K.

Definition. A cone K is said to be self-dual iff K∗ = K. m m m Exercise 5. Show that R+ , L and S+ are self-dual.

4 5 Dual conic programs

Before we show the dual form of a general convex conic program, we need another important notion from . Definition. An adjoint of a linear operator A : U → V is the linear operator A∗ : V → U such that for every u ∈ U and v ∈ V ,

∗ hu, A viU = hAu, viV . In general, there is no relation between the adjoint operator and the inverse operator! Exercise 6. Show that the adjoint of a linear operator expressed by a real matrix A is the matrix AT. Let us now consider the (primal) conic program

T min c x s.t. Ax − b ≥K 0. x∈Rn For every feasible x, Ax − b ∈ K. Therefore, by definition of the dual cone, for every y ∈ K∗, hAx − b, yi ≥ 0, from where hAx, yi = hx, A∗yi ≥ hb, yi

In particular, for A∗y = c, we have

hc, xi ≥ hb, yi.

In other words, hb, yi is the lower bound on the primal objective. Maximizing the latter bound yields the dual problem:

max bTy s.t. A∗y = c. y∈K∗ Most practical conic program satisfy the conditions of the following strong duality theo- rem: Theorem 2 (Strong duality). If either the primal or the dual problems are strictly feasible and bounded, then the other one is solvable and hc, x∗i = hb, y∗i. Let x∗ and y∗ the primal and the dual solutions, respectively. Substituting A∗y = c into the strong duality result yields

0 = hc, x∗i − hb, y∗i = hA∗y∗, x∗i − hb, y∗i = hy∗, Ax∗i − hy∗, bi = hy∗, Ax∗ − bi.

Note that we have already seen this condition in the general nonlinear programming case, λ∗Tg(x∗) = 0, under the name of complementary slackness. As before, the dual variable y plays the role of the Lagrange multipliers.

5 As an example, let us derive the dual problem of SDP, in which

m X Ax = xiAi. i=1

m Since S+ is self-dual, the dual problem can be written as

maxhY, Bi s.t. A∗Y = c. Y0

Here, the inner product in the objective is a matrix inner product hY, Bi = tr(YB) (no ∗ m×m transpose due to symmetry). The adjoint operator A assigns to each matrix Y ∈ R a n vector in R . In order to derive the adjoint, let fix some x and Y and use the definition

m ∗ X hx, A Yi = hAx, Yi = h xiAi, Yi i=1   m hA1, Yi X  .  = xihAi, Yi = hx,  . i i=1 hAn, Yi

Since the latter equality holds for every x and Y, we recognize in the vector of inner products hAi, Yi the action of the adjoint operator on Y.

6