1 Convex Cones
Total Page:16
File Type:pdf, Size:1020Kb
Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Conic Programming Recall the linear programming (LP) problem that we have seen in the last lecture: min cTx s:t: Ax − b ≥ 0: x2Rn n m×n m with c 2 R , A 2 R and b 2 R . The constraints of this problem (interpreted as an element-wise inequality) can be written equivalently as m m Ax − b 2 R+ = fy 2 R : yi ≥ 0; 8ig: m The space R+ is usually called the positive (or, more accurately, non-negative) orthant; the constraints of LP can be interpreted as the intersection of the affine space Ax − b with the positive orthant. 1 Convex cones Geometrically, the orthant is a particular case of a construction called a convex cone: Definition. A set C is a convex cone iff αC + βC = C for any α; β > 0. The cone is said pointed if 0 2 C and blunt otherwise. The cone is said salient if for every 0 6= x 2 C, −x 2= C, and flat otherwise. m The way we defined R+ makes it a pointed salient convex cone. Other examples of cones we will see next fall into this category: 1. Lorentz (aka \ice cream" or \second-order") cone defined as m m 2 2 2 2 L = fx 2 R : xm ≥ x1 + ··· + xm−1g = fx : xm ≥ k(x1; : : : ; xm−1)k2g: 2. Positive semidefinite cone m m×m S+ = fA 2 R : A 0g: m m m Exercise 1. Show that R+ , L , and S+ are pointed salient convex cones. The notion of a cone gives rise to a generalized form of conic inequalities: for a cone K, we will denote a 2 K equivalently as a ≥K 0. Similarly, we will say a ≥K b implying a − b ≥K 0 or, equivalently, a − b 2 K. In the same way, Ax − b ≥K 0 defines the intersection of an affine space with the cone K. 1 2 Conic programs The notion of a conic inequality allows to generalize a linear program into a conic program (CP) T min c x s:t: Ax − b ≥K 0 x2Rn by simply replacing the standard inequality (i.e., ≥ m ) with a conic inequality w.r.t. some R+ convex cone K. Apart from linear programming, which is an obvious particular case of CP, the following two problems are of major importance: 1. Second-order conic program (SOCP) corresponding to K = Lm: T Ax − b min c x s:t: T ≥K 0 x2Rn d x − e which can be rewritten explicitly as T T min c x s:t: kAx − bk2 ≤ d x − e: x2Rn m 2. Semidefinite program (SDP) corresponding to K = S+ . In order to avoid high-order n m×m tensor notation, we will define a general linear operator A : R 7! R as n X Ax = xiAi; i=1 m where A1;:::; An 2 S+ . In these terms, an SDP can be written as min cTx s:t: Ax − B 0: x2Rn Surprisingly, SDP generalizes both LP and SOCP! Any linear program can be written in the form T T min c x s:t: ai x − bi ≥ 0; i = 1; : : : ; m; x2Rn which can be written as an equivalent SDP 0 T 1 a1 x − b1 T . min c x s:t: B .. C 0: x2 n @ A R T amx − bm Note that a diagonal matrix is positive semi-definite iff all its diagonal elements are non- negative. In order to reduce SOCP to SDP, we will need the following very useful result: 2 Theorem 1 (Schur complement). Let A be a symmetric block matrix PQT A = QR with R 0. Then, A 0 iff P − QTR−1Q 0. The theorem holds for a semi-definite A as well. To prove this result, observe that A 0 iff for every u, PQT u inf(uT; vT) = inf uTPu + 2uTQTv + vTRv: v QR v v The latter is an infimum of a quadratic function in v, for which we can write the following necessary optimality condition (which is also sufficient, since R 0 and the function is convex): 2Qu + 2Rv = 0; from where v = −R−1Qu: Substituting the solution into the quadratic function yields u inf(uTvT)A = uTPu − 2uTQTR−1Qu + uTQTR−1RR−1Qu v v = uT(P − QTR−1Q)u The latter quadratic form is non-negative iff P − QTR−1Q 0. Armed with Schur's complement theorem, we can now return to the SOCP problem, which we will now write as T min c x s:t: ku(x)k2 ≤ v(x); x2Rn where u(x) = Ax − b and v(x) = dTx − e. Note that v in the constraint upper-bounds a norm, so it cannot be negative. We rewrite the constraint as uTu ≤ v2 or, dividing by v as −1 0 v 1 1 T T . v − u u = v − u B .. C u ≥ 0: v @ A v Invoking the Schur's complement, we can equivalently write the constraint as 0 v(x) − uT(x) − 1 B j v(x) C B C B .. C 0; @ u(x) . A j v(x) turning SOCP into an SDP. 3 3 Barrier (interior point) methods One of the most successful ways of solving conic programs are barrier methods, also know as interior point methods. Recall that we defined barriers as a particular family of penalty functions that disallow infeasible solutions. For our purpose, a barrier representing a cone K is a function ' : K ! R such that lim '(x) = 1: K3x!@K A barrier method consists of solving a sequence of unconstrained minimization problems with the barrier aggregate T 1 min Fp(x) = min c x + '(Ax − b); x2Rn x2Rn p with increasing p. In the limit p ! 1, unconstrained minimization of F1 is equivalent to the original conic program. As we have already mentioned, the barrier method has to be initialized with a feasible point, and it will always produce a strictly feasible solution (this is the reason for the name interior point). The following barrier functions are frequently used for standard convex cones: Cone Barrier function m m X R+ '(x) = − log xi i=1 m−1 ! m 2 X 2 L '(x) = − log xm − xi i=1 m S+ '(X) = − log det X Exercise 2. Derive the gradients of the above functions and show that they qualify as bar- riers. Hint: for the semidefinite cone barrier, show that − log det X can be represented as −tr log X, where log X is understood as a function of a matrix. 4 Dual cones Definition. The dual of a cone K is K∗ = fy : hx; yi ≥ 0 8x 2 Kg Exercise 3. Prove that K∗ is a cone. Geometrically, the dual cone consists of vectors forming acute angles with all the vectors of the cone K. Exercise 4. Prove that (K∗)∗ = K. Definition. A cone K is said to be self-dual iff K∗ = K. m m m Exercise 5. Show that R+ , L and S+ are self-dual. 4 5 Dual conic programs Before we show the dual form of a general convex conic program, we need another important notion from linear algebra. Definition. An adjoint of a linear operator A : U ! V is the linear operator A∗ : V ! U such that for every u 2 U and v 2 V , ∗ hu; A viU = hAu; viV : In general, there is no relation between the adjoint operator and the inverse operator! Exercise 6. Show that the adjoint of a linear operator expressed by a real matrix A is the matrix AT. Let us now consider the (primal) conic program T min c x s:t: Ax − b ≥K 0: x2Rn For every feasible x, Ax − b 2 K. Therefore, by definition of the dual cone, for every y 2 K∗, hAx − b; yi ≥ 0, from where hAx; yi = hx; A∗yi ≥ hb; yi In particular, for A∗y = c, we have hc; xi ≥ hb; yi: In other words, hb; yi is the lower bound on the primal objective. Maximizing the latter bound yields the dual problem: max bTy s:t: A∗y = c: y2K∗ Most practical conic program satisfy the conditions of the following strong duality theo- rem: Theorem 2 (Strong duality). If either the primal or the dual problems are strictly feasible and bounded, then the other one is solvable and hc; x∗i = hb; y∗i. Let x∗ and y∗ the primal and the dual solutions, respectively. Substituting A∗y = c into the strong duality result yields 0 = hc; x∗i − hb; y∗i = hA∗y∗; x∗i − hb; y∗i = hy∗; Ax∗i − hy∗; bi = hy∗; Ax∗ − bi: Note that we have already seen this condition in the general nonlinear programming case, λ∗Tg(x∗) = 0, under the name of complementary slackness. As before, the dual variable y plays the role of the Lagrange multipliers. 5 As an example, let us derive the dual problem of SDP, in which m X Ax = xiAi: i=1 m Since S+ is self-dual, the dual problem can be written as maxhY; Bi s:t: A∗Y = c: Y0 Here, the inner product in the objective is a matrix inner product hY; Bi = tr(YB) (no ∗ m×m transpose due to symmetry). The adjoint operator A assigns to each matrix Y 2 R a n vector in R . In order to derive the adjoint, let fix some x and Y and use the definition m ∗ X hx; A Yi = hAx; Yi = h xiAi; Yi i=1 0 1 m hA1; Yi X B .