A Short Course on Semidefinite Programming
Total Page:16
File Type:pdf, Size:1020Kb
A Short Course on Semidefinite Programming (in order of appearance) Henry Wolkowicz, Etienne de Klerk, Levent Tunc¸el, Franz Rendl [email protected] Department of Combinatorics and Optimization University of Waterloo A Short Course on Semidefinite Programming – p. 1 About these Notes: Semidefinite Programming, SDP, refers to optimization problems where the vector variable is a symmetric matrix which is required to be positive semidefinite. Though SDPs (under various names) have been studied as far back as the 1940s, the interest has grown tremendously during the last ten years. This is partly due to the many diverse applications in e.g. engineering, combinatorial optimization, and statistics. Part of the interest is due to the great advances in efficient solutions for these types of problems. These notes summarize the theory, algorithms, and applications for semidefinite programming. They were prepared for a minicourse given at the Workshop on Large Scale Nonlinear and Semidefinite Programming, held in memory of Jos Sturm, May 12, 2004, at the University of Waterloo. A Short Course on Semidefinite Programming – p. 2 Preface The purpose of these notes is three fold: first, they provide a comprehensive treatment of the area of Semidefinite Programming, a new and exciting area in Optimization; second, the notes illustrate the strength of convex analysis in Optimization; third, they emphasize the interaction between theory and algorithms and solutions of practical problems. A Short Course on Semidefinite Programming – p. 3 1 Introduction and Motivation 1.1 Outline • Basic Properties and Notation • Examples • Historical Notes A Short Course on Semidefinite Programming – p. 4 1.2 Basic Properties and Notation Basic linear Semidefinite Programming looks just like Linear Programming p∗ = max trace CX (hC,Xi) (PSDP) s.t. A(X) ∼= AX = b (linear) X 0, (X ∈P) (nonneg) C,X ∈ Sn Sn := space of n × n symmetric matrices A Short Course on Semidefinite Programming – p. 5 Space of Symmetric Matrices A ∈ Sn := space of n × n symmetric matrices A is positive semidefinite (positive definite) (A 0 (A ≻ 0)) if xT Ax ≥ 0(> 0), ∀x =6 0. denotes the Löwner partial order, [53] A B if B − A 0 (positive semidefinite) TFAE: 1. A 0 (A ≻ 0) 2. the eigenvalues λ(A) ≥ 0 (λ(A) > 0) A Short Course on Semidefinite Programming – p. 6 Linear transformation A; Adjoint: A : Sn → Rm, A∗ : Rm → Sn n A defined by associated set {Ai ∈ S , i = 1,...m} : ∗ m (AX)i = trace(AiX); A y = i=1 yiAi, where P hAX,yi = hX, A∗yi , ∀X ∈ Sn, ∀y ∈ Rm n P = S+ - cone of positive semidefinite matrices replaces n R+ - nonnegative orthant A Short Course on Semidefinite Programming – p. 7 SDP or LMI n n trace CX = hC,Xi = i=1 j=1 CijXij SDP is equivalent to (LinearP P Matrix Inequalities): ∗ n n p = max i=1 j=1 CijXij (PSDP) s.t. hA ,Xi = b , i = 1,...m P iP i X 0, (X ∈P) A Short Course on Semidefinite Programming – p. 8 1.2.1 Duality payoff function; player Y to player X (Lagrangian) L(X,y) := trace(CX) + yT (b −AX) (= hC,Xi + hy, b −AXi) Optimal (worst case) strategy for player X: p∗ = max min L(X,y) X0 y For each fixed X 0: y free yields hidden constraint b −AX = 0; recovers primal problem (PSDP). A Short Course on Semidefinite Programming – p. 9 Rewrite Lagrangian/payoff L(X,y) = trace(CX) + yT (b −AX) = bT y + trace (C −A∗y) X ∗ using adjoint operator, A y = i yiAi hA∗y,Xi = hy,PAXi , ∀X,y Then: p∗ = max min L(X,y) ≤ d∗ := min max L(X,y) X0 y y X0 For dual: for each fixed y, that X 0 yields hidden constraint g(y) := C −A∗y 0 A Short Course on Semidefinite Programming – p. 10 Hidden Constraint p∗ = max min L(X,y) ≤ d∗ := min max L(X,y) X0 y y X0 dual obtained from optimal strategy of competing player Y; use hidden constraint g(y) = C −A∗y 0 d∗ = min bT y (DSDP) s.t. A∗y C for the primal p∗ = max trace CX (PSDP) s.t. AX = b X 0 A Short Course on Semidefinite Programming – p. 11 1.2.2 Weak Duality - Optimality Proposition 1.2.2.1 (Weak Duality) If X feas. in (PSDP), y feas. in (DSDP), Z = A∗y − C 0 is slack variable, then trace CX − bT y = −trace XZ ≤ 0. Proof. (Direct - using: trace ZX = trace X1/2X1/2Z = trace X1/2Z1/2Z1/2X1/2 = kX1/2Z1/2k2 ≥ 0; Note: XZ = 0 iff trace XZ = 0) trace CX − bT y = trace(A∗y − Z)X − bT y = trace yT AX − trace ZX − bT y = yT (AX − b) − trace ZX = −trace ZX, A Short Course on Semidefinite Programming – p. 12 Characterization of optimality for the dual pair X 0,y (and slack Z 0) A∗y − Z = C dual feasibility AX = b primal feasibility ZX = 0 complementary slackness And ZX = µI perturbed Forms the basis for: interior point methods (central path Xµ,yµ, Zµ); (primal simplex method, dual simplex method) A Short Course on Semidefinite Programming – p. 13 Strong Duality primal value = hC,Xi = hA∗y − Z,Xi dual feasibility = hy, AXi − hZ,Xi adjoint = hy, bi − hZ,Xi primal feasibility = hy, bi complementary slackness = dual value A Short Course on Semidefinite Programming – p. 14 1.2.3 Preliminary Examples Example 1.2.3.1 Minimizing the Maximum Eigenvalue Arises in e.g. stability of differential equations • The mathematical problem: ⋄ given A(x) ∈ Sn depending linearly on vector x ∈ Rm ⋄ Find x to minimize the maximum eigenvalue of A(x) • SDP Model: ⋄ the largest eigenvalue λmax(A(x)) ≤ α iff λmax(A(x) − αI) ≤ 0 iff A(x) − αI 0. ⋄ The DSDP (in dual form) is: max −α s.t. A(x) − αI 0. A Short Course on Semidefinite Programming – p. 15 Pseudoconvex Optimization Example 1.2.3.2 Pseudoconvex (Nonlinear) Optimization T 2 d∗ = min (c x) (PCP) dT x s.t. Ax + b ≥ 0, (given Ax + b ≥ 0 ⇒ dT x > 0) Then (using 2 × 2 determinant), (PCP) is equivalent to d∗ = min t Diag (Ax + b) 0 0 T s.t. 0 t c x 0 0 cT x dT x A Short Course on Semidefinite Programming – p. 16 1.2.4 Further Properties, Definitions • For matrices P, Q compatible for multiplication, trace P Q = trace QP. • Every symmetric matrix S has an orthogonal decomposition S = QΛQT , where Λ is diagonal with the eigenvalues of S on the diagonal, and Q has orthonormal columns consisting of eigenvectors of S. • X 0 ⇒ P T XP 0 (congruence) A Short Course on Semidefinite Programming – p. 17 • For S symmetric, TFAE: 1. S 0 (≻ 0) 2. vT Sv ≥ 0, ∀v ∈ Rn (vT Sv > 0, ∀0 =6 v ∈ Rn) 3. λ(S) ≥ 0, nonneg. eigs (> 0, pos. eigs) 4. S = P T P for some P (for some nonsingular P ) 1 2 1 5. S =(S 2 ) , for some S 2 0 (≻ 0) A Short Course on Semidefinite Programming – p. 18 • SDP - Linear Matrix Inequality g(y) = C −A∗y 0 ∗ T adjoint operator A y = i yiAi, Ai = Ai • g(y) is a P-convex constraint, i.e. P g(λx+(1−λ)y) λg(x)+(1−λ)g(y), ∀0 ≤ λ ≤ 1, ∀x,y ∈ Rm equivalently, for every K 0, the real valued function fK(x) := trace Kg(x) is a convex function, i.e. m fK(λx+(1−λ)y) ≤ λfK(x)+(1−λ)fK(y), ∀0 ≤ λ ≤ 1, ∀x,y ∈ R A Short Course on Semidefinite Programming – p. 19 Geometry - Feasible Set • feasible set {x ∈ Rm : g(x) 0} is convex set • optimum point is on the boundary (g(x) is singular) • boundary of feasible set is not smooth (piecewise smooth, algebraic surface) A Short Course on Semidefinite Programming – p. 20 1.2.5 Historical Notes • arguably most active area in optimization (see HANDBOOK OF SEMIDEFINITE PROGRAMMING: Theory, Algorithms, and Applications, 2000, [94], for comprehensive results, history, references, ... and books [15, 95, 96]) • Lyapunov over 100 years ago on stability analysis of differential equations • Bohnenblust 1948 on geometry of the cone of SDPs, [10] • Yakubovitch in the 1960’s and Boyd and others on convex optimization in control in the 1980’s, e.g. solving Ricatti Equations (called LMIs), e.g. [13],[92, 91] A Short Course on Semidefinite Programming – p. 21 More: Historical Notes • matrix completion problems (another name for SDP) started early 1980’s, continues to be a very active area of research, [19],[28], and e.g.: [47, 46, 45, 44, 48, 40, 32, 20, 41]. (More recently,it is being used to solve large scale SDPs.) • combinatorial optimization applications 1980’s: Lovász theta function [52]; the strong approximation results for the max-cut problem by Goemans-Williamson, e.g. [27], survey papers: [25, 26],[78]. • linear complementarity problems can be extended to problems over the cone of semidefinite matrices, e.g. [22, 39, 43, 42, 59]. A Short Course on Semidefinite Programming – p. 22 More: Historical Notes • Complexity, Distance to Ill-Posedness, and Condition Numbers SDP is a convex program and it can be solved to any desired accuracy in polynomial time, see seminal work of Nesterov and Nemirovski e.g. [63, 64, 68, 66, 62, 65, 67]. Another measure of complexity is the distance to ill-posedness: e.g. work by Renegar [80, 84, 83, 82, 81]. • Cone Programming this is a generalization of SDP, also called generalized linear programming, in paper by Bellman and Fan 1963, [8]. Other books deal with problems over cones date back to 60s, e.g.