
A gentle introduction to Continuous and Mixed-Integer Conic Programming Universit`adi Bologna November 23th, 2020 Sven Wiese www.mosek.com Overview 1. From Linear to Conic Programming 2. Examples & Applications of cones - Symmetric cones - Non-symmetric cones - Yet more cones 3. Conic duality 4. Numerical solution methods - The continuous case - The mixed-integer case 1 / 67 1. From Linear to Conic Programming Linear Programming Linear Optimization in standard form: minimize cT x subject to Ax = b x ≥ 0: Pro: • Structure is explicit and simple. • Data is simple: c; A; b. • Structure implies data-independent convexity. • Powerful duality theory including Farkas lemma. Therefore, we have powerful algorithms and software. 3 / 67 Linear Programming Con: • It is linear only. 4 / 67 Nonlinear Programming The classical nonlinear optimization problem: minimize f (x) subject to h(x) = 0 g(x) ≤ 0: Pro • It is very general. Con: • Structure is less explicit. • How to specify the problem at all in software? • How to compute gradients and Hessians if needed? • How to exploit structure? • Smoothness? • Verifying convexity is NP-hard! 5 / 67 A fundamental question Is there a class of nonlinear optimization problems that preserves possibly many of the good properties of Linear Programming? 6 / 67 Good partial orderings Definition (Ben-Tal & Nemirovski, (2001)) n A \good" partial ordering of R is a vector relation that satisfies: 1. reflexivity 2. antisymmetry 3. transitivity 4. compatibility with linear operations The coordinatewise ordering x ≥ y () xi ≥ yi 8i = 1;:::; n is an example, but not the only one! 7 / 67 Good partial orderings • If for some good partial ordering \" we define n K := fa 2 R j a 0g; then K must be a pointed, convex cone: 1. a; a0 2 K =) a + a0 2 K 2. a 2 K; λ ≥ 0 =) λa 2 K 3. a 2 K and −a 2 K =) a = 0 • Conversely, if K is a non-empty pointed convex cone, then x K y :() x − y 2 K defines a good partial ordering. n The cone R+ is also closed and has a non-empty interior. 8 / 67 (Mixed-Integer) Conic Programming We thus consider problems of the form minimize cT x subject to Ax = b n p n x 2≥ RK0 +\ (Z × R ) where K is a (closed) pointed convex cone (with non-empty interior). • (MI)LP is a special case! • Typically, K = K1 × K2 × · · · × KK is a product of lower-dimensional cones. • A conic building block Ki can be thought of as encoding some type of specific non-linearity. 9 / 67 The beauty of conic optimization • Separation of data and structure: • Data: c, A and b. • Structure: K. • No issues with smoothness and differentiability. • Structural convexity. • Duality (almost...). 10 / 67 References I • A. Ben-Tal & A. Nemirovski: Lectures on Modern Convex Optimization (2001). 11 / 67 2. Examples & Applications of cones The conic wheel LP cone K 13 / 67 Quadratic cones n After the non-negative orthant R+, the quadratic-cone family is arguably most prominent. • the quadratic cone n n 2 21=2 Q = fx 2 R j x1 ≥ x2 + ··· + xn = kx2:nk2g; • the rotated quadratic cone n n 2 2 2 Qr = fx 2 R j 2x1x2 ≥ x3 + ··· + xn = kx3:nk2; x1; x2 ≥ 0g: n n Are equivalent in the sense that x 2 Q () Tnx 2 Qr with p p 0 1 1=p21 = p20 @1= 2 −1= 20 A : 00 In−2 14 / 67 Quadratic cones in dimension 3 x1 x1 x3 x3 x2 x2 15 / 67 Conic quadratic case study: least squares regression In least squares regression we use the penalty function φ(r) = krk2: n In its simplest form, given observations y 2 R and features n×d X 2 R , it solves min ky − Xwk2: w2Rd Start with a small and simple amount of reformulation: minimize t subject to t ≥ ky − Xwk2 d t 2 R; w 2 R 16 / 67 Conic quadratic case study: least squares regression In the conic framework this would be written as minimize t subject to s = y − Xw (t; s) 2 Qn+1 d w 2 R : We usually use the more compact notation (t; y − Xw) 2 Qn+1: 17 / 67 More conic quadratic modeling • Second-order cone inequality: T T m+1 c x + d ≥ kAx + bk2 () (c x + d; Ax + b) 2 Q : • Squared Euclidean norm: 2 n+2 t ≥ kxk2 () (t; 1=2; x) 2 Qr : • Convex quadratic inequality: T T k+2 t ≥ (1=2)x Qx () (t; 1; F x) 2 Qr T n×k with Q = F F , F 2 R . Any convex (MI)QCQP can be cast in conic form! 18 / 67 More conic quadratic modeling • Square roots, convex hyperbolic function, some convex negative rational powers... • Convex positive rational power t ≥ x3=2; x ≥ 0 : If we impose 1 (s; t; x); (x; 1=8; s) 2 Q3 () 2st ≥ x2; 2x · ≥ s2; r 8 it follows that 1 4s2t2 · x ≥ x4s2 =) t2 ≥ x3 =) t ≥ x3=2: 4 19 / 67 The positive semidefinite cone The positive semidefinite cone can be defined as a subspace of the n(n+1)=2 vector space R n(n+1)=2 n(n+1)=2 T n S := fx 2 R j z smat(x)z ≥ 0; 8z 2 R g; with p p 0 1 x1p x2= 2 ::: xn= p2 B x2= 2 xn+1 ::: x2n−1= 2 C smat(x) := B C : B . C @ .p . p . A xn= 2 x2n−1= 2 ::: xn(n+1)=2 An equivalent definition via matrix variables: n n T n X 2 S+ :() X 2 S and z Xz ≥ 0 8z 2 R : X is mapped to Sn(n+1)=2 via p p p T svec(X ) := (X11; 2X21;:::; 2Xn1; X22; 2X32;:::; Xnn) : 20 / 67 SDP use-case: Nearest correlation matrix n Let A 2 S and assume we want to find its nearest correlation matrix ∗ n X 2 C := fX 2 S+ j Xii = 1 8i = 1;:::; ng; i.e., ∗ X = min kA − X kF : X 2C A conic formulation in vector space is given by minimize t subject to x1 = xn+1 = x2n = ::: = xn(n+1)=2 = 1 (t; svec(A) − x) 2 Qn(n+1)=2+1 x 2 Sn(n+1)=2: 21 / 67 More semidefinite modeling • SDP can come in handy in eigenvalue optimization, e.g., if tI − X n 0; S+ then t is an upper bound on the largest eigenvalue of X . • SDP-relaxations play a role in Quadratic Programming and in Combinatorial Optimization: T T X = xx can be relaxed to X − xx n 0: S+ • There are applications for Mixed-Integer SDP, see, e.g., Gally, Pfetsch and Ulbrich (2018). 22 / 67 Back to the conic wheel LP cone K quadratic cones SDP The three cones we have seen so far are so-called symmetric cones, i.e., they are 1. homogeneous 2. self-dual 23 / 67 The exponential cone The exponential cone is defined as the closure of the epigraph of the perspective of the exponential function: 3 Kexp := clfx 2 R j x1 ≥ x2 exp(x3=x2); x2 > 0g; or more explicitly Kexp = f(x1; x2; x3) j x1 ≥ x2 exp(x3=x2); x2 > 0g [ f(x1; 0; x3) j x1 ≥ 0; x3 ≤ 0g: The exponential cone is non-symmetric! 24 / 67 The exponential cone x1 x3 x2 25 / 67 Exponential cone use case: Geometric Programming Consider the very simple Geometric Program 0:3 minimize px + y z subject to x + y −1 ≤ 1 x; y; z > 0 First note that ex1 + ::: + exk ≤ 1 can be modeled as k X (ui ; 1; xi ) 2 Kexp 8i = 1;:::; k and ui ≤ 1; i=1 and then substitute x = ep; y = eq; z = er : minimize t subject to( u1; 1; p − t); (u2; 1; 0:3q + w − t) 2 Kexp; u1 + u2 ≤ 1 (v1; 1; p=2); (v2; 1; −q) 2 Kexp; v1 + v2 ≤ 1 26 / 67 More exponential cone modeling • Logarithm: log x ≥ t () (x; 1; t) 2 Kexp: • Entropy: −x log x ≥ t () (1; x; t) 2 Kexp: • Relative entropy: x log(x=y) ≤ t () (y; x; −t) 2 Kexp: • Softplus function: x log(1+e ) ≤ t () (u; 1; x−t); (v; 1; −t) 2 Kexp; u+v ≤ 1: 27 / 67 The power cone The power cone is defined as α n α (1−α) Pn = fx 2 R j x1 x2 ≥ kx3:nk2; x1; x2 ≥ 0g; for0 < α < 1. One may also restrict to the three-dimensional power cone without losing any modeling capabilities: α α n−1 (x1;:::; xn) 2 Pn () (x1; x2; z) 2 P3 ; (z; x3;:::; xn) 2 Q : Also the power cone is non-symmetric! 28 / 67 The power cone x1 x1 x3 x3 α = 0:6 x2 α = 0:8 x2 29 / 67 Power cone modeling • Simple powers: p p jtj ≤ x ; x ≥ 0 with 0 < p < 1 () (x; 1; t) 2 P3 : p 1=p t ≥ jxj with p > 1 () (t; 1; x) 2 P3 : 3=2 2=3 Example: t ≥ x ; x ≥ 0 () (t; 1; x) 2 P3 (instead of 3 (s; t; x); (x; 1=8; s) 2 Qr ...) • p-norms, geometric mean, ... 30 / 67 How general is the conic wheel? exponential LP cone cone Continuous Optimization Folklore power K quadratic \Almost all convex constraints which cone cones arise in practice are representable using these 5 cones." SDP More evidence: Lubin et al. (2016) show that all convex instances (333) in MINLPLIB2 are conic representable using only 4 of the above cones. 31 / 67 Verifying convexity Is log(1=(1 + exp(−x))) ≤ 0 a convex constraint? From ask.cvxr.com: Verifying convexity can be hard! Solution: Disciplined Convex Programming (DCP) by Grant, Boyd and Ye (2006): only allow for modeling operations that preserve convexity.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages68 Page
-
File Size-