A Gentle Introduction to Continuous and Mixed-Integer Conic Programming

A gentle introduction to Continuous and Mixed-Integer Conic Programming Universitàdi Bologna November 23th, 2020 Sven Wiese www.mosek.com Overview 1. From Linear to Conic Programming 2. Examples & Applications of cones - Symmetric cones - Non-symmetric cones - Yet more cones 3. Conic duality 4. Numerical solution methods - The continuous case - The mixed-integer case 1 / 67 1. From Linear to Conic Programming Linear Programming Linear Optimization in standard form: minimize cT x subject to Ax = b x ≥ 0: Pro: • Structure is explicit and simple. • Data is simple: c; A; b. • Structure implies data-independent convexity. • Powerful duality theory including Farkas lemma. Therefore, we have powerful algorithms and software. 3 / 67 Linear Programming Con: • It is linear only. 4 / 67 Nonlinear Programming The classical nonlinear optimization problem: minimize f (x) subject to h(x) = 0 g(x) ≤ 0: Pro • It is very general. Con: • Structure is less explicit. • How to specify the problem at all in software? • How to compute gradients and Hessians if needed? • How to exploit structure? • Smoothness? • Verifying convexity is NP-hard! 5 / 67 A fundamental question Is there a class of nonlinear optimization problems that preserves possibly many of the good properties of Linear Programming? 6 / 67 Good partial orderings Definition (Ben-Tal & Nemirovski, (2001)) n A \good" partial ordering of R is a vector relation that satisfies: 1. reflexivity 2. antisymmetry 3. transitivity 4. compatibility with linear operations The coordinatewise ordering x ≥ y () xi ≥ yi 8i = 1;:::; n is an example, but not the only one! 7 / 67 Good partial orderings • If for some good partial ordering \" we define n K := fa 2 R j a 0g; then K must be a pointed, convex cone: 1. a; a0 2 K =) a + a0 2 K 2. a 2 K; λ ≥ 0 =) λa 2 K 3. a 2 K and −a 2 K =) a = 0 • Conversely, if K is a non-empty pointed convex cone, then x K y :() x − y 2 K defines a good partial ordering. n The cone R+ is also closed and has a non-empty interior. 8 / 67 (Mixed-Integer) Conic Programming We thus consider problems of the form minimize cT x subject to Ax = b n p n x 2≥ RK0 +\ (Z × R ) where K is a (closed) pointed convex cone (with non-empty interior). • (MI)LP is a special case! • Typically, K = K1 × K2 × · · · × KK is a product of lower-dimensional cones. • A conic building block Ki can be thought of as encoding some type of specific non-linearity. 9 / 67 The beauty of conic optimization • Separation of data and structure: • Data: c, A and b. • Structure: K. • No issues with smoothness and differentiability. • Structural convexity. • Duality (almost...). 10 / 67 References I • A. Ben-Tal & A. Nemirovski: Lectures on Modern Convex Optimization (2001). 11 / 67 2. Examples & Applications of cones The conic wheel LP cone K 13 / 67 Quadratic cones n After the non-negative orthant R+, the quadratic-cone family is arguably most prominent. • the quadratic cone n n 2 21=2 Q = fx 2 R j x1 ≥ x2 + ··· + xn = kx2:nk2g; • the rotated quadratic cone n n 2 2 2 Qr = fx 2 R j 2x1x2 ≥ x3 + ··· + xn = kx3:nk2; x1; x2 ≥ 0g: n n Are equivalent in the sense that x 2 Q () Tnx 2 Qr with p p 0 1 1=p21 = p20 @1= 2 −1= 20 A : 00 In−2 14 / 67 Quadratic cones in dimension 3 x1 x1 x3 x3 x2 x2 15 / 67 Conic quadratic case study: least squares regression In least squares regression we use the penalty function φ(r) = krk2: n In its simplest form, given observations y 2 R and features n×d X 2 R , it solves min ky − Xwk2: w2Rd Start with a small and simple amount of reformulation: minimize t subject to t ≥ ky − Xwk2 d t 2 R; w 2 R 16 / 67 Conic quadratic case study: least squares regression In the conic framework this would be written as minimize t subject to s = y − Xw (t; s) 2 Qn+1 d w 2 R : We usually use the more compact notation (t; y − Xw) 2 Qn+1: 17 / 67 More conic quadratic modeling • Second-order cone inequality: T T m+1 c x + d ≥ kAx + bk2 () (c x + d; Ax + b) 2 Q : • Squared Euclidean norm: 2 n+2 t ≥ kxk2 () (t; 1=2; x) 2 Qr : • Convex quadratic inequality: T T k+2 t ≥ (1=2)x Qx () (t; 1; F x) 2 Qr T n×k with Q = F F , F 2 R . Any convex (MI)QCQP can be cast in conic form! 18 / 67 More conic quadratic modeling • Square roots, convex hyperbolic function, some convex negative rational powers... • Convex positive rational power t ≥ x3=2; x ≥ 0 : If we impose 1 (s; t; x); (x; 1=8; s) 2 Q3 () 2st ≥ x2; 2x · ≥ s2; r 8 it follows that 1 4s2t2 · x ≥ x4s2 =) t2 ≥ x3 =) t ≥ x3=2: 4 19 / 67 The positive semidefinite cone The positive semidefinite cone can be defined as a subspace of the n(n+1)=2 vector space R n(n+1)=2 n(n+1)=2 T n S := fx 2 R j z smat(x)z ≥ 0; 8z 2 R g; with p p 0 1 x1p x2= 2 ::: xn= p2 B x2= 2 xn+1 ::: x2n−1= 2 C smat(x) := B C : B . C @ .p . p . A xn= 2 x2n−1= 2 ::: xn(n+1)=2 An equivalent definition via matrix variables: n n T n X 2 S+ :() X 2 S and z Xz ≥ 0 8z 2 R : X is mapped to Sn(n+1)=2 via p p p T svec(X ) := (X11; 2X21;:::; 2Xn1; X22; 2X32;:::; Xnn) : 20 / 67 SDP use-case: Nearest correlation matrix n Let A 2 S and assume we want to find its nearest correlation matrix ∗ n X 2 C := fX 2 S+ j Xii = 1 8i = 1;:::; ng; i.e., ∗ X = min kA − X kF : X 2C A conic formulation in vector space is given by minimize t subject to x1 = xn+1 = x2n = ::: = xn(n+1)=2 = 1 (t; svec(A) − x) 2 Qn(n+1)=2+1 x 2 Sn(n+1)=2: 21 / 67 More semidefinite modeling • SDP can come in handy in eigenvalue optimization, e.g., if tI − X n 0; S+ then t is an upper bound on the largest eigenvalue of X . • SDP-relaxations play a role in Quadratic Programming and in Combinatorial Optimization: T T X = xx can be relaxed to X − xx n 0: S+ • There are applications for Mixed-Integer SDP, see, e.g., Gally, Pfetsch and Ulbrich (2018). 22 / 67 Back to the conic wheel LP cone K quadratic cones SDP The three cones we have seen so far are so-called symmetric cones, i.e., they are 1. homogeneous 2. self-dual 23 / 67 The exponential cone The exponential cone is defined as the closure of the epigraph of the perspective of the exponential function: 3 Kexp := clfx 2 R j x1 ≥ x2 exp(x3=x2); x2 > 0g; or more explicitly Kexp = f(x1; x2; x3) j x1 ≥ x2 exp(x3=x2); x2 > 0g [ f(x1; 0; x3) j x1 ≥ 0; x3 ≤ 0g: The exponential cone is non-symmetric! 24 / 67 The exponential cone x1 x3 x2 25 / 67 Exponential cone use case: Geometric Programming Consider the very simple Geometric Program 0:3 minimize px + y z subject to x + y −1 ≤ 1 x; y; z > 0 First note that ex1 + ::: + exk ≤ 1 can be modeled as k X (ui ; 1; xi ) 2 Kexp 8i = 1;:::; k and ui ≤ 1; i=1 and then substitute x = ep; y = eq; z = er : minimize t subject to( u1; 1; p − t); (u2; 1; 0:3q + w − t) 2 Kexp; u1 + u2 ≤ 1 (v1; 1; p=2); (v2; 1; −q) 2 Kexp; v1 + v2 ≤ 1 26 / 67 More exponential cone modeling • Logarithm: log x ≥ t () (x; 1; t) 2 Kexp: • Entropy: −x log x ≥ t () (1; x; t) 2 Kexp: • Relative entropy: x log(x=y) ≤ t () (y; x; −t) 2 Kexp: • Softplus function: x log(1+e ) ≤ t () (u; 1; x−t); (v; 1; −t) 2 Kexp; u+v ≤ 1: 27 / 67 The power cone The power cone is defined as α n α (1−α) Pn = fx 2 R j x1 x2 ≥ kx3:nk2; x1; x2 ≥ 0g; for0 < α < 1. One may also restrict to the three-dimensional power cone without losing any modeling capabilities: α α n−1 (x1;:::; xn) 2 Pn () (x1; x2; z) 2 P3 ; (z; x3;:::; xn) 2 Q : Also the power cone is non-symmetric! 28 / 67 The power cone x1 x1 x3 x3 α = 0:6 x2 α = 0:8 x2 29 / 67 Power cone modeling • Simple powers: p p jtj ≤ x ; x ≥ 0 with 0 < p < 1 () (x; 1; t) 2 P3 : p 1=p t ≥ jxj with p > 1 () (t; 1; x) 2 P3 : 3=2 2=3 Example: t ≥ x ; x ≥ 0 () (t; 1; x) 2 P3 (instead of 3 (s; t; x); (x; 1=8; s) 2 Qr ...) • p-norms, geometric mean, ... 30 / 67 How general is the conic wheel? exponential LP cone cone Continuous Optimization Folklore power K quadratic \Almost all convex constraints which cone cones arise in practice are representable using these 5 cones." SDP More evidence: Lubin et al. (2016) show that all convex instances (333) in MINLPLIB2 are conic representable using only 4 of the above cones. 31 / 67 Verifying convexity Is log(1=(1 + exp(−x))) ≤ 0 a convex constraint? From ask.cvxr.com: Verifying convexity can be hard! Solution: Disciplined Convex Programming (DCP) by Grant, Boyd and Ye (2006): only allow for modeling operations that preserve convexity.

A Gentle Introduction to Continuous and Mixed-Integer Conic Programming

Multi-Objective Optimization of Unidirectional Non-Isolated Dc/Dcconverters

Julia, My New Friend for Computing and Optimization? Pierre Haessig, Lilian Besson

Numericaloptimization

TOMLAB –Unique Features for Optimization in MATLAB

COSMO: a Conic Operator Splitting Method for Convex Conic Problems

Introduction to Mosek

AJINKYA KADU 503, Hans Freudenthal Building, Budapestlaan 6, 3584 CD Utrecht, the Netherlands

Robert Fourer, David M. Gay INFORMS Annual Meeting

Julia: a Modern Language for Modern ML

Open Source Tools for Optimization in Python

Benchmarks for Current Linear and Mixed Integer Optimization Solvers

Insight MFR By