A gentle introduction to Continuous and Mixed-Integer Conic Programming
Universit`adi Bologna November 23th, 2020
Sven Wiese
www.mosek.com Overview
1. From Linear to Conic Programming
2. Examples & Applications of cones - Symmetric cones - Non-symmetric cones - Yet more cones
3. Conic duality
4. Numerical solution methods - The continuous case - The mixed-integer case
1 / 67 1. From Linear to Conic Programming Linear Programming
Linear Optimization in standard form:
minimize cT x subject to Ax = b x ≥ 0.
Pro: • Structure is explicit and simple. • Data is simple: c, A, b. • Structure implies data-independent convexity. • Powerful duality theory including Farkas lemma.
Therefore, we have powerful algorithms and software.
3 / 67 Linear Programming
Con: • It is linear only.
4 / 67 Nonlinear Programming
The classical nonlinear optimization problem:
minimize f (x) subject to h(x) = 0 g(x) ≤ 0.
Pro • It is very general. Con: • Structure is less explicit. • How to specify the problem at all in software? • How to compute gradients and Hessians if needed? • How to exploit structure? • Smoothness? • Verifying convexity is NP-hard! 5 / 67 A fundamental question
Is there a class of nonlinear optimization problems that preserves possibly many of the good properties of Linear Programming?
6 / 67 Good partial orderings
Definition (Ben-Tal & Nemirovski, (2001)) n A “good” partial ordering of R is a vector relation that satisfies: 1. reflexivity 2. antisymmetry 3. transitivity 4. compatibility with linear operations
The coordinatewise ordering
x ≥ y ⇐⇒ xi ≥ yi ∀i = 1,..., n
is an example, but not the only one!
7 / 67 Good partial orderings
• If for some good partial ordering “” we define
n K := {a ∈ R | a 0}, then K must be a pointed, convex cone: 1. a, a0 ∈ K =⇒ a + a0 ∈ K 2. a ∈ K, λ ≥ 0 =⇒ λa ∈ K 3. a ∈ K and −a ∈ K =⇒ a = 0
• Conversely, if K is a non-empty pointed convex cone, then x K y :⇐⇒ x − y ∈ K defines a good partial ordering.
n The cone R+ is also closed and has a non-empty interior.
8 / 67 (Mixed-Integer) Conic Programming
We thus consider problems of the form minimize cT x subject to Ax = b n p n x ∈≥ RK0 +∩ (Z × R )
where K is a (closed) pointed convex cone (with non-empty interior).
• (MI)LP is a special case!
• Typically, K = K1 × K2 × · · · × KK is a product of lower-dimensional cones.
• A conic building block Ki can be thought of as encoding some type of specific non-linearity. 9 / 67 The beauty of conic optimization
• Separation of data and structure: • Data: c, A and b. • Structure: K.
• No issues with smoothness and differentiability. • Structural convexity. • Duality (almost...).
10 / 67 References I
• A. Ben-Tal & A. Nemirovski: Lectures on Modern Convex Optimization (2001).
11 / 67 2. Examples & Applications of cones The conic wheel
LP cone
K
13 / 67 Quadratic cones
n After the non-negative orthant R+, the quadratic-cone family is arguably most prominent.
• the quadratic cone
n n 2 21/2 Q = {x ∈ R | x1 ≥ x2 + ··· + xn = kx2:nk2},
• the rotated quadratic cone n n 2 2 2 Qr = {x ∈ R | 2x1x2 ≥ x3 + ··· + xn = kx3:nk2, x1, x2 ≥ 0}.
n n Are equivalent in the sense that x ∈ Q ⇐⇒ Tnx ∈ Qr with √ √ 1/√21 / √20 1/ 2 −1/ 20 . 00 In−2 14 / 67 Quadratic cones in dimension 3
x1
x1
x3
x3 x2
x2
15 / 67 Conic quadratic case study: least squares regression
In least squares regression we use the penalty function
φ(r) = krk2.
n In its simplest form, given observations y ∈ R and features n×d X ∈ R , it solves min ky − Xwk2. w∈Rd Start with a small and simple amount of reformulation:
minimize t subject to t ≥ ky − Xwk2 d t ∈ R, w ∈ R
16 / 67 Conic quadratic case study: least squares regression
In the conic framework this would be written as minimize t subject to s = y − Xw (t, s) ∈ Qn+1 d w ∈ R .
We usually use the more compact notation
(t, y − Xw) ∈ Qn+1.
17 / 67 More conic quadratic modeling
• Second-order cone inequality:
T T m+1 c x + d ≥ kAx + bk2 ⇐⇒ (c x + d, Ax + b) ∈ Q .
• Squared Euclidean norm:
2 n+2 t ≥ kxk2 ⇐⇒ (t, 1/2, x) ∈ Qr .
• Convex quadratic inequality:
T T k+2 t ≥ (1/2)x Qx ⇐⇒ (t, 1, F x) ∈ Qr
T n×k with Q = F F , F ∈ R .
Any convex (MI)QCQP can be cast in conic form! 18 / 67 More conic quadratic modeling
• Square roots, convex hyperbolic function, some convex negative rational powers...
• Convex positive rational power
t ≥ x3/2, x ≥ 0 :
If we impose 1 (s, t, x), (x, 1/8, s) ∈ Q3 ⇐⇒ 2st ≥ x2, 2x · ≥ s2, r 8 it follows that 1 4s2t2 · x ≥ x4s2 =⇒ t2 ≥ x3 =⇒ t ≥ x3/2. 4
19 / 67 The positive semidefinite cone
The positive semidefinite cone can be defined as a subspace of the n(n+1)/2 vector space R n(n+1)/2 n(n+1)/2 T n S := {x ∈ R | z smat(x)z ≥ 0, ∀z ∈ R }, with √ √ x1√ x2/ 2 ... xn/ √2 x2/ 2 xn+1 ... x2n−1/ 2 smat(x) := . . . . .√ . √ . xn/ 2 x2n−1/ 2 ... xn(n+1)/2 An equivalent definition via matrix variables: n n T n X ∈ S+ :⇐⇒ X ∈ S and z Xz ≥ 0 ∀z ∈ R . X is mapped to Sn(n+1)/2 via √ √ √ T svec(X ) := (X11, 2X21,..., 2Xn1, X22, 2X32,..., Xnn) .
20 / 67 SDP use-case: Nearest correlation matrix
n Let A ∈ S and assume we want to find its nearest correlation matrix
∗ n X ∈ C := {X ∈ S+ | Xii = 1 ∀i = 1,..., n},
i.e., ∗ X = min kA − X kF . X ∈C A conic formulation in vector space is given by
minimize t subject to x1 = xn+1 = x2n = ... = xn(n+1)/2 = 1 (t, svec(A) − x) ∈ Qn(n+1)/2+1 x ∈ Sn(n+1)/2.
21 / 67 More semidefinite modeling
• SDP can come in handy in eigenvalue optimization, e.g., if
tI − X n 0, S+ then t is an upper bound on the largest eigenvalue of X .
• SDP-relaxations play a role in Quadratic Programming and in Combinatorial Optimization:
T T X = xx can be relaxed to X − xx n 0. S+
• There are applications for Mixed-Integer SDP, see, e.g., Gally, Pfetsch and Ulbrich (2018).
22 / 67 Back to the conic wheel
LP cone
K quadratic cones
SDP
The three cones we have seen so far are so-called symmetric cones, i.e., they are 1. homogeneous 2. self-dual 23 / 67 The exponential cone
The exponential cone is defined as the closure of the epigraph of the perspective of the exponential function:
3 Kexp := cl{x ∈ R | x1 ≥ x2 exp(x3/x2), x2 > 0},
or more explicitly
Kexp = {(x1, x2, x3) | x1 ≥ x2 exp(x3/x2), x2 > 0} [
{(x1, 0, x3) | x1 ≥ 0, x3 ≤ 0}.
The exponential cone is non-symmetric!
24 / 67 The exponential cone
x1
x3
x2
25 / 67 Exponential cone use case: Geometric Programming
Consider the very simple Geometric Program
0.3 minimize √x + y z subject to x + y −1 ≤ 1 x, y, z > 0
First note that ex1 + ... + exk ≤ 1 can be modeled as
k X (ui , 1, xi ) ∈ Kexp ∀i = 1,..., k and ui ≤ 1, i=1 and then substitute x = ep, y = eq, z = er :
minimize t subject to( u1, 1, p − t), (u2, 1, 0.3q + w − t) ∈ Kexp, u1 + u2 ≤ 1 (v1, 1, p/2), (v2, 1, −q) ∈ Kexp, v1 + v2 ≤ 1
26 / 67 More exponential cone modeling
• Logarithm:
log x ≥ t ⇐⇒ (x, 1, t) ∈ Kexp.
• Entropy:
−x log x ≥ t ⇐⇒ (1, x, t) ∈ Kexp.
• Relative entropy:
x log(x/y) ≤ t ⇐⇒ (y, x, −t) ∈ Kexp.
• Softplus function: x log(1+e ) ≤ t ⇐⇒ (u, 1, x−t), (v, 1, −t) ∈ Kexp, u+v ≤ 1.
27 / 67 The power cone
The power cone is defined as
α n α (1−α) Pn = {x ∈ R | x1 x2 ≥ kx3:nk2, x1, x2 ≥ 0},
for0 < α < 1.
One may also restrict to the three-dimensional power cone without losing any modeling capabilities:
α α n−1 (x1,..., xn) ∈ Pn ⇐⇒ (x1, x2, z) ∈ P3 , (z, x3,..., xn) ∈ Q .
Also the power cone is non-symmetric!
28 / 67 The power cone
x1 x1
x3 x3
α = 0.6 x2 α = 0.8 x2
29 / 67 Power cone modeling
• Simple powers:
p p |t| ≤ x , x ≥ 0 with 0 < p < 1 ⇐⇒ (x, 1, t) ∈ P3 .
p 1/p t ≥ |x| with p > 1 ⇐⇒ (t, 1, x) ∈ P3 .
3/2 2/3 Example: t ≥ x , x ≥ 0 ⇐⇒ (t, 1, x) ∈ P3 (instead of 3 (s, t, x), (x, 1/8, s) ∈ Qr ...)
• p-norms, geometric mean, ...
30 / 67 How general is the conic wheel?
exponential LP cone cone Continuous Optimization Folklore
power K quadratic “Almost all convex constraints which
cone cones arise in practice are representable using these 5 cones.” SDP
More evidence: Lubin et al. (2016) show that all convex instances (333) in MINLPLIB2 are conic representable using only 4 of the above cones.
31 / 67 Verifying convexity
Is log(1/(1 + exp(−x))) ≤ 0 a convex constraint?
From ask.cvxr.com:
Verifying convexity can be hard!
Solution: Disciplined Convex Programming (DCP) by Grant, Boyd and Ye (2006): only allow for modeling operations that preserve convexity.
32 / 67 Extremely Disciplined Convex Programming
We call modeling with the aforementioned 5 cones
Extremely Disciplined Convex Programming.
• More strict than DCP.. • ... but leading to guaranteed convexity and conic-representability. • Aiming at the development of efficient numerical algorithms.
33 / 67 Are there more cones?
For every convex function g(x) the set
K := cl{(y, s, x) | y ≥ s · g(x/s)}
is a closed pointed convex cone. So
y ≥ g(x) ⇐⇒ (y, 1, x) ∈ K.
But how de we handle K computationally, and is it tractable?
34 / 67 Do we need more cones?
35 / 67 Do we need more cones?
The Extremely-DCP framework is very general, but does it have limitations? • A folkloristic saying is not a formal theorem. • More cones may lead to less reformulation.
Coey, Kapelevich and Vielma (2020) introduce a framework for Generic Conic Programming, treating more exotic cones.
In the literature, note the prominent appearance of • the completely positive, • the copositive • and the doubly-non-negative cone.
36 / 67 Exotic cones
• The infinity norm cone n K`∞ = {x ∈ R | x1 ≥ kx2:nk∞}
• The relative entropy cone d d d X Kentr = cl{(x, u, v) ∈ R × R × R | x ≥ ui log(ui /vi )} i=1 • The spectral norm cone
d1×d2 Kspec(d1,d2) = {(x, X ) ∈ R × R | x ≥ σ1(X )}
• Root-determinant cone, Log-determinant cone, Polynomial weighted sum-of-squares cone, ... xId1 X d1+d2 For example,( x, X ) ∈ Kspec(d ,d ) ⇐⇒ ∈ . 1 2 T S+ 37 / 67 X xId2 The extended conic wheel
Conic software: • MOSEK: LP, QCP, SDP, Exp, Pow, with MI support • SeDuMi, CSDP, SDPA, SDPD, N orm SDPT3: SDP, QCP co n exponential LP e • CPLEX, Gurobi, XPRESS: s , (MI)-LP and -SOCP cone cone m
a
t • SCS: LP, QCP, SDP, Exp, Pow
r i
power K quadratic x
c • ECOS: QCP, Exp
cone cones o n
e
s • SCIP-SDP: MI-SDP
SDP ,
o
.
t
.
h
.
s
e •
e : OA-framework for r Pajarito.jl
n
e o x
c o t c i MI, -QCP, -SDP, -Exp • Hypatia.jl: Generic Conic Programming • Modeling: CVX, Yalmip, JuMP 38 / 67 References II
• The MOSEK modeling cookbook (2020).
• T. Gally, M. Pfetsch, S. Ulbrich: A Framework for Solving Mixed-Integer Semidefinite Programs (2018). • M. Lubin and E. Yamangil and R. Bent, J. P. Vielma: Extended Formulations in Mixed-integer Convex Programming (2016). • M. Grant, S. Boyd, Y. Ye: Disciplined Convex Programming (2006). • C. Coey, L. Kapelevich, J. P. Vielma: Towards Practical Generic Conic Optimization (2020).
39 / 67 3. Conic duality Lagrangian duality
Recall the nonlinear optimization problem
minimize f (x) subject to h(x) = 0 g(x) ≤ 0.
The Lagrangian duality approach defines the Lagrange function
L(x, µ, λ) = f (x) + µT h(x) + λT g(x),
and the dual function
g(µ, λ) = inf L(x, µ, λ). x If g(µ, λ) > −∞, we call( µ, λ) dual feasible.
41 / 67 LP is special...
... because the dual takes on an explicit form:
f (x) = cT x, h(x) = Ax − b, g(x) = −x
leads to the Lagrange function
L(x, µ, λ) = cT x + µT (Ax − b) − λT x,
and the dual function is finite if (dual feasibility!)
AT µ + c − λ = 0 and λ ≥ 0.
Note that λ ≥ 0 guarantees −λT x ≤ 0 (for primal feasible x).
42 / 67 Extending duality to Conic Programming
In the conic framework
minimize cT x subject to Ax − b = 0 x ∈ K,
we need dual variables λ that satisfy
−λT x ≤ 0 ∀x ∈ K,
thus giving rise to the set
∗ n T K = {y ∈ R | y x ≥ 0 ∀x ∈ K}.
For any ∅= 6 K, K∗ is a closed convex cone, and if K is a cone, we call K∗ its dual cone!
43 / 67 Conic dual problem
The conic dual takes on the (explicit!) form
maximize bT y subject to −AT y + c − λ = 0 λ ∈ K∗,
and the feasible set can more compactly be written as
T ∗ T c − A y ∈ K or c ≥K∗ A y.
Weak duality comes for free:
bT y = (Ax)T y = xT · AT y = xT · (c − λ) = cT x − λT x ≤ cT x.
44 / 67 Dual cones
n n n Self-duality: for K ∈ {R+, Q , S+},
∗ x1 K = K.
x 3 x1
x2
x3 x2
Kexp is not self-dual:
∗ 3 (Kexp) = cl{x ∈ R | x1 ≥ −x3 exp(x2/x3), x3 < 0}
45 / 67 Farkas lemma - the LP case
b
n A(R+) = {Ax | x ≥ 0} a1 a2
a3