Cutting Plane and Subgradient Methods

John E. Mitchell

Department of Mathematical Sciences RPI, Troy, NY 12180 USA

October 12, 2009

Collaborators: Luc Basescu, Brian Borchers, Mohammad Oskoorouchi, Srini Ramaswamy, Kartik Sivaramakrishnan, Mike Todd

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 1 / 72 Outline

1 Interior Point Cutting Plane and Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization

2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number

3 Smoothing techniques and subgradient methods

4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 2 / 72 Interior Point Cutting Plane and Column Generation Methods Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 3 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 4 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction The Traveling Salesman Problem

TSP Webpage Robert Bosch Bill Cook, Georgia Tech Oberlin College

Lower bounds determined using cutting planes and branch-and-cut.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 5 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction MIP Formulation

Want to solve the problem

T maxy∈IRm b y subject to AT y ≤ c

yi integer for i ∈ I

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 6 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction LP Relaxation

Relax integrality restriction: Dual problem: Primal problem:

T T m n maxy∈IR b y minx∈IR ,x0 c x + c0x0 T s.t. A y ≤ c s.t. Ax + a0x0 = b T ≥ a0 y ≤ c0 x 0

Add a cutting plane to the dual problem

Corresponds to column generation in the primal.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 7 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction LP Relaxation

Relax integrality restriction: Dual problem: Primal problem:

T T m n maxy∈IR b y minx∈IR ,x0 c x + c0x0 T s.t. A y ≤ c s.t. Ax + a0x0 = b T ≥ a0 y ≤ c0 x 0

Add a cutting plane to the dual problem

Corresponds to column generation in the primal.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 7 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction LP Relaxation

Relax integrality restriction: Dual problem: Primal problem:

T T m n maxy∈IR b y minx∈IR ,x0 c x + c0x0 T s.t. A y ≤ c s.t. Ax + a0x0 = b T , ≥ a0 y ≤ c0 x x0 0

Add a cutting plane to the dual problem

Corresponds to column generation in the primal.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 7 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction LP Relaxation

Relax integrality restriction: Dual problem: Primal problem:

T T m n maxy∈IR b y minx∈IR ,x0 c x + c0x0 T s.t. A y ≤ c s.t. Ax + a0x0 = b T , ≥ a0 y ≤ c0 x x0 0

Add a cutting plane to the dual problem

Corresponds to column generation in the primal.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 7 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Cutting plane algorithm illustration

{y ∈ IRm : AT y ≤ c}

PP PP PPP rrr P

rrrrr H HH  Soln to LP relaxation rrHrrrr  H  t HH  H rr

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 8 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Cutting plane algorithm illustration

{y ∈ IRm : AT y ≤ c}

PP PP PPP rrr P Cutting plane T a1 y = c1 rrrrr

H HH  Soln to LP relaxation rrHrrrr  H  t HH  H rr

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 8 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Cutting plane algorithm illustration

m T T {y ∈ IR : A y ≤ c, a1 y ≤ c1}

PP PP PPP rrr P

Soln to LP relaxation rrrrr t H HH  rrHrrrr   H  HH  H rr

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 8 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Cutting plane algorithm illustration

m T T {y ∈ IR : A y ≤ c, a1 y ≤ c1}

Cutting plane PP PP aT y = c PPP2 2 rrr P

Soln to LP relaxation rrrrr t H HH  rrHrrrr   H  HH  H rr

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 8 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Cutting plane algorithm illustration

m T T T {y ∈ IR : A y ≤ c, a1 y ≤ c1, a2 y ≤ b2}

PP PP PPP rrr P

rrrrr H Solution to LP relaxation HH  rrHrrrr  d H  HH  H rr

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 8 / 72 Interior Point Cutting Plane and Column Generation Methods Introduction Cutting plane algorithm illustration

m T T T {y ∈ IR : A y ≤ c, a1 y ≤ c1, a2 y ≤ b2}

PP PP PPP rrr P

rrrrr H Solution to LP relaxation HH  rrHrrrr  d H  Solution is integral, HH  H so optimal to IP rr

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 8 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 9 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Valid constraints for MaxCut Find the maximum cut in a graph.

Find ground state of Ising spin glass. Vertices ↔ spins: “Up” or “Down”. Edges ↔ interactions: ±1. t t t t t Interactions known. Deduce spins. t t t t t  1 u, v opposite spins y = t t t t t uv 0 u, v same spin t t t t t y ↔ incidence vector of cut. t t t t t Valid constraint: Any cut and any cycle intersect f e t tg in an even number of edges. h ye +yf +yg −yh ≤ 2 t t

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 10 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Valid constraints for MaxCut Find the maximum cut in a graph.

Find ground state of Ising spin glass. Vertices ↔ spins: “Up” or “Down”. Edges ↔ interactions: ±1. t t t t t Interactions known. Deduce spins. t t t t t  1 u, v opposite spins y = t t t t t uv 0 u, v same spin t t t t t y ↔ incidence vector of cut. t t t t t Valid constraint: Any cut and any cycle intersect f e t tg in an even number of edges. h ye +yf +yg −yh ≤ 2 t t

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 10 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Integer programming formulation

P max e∈E beye subject to y satisfies all cut-cycle inequalities (IP) y binary P max e∈E beye subject to y satisfies all cut-cycle inequalities (LP) y satisfies additional linear constraints 0 ≤ y ≤ 1 Algorithm: 1 Initialize: just the box constraints 0 ≤ y ≤ 1. 2 Solve LP relaxation 3 Add violated constraints as required. 4 If not yet converged, return to Step 2.

Can typically solve 100x100 grids in about 5 minutes with be = ±1.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 11 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Integer programming formulation

P max e∈E beye subject to y satisfies all cut-cycle inequalities (IP) y binary P max e∈E beye subject to y satisfies all cut-cycle inequalities (LP) y satisfies additional linear constraints 0 ≤ y ≤ 1 Algorithm: 1 Initialize: just the box constraints 0 ≤ y ≤ 1. 2 Solve LP relaxation 3 Add violated constraints as required. 4 If not yet converged, return to Step 2.

Can typically solve 100x100 grids in about 5 minutes with be = ±1.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 11 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Integer programming formulation

P max e∈E beye subject to y satisfies all cut-cycle inequalities (IP) y binary P max e∈E beye subject to y satisfies all cut-cycle inequalities (LP) y satisfies additional linear constraints 0 ≤ y ≤ 1 Algorithm: 1 Initialize: just the box constraints 0 ≤ y ≤ 1. 2 Solve LP relaxation 3 Add violated constraints as required. 4 If not yet converged, return to Step 2.

Can typically solve 100x100 grids in about 5 minutes with be = ±1.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 11 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Integer programming formulation

P max e∈E beye subject to y satisfies all cut-cycle inequalities (IP) y binary P max e∈E beye subject to y satisfies all cut-cycle inequalities (LP) y satisfies additional linear constraints 0 ≤ y ≤ 1 Algorithm: 1 Initialize: just the box constraints 0 ≤ y ≤ 1. 2 Solve LP relaxation 3 Add violated constraints as required. 4 If not yet converged, return to Step 2.

Can typically solve 100x100 grids in about 5 minutes with be = ±1.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 11 / 72 Interior Point Cutting Plane and Column Generation Methods MaxCut Computational results

Compare interior point and simplex cutting plane algorithms. Times in seconds (rounded). be = ±1. One problem of each size.

Grid Interior Simplex Ratio Interior point: 50x50 4 18 4.87 one core of Mac Pro with 60x60 31 271 8.90 2x2.8 GHz Quad-Core Intel 70x70 54 786 14.75 Xeon. Personal code. 80x80 51 639 12.72 90x90 223 * * Simplex (spin glass server): 100x100 187 * * Intel(R) Celeron(R) M CPU 440 @ 1.86GHz. CPLEX 9.1.

Interior is faster, with the ratio getting better as problem size increases.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 12 / 72 Interior Point Cutting Plane and Column Generation Methods Interior point cutting plane methods Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 13 / 72 Interior Point Cutting Plane and Column Generation Methods Interior point cutting plane methods Interior point cutting plane methods

Solve the LP relaxations using interior point methods: good for large scale LPs. These methods work well when a lot of violated cuts can be added at once. The LP relaxations need only be solved approximately, so the resulting cutting planes can be more centered. Combining interior point and simplex methods is especially effective: use the interior point method initially; switch to simplex once sufficiently close to optimality.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 14 / 72 Interior Point Cutting Plane and Column Generation Methods Interior point cutting plane methods Comparing the strength of simplex and interior point cutting planes

Simplex: Interior point method:

@ Optimal vertex ¨ ¨¨ found by simplex A ¨ ¨ @ ¨ ¨ t ¨ @ Added cutting plane AA Interior point iterate

when using simplex A At Optimal face rr rr Central A H H trajectory H  H  H  rrr H  A rrr H H AddedA cutting plane when using interior point method

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 15 / 72 Interior Point Cutting Plane and Column Generation Methods Interior point cutting plane methods Comparing the strength of simplex and interior point cutting planes

Simplex: Interior point method:

@ Optimal vertex ¨ ¨¨ found by simplex A ¨ ¨ @ ¨ ¨ t ¨ @ Added cutting plane AA Interior point iterate

when using simplex A At Optimal face rr rr Central A H H trajectory H  H  H  rrr H  A rrr H H AddedA cutting plane when using interior point method

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 15 / 72 Interior Point Cutting Plane and Column Generation Methods Interior point cutting plane methods Linear ordering problems

Ratio: Combined time / Simplex time 6 y

1 y

Different problem classes:

y 0% zeroes y 10% zeroes y y 20% zeroes y y Type B y y Type C y y yy y y

y y

y y y

y y y y y y y y y y y y y y y y y y 0.1 y yy y y y y y y 0 - 2000 4000 8000 Simplex time (secs)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 16 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 17 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can’t be done

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can be done to some degree! Can reduce iteration counts by 50%, HH perhaps more. H HH H

C C C C C C C C

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can be done to some degree! Can reduce iteration counts by 50%, HH perhaps more. H HH H

C C C C C C C C

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can be done to some degree! Can reduce iteration counts by 50%, HH perhaps more. H HH H

C C C C C t C C C

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can be done to some degree! Can reduce iteration counts by 50%, HH perhaps more. H HH H

C C C C C t C C C

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can be done to some degree! Can reduce iteration counts by 50%, HH perhaps more. H HH Need to make some effort to recover H centrality instead of immediately aiming

for new optimal solution. C Useful to store potential restart points: C earlier iterates, or earlier approximate C C t analytic centers (Gondzio), or C t problem-specific points. C C C

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Warm starting an interior point method

Can be done to some degree! Can reduce iteration counts by 50%, H perhaps more. HH H Need to make some effort to recover HH centrality instead of immediately aiming

for new optimal solution. C Useful to store potential restart points: C earlier iterates, or earlier approximate C t analytic centers (Gondzio), or C C t problem-specific points. C C Want to get to final straighter portion of C trajectory.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 18 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Using Dikin ellipsoid to restart

Current relaxation and iterate X T ¢¢ XXX A y ≤ c XXX ¢ XXX ¢ XXX ¢ XXX ¢ A ¢ A A A A A A ¯ A A y A A A t

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 19 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Using Dikin ellipsoid to restart

Dikin ellipsoid: X ¯ ¢¢ XXX inscribed ellipsoid centered at y XXX ¢ XXX ¢ XXX ¢ XXX ¢ A ¢ A A A A A A ¯ A A y A A  A t 

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 19 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Using Dikin ellipsoid to restart

Add cutting plane X ¢¢ XXX XXX ¢ XXX ¢ XXX ¢ XXX ¢ A ¢ A A A Q AQ A A Q A Q y¯ A Q A A  Q A t Q  Q T T ¯ Q a0 y = a0 y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 19 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Using Dikin ellipsoid to restart

Zoom in . . . X ¢¢ XXX XXX ¢ XXX ¢ XXX ¢ XXX ¢ A ¢ A A A Q AQ A A Q A Q y¯ A Q A A  Q A t Q Q  Q

Use Dikin ellipsoid as proxy for old constraints

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 19 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Dikin Ellipsoid Q Q Find restart direction d Q Q Q Q Q Q !*! Q !! d Q !! Q ! Q!! Q y¯ Q t Q D Q E (y¯, s¯) Q Q Q Q Q Q T T ¯ Qa0 y = a0 y Q

Move as far off the added constraint as possible, while staying within the Dikin ellipsoid.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 20 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Another Dikin Ellipsoid Find restart direction d when adding multiple constraints

HP y¯ QHPP t QHPP QH PP D(¯, ¯) QH P E y s QHH PP Q Hj PP Q d PP Q Q T T ¯ Q A0 y ≤ A0 y Q Q

P Minimize the potential function of the new slack variables, − ln si , while staying within the Dikin ellipsoid. (Goffin and Vial)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 21 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Primal restart

T n minx∈IR ,x0 c x + c0x0

s.t. Ax + a0x0 = b

xx 0 ≥ 0

Restart from a point x = x¯, x0 = 0. Using a scaling matrix D, get direction

2 T 2 T −1 ∆x := −D A (AD A ) a0

This can be derived from the Dikin ellipsoid. Can be generalized to the addition of multiple columns.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 22 / 72 Interior Point Cutting Plane and Column Generation Methods Warm starting Primal restart

T n minx∈IR ,x0 c x + c0x0

s.t. Ax + a0x0 = b

xx 0 ≥ 0

Restart from a point x = x¯, x0 = 0. Using a scaling matrix D, get direction

2 T 2 T −1 ∆x := −D A (AD A ) a0

This can be derived from the Dikin ellipsoid. Can be generalized to the addition of multiple columns.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 22 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 23 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Theoretical results for the convex feasibility problem

XX XXX XXX Convex feasibility problem: ¤¤ C ¤ Given convex set C and a polyhedral C ¤ C f ¤ outer approximation to Y , C iterate y¯ either find a point in Y or determine s ¤ C approximate center¤ Y is empty. CC ¤ C ¤ C ¤ C ¤

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 24 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Theoretical results for the convex feasibility problem

XX XXX XXX Convex feasibility problem: ¨¤¤ ¨¨ C ¨ ¤ Given convex set C and a polyhedral C ¨¨ ¤ C f ¨ ¤ outer approximation to Y , C ¨¨ iterate y¯ either find a point in Y or determine ¨ s ¤ C¨ approximate center¤ Y is empty. CC ¤ C ¤ C ¤ C ¤

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 24 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Theoretical results for the convex feasibility problem

XX XXX XXX Convex feasibility problem: ¨¤¤ ¨¨ C ¨ ¤ Given convex set C and a polyhedral C ¨¨ ¤ C f ¨ ¤ outer approximation to Y , C ¨¨ iterate y¯ ¨ s ¤ either find a point in Y or determine C¨ ¤ Y is empty. UpdatedCC outer approximation¤ C ¤ C ¤ C ¤

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 24 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Theoretical results for the convex feasibility problem

XX XXX XXX Convex feasibility problem: ¨¤¤ ¨¨ C ¨ ¤ Given convex set C and a polyhedral ¨ C supdated¨ y¯ ¤ C f ¨ ¤ outer approximation to Y , C ¨¨ ¨ ¤ either find a point in Y or determine C¨ ¤ Y is empty. UpdatedCC outer approximation¤ C ¤ C ¤ C ¤

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 24 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Convergence theorem (Goffin and Vial)

Use interior point cutting plane algorithm. Use approximate analytic centers as iterates. Add up to p cuts at each call to oracle. Never drop cuts.

Theorem If C contains a ball of radius ε then a certain interior point cutting plane m2p2 method converges after adding no more thanO ( ε2 ) cutting planes.

Each recentering step requires at mostO (p ln p) Newton steps.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 25 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Convergence theorems (Atkinson and Vaidya) Also allow dropping of constraints. Add at most one cut at a time. Theorem If C contains a ball of radius ε then an interior point variant converges 1 2 after adding no more thanO (m ln( ε ) ) iterations, when extra conditions are used to determine constraints to add and drop.

Volumetric center cutting plane methods: Add or drop one constraint at a time. Theorem If C contains a ball of radius ε then an interior point variant stops in 1 1 O(m ln( ε )) calls to the oracle andO (m ln( ε )) approximate Newton steps.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 26 / 72 Interior Point Cutting Plane and Column Generation Methods Theoretical results Convergence theorems (Atkinson and Vaidya) Also allow dropping of constraints. Add at most one cut at a time. Theorem If C contains a ball of radius ε then an interior point variant converges 1 2 after adding no more thanO (m ln( ε ) ) iterations, when extra conditions are used to determine constraints to add and drop.

Volumetric center cutting plane methods: Add or drop one constraint at a time. Theorem If C contains a ball of radius ε then an interior point variant stops in 1 1 O(m ln( ε )) calls to the oracle andO (m ln( ε )) approximate Newton steps.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 26 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 27 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

z 6

- y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

¨ 6 BB¨ z ¨¨ B ¨ ¨¨ B ¨ B ¨¨ B ¨ ¨¨ B ¨ B ¨¨ B ¨¨ B B B B B B B- 2 1 yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

¨ 6 BB¨ Maximize z ¨¨ B ¨ piecewise ¨¨ B ¨ B linear ¨¨ B ¨ overestimator ¨¨ B ¨ B ¨¨ B ¨¨ B B B B B B B- 2 3 1 yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

¨ 6 BB¨ Maximize z ¨¨ B ¨ piecewise ¨¨ B XX ¨ B linear XX¨¨ B ¨ XX overestimator ¨ XXX B ¨ XXX ¨ XXB ¨¨ XB ¨¨ B B B B B B B- 2 3 1 yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

¨ 6 BB¨ Maximize z ¨¨ B ¨ piecewise ¨¨ B XX ¨ B linear XX¨¨ B ¨ XX overestimator ¨ XXX B ¨ XXX ¨ XXB ¨¨ XB ¨¨ B B B B B B B- 2 4 3 1 yr yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

¨ 6 BB¨ Maximize z ¨¨ B ¨ piecewise ¨¨ B XX ¨ B linear XX¨¨ B ¨ XX(X( overestimator (¨((( XX B ((¨ XXX ¨ XXB ¨¨ XB ¨¨ B B B B B B B- 2 4 3 1 yr yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Links with Stabilization Maximize concave function f (y). Subgradient inequality: f (y) ≤ f (y¯) + ξT (y − y¯)

¨ 6 BB¨ Maximize z ¨¨ B ¨ piecewise ¨¨ B XX ¨ B linear XX¨¨ B ¨ XX(X( overestimator (¨((( XX B ((¨ XXX ¨ XXB ¨¨ XB ¨¨ B B B B B B B- 2 4 5 3 1 yr yr yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 28 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Stabilization / regularization

Iterates jump around if solve piecewise linear overestimator to optimality.

Try to stabilize the process.

u ¯ 2 One possibility: subtract proximal term 2 ||y − y|| from objective. Used in bundle methods.

Alternative: use interior point method and only solve the relaxations approximately.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 29 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Stabilization with interior point methods

6 BB¨¨ z ¨ ¨¨ B ¨ B ¨¨ B ¨ ¨¨ B ¨ B ¨¨ B ¨ ¨¨ B ¨ B B B B B B B - 2 1 yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 30 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Stabilization with interior point methods

6 BB¨¨ z ¨ ¨¨ B ¨ B ¨¨ B ¨ ¨¨ B ¨ B ¨¨ B ¨ ¨¨ B ¨ B lower bound B B B B B B - 2 1 yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 30 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Stabilization with interior point methods

6 BB¨¨ z ¨ Find ¨¨ B ¨ B approximate ¨¨ B ¨ analytic ¨¨ B center ¨ s B ¨¨ B ¨ ¨¨ B ¨ B lower bound B B B B B B - 2 3 1 yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 30 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Stabilization with interior point methods

6 BB¨¨ z ¨ Find ¨¨ B ¨ B approximate ¨¨ B ¨ analytic hhhh¨ B ¨ hhh B center ¨¨ hhs h ¨ hhhhB ¨ hhh ¨¨ B ¨ B B B B B B B - 2 3 1 yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 30 / 72 Interior Point Cutting Plane and Column Generation Methods Stabilization Stabilization with interior point methods

6 BB¨¨ z ¨ Find ¨¨ B ¨ B approximate ¨¨ B ¨ analytic hhhh¨ B ¨ hhh B center ¨¨ hhs h ¨ hhhhB ¨ s hhh ¨¨ B ¨ B B B B B B B - 2 4 3 1 yr yr yr yr y

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 30 / 72 Cutting Surfaces for Semidefinite Programming Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 31 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 32 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming SDP relaxation of MaxCut

ye = 1 if edge e in cut, 0 otherwise. Let xuv = 2yuv − 1 for each pair of vertices u, v.

xuv can be regarded as entries in matrix X:

Let z be a vector: zu = +1 on one side of cut, zu = −1 on other side. Then X = zzT .

maxX C • X (Frobenius inner product) subject to Xvv = 1 ∀v ∈ V X  0 (ie, X symmetric, psd) rank(X) = 1

Relaxation: drop rank requirement. Then get within 0.878 of optimality if all ce ≥ 0 (Goemans and Williamson).

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 33 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming SDP relaxation of MaxCut

ye = 1 if edge e in cut, 0 otherwise. Let xuv = 2yuv − 1 for each pair of vertices u, v.

xuv can be regarded as entries in matrix X:

Let z be a vector: zu = +1 on one side of cut, zu = −1 on other side. Then X = zzT .

maxX C • X (Frobenius inner product) subject to Xvv = 1 ∀v ∈ V X  0 (ie, X symmetric, psd) rank(X)= 1

Relaxation: drop rank requirement. Then get within 0.878 of optimality if all ce ≥ 0 (Goemans and Williamson).

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 33 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Applications of Semidefinite Programming

Combinatorial optimization: Obtain stronger relaxations than using for MaxCut, Satisfiability, Independent Set,... Control theory Structural optimization Electronic structure calculation and ground states Relaxations of other optimization problems: For example, quadratically constrained quadratic programs. Statistics: For example, principal component analysis. Data mining: Distance metric learning.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 34 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Applications of Semidefinite Programming

Combinatorial optimization: Obtain stronger relaxations than using linear programming for MaxCut, Satisfiability, Independent Set,... Control theory Structural optimization Electronic structure calculation and ground states Relaxations of other optimization problems: For example, quadratically constrained quadratic programs. Statistics: For example, principal component analysis. Data mining: Distance metric learning.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 34 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Applications of Semidefinite Programming

Combinatorial optimization: Obtain stronger relaxations than using linear programming for MaxCut, Satisfiability, Independent Set,... Control theory Structural optimization Electronic structure calculation and ground states Relaxations of other optimization problems: For example, quadratically constrained quadratic programs. Statistics: For example, principal component analysis. Data mining: Distance metric learning.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 34 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Semidefinite Programming

A semidefinite program has a linear objective function and linear constraints. The variables can be written in the form of a matrix. This matrix of variables is constrained to be positive semidefinite. SDPs can be solved in polynomial time using interior point methods. SDPs generalize linear programming. Can be generalized further to conic programming: linear objective function, linear constraints, the variables belong to a convex cone.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 35 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming SDP primal-dual pair

Primal Dual min C • X max bT y s.t. A(X) = b s.t. A∗(y) + S = C X  0 S  0

Notation: Pn Pn C • X = i=1 j=1 Cij Xij = trace(CX).

A(X) is an m-vector. Components Ai • X. ∗ ∗ Pm A (y) is an n × n matrix. A (y) = i=1 yi Ai .

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 36 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Assumptions

1 The matrices Ai are linearly independent.

2 Primal and dual problems both have strictly feasible solutions. Consequently: the primal and dual values agree.

3 Every primal feasible solution satisfies trace(X) = 1. Equivalent to assuming the primal problem has a bounded . SDP relaxations of combinatorial optimization problems can be rescaled to satisfy this assumption. First two assumptions are standard in interior point literature for SDP. Third one is due to Helmberg and Rendl (2000).

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 37 / 72 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Assumptions

1 The matrices Ai are linearly independent.

2 Primal and dual problems both have strictly feasible solutions. Consequently: the primal and dual values agree.

3 Every primal feasible solution satisfies trace(X) = 1. Equivalent to assuming the primal problem has a bounded feasible region. SDP relaxations of combinatorial optimization problems can be rescaled to satisfy this assumption. First two assumptions are standard in interior point literature for SDP. Third one is due to Helmberg and Rendl (2000).

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 37 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 38 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Why cutting planes and surfaces?

Interior point algorithms solve SDPs in polynomial time. But, computational time becomes impractical for larger problems.

Alternatively, use relaxations or approximations to get a good solution reasonably quickly. These relaxation approaches will also typically give an indication of a final duality gap.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 39 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Why cutting planes and surfaces?

Interior point algorithms solve SDPs in polynomial time. But, computational time becomes impractical for larger problems.

Alternatively, use relaxations or approximations to get a good solution reasonably quickly. These relaxation approaches will also typically give an indication of a final duality gap.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 39 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP An example of an SDP feasible region

(4,2)

 y y 0  1 2 Feasible region S =  y2 y1 − 3 0  in y-space 0 0 y1 − 4

 0 (4,-2)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 40 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Tangent line approximation

" " " (4,2)

 y y 0  1 2 Linear approximation S =  y2 y1 − 3 0  0 0 y1 − 4

 0 (4,-2) b b b b b b b b b b

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 41 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP A relaxation of SDP

Restricted primal, relaxed dual: For a given matrix P:

T T minV C • PVP maxy,S b y s.t. A(PVPT ) = b s.t. A∗(y) + S = C V  0 PT SP  0

Restrict X to be equal to PVPT . Note V  0 ⇒ X  0. Relax S  0 to PT SP  0. Possibly have additional restrictions on V and corresponding weakening of the condition PT SP  0. The matrix P is updated over the course of the algorithm.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 42 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP A relaxation of SDP

Restricted primal, relaxed dual: For a given matrix P:

T T minV C • PVP maxy,S b y s.t. A(PVPT ) = b s.t. A∗(y) + S = C V  0 PT SP  0

Restrict X to be equal to PVPT . Note V  0 ⇒ X  0. Relax S  0 to PT SP  0. Possibly have additional restrictions on V and corresponding weakening of the condition PT SP  0. The matrix P is updated over the course of the algorithm.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 42 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Algorithmic framework

1 Approximately solve the modified problem, get X = PVPT , S. 2 If S is not positive semidefinite: Add one or more columns to P corresponding to eigenvectors of S with negative eigenvalue. If desired, delete columns of P, or aggregate them into a linear term. Return to Step 1.

3 If duality gap too large, tighten the tolerance and return to Step 1. 4 STOP with an approximate solution to SDP.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 43 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Algorithmic framework

1 Approximately solve the modified problem, get X = PVPT , S. 2 If S is not positive semidefinite: Add one or more columns to P corresponding to eigenvectors of S with negative eigenvalue. If desired, delete columns of P, or aggregate them into a linear term. Return to Step 1.

3 If duality gap too large, tighten the tolerance and return to Step 1. 4 STOP with an approximate solution to SDP.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 43 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Algorithmic framework

1 Approximately solve the modified problem, get X = PVPT , S. 2 If S is not positive semidefinite: Add one or more columns to P corresponding to eigenvectors of S with negative eigenvalue. If desired, delete columns of P, or aggregate them into a linear term. Return to Step 1.

3 If duality gap too large, tighten the tolerance and return to Step 1. 4 STOP with an approximate solution to SDP.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 43 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Why update P with eigenvectors?

(See Helmberg (2000), for example).

Lagrangian relaxation of primal SDP:

T Pm Θ(y) := minX b y + (C − i=1 yi Ai ) • X subject to I • X = 1 X  0.

Lagrangian dual problem:

m T X max Θ(y) = b y + λmin(C − yi Ai ). y i=1

(λmin denotes the minimum eigenvalue.)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 44 / 72 Cutting Surfaces for Semidefinite Programming Relaxations of dual SDP Lagrangian dual problem

m T X max Θ(y) = b y + λmin(C − yi Ai ). y i=1

Under our assumptions, optimal value of Lagrangian dual problem is equal to optimal value of SDP. So no duality gap. Lagrangian dual function is concave and nonsmooth. Overestimate it; improve overestimate with subgradient inequalities. Subdifferential of Θ(y) is derived from the eigenspace corresponding to the minimum eigenvalue. If columns of P form an orthonormal basis for this eigenspace, the subdifferential inequality is PT SP  θI, where θ is estimate of Pm λmin(C − i=1 yi Ai ).

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 45 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 46 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Variant 1: No additional restrictions on V

min C • PVPT max bT y s.t. A(PVPT ) = b s.t. A∗(y) + S = C V  0 PT SP  0

X = P V PT

Approximate the original SDP constraint by a single lower-dimensional SDP constraint. If rank(P) is equal to the dimension of the original X then this formulation is exactly equivalent to the original SDP. Typically, also include an extra linear term in the primal problem, and corresponding linear constraint in the dual, W • S ≥ 0.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 47 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Experience with lower-dimensional SDP constraint Eg: Spectral bundle method (Helmberg, Rendl, Kiwiel)

Solve dual relaxation with additional proximal term:

T u ¯ 2 y¯: current iterate. max b y − 2 ||y − y|| T subject to A y + S = C u > 0, W  0: T P SP  0 dynamically updated. W • S ≥ 0 Solved efficiently by primal-dual interior point method.

Converges under various assumptions. Using W lets algorithm approach an optimal solution even when the number of columns in P is restricted. Excellent computational results for SDP relaxations of combinatorial optimization problems.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 48 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Experience with lower-dimensional SDP constraint Eg: Spectral bundle method (Helmberg, Rendl, Kiwiel)

Solve dual relaxation with additional proximal term:

T u ¯ 2 y¯: current iterate. max b y − 2 ||y − y|| T subject to A y + S = C u > 0, W  0: T P SP  0 dynamically updated. W • S ≥ 0 Solved efficiently by primal-dual interior point method.

Converges under various assumptions. Using W lets algorithm approach an optimal solution even when the number of columns in P is restricted. Excellent computational results for SDP relaxations of combinatorial optimization problems.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 48 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Variant 2: Require V to be diagonal

min C • PVPT T   s.t. A(PVP ) = b ∗ V  0, diagonal  ∗  V =    ..  T  .  max b y ∗ s.t. A∗(y) + S = C diagonal entries of PT SP ≥ 0

Gives a linear programming relaxation of the dual SDP. Each column p of P gives a constraint pT Sp ≥ 0, a linear constraint on the entries in S.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 49 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Experience with LP relaxation

Eg: Sivaramakrishnan (2008) Preprocess to get a block angular SDP:

P T P T minXi i Ci • Xi maxy,wi b y + i di wi P T T s.t. i Ai (Xi ) = b s.t. Ai y + Bi wi  Ci ∀i

Bi (Xi ) = di ∀i

Xi  0 ∀i

Block angular constraints and linking constraints

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 50 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Experience with LP relaxation

Eg: Sivaramakrishnan (2008) Preprocess to get a block angular SDP:

P T P T minXi i Ci • Xi maxy,wi b y + i di wi P T T s.t. i Ai (Xi ) = b s.t. Ai y + Bi wi  Ci ∀i

Bi (Xi )= di ∀i

Xi  0 ∀i

Block angular constraints and linking constraints

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 50 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Experience with LP relaxation

Eg: Sivaramakrishnan (2008) Preprocess to get a block angular SDP:

P T P T minXi i Ci • Xi maxy,wi b y + i di wi P T T s.t. i Ai (Xi )= b s.t. Ai y + Bi wi  Ci ∀i

Bi (Xi ) = di ∀i

Xi  0 ∀i

Block angular constraints and linking constraints

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 50 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Block angular method (continued) Lagrangian dual function Θ(y):

r T X T Θ(y) := b y + min{(Ci − Ai y) • Xi : Ai (Xi ) = b, Xi  0} i=1 No duality gap under assumptions. Construct piecewise linear approximations by solving in parallel the disaggregated semidefinite programming subproblems

T Θi (y) := min{(Ci − Ai y) • Xi : Ai (Xi ) = b, Xi  0}

for different choices of y arising as solutions of a master problem.

Get subgradients −Ai (Xi ) of Θi (y). Master problem: use stabilized column generation procedure. Can solve SDPs with n, m > 10000 to duality gap of 10−2 or 10−3.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 51 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Variant 3: Require V to be block-diagonal

min C • PVPT   s.t. A(PVPT ) = b     V  0, block diagonal V =    .   ..  T   max b y s.t. A∗(y) + S = C diagonal blocks of PT SP  0

Approximate the original SDP constraint by several lower-dimensional SDP constraints. For example, diagonal blocks of size two correspond to second order  a b  cone constraints. M =  0 ⇔ a ≥ 0, c ≥ 0, ac ≥ b2. b c (Oskoorouchi, Goffin, Mitchell)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 52 / 72 Cutting Surfaces for Semidefinite Programming Computational experience Experience with SOCP constraints

,,slnmn lc socc p dim gap SOCCSM SDPLR SDPT3 SeDuMi

300, 50, 100 14 46 3 109 9.2e-4 16 29 69 352

300, 100, 600 4 31 4 121 7.8e-4 56 78 225 843

300, 200, 1100 4 44 4 169 8.6e-4 191 146 503 1834

300, 300, 800 11 79 8 518 9.2e-4 739 -- 656 --

300, 300, 1000 5 62 7 335 9.2e-4 561 -- 747 --

500, 50, 200 4 33 4 128 7.6e-4 77 144 247 1245

500, 100, 1000 9 21 3 69 7.3e-4 67 358 1227 2504

500, 200, 500 11 81 5 384 9.1e-4 643 -- 1555 --

500, 200, 2000 9 30 3 94 8.6e-4 398 -- 1791 --

500, 300, 1000 1 44 7 221 4.7 646 -- 2558 --

800, 10, 800 8 1 2 10 4.2e-4 7 159 553 2798

800, 50, 500 1 22 3 58 9.1e-4 77 433 1453 3449

800, 100, 800 2 32 4 114 9.4e-4 313 -- 2811 --

800, 200, 1000 1 28 4 107 3.4e-3 467 -- 5814 --

800, 200, 1500 1 24 5 98 4.5e-3 424 -- 6332 --

INFORMS Annual Meeting 2007 Seattle, WA Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 53 / 72 Cutting Surfaces for Semidefinite Programming Computational experience The additional linear term

Most implementations include an additional linear term, αW .

Use X = PVPT + αW , with α ≥ 0 a variable.

Used to aggregate columns of P: Instead of dropping a column p, add multiple of ppT to W .

In addition to saving old information, this also makes it easier to restart.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 54 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 55 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Theoretical results for the convex feasibility problem

Given a convex set Y ⊆ IRm containing a ball of radius ε, find a point y ∈ Y or determine that Y is empty.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 56 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Find approximate analytic center y¯

PP PP Y PP C d C C C y¯ C Outer approximation of Y C r C X C XXX C XXX XXX C XXX C XXXC

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 57 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Add a conic cut at y¯

PP PP Y PP C d C C C y¯ C Outer approximation of Y   C  r C  X C XXX C XXX XXX C XXX C XXXC

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 58 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Update outer approximation

PP PP Y PPP C d C UpdatedC outer approximation of Y C y¯ C   C  r C  X C XXX C XXX XXX C XXX C ... and repeat XXXC

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 59 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Conic relaxation The feasibility problem is approximated by a conic program:

max 0 subject to A∗y + s = c s ∈ K

where K is a full-dimensional self-scaled cone.

Note that K may be a product of smaller cones.

n Possible cones include IR+, SDP cone, SOCP cone.

n ∗ Pn Potential functions: IR+ has f (s) = − i=1 ln si S  0 has f ∗(S) = − ln det(S).

Analytic center minimizes the potential function.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 60 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Using the Dikin Ellipsoid Find restart direction d when adding conic constraint

HP y¯ QHPP t QHPP QH PP D(¯, ¯) QH P E y s QHH PP Q Hj PP Q d PP Q Q T ¯ T Q A0 y − A0 y Q =: ∈ Q s0 K0

Minimize the potential function of the new slack variables, while staying within the Dikin ellipsoid.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 61 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Convergence Convergence to new analytic center: Can solve an NLP in the new variables in order to find a restart direction. Number of steps to get convergence to new approximate analytic center is linear in the “size” of the added constraint.

Global convergence for feasibility problem: Polynomial in the dimension m, the required tolerance , and a condition number based on the added constraints.

Proof uses upper and lower bounds on the dual potential function.

(Oskoorouchi and Goffin for SDP and SOCP, Basescu and Mitchell for more general problems)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 62 / 72 Cutting Surfaces for Semidefinite Programming Theoretical results with conic cuts Convergence Convergence to new analytic center: Can solve an NLP in the new variables in order to find a restart direction. Number of steps to get convergence to new approximate analytic center is linear in the “size” of the added constraint.

Global convergence for feasibility problem: Polynomial in the dimension m, the required tolerance , and a condition number based on the added constraints.

Proof uses upper and lower bounds on the dual potential function.

(Oskoorouchi and Goffin for SDP and SOCP, Basescu and Mitchell for more general problems)

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 62 / 72 Cutting Surfaces for Semidefinite Programming Condition number Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 63 / 72 Cutting Surfaces for Semidefinite Programming Condition number The need for a condition number

Dual potential function f ∗(c − A∗y) is equal to sum of dual potential functions over each of the added constraints. Upper bound obtained from assumption of an -ball: ∗ If c − A (y + u) K 0 for any unit vector u then ∗ ∗ ∗ ∗ ∗ ∗ fK (c − A y) ≤ fK (A u) = fK (A u) − ϑfK ln . ∗ ∗ Want fK (A u) small in order to get a good upper bound.

Define condition number µK so that ∗ ∗ ∗ ln µK := inf{fK (A u): A u K 0, ||u|| = 1}.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 64 / 72 Cutting Surfaces for Semidefinite Programming Condition number The need for a condition number

Dual potential function f ∗(c − A∗y) is equal to sum of dual potential functions over each of the added constraints. Upper bound obtained from assumption of an -ball: ∗ If c − A (y + u) K 0 for any unit vector u then ∗ ∗ ∗ ∗ ∗ ∗ fK (c − A y) ≤ fK (A u) = fK (A u) − ϑfK ln . ∗ ∗ Want fK (A u) small in order to get a good upper bound.

Define condition number µK so that ∗ ∗ ∗ ln µK := inf{fK (A u): A u K 0, ||u|| = 1}.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 64 / 72 Cutting Surfaces for Semidefinite Programming Condition number The need for a condition number

Dual potential function f ∗(c − A∗y) is equal to sum of dual potential functions over each of the added constraints. Upper bound obtained from assumption of an -ball: ∗ If c − A (y + u) K 0 for any unit vector u then ∗ ∗ ∗ ∗ ∗ ∗ fK (c − A y) ≤ fK (A u) = fK (A u) − ϑfK ln . ∗ ∗ Want fK (A u) small in order to get a good upper bound.

Define condition number µK so that ∗ ∗ ∗ ln µK := inf{fK (A u): A u K 0, ||u|| = 1}.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 64 / 72 Cutting Surfaces for Semidefinite Programming Condition number The need for a condition number

Dual potential function f ∗(c − A∗y) is equal to sum of dual potential functions over each of the added constraints. Upper bound obtained from assumption of an -ball: ∗ If c − A (y + u) K 0 for any unit vector u then ∗ ∗ ∗ ∗ ∗ ∗ fK (c − A y) ≤ fK (A u) = fK (A u) − ϑfK ln . ∗ ∗ Want fK (A u) small in order to get a good upper bound.

Define condition number µK so that ∗ ∗ ∗ ln µK := inf{fK (A u): A u K 0, ||u|| = 1}.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 64 / 72 Cutting Surfaces for Semidefinite Programming Condition number An SDP constraint with a bad condition number

4 ∗ P4 Look at {d ∈ IR : A (d) := i=1 Ai di  0} for

 1 0   0 1   1 0   1 1  A = A = A = A = 1 0 δ 2 1 δ 3 0 0 4 1 0

If ||d|| = 1 then det(A∗(d)) is at most O(δ).

Thus, for any choice of direction, need to move quite far to get a reasonable potential function value. Can remove need for the condition number by selective orthonormalization, which unfortunately weakens the constraints.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 65 / 72 Cutting Surfaces for Semidefinite Programming Condition number An SDP constraint with a bad condition number

4 ∗ P4 Look at {d ∈ IR : A (d) := i=1 Ai di  0} for

 1 0   0 1   1 0   1 1  A = A = A = A = 1 0 δ 2 1 δ 3 0 0 4 1 0

If ||d|| = 1 then det(A∗(d)) is at most O(δ).

Thus, for any choice of direction, need to move quite far to get a reasonable potential function value. Can remove need for the condition number by selective orthonormalization, which unfortunately weakens the constraints.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 65 / 72 Cutting Surfaces for Semidefinite Programming Condition number An SDP constraint with a bad condition number

4 ∗ P4 Look at {d ∈ IR : A (d) := i=1 Ai di  0} for

 1 0   0 1   1 0   1 1  A = A = A = A = 1 0 δ 2 1 δ 3 0 0 4 1 0

If ||d|| = 1 then det(A∗(d)) is at most O(δ).

Thus, for any choice of direction, need to move quite far to get a reasonable potential function value. Can remove need for the condition number by selective orthonormalization, which unfortunately weakens the constraints.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 65 / 72 Cutting Surfaces for Semidefinite Programming Condition number An SDP constraint with a bad condition number

4 ∗ P4 Look at {d ∈ IR : A (d) := i=1 Ai di  0} for

 1 0   0 1   1 0   1 1  A = A = A = A = 1 0 δ 2 1 δ 3 0 0 4 1 0

If ||d|| = 1 then det(A∗(d)) is at most O(δ).

Thus, for any choice of direction, need to move quite far to get a reasonable potential function value. Can remove need for the condition number by selective orthonormalization, which unfortunately weakens the constraints.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 65 / 72 Smoothing techniques and subgradient methods Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 66 / 72 Smoothing techniques and subgradient methods Subgradient methods

Subgradient methods used to solve minx {f (x): x ∈ C} C: bounded closed convex set f (x) is a continuous convex function on C. Given feasible x¯, find subgradient ξ of f (.) at x¯. Update x ← x¯ − τξ for steplength τ. Converges under certain conditions on τ. Work required at each iteration: (i) find f (x¯) and ξ (eg, using an oracle) (ii) update the iterate. Advantage: low cost of (ii). Disadvantage: the number of iterations may be large.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 67 / 72 Smoothing techniques and subgradient methods Nesterov’s smoothing technique

Complexity to get within ε of optimality: O(1/ε2) for best possible generic subgradient scheme.

If f (x) smooth and C satisfies certain conditions: √ Complexity is O(1/ ε).

So, construct smooth approximation to f (x). Complexity: O(1/ε). Complexity depends on Lipschitz constant of gradient of approximation.

Used on certain Lagrangian dual problems, variational inequalities, semidefinite programming, conic programming, Nash equilibria computation . . .

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 68 / 72 Smoothing techniques and subgradient methods Smoothing technique for SDPs

Consider the SDP:

T min f (y) := λmax(C + A y) y

Smoothed function is

n T X λi (C+A y)/µ fµ(y) = µ ln( e ) where λi (M) is ith eval of M i=1 More accurate for smaller values of µ.

Calculating gradient requires calculating matrix exponential. d’Aspremont (2008): approximate gradient using first few eigenvalues. Computationally: works well for approximate solution to large SDPs.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 69 / 72 Conclusions Outline 1 Interior Point Cutting Plane and Column Generation Methods Introduction MaxCut Interior point cutting plane methods Warm starting Theoretical results Stabilization 2 Cutting Surfaces for Semidefinite Programming Semidefinite Programming Relaxations of dual SDP Computational experience Theoretical results with conic cuts Condition number 3 Smoothing techniques and subgradient methods 4 Conclusions

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 70 / 72 Conclusions Conclusions for Cutting Plane Methods

Interior point methods provide an excellent choice for stabilizinga column generation approach. The strength of interior point methods for linear programming means that these column generation approaches scale well, with theoretical polynomial or fully polynomial convergence depending on the variant. Combining interior point and simplex column generation methods has proven especially effective, with the interior point methods used early on and the simplex method used later. The development of methods that automatically combine the two linear programming approaches would be useful. Theoretical development of a more intuitive polynomial interior point cutting plane algorithm is desirable.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 71 / 72 Conclusions Conclusions for Cutting Plane Methods

Interior point methods provide an excellent choice for stabilizinga column generation approach. The strength of interior point methods for linear programming means that these column generation approaches scale well, with theoretical polynomial or fully polynomial convergence depending on the variant. Combining interior point and simplex column generation methods has proven especially effective, with the interior point methods used early on and the simplex method used later. The development of methods that automatically combine the two linear programming approaches would be useful. Theoretical development of a more intuitive polynomial interior point cutting plane algorithm is desirable.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 71 / 72 Conclusions Conclusions for Cutting Plane Methods

Interior point methods provide an excellent choice for stabilizinga column generation approach. The strength of interior point methods for linear programming means that these column generation approaches scale well, with theoretical polynomial or fully polynomial convergence depending on the variant. Combining interior point and simplex column generation methods has proven especially effective, with the interior point methods used early on and the simplex method used later. The development of methods that automatically combine the two linear programming approaches would be useful. Theoretical development of a more intuitive polynomial interior point cutting plane algorithm is desirable.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 71 / 72 Conclusions Conclusions for Cutting Plane Methods

Interior point methods provide an excellent choice for stabilizinga column generation approach. The strength of interior point methods for linear programming means that these column generation approaches scale well, with theoretical polynomial or fully polynomial convergence depending on the variant. Combining interior point and simplex column generation methods has proven especially effective, with the interior point methods used early on and the simplex method used later. The development of methods that automatically combine the two linear programming approaches would be useful. Theoretical development of a more intuitive polynomial interior point cutting plane algorithm is desirable.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 71 / 72 Conclusions Cutting Surface Conclusions

Research in cutting surface and subgradient methods for quickly finding good solutions to semidefinite programming problems is an active area. The links between the cutting surface approaches and the smoothed subgradient methods can be developed further. Complexity results for cutting surface methods have been developed, but can probably be improved further. Eg, remove the condition number without weakening the constraints. These methods have been developed for conic problems where the cones are self-dual, and it would be interesting to extend the methods and results to more general classes of conic optimization problems.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 72 / 72 Conclusions Cutting Surface Conclusions

Research in cutting surface and subgradient methods for quickly finding good solutions to semidefinite programming problems is an active area. The links between the cutting surface approaches and the smoothed subgradient methods can be developed further. Complexity results for cutting surface methods have been developed, but can probably be improved further. Eg, remove the condition number without weakening the constraints. These methods have been developed for conic problems where the cones are self-dual, and it would be interesting to extend the methods and results to more general classes of conic optimization problems.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 72 / 72 Conclusions Cutting Surface Conclusions

Research in cutting surface and subgradient methods for quickly finding good solutions to semidefinite programming problems is an active area. The links between the cutting surface approaches and the smoothed subgradient methods can be developed further. Complexity results for cutting surface methods have been developed, but can probably be improved further. Eg, remove the condition number without weakening the constraints. These methods have been developed for conic problems where the cones are self-dual, and it would be interesting to extend the methods and results to more general classes of conic optimization problems.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 72 / 72 Conclusions Cutting Surface Conclusions

Research in cutting surface and subgradient methods for quickly finding good solutions to semidefinite programming problems is an active area. The links between the cutting surface approaches and the smoothed subgradient methods can be developed further. Complexity results for cutting surface methods have been developed, but can probably be improved further. Eg, remove the condition number without weakening the constraints. These methods have been developed for conic problems where the cones are self-dual, and it would be interesting to extend the methods and results to more general classes of conic optimization problems.

Mitchell (RPI) Cutting Planes, Subgradients INFORMS TutORial 72 / 72