Sequential Convex Programming
• sequential convex programming • alternating convex optimization • convex-concave procedure
EE364b, Stanford University Methods for nonconvex optimization problems
• convex optimization methods are (roughly) always global, always fast
• for general nonconvex problems, we have to give up one – local optimization methods are fast, but need not find global solution (and even when they do, cannot certify it) – global optimization methods find global solution (and certify it), but are not always fast (indeed, are often slow)
• this lecture: local optimization methods that are based on solving a sequence of convex problems
EE364b, Stanford University 1 Sequential convex programming (SCP)
• a local optimization method for nonconvex problems that leverages convex optimization – convex portions of a problem are handled ‘exactly’ and efficiently
• SCP is a heuristic – it can fail to find optimal (or even feasible) point – results can (and often do) depend on starting point (can run algorithm from many initial points and take best result)
• SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value
EE364b, Stanford University 2 Problem we consider nonconvex problem
minimize f0(x) subject to fi(x) ≤ 0, i =1,...,m hi(x)=0, j =1,...,p with variable x ∈ Rn
• f0 and fi (possibly) nonconvex
• hi (possibly) non-affine
EE364b, Stanford University 3 Basic idea of SCP
• maintain estimate of solution x(k), and convex trust region T (k) ⊂ Rn
(k) • form convex approximation fˆi of fi over trust region T
(k) • form affine approximation hˆi of hi over trust region T
• x(k+1) is optimal point for approximate convex problem
minimize fˆ0(x) subject to fˆi(x) ≤ 0, i =1,...,m hˆi(x)=0, i =1,...,p x ∈T (k)
EE364b, Stanford University 4 Trust region
• typical trust region is box around current point:
(k) (k) T = {x ||xi − xi |≤ ρi, i =1,...,n}
• if xi appears only in convex inequalities and affine equalities, can take ρi = ∞
EE364b, Stanford University 5 Affine and convex approximations via Taylor expansions
• (affine) first order Taylor expansion:
fˆ(x)= f(x(k))+ ∇f(x(k))T (x − x(k))
• (convex part of) second order Taylor expansion:
fˆ(x)= f(x(k))+ ∇f(x(k))T (x − x(k))+(1/2)(x − x(k))T P (x − x(k))
2 (k) P = ∇ f(x ) +, PSD part of Hessian • give local approximations, which don’t depend on trust region radii ρi
EE364b, Stanford University 6 Quadratic trust regions
• full second order Taylor expansion:
fˆ(x)= f(x(k))+∇f(x(k))T (x−x(k))+(1/2)(x−x(k))∇2f(x(k))(x−x(k)),
• trust region is compact ellipse around current point: for some P ≻ 0
T (k) = {x | (x − x(k))T P (x − x(k)) ≤ ρ}
• Update is any x(k+1) for which there is λ ≥ 0 s.t.
2 (k) (k+1) ∇ f(x )+ λP 0, λ(kx k2 − 1)=0, (∇2f(x(k))+ λP )x(k) = −∇f(x(k))
EE364b, Stanford University 7 Particle method
• particle method:
(k) – choose points z1,...,zK ∈T (e.g., all vertices, some vertices, grid, random, . . . ) – evaluate yi = f(zi) – fit data (zi,yi) with convex (affine) function (using convex optimization)
• advantages: – handles nondifferentiable functions, or functions for which evaluating derivatives is difficult – gives regional models, which depend on current point and trust region radii ρi
EE364b, Stanford University 8 Fitting affine or quadratic functions to data
fit convex quadratic function to data (zi,yi)
K (k) T (k) T (k) 2 minimize i=1 (zi − x ) P (zi − x )+ q (zi − x )+ r − yi subject to P 0 P with variables P ∈ Sn, q ∈ Rn, r ∈ R
• can use other objectives, add other convex constraints • no need to solve exactly • this problem is solved for each nonconvex constraint, each SCP step
EE364b, Stanford University 9 Quasi-linearization
• a cheap and simple method for affine approximation
• write h(x) as A(x)x + b(x) (many ways to do this)
• use hˆ(x)= A(x(k))x + b(x(k))
• example:
h(x)=(1/2)xT Px + qT x + r = ((1/2)Px + q)T x + r
(k) T • hˆql(x) = ((1/2)Px + q) x + r
(k) T (k) (k) • hˆtay(x)=(Px + q) (x − x )+ h(x )
EE364b, Stanford University 10 Example
• nonconvex QP
minimize f(x)=(1/2)xT Px + qT x subject to kxk∞ ≤ 1
with P symmetric but not PSD
• use approximation
(k) (k) T (k) (k) T (k) f(x )+(Px + q) (x − x )+(1/2)(x − x ) P+(x − x )
EE364b, Stanford University 11 • example with x ∈ R20 • SCP with ρ =0.2, started from 10 different points
−10
−20
−30 ) ) k
( −40 x ( f −50
−60
−70 5 10 15 20 25 30 k • runs typically converge to points between −60 and −50 • dashed line shows lower bound on optimal value ≈−66.5
EE364b, Stanford University 12 Lower bound via Lagrange dual
2 • write constraints as xi ≤ 1 and form Lagrangian
n T T 2 L(x,λ) = (1/2)x Px + q x + λi(xi − 1) i=1 X = (1/2)xT (P +2 diag(λ)) x + qT x − 1T λ
− • g(λ)= −(1/2)qT (P +2 diag(λ)) 1 q − 1T λ; need P +2 diag(λ) ≻ 0
• solve dual problem to get best lower bound:
− maximize −(1/2)qT (P +2 diag(λ)) 1 q − 1T λ subject to λ 0, P +2 diag(λ) ≻ 0
EE364b, Stanford University 13 Some (related) issues
• approximate convex problem can be infeasible
• how do we evaluate progress when x(k) isn’t feasible? need to take into account
(k) – objective f0(x ) (k) – inequality constraint violations fi(x )+ (k) – equality constraint violations |hi(x )|
• controlling the trust region size – ρ too large: approximations are poor, leading to bad choice of x(k+1) – ρ too small: approximations are good, but progress is slow
EE364b, Stanford University 14 Exact penalty formulation
• instead of original problem, we solve unconstrained problem
m p minimize φ(x)= f0(x)+ λ ( i=1 fi(x)+ + i=1 |hi(x)|)
where λ> 0 P P • for λ large enough, minimizer of φ is solution of original problem • for SCP, use convex approximation
m p
φˆ(x)= fˆ0(x)+ λ fˆi(x)+ + |hˆi(x)| i=1 i=1 ! X X • approximate problem always feasible
EE364b, Stanford University 15 Trust region update
• judge algorithm progress by decrease in φ, using solution x˜ of approximate problem • decrease with approximate objective: δˆ = φ(x(k)) − φˆ(˜x) (called predicted decrease) • decrease with exact objective: δ = φ(x(k)) − φ(˜x) • if δ ≥ αδˆ, ρ(k+1) = βsuccρ(k), x(k+1) =x ˜ (α ∈ (0, 1), βsucc ≥ 1; typical values α =0.1, βsucc =1.1) • if δ<αδˆ, ρ(k+1) = βfailρ(k), x(k+1) = x(k) (βfail ∈ (0, 1); typical value βfail =0.5) • interpretation: if actual decrease is more (less) than fraction α of predicted decrease then increase (decrease) trust region size
EE364b, Stanford University 16 Nonlinear optimal control
l2, m2 τ 2 θ2
τ1 l1, m1
θ1
• 2-link system, controlled by torques τ1 and τ2 (no gravity)
EE364b, Stanford University 17 • dynamics given by M(θ)θ¨ + W (θ, θ˙)θ˙ = τ, with
(m + m )l2 m l l (s s + c c ) M(θ) = 1 2 1 2 1 2 1 2 1 2 m l l (s s + c c ) m l2 2 1 2 1 2 1 2 2 2 0 m l l (s c − c s )θ˙ W (θ, θ˙) = 2 1 2 1 2 1 2 2 m l l (s c − c s )θ˙ 0 2 1 2 1 2 1 2 1
si = sin θi, ci = cos θi
• nonlinear optimal control problem:
T 2 minimize J = 0 kτ(t)k2 dt subject to θ(0) = θ , θ˙(0) = 0, θ(T )= θ , θ˙(T )=0 R init final kτ(t)k∞ ≤ τmax, 0 ≤ t ≤ T
EE364b, Stanford University 18 Discretization
• discretize with time interval h = T/N N 2 • J ≈ h i=1 kτik2, with τi = τ(ih) • approximateP derivatives as
θ − θ − θ − 2θ + θ − θ˙(ih) ≈ i+1 i 1, θ¨(ih) ≈ i+1 i i 1 2h h2
• approximate dynamics as set of nonlinear equality constraints:
θ − 2θ + θ − θ − θ − θ − θ − M(θ ) i+1 i i 1 + W θ , i+1 i 1 i+1 i 1 = τ i h2 i 2h 2h i
• θ0 = θ1 = θinit; θN = θN+1 = θfinal
EE364b, Stanford University 19 • discretized nonlinear optimal control problem:
N 2 minimize h i=1 kτik2 subject to θ0 = θ1 = θinit, θN = θN+1 = θfinal P kτik∞ ≤ τmax, i =1,...,N θi+1−2θi+θi−1 θi+1−θi−1 θi+1−θi−1 M(θi) h2 + W θi, 2h 2h = τi • replace equality constraints with quasilinearized versions
(k) (k) (k) θi+1 − 2θi + θi−1 (k) θi+1 − θi−1 θi+1 − θi−1 M(θi ) 2 + W θi , = τi h 2h ! 2h
• trust region: only on θi
• initialize with θi = ((i − 1)/(N − 1))(θfinal − θinit), i =1,...,N
EE364b, Stanford University 20 Numerical example
• m1 =1, m2 =5, l1 =1, l2 =1
• N = 40, T = 10
• θinit = (0, −2.9), θfinal = (3, 2.9)
• τmax =1.1
• α =0.1, βsucc =1.1, βfail =0.5, ρ(1) = 90◦
• λ =2
EE364b, Stanford University 21 SCP progress
70
60
50 ) ) k ( 40 x ( φ 30
20
10 5 10 15 20 25 30 35 40 k
EE364b, Stanford University 22 Convergence of J and torque residuals
2 14 10
13.5 1 10 13
0 10
) 12.5 k (
J 12 −1 10
11.5 −2 10 11 sum of torque residuals
−3 10.5 10 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 k k
EE364b, Stanford University 23 Predicted and actual decreases in φ
2 140 10
120 1 10 100
(solid) 80 0 ) 10 ◦ δ ( 60 )
k −1 ( 10
40 ρ
20
(dotted), −2 10 ˆ δ 0
−3 −20 10 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 k k
EE364b, Stanford University 24 Trajectory plan
1.5 3.5
3 1 2.5
2 1 1 0.5 1.5 τ θ 1 0 0.5
−0.5 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 t t 2 5
1 2 2 0 τ 0 θ
−1 −5 0 2 4 6 8 10 0 2 4 6 8 10 t t
EE364b, Stanford University 25 Convex composite
• general form: for h : Rm → R convex, c : Rn → Rm smooth,
f(x)= h(c(x))
• exact penalty formulation of
minimize f(x) subject to c(x)=0
• approximate f locally by convex approximation: near x,
T f(y) ≈ fˆx(y)= h(c(x)+ ∇c(x) (y − x))
EE364b, Stanford University 26 Convex composite (prox-linear) algorithm
given function f = h ◦ c and convex domain C, line search parameters α ∈ (0,.5), β ∈ (0, 1), stopping tolerance ǫ> 0 k := 0 repeat ˆ Use model f = fx(k) (k+1) ˆ (k+1) (k+1) (k) Set xˆ = argminx∈C{f(x)} and direction ∆ =x ˆ − x Set δ(k) = fˆ(x(k) +∆(k)) − f(x(k)) Set t =1 while f(x(k) + t∆(k)) ≥ f(x(k))+ αtδ(k) t = β · t (k+1) If k∆ k2/t ≤ ǫ, quit k := k +1
EE364b, Stanford University 27 Nonlinear measurements (phase retrieval)
n n • phase retrieval problem: for ai ∈ C , x⋆ ∈ C , observe ∗ 2 bi = |ai x⋆| • goal is to find x, natural objectives are of form f(x)= |Ax|2 − b
• “robust” phase retrieval problem m ∗ 2 f(x)= |ai x| − bi i=1 X or quadratic objective
m 1 2 f(x)= |a∗x|2 − b 2 i i i=1 X
EE364b, Stanford University 28 Numerical example
• m = 200,n = 50, over reals R (sign retrieval)
m×n 2 • Generate 10 independent examples, A ∈ R , b = |Ax⋆| ,
Aij ∼N (0, 1), x⋆ ∼N (0,I)
• Two sets of experiments: initialize at
(0) (0) x ∼N (0,I) or x ∼N (x⋆,I)
2 2 • Use h(z)= kzk1 or h(z)= kzk2, c(x)=(Ax) − b.
EE364b, Stanford University 29 Numerical example (absolute loss, random initialization)