Sequential Convex Programming

Sequential Convex Programming • sequential convex programming • alternating convex optimization • convex-concave procedure EE364b, Stanford University Methods for nonconvex optimization problems • convex optimization methods are (roughly) always global, always fast • for general nonconvex problems, we have to give up one – local optimization methods are fast, but need not find global solution (and even when they do, cannot certify it) – global optimization methods find global solution (and certify it), but are not always fast (indeed, are often slow) • this lecture: local optimization methods that are based on solving a sequence of convex problems EE364b, Stanford University 1 Sequential convex programming (SCP) • a local optimization method for nonconvex problems that leverages convex optimization – convex portions of a problem are handled ‘exactly’ and efficiently • SCP is a heuristic – it can fail to find optimal (or even feasible) point – results can (and often do) depend on starting point (can run algorithm from many initial points and take best result) • SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value EE364b, Stanford University 2 Problem we consider nonconvex problem minimize f0(x) subject to fi(x) ≤ 0, i =1,...,m hi(x)=0, j =1,...,p with variable x ∈ Rn • f0 and fi (possibly) nonconvex • hi (possibly) non-affine EE364b, Stanford University 3 Basic idea of SCP • maintain estimate of solution x(k), and convex trust region T (k) ⊂ Rn (k) • form convex approximation fî of fi over trust region T (k) • form affine approximation hî of hi over trust region T • x(k+1) is optimal point for approximate convex problem minimize fˆ0(x) subject to fî(x) ≤ 0, i =1,...,m hî(x)=0, i =1,...,p x ∈T (k) EE364b, Stanford University 4 Trust region • typical trust region is box around current point: (k) (k) T = {x ||xi − xi |≤ ρi, i =1,...,n} • if xi appears only in convex inequalities and affine equalities, can take ρi = ∞ EE364b, Stanford University 5 Affine and convex approximations via Taylor expansions • (affine) first order Taylor expansion: fˆ(x)= f(x(k))+ ∇f(x(k))T (x − x(k)) • (convex part of) second order Taylor expansion: fˆ(x)= f(x(k))+ ∇f(x(k))T (x − x(k))+(1/2)(x − x(k))T P (x − x(k)) 2 (k) P = ∇ f(x ) +, PSD part of Hessian • give local approximations, which don’t depend on trust region radii ρi EE364b, Stanford University 6 Quadratic trust regions • full second order Taylor expansion: fˆ(x)= f(x(k))+∇f(x(k))T (x−x(k))+(1/2)(x−x(k))∇2f(x(k))(x−x(k)), • trust region is compact ellipse around current point: for some P ≻ 0 T (k) = {x | (x − x(k))T P (x − x(k)) ≤ ρ} • Update is any x(k+1) for which there is λ ≥ 0 s.t. 2 (k) (k+1) ∇ f(x )+ λP 0, λ(kx k2 − 1)=0, (∇2f(x(k))+ λP )x(k) = −∇f(x(k)) EE364b, Stanford University 7 Particle method • particle method: (k) – choose points z1,...,zK ∈T (e.g., all vertices, some vertices, grid, random, . ) – evaluate yi = f(zi) – fit data (zi,yi) with convex (affine) function (using convex optimization) • advantages: – handles nondifferentiable functions, or functions for which evaluating derivatives is difficult – gives regional models, which depend on current point and trust region radii ρi EE364b, Stanford University 8 Fitting affine or quadratic functions to data fit convex quadratic function to data (zi,yi) K (k) T (k) T (k) 2 minimize i=1 (zi − x ) P (zi − x )+ q (zi − x )+ r − yi subject to P 0 P with variables P ∈ Sn, q ∈ Rn, r ∈ R • can use other objectives, add other convex constraints • no need to solve exactly • this problem is solved for each nonconvex constraint, each SCP step EE364b, Stanford University 9 Quasi-linearization • a cheap and simple method for affine approximation • write h(x) as A(x)x + b(x) (many ways to do this) • use hˆ(x)= A(x(k))x + b(x(k)) • example: h(x)=(1/2)xT Px + qT x + r = ((1/2)Px + q)T x + r (k) T • hˆql(x) = ((1/2)Px + q) x + r (k) T (k) (k) • hˆtay(x)=(Px + q) (x − x )+ h(x ) EE364b, Stanford University 10 Example • nonconvex QP minimize f(x)=(1/2)xT Px + qT x subject to kxk∞ ≤ 1 with P symmetric but not PSD • use approximation (k) (k) T (k) (k) T (k) f(x )+(Px + q) (x − x )+(1/2)(x − x ) P+(x − x ) EE364b, Stanford University 11 • example with x ∈ R20 • SCP with ρ =0.2, started from 10 different points −10 −20 −30 ) ) k ( −40 x ( f −50 −60 −70 5 10 15 20 25 30 k • runs typically converge to points between −60 and −50 • dashed line shows lower bound on optimal value ≈−66.5 EE364b, Stanford University 12 Lower bound via Lagrange dual 2 • write constraints as xi ≤ 1 and form Lagrangian n T T 2 L(x,λ) = (1/2)x Px + q x + λi(xi − 1) i=1 X = (1/2)xT (P +2 diag(λ)) x + qT x − 1T λ − • g(λ)= −(1/2)qT (P +2 diag(λ)) 1 q − 1T λ; need P +2 diag(λ) ≻ 0 • solve dual problem to get best lower bound: − maximize −(1/2)qT (P +2 diag(λ)) 1 q − 1T λ subject to λ 0, P +2 diag(λ) ≻ 0 EE364b, Stanford University 13 Some (related) issues • approximate convex problem can be infeasible • how do we evaluate progress when x(k) isn’t feasible? need to take into account (k) – objective f0(x ) (k) – inequality constraint violations fi(x )+ (k) – equality constraint violations |hi(x )| • controlling the trust region size – ρ too large: approximations are poor, leading to bad choice of x(k+1) – ρ too small: approximations are good, but progress is slow EE364b, Stanford University 14 Exact penalty formulation • instead of original problem, we solve unconstrained problem m p minimize φ(x)= f0(x)+ λ ( i=1 fi(x)+ + i=1 |hi(x)|) where λ> 0 P P • for λ large enough, minimizer of φ is solution of original problem • for SCP, use convex approximation m p φˆ(x)= fˆ0(x)+ λ fî(x)+ + |hî(x)| i=1 i=1 ! X X • approximate problem always feasible EE364b, Stanford University 15 Trust region update • judge algorithm progress by decrease in φ, using solution x˜ of approximate problem • decrease with approximate objective: δˆ = φ(x(k)) − φˆ(˜x) (called predicted decrease) • decrease with exact objective: δ = φ(x(k)) − φ(˜x) • if δ ≥ αδˆ, ρ(k+1) = βsuccρ(k), x(k+1) =x ˜ (α ∈ (0, 1), βsucc ≥ 1; typical values α =0.1, βsucc =1.1) • if δ<αδˆ, ρ(k+1) = βfailρ(k), x(k+1) = x(k) (βfail ∈ (0, 1); typical value βfail =0.5) • interpretation: if actual decrease is more (less) than fraction α of predicted decrease then increase (decrease) trust region size EE364b, Stanford University 16 Nonlinear optimal control l2, m2 τ 2 θ2 τ1 l1, m1 θ1 • 2-link system, controlled by torques τ1 and τ2 (no gravity) EE364b, Stanford University 17 • dynamics given by M(θ)θ¨ + W (θ, θ˙)θ˙ = τ, with (m + m )l2 m l l (s s + c c ) M(θ) = 1 2 1 2 1 2 1 2 1 2 m l l (s s + c c ) m l2 2 1 2 1 2 1 2 2 2 0 m l l (s c − c s )θ˙ W (θ, θ˙) = 2 1 2 1 2 1 2 2 m l l (s c − c s )θ˙ 0 2 1 2 1 2 1 2 1 si = sin θi, ci = cos θi • nonlinear optimal control problem: T 2 minimize J = 0 kτ(t)k2 dt subject to θ(0) = θ , θ˙(0) = 0, θ(T )= θ , θ˙(T )=0 R init final kτ(t)k∞ ≤ τmax, 0 ≤ t ≤ T EE364b, Stanford University 18 Discretization • discretize with time interval h = T/N N 2 • J ≈ h i=1 kτik2, with τi = τ(ih) • approximateP derivatives as θ − θ − θ − 2θ + θ − θ˙(ih) ≈ i+1 i 1, θ¨(ih) ≈ i+1 i i 1 2h h2 • approximate dynamics as set of nonlinear equality constraints: θ − 2θ + θ − θ − θ − θ − θ − M(θ ) i+1 i i 1 + W θ , i+1 i 1 i+1 i 1 = τ i h2 i 2h 2h i • θ0 = θ1 = θinit; θN = θN+1 = θfinal EE364b, Stanford University 19 • discretized nonlinear optimal control problem: N 2 minimize h i=1 kτik2 subject to θ0 = θ1 = θinit, θN = θN+1 = θfinal P kτik∞ ≤ τmax, i =1,...,N θi+1−2θi+θi−1 θi+1−θi−1 θi+1−θi−1 M(θi) h2 + W θi, 2h 2h = τi • replace equality constraints with quasilinearized versions (k) (k) (k) θi+1 − 2θi + θi−1 (k) θi+1 − θi−1 θi+1 − θi−1 M(θi ) 2 + W θi , = τi h 2h ! 2h • trust region: only on θi • initialize with θi = ((i − 1)/(N − 1))(θfinal − θinit), i =1,...,N EE364b, Stanford University 20 Numerical example • m1 =1, m2 =5, l1 =1, l2 =1 • N = 40, T = 10 • θinit = (0, −2.9), θfinal = (3, 2.9) • τmax =1.1 • α =0.1, βsucc =1.1, βfail =0.5, ρ(1) = 90◦ • λ =2 EE364b, Stanford University 21 SCP progress 70 60 50 ) ) k ( 40 x ( φ 30 20 10 5 10 15 20 25 30 35 40 k EE364b, Stanford University 22 Convergence of J and torque residuals 2 14 10 13.5 1 10 13 0 10 ) 12.5 k ( J 12 −1 10 11.5 −2 10 11 sum of torque residuals −3 10.5 10 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 k k EE364b, Stanford University 23 Predicted and actual decreases in φ 2 140 10 120 1 10 100 (solid) 80 0 ) 10 ◦ δ ( 60 ) k −1 ( 10 40 ρ 20 (dotted), −2 10 ˆ δ 0 −3 −20 10 5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 k k EE364b, Stanford University 24 Trajectory plan 1.5 3.5 3 1 2.5 2 1 1 0.5 1.5 τ θ 1 0 0.5 −0.5 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 t t 2 5 1 2 2 0 τ 0 θ −1 −5 0 2 4 6 8 10 0 2 4 6 8 10 t t EE364b, Stanford University 25 Convex composite • general form: for h : Rm → R convex, c : Rn → Rm smooth, f(x)= h(c(x)) • exact penalty formulation of minimize f(x) subject to c(x)=0 • approximate f locally by convex approximation: near x, T f(y) ≈ fˆx(y)= h(c(x)+ ∇c(x) (y − x)) EE364b, Stanford University 26 Convex composite (prox-linear) algorithm given function f = h ◦ c and convex domain C, line search parameters α ∈ (0,.5), β ∈ (0, 1), stopping tolerance ǫ> 0 k := 0 repeat ˆ Use model f = fx(k) (k+1) ˆ (k+1) (k+1) (k) Set xˆ = argminx∈C{f(x)} and direction ∆ =x ˆ − x Set δ(k) = fˆ(x(k) +∆(k)) − f(x(k)) Set t =1 while f(x(k) + t∆(k)) ≥ f(x(k))+ αtδ(k) t = β · t (k+1) If k∆ k2/t ≤ ǫ, quit k := k +1 EE364b, Stanford University 27 Nonlinear measurements (phase retrieval) n n • phase retrieval problem: for ai ∈ C , x⋆ ∈ C , observe ∗ 2 bi = |ai x⋆| • goal is to find x, natural objectives are of form f(x)= |Ax|2 − b • “robust” phase retrieval problem m ∗ 2 f(x)= |ai x| − bi i=1 X or quadratic objective m 1 2 f(x)= |a∗x|2 − b 2 i i i=1 X EE364b, Stanford University 28 Numerical example • m = 200,n = 50, over reals R (sign retrieval) m×n 2 • Generate 10 independent examples, A ∈ R , b = |Ax⋆| , Aij ∼N (0, 1), x⋆ ∼N (0,I) • Two sets of experiments: initialize at (0) (0) x ∼N (0,I) or x ∼N (x⋆,I) 2 2 • Use h(z)= kzk1 or h(z)= kzk2, c(x)=(Ax) − b.

Load more