Applying Subgradient Methods – and Accelerated Gradient Methods – to Solve General Convex, Conic Optimization Problems

Applying Subgradient Methods – and Accelerated Gradient Methods – to Solve General Convex, Conic Optimization Problems Jim Renegar School of Operations Research and Information Engineering Cornell University Workshop on “Optimization and Parsimonius Modeling” IMA, January 25-29, 2016 convex function (assume lower semicontinuous) min f(x) closed, convex set s.t. x Q 2 graph of f x graph of the function y f(x)+ g, y x 7! h − i a subgradient of f at x The set of all subgradients at x is denoted @f(x) – the “subdi↵erential” at x For a concave function, supgradients play the analogous role. convex function (assume lower semicontinuous) min f(x) closed, convex set s.t. x Q 2 Assume f is Lipschitz-continuous on an open neighborhood of Q: f(x) f(y) M x y | − | k − k Lipschitz constant Goal: Compute x Q satisfying f(x) f ⇤ + ✏ 2 optimal value A typical subgradient method: Initialize: x Q 0 2 ✏ Iterate: Compute gk @f(xk), and let xk+1 = PQ( xk g 2 gk ) 2 − k kk where PQ is projection onto Q A typical theorem: an optimal solution 2 M x0 x⇤ ` k − k min f(xk) f ⇤ + ✏ ≥ ✏ ) k ` ✓ ◆ In the special case of linear programming this becomes ... min cT x s.t. Ax b so Q = x : Ax b ≥ { ≥ } Then, of course, the objective function is Lipschitz continuous: cT x cT y c x y | − | k kk − k Lipschitz constant T Goal: Compute x satisfying Ax b and c x z⇤ + ✏ ≥ optimal value A typical subgradient method: Initialize: x Q 0 2 ✏ Iterate: Let xk+1 = PQ( xk c 2 c ) − k k where PQ is projection onto Q A typical theorem: an optimal solution 2 c x0 x⇤ T ` k kk − k min c xk z⇤ + ✏ ≥ ✏ ) k ` ✓ ◆ But in general, projecting onto Q = x : Ax b is no easier than solving linear{ programs!!!≥ } A typical subgradient method: Initialize: x Q 0 2 ✏ Iterate: Let xk+1 = PQ( xk c 2 c ) − k k where PQ is projection onto Q But in general, projecting onto Q = x : Ax b is no easier than solving linear{ programs!!!≥ } There are ways, however, to use a subgradient method to “solve” an LP. For example, here is a way to “approximate” an LP by an unconstrained convex optimization problem: min cT x min cT x + γ max 0,b ↵T x : i =1,...,m s.t. Ax b ⇡ { i − i } ≥ user-chosen postive constant However, the optimal solution for the problem on the right will not necessarily be feasible for LP. In the literature, in fact, the only subgradient methods producing feasible iterates require the feasible region to be “simple.” “Why is this?” min cT x s.t. Ax = b x 0 ≥ Think of this 2-dimensional plane as being the slice of Rn cut out by x : Ax = b . { } min cT x s.t. Ax = b x 0 ≥ ⇡⇤ optimal solution Assume the objective function x cT x is constant on horizontal slices. 7! T 1 min c x s.t. Ax = b x 0 ≥ To begin with simplicity, assume the vector of all one’s is feasible. T 1 min c x s.t. Ax = b x 0 ≥ x Affinez ⇡(x) := x : Ax = b “radial projection” { and cT x = z } T 1 min c x s.t. Ax = b x 0 ≥ x Affinez := x : Ax = b { and cT x = z } ⇡(x) T 1 min c x s.t. Ax = b x 0 ⇡(x) ≥ x Affinez := x : Ax = b { and cT x = z } T 1 min c x s.t. Ax = b x 0 ≥ xz⇤ Affinez := x : Ax = b { and cT x = z } ⇡⇤ = ⇡(xz⇤) T 1 min c x s.t. Ax = b x 0 ≥ Affinez := x : Ax = b { and cT x = z } T 1 min c x s.t. Ax = b x 0 ≥ x Affinez := x : Ax = b { and cT x = z } ⇡(x) = 1 1 + 1 min x (x 1) − j j − z⇤ optimal value, assumed finite T min c x T s.t. Ax = b LP z a fixed value satisfying z<c 1 9 x 0 T = Affinez x : Ax = b and c x = z ≥ { } ; 1 1 x ⇡(x):=1 + 1 min x (x 1) 7! − j j − T T 1 T T c ⇡(x)=c 1 + 1 min x (c x c 1) − j j − T 1 T = c 1 + 1 min x ( z c 1) − j j − a negative constant | {z } Thus, for x, y Affine ,cT ⇡(x) <cT ⇡(y) min x > min y 2 z , j j j j maxx minj xj Theorem: LP is equivalent to s.t. Ax = b cT x = z The only constraints in the equivalent problem are linear equations. It’s thus easy to project onto the feasible region for the equivalent problem. T min c x maxx minj xj s.t. Ax = b LP s.t. Ax = b 9 T x 0 c x = z ≥ = ⌘ ; x min x is the exemplary nonsmooth concave function 7! j j Lipschitz continuous with constant M =1 • Supgradients at x are the convex combinations • of the standard basis vectors e(k) for which xk =minj xj Thus, projected supgradients at x are the convex combinations of the corresponding columns of the projection matrix T T 1 A P¯ := I A¯ (A¯ A¯ )− A¯ where A¯ = T − c ⇥ ⇤ Hence, in implementing a supgradient method, one option in choosing a supgradient at x is simply to compute any column P¯k for which xk =minj xj ✏ x = x + P¯ + P¯ 2 k k kk do not (cannot) For large n, compute columns of P¯ as needed compute (store in memory) all of P¯ With a modest amount of preprocessing work, the cost of each iteration is proportional to the number of nonzero entries in A. min C, X h i I s.t. (X)=b XA 0 ⌫ Now consider a semidefinite program, and for simplicity, assume the identity matrix is feasible. min C, X h i I s.t. (X)=b XA 0 ⌫ X Affinez := X : (X)=b { Aand C, X = z h i } ⇡(X) = 1 I + 1 λ (X) (X I) − min − min C, X h i I s.t. (X)=b equivalent problem XA 0 ⌫ max λmin(X) s.t. (X)=b AC, X = z h i X Affinez := X : (X)=b { Aand C, X = z h i } ⇡(X) = 1 I + 1 λ (X) (X I) − min − min C, X h i I s.t. (X)=b equivalent problem XA 0 ⌫ max λmin(X) s.t. (X)=b AC, X = z Xz⇤ h i X Affinez := X : (X)=b { Aand C, X = z h i } ⇡(X) ⇡(Xz⇤) – so z⇤ = C, ⇡(X⇤) h z i C, ⇡(X) z⇤ Goal: Compute X satisfying h i ✏ C, I z h i ⇤ To accomplish this, how accurately does λmin(X) need to approximate λmin(Xz⇤)? min C, X max λmin(X) s.t. h (X)=i b SDP s.t. (X)=b 9 A XA 0 C, X = z ⌫ = h i C, ⇡(X) ;z⇤ ✏ C, I z h i ✏ λ (X⇤) λ (X) h i C, I z min z − min 1 ✏ C, I z h i ⇤ , − h i ⇤ I δ 1{ Affinez := C, I z δ1 h i = X : (X)=b C, I z δ { A h i ⇤ 2 and δ2 C, X = z h i } To get around the high accuracy required if the ratio is small, and to get around having to know the ratio (that is, having to know the optimal value z⇤), we apply a supgradient method to multiple layers (details of which} can be found in the arXiv posting) . initial user-supplied max λmin(X) SDP-feasible matrix s.t. (X)=b I AC, X = z X0 h i Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X1 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X2 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X3 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X3 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X3 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X4 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X5 Affinez max λmin(X) s.t. (X)=b I AC, X = z h i X5 Affinez initial user-supplied SDP-feasible matrix I X0 = level set for SDP Diam := supremum of diameters of level sets for objective values C, X h 0i min C, X max λmin(X) s.t. h (X)=i b SDP s.t. (X)=b 9 A XA 0 C, X = z ⌫ = h i C, ⇡(X) ;z⇤ ✏ C, I z h i ✏ λ (X⇤) λ (X) h i C, I z min z − min 1 ✏ C, I z h i ⇤ , − h i ⇤ Thm: 1 1 C, I z⇤ ` 8 Diam2 + log h i +1 ≥ · ✏2 ✏ 4/3 C, I C, X ✓ ✓h ih 0i◆ ◆ C, ⇡(X ) z⇤ min h k i ✏ ) k ` C, I z h i ⇤ min c x s.t. Ax· = b e closed, convex cone x 2 K with nonempty interior Now consider a general convex conic optimization problem, and fix a strictly feasible point e. min c x s.t. Ax· = b e closed, convex cone x 2 K with nonempty interior x Affinez := x : Ax = b { and c x = z · } ⇡(x) = 1 e + 1 λ (x) (x e) − min − ...where λ (x) is the scalar λ satisfying x λ e boundary( ) min − 2 K Prop: The map x λ (x) is concave and Lipschitz continuous.

Load more