The Linear Convergence of a Successive Linear Programming

The Linear Convergence of a Successive Linear Programming Algorithm y z Michael C Ferris Sergei K Zavriev December Abstract We present a successive linear programming algorithm for solving constrained nonlinear optimization problems The algorithm employs an Armijo pro cedure for up dating a trust region radius We prove the linear convergence of the metho d by relating the solutions of our sub problems to standard trust region and gradient pro jection subproblems and adapting an error b ound analysis due to Luo and Tseng Com putational results are provided for p olyhedrally constrained nonlinear programs Introduction In this pap er we develop a rst order technique for constrained optimization problems based on subproblems with a linear ob jective function There have b een many attempts to generate robust successive linear programming al gorithms for nonlinear programming see for example This pap er sp ecies a new algorithm that has the same convergence prop erties as the gradient pro jection metho d We mo dify the conditional gradient metho d sometimes called the FrankWolfe metho d for p olyhedrally constrained optimization problems to obtain a linearly con vergent algorithm The only known linear rate of convergence is given in but this is for problems where the constraints are not p olyhedral but have p ositive curvature there are even simple examples p This work is partially supp orted by the National Science Foundation under grant CCR y Computer Sciences Department University of Wisconsin Madison WI z Op erations Research Department Faculty of Computational Mathematics and Cy b ernetics Moscow State University Moscow Russia that show the conditional gradient metho d has a sublinear rate of conver gence The main idea is to use a p olyhedral trust region constraint in the stan dard linear programming subproblem of the conditional gradient metho d This maintains the attractive computational feature that all subproblems are linear programs for which a wealth of highly eective and ecient software has b een developed It also guarantees that every subproblem we generate has an optimal solution We develop an adaptive mechanism to up date the trust region parameter that guarantees the linear convergence of the metho d under the standard assumptions given by Luo and Tseng Our analysis is somewhat involved and relies up on estimating descent prop erties of the directions generated by our algorithm in terms of standard quadratic subproblems arising for example in gradient pro jection algorithms To our knowledge every linearly convergent technique for such problems is based on such quadratic subproblems The main contribution of this pap er is that the solutions of our linear subproblems are considered as approxi mate solutions of the quadratic subproblems and analysis is carried out to precisely quantify this relationship We will consider the following optimization problem P inf f x xX n where X R is a nonempty closed convex set and the ob jective function f is assumed to b e continuously dierentiable Furthermore it will b e assumed that f has a Lipschitz continuous gradient rf on X that is x y X krf x rf y k L kx y k n Here L is a p ositive scalar and kk represents the Euclidean norm on R We also supp ose that f inf f x opt xX and the optimal solution set for P is not empty Let N x denote the normal cone to X at x X The stationary set X for the problem P is the set of all p oints satisfying the rst order optimality conditions for P that is X fx X j rf x N xg stat X fx X j y X hrf x y xi g Our algorithm will converge under standard assumptions to a stationary p oint Conditions relating stationary p oints to solutions of P are well known Our metho d resembles a trust region metho d see by virtue of the fact that the subproblems are of the form min hrf x hi h LP x r sub ject to x h X khk r n where kk represents an arbitrary norm on R and r is the trust region radius We actually prove convergence of our algorithm based on X b eing a convex set and the trust region constraint b eing determined by an arbi trary norm When sp ecialized to the case where X p olyhedral and kk is a p olyhedral norm the subproblems LP x r are linear programs and our algorithm is a successive linear programming algorithm There are two key approximations in our analysis The rst is that we relate solutions of our linear programming descent problem LP x r to the following subproblem min hrf x hi khk h a QP x a sub ject to x h X where a is a corresp onding stepsize When X is p olyhedral QP x a is a quadratic programming problem It is imp ortant that we guarantee in our algorithm that the trust region radius r is not to o small We therefore employ a sp ecial Armijo type pro cedure for choosing r at each iteration The essential diculty in the pro of relates to nding an expression for the inverse ax r of r x a and this is explored in Section The key relationship is b etween solutions of LP x r and QP x a app ears as Lemma Usually rapid convergence techniques ie at least a linear rate for solving P require the solution of a subproblem with a quadratic ob jective function at each iteration k For example the gradient pro jection algorithm solves subproblems of the form QP x a The second piece of our analysis given as Theorem in Section of this pap er relates the solutions of QP x r to solutions of the following standard trust region problem min hrf x hi h TP x r sub ject to x h X khk r Note that the ob jective function of this problem is linear and that the trust region constraint is dened in terms of the Euclidean norm In Section we state the general form of the metho d which we call the Sequential Linearization Metho d and prove its linear convergence to the stationary set X in Section The sp ecial case of p olyhedral constraints stat is considered in Section The resulting algorithm is a linearly convergent sequential linear programming SLP technique for which we also provide some numerical results using Matlab Cplex and the Cute suite of problems Technical Preliminaries For every x X consider the following auxiliary problem min hrf x hi h TP x r sub ject to x h X khk r where r is a parameter of the problem Let Hx r and v x r denote the optimal set and the optimal value of TP x r resp ectively We shall relate this to the following problem min hrf x hi h TP x sub ject to x h X Note that it is p ossible for v x and Hx The standard optimality conditions see for the problem TP x r provide the following lemma n Lemma i For every x X r and h R a and b are equivalent a h Hx r b there exists such that rf x h N x h X x h X khk r khk r n ii For every x X and h R a and b are equivalent a h Hx b rf x N x h x h X X The following corollary is immediate from Lemma Corollary For every x X and r the fol lowing assertion holds x X v x r stat n For a subset S R let x denote the set of all orthogonal pro jections S of x on S that is x fy S j kx y k dist x S g S Here dist x S represents the distance from x to S dist x S inf kx y k y S Let x b e a typical element of x it is clear that if S is closed and S S convex then x f xg S S For x X and a dene p x a x arf x x X Clearly for a p x a is the unique solution of the following problem khk min hrf x hi h a QP x a sub ject to x h X The following two simple results prove to b e useful in the sequel Lemma Let x X and a be given The fol lowing assertions hold i h p x a if and only if h N x h x h X rf x X a ii hrf x p x ai kp x ak a n Corollary Let h R be a limit point of a sequence fp x a g a k k then h Hx We now dene a residual function r x R R by r x a kp x ak where R fx R j x g We collect some well known prop erties of the function r x R R in the following lemma The pro ofs can b e found in Lemma For every x X the fol lowing properties hold i r x C R ii a a r x a r x a r xa r xa iii a a a a iv a r x a x X stat The following lemma b ounds the residual in terms of any solution of TP x r Lemma Let x X and suppose Hx Then for every h Hx and every a r x a kp x ak khk Pro of The result is trivial for a Otherwise let us x arbitrary x X h Hx and a Supp ose that khk kp x ak By the denition of Hx we have hrf x hi hrf x p x ai Hence applying we obtain khk hrf x p x ai kp x ak hrf x hi a a which is a contradiction to the fact that p x a is the solution of QP x a We now investigate some limiting prop erties of this residual function For every x X we let r x lim r x a a Obviously it is p ossible that r x for some x X The follow ing lemma allows us to identify certain limits of the functions p x and r x It is really just a technical result used for proving the ensuing result Lemma Let x X and suppose for some a r x a r x Then x a a a p x a p In this case lim p x a p x a Hx a Pro of Let a a Lemma ii and show that x a kp x ak p We now derive a contradiction to the supp osition that x a p x a p Since p x a and p x a are the unique solutions of the problems QP x a and QP x a resp ectively we have hrf x p x ai x a x a p kp x ak rf x p a a x a p x a rf x p a hrf x p x ai kp x ak a Hence by x a hrf x p x ai hrf x p x ai rf x p which is a contradiction and so is proved It now follows from

The Linear Convergence of a Successive Linear Programming

12. Coordinate Descent Methods

Process Optimization

The Gradient Sampling Methodology

A New Algorithm of Nonlinear Conjugate Gradient Method with Strong Convergence*

Using a New Nonlinear Gradient Method for Solving Large Scale Convex Optimization Problems with an Application on Arabic Medical Text

Conditional Gradient Method for Multiobjective Optimization

Successive Linear Programming to The

Metaheuristic Techniques in Enhancing the Efficiency and Performance of Thermo-Electric Cooling Devices

Branch and Bound Algorithm for Dependency Parsing with Non-Local Features

Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes

Conjugate Gradient Descent Conjugate Direction Methods

Block Stochastic Gradient Method