Acta Numerica pp

Sequential

Paul T Boggs

Applied and Computational Division

National Institute of Standards and Technology

Gaithersburg Maryland

USA

Email boggscamnistgov

Jon W Tolle

Departments of Mathematics and Operations Research

University of North Carolina

Chapel Hil l North Carolina

USA

Email tol lemathuncedu

CONTENTS

Intro duction

The Basic SQP Metho d

Lo cal Convergence

Merit Functions and Global Convergence

Global to Lo cal Behavior

SQP Metho ds

Practical Considerations

References

Intro duction

Since its p opularization in the late s Sequential Quadratic Program

ming SQP has arguably b ecome the most successful metho d for solving

nonlinearly problems As with most optimization

metho ds SQP is not a single algorithm but rather a conceptual metho d

from whichnumerous sp ecic algorithms haveevolved Backed by a solid

theoretical and computational foundation b oth commercial and public do

main SQP algorithms have b een develop ed and used to solve a remarkably

e large set of imp ortant practical problems Recently large scale versions hav

b een devised and tested with promising results

In this pap er we examine the underlying ideas of the SQP metho d and the

theory that establishes it as a framework from which eective algorithms can

Contribution of the National Institute of Standards and Technology and not sub ject to

copyright in the United States

Boggs and Tolle

b e derived In the pro cess we will describ e the most p opular manifestations

of the metho d discuss their theoretical prop erties and commentontheir

practical implementations

The problem to b e solved is

minimize f x

x

NLP

sub ject to h x

g x

n n m n p

where f R h R R and g R R Such problems arise in

avariety of applications in science engineering industry and management

In the form NLP the problem is quite general it includes as sp ecial cases

linear and quadratic programs in which the functions h and g

are ane and f is linear or quadratic While these problems are imp ortant

and numerous the great strength of the SQP metho d is its abilitytosolve

problems with nonlinear constraints For this reason it is assumed that

NLP contains at least one nonlinear constraint

The basic idea of SQP is to mo del NLP at a given approximate solu

k

tion say x by a quadratic programming subproblem and then to use the

k 

solution to this subproblem to construct a b etter approximation x This

pro cess is iterated to create a sequence of approximations that it is hop ed

will converge to a solution x Perhaps the key to understanding the p er

formance and theory of SQP is the fact that with an appropriate choice of

quadratic subproblem the metho d can b e viewed as the natural extension

of Newton and quasiNewton metho ds to the constrained optimization set

ting Thus one would exp ect SQP metho ds to share the characteristics of

Newtonlike metho ds namely rapid convergence when the iterates are close

to the solution but p ossible erratic b ehavior that needs to b e carefully con

trolled when the iterates are far from a solution While this corresp ondence

is valid in general the presence of constraints makes b oth the analysis and

implementation of SQP metho ds signicantly more complex

Two additional prop erties of the SQP metho d should b e p ointed out

First SQP is not a feasiblep oint metho d that is neither the initial p oint

nor any of the subsequent iterates need b e feasible a feasible p oint satises

all of the constraints of NLP This is a ma jor advantage since nding a

feasible p oint when there are nonlinear constraints may b e nearly as hard

as solving NLP itself SQP metho ds can b e easily mo died so that lin

ear constraints including simple b ounds are always satised Second the

success of the SQP metho ds dep ends on the existence of rapid and accurate

algorithms for solving quadratic programs Fortunately quadratic programs

are easy to solve in the sense that there are a go o d pro cedures for their so

lution Indeed when there are only equality constraints the solution to a quadratic program reduces to the solution of a linear system of

Sequential Quadratic Programming

When there are inequality constraints a sequence of systems mayhavetobe

solved

A comprehensive theoretical foundation do es not guarantee that a pro

p osed algorithm will b e eective in practice In reallife problems hyp otheses

may not b e satised certain constants and b ounds may not b e computable

matrices maybenumerically singular or the scaling may b e p o or A suc

cessful algorithm needs adaptive safeguards that deal with these pathologies

The algorithmic details to overcome such diculties as wellasmoremun

dane questionshowtocho ose parameters how to recognize convergence

and how to carry out the numerical linear algebraare lump ed under the

term implementation While a detailed treatment of this topic is not p os

sible here we will take care to p oint out questions of implementation that

p ertain sp ecically to the SQP metho d

This survey is arranged as follows In Section we state the basic SQP

metho d along with the assumptions ab out NLP that will hold throughout

the pap er We also make some necessary remarks ab out notation and ter

minology Section treats local convergence that is b ehavior of the iterates

when they are close to the solution Rates of convergence are provided b oth

in general terms and for some sp ecic SQP algorithms The goal is not to

present the strongest results but to establish the relation b etween Newtons

metho d and SQP to delineate the kinds of quadratic mo dels that will yield

satisfactory theoretical results and to place currentvariations of the SQP

metho d within this scheme

The term global is used in two dierentcontexts in nonlinear optimiza

tion and is often the source of confusion An algorithm is said to b e global ly

convergent if under suitable conditions it will con verge to some lo cal so

lution from any remote starting p oint Nonlinear optimization problems

can havemultiple lo cal solutions the global solution is that lo cal solution

corresp onding to the least value of f SQP metho ds like Newtons metho d

and steep est descent are only guaranteed to nd a lo cal solution of NLP

they should not b e confused with algorithms for nding the global solution

which are of an entirely dierentavor

To establish global convergence for constrained optimization algorithms

away of measuring progress towards a solution is needed For SQP this is

done by constructing a merit function a reduction in which implies that an

acceptable step has b een taken In Section two standard merit functions

are dened and their advantages and disadvantages for forcing global con

vergence are considered In Section the problems of the transition from

global to lo cal convergence are discussed In Sections and the emphasis

is on metho ds Because trust region metho ds whichhavebeen

found to b e eective in unconstrained optimization have b een extended to

t into the SQP framework a brief description of this approachisgiven in

Boggs and Tolle

Section Section is devoted to implementation issues including those

asso ciated with large scale problems

Toavoid interrupting the ow of the presentation comments on related

results and references to the literature are provided at the end of eachsec

tion The numb er of pap ers on the SQP metho d and its variants is large

and space prohibits us from compiling a complete list of references wehave

tried to give enough references to each topic to direct the interested reader

to further material

Notes and References

The earliest reference to SQPtyp e algorithms seems to have b een in the

PhD thesis of Wilson at Harvard University in which he prop osed the

metho d we call in Section the NewtonSQP algorithm The development

of the secantorvariablemetric algorithms for unconstrained optimization

in the late s and early s naturally led to the extension of these

metho ds to the constrained problem The initial work on these metho ds

was done by Mangasarian and his students at the University of Wisconsin

GarciaPalomares and Mangasarian investigated an SQPtyp e algo

rithm in which the entire Hessian of the Lagrangian ie the matrix

of second derivatives with resp ect to b oth to x and the multipliers was

up dated at each step Shortly thereafter Han Han and Han

provided further imp etus for the study of SQP metho ds In the rst pap er

Han gave lo cal convergence and rate of convergence theorems for the PSB

and BFGSSQP algorithms for the inequalityconstrained problem and in

the second employed the merit function to obtain a global convergence



theorem in the conv ex case In a series of pap ers presented at conferences

Powell Powell a and Powell b Hans work was brought

to the attention of the general optimization audience From that time there

has b een a continuous pro duction of research pap ers on the SQP metho d

As noted a signicantadvantage of SQP is that feasible p oints are not

required at any stage of the pro cess Nevertheless a version of SQP that

always remains feasible has b een develop ed and studied for example by

Bonnans Panier Tits and Zhou

The Basic SQP Metho d

Assumptions and Notation

As with virtually all nonlinear problems it is necessary to makesomeas

sumptions on the problem NLP that clarify the class of problems for which

the algorithm can b e shown to p erform well These assumptions as well as

the consequent theory of nonlinear programming are also needed to de

scrib e the SQP algorithm itself In this presentation we do not attempt to

Sequential Quadratic Programming

make the weakest assumptions p ossible but rather provide what we consider

reasonable assumptions under which the main ideas can b e illustrated

Wemake the blanket assumption that all of the functions in NLP are

three times continuously dierentiable We denote the of a scalar

valued function by r eg rf x Here and throughout the pap er all

t

vectors are assumed to b e column vectors we use the sup erscript to denote

transp ose For vectorvalued functions wealsouse r to denote the Jacobian

of the function For example

rhx rh x rh xrh x

  m

The Hessian of a scalarvalued function denoted by the letter H is dened

th

to b e the whose i j comp onentis



f x

Hf x

ij

x x

i j

Where a function is dened on more than one set of variables such as the

Lagrangian function dened b elow the dierentiation op erators r and H

will refer to dierentiation with respect to x only

Akey function one that plays a central role in all of the theory of con

strained optimization is the scalarvalued Lagrangian function dened by

t t

Lx u v f xu hxv g x

m p

where u R and v R are the multiplier vectors Given a vector x the

set of active constraints at x consists of the inequality constraintsifany

satised as equalities at xWe denote the index set of active constraints by

fi g x g I x

i

The matrix Gx made up of the matrix rhx along with the columns

rg xiIx will b e imp ortant in describing the basic assumptions

i

and carrying out the subsequent analyses Assuming that the matrix Gx

t

has full column rank the null space of Gx denes the tangent space to

the equality and active inequality constraints at x The pro jection onto this

tangent space can b e written as



t t

P xI Gx Gx Gx Gx

The corresp onding pro jection onto the range space of Gx will b e written

Qx I Px

k

For convenience these pro jection matrices evaluated at iterates x andata

k k

solution x will b e denoted by P P Q and Q Similarlywe will write

H L H Lx u v throughout the remainder of this pap er

Boggs and Tolle

In this pap er x will representany particular lo cal solution of NLP We

assume that the following conditions apply to each such solution

A The rst order necessary conditions hold ie there exist optimal mul

tiplier vectors u and v such that

rLx u v rf x rh x u rg x v

A The columns of Gx are linearly indep endent

A Strict complementary slackness holds ie

g x v

i

i

then v for i p and if g x

i

i

A The Hessian of the Lagrangian function with resp ect to x is p ositive

t

denite on the null space of Gx ie

t

d H L d

t

for all d such that Gx d

The ab ove conditions sometimes called the strong second order sucient

conditions in fact guarantee that x is an isolated lo cal minimum of NLP

and that the optimal multiplier vectors u and v are unique It should b e

noted that without strong additional assumptions NLP mayhavemultiple

lo cal solutions

We use the term critical point to denote a feasible p oint that satises the

rst order necessary conditions A A critical p ointmayormaynotbea

lo cal minimum of NLP

In discussing convergence of SQP algorithms the asymptotic rate of con

ys an imp ortant role Three standard measures of convergence vergence pla

rates will b e emphasized in this pap er In the denitions that follow and

throughout the pap er the norm kk will refer to the norm unless sp eci

cally noted otherwise Other norms can b e used for most of the analysis

k

Denition Let fx g b e a sequence converging to x The sequence is

said to converge linearly if there exists a p ositive constant such that

k k 

x x x x

for all k suciently large superlinearly if there exists a sequence of p ositive

constants such that

k

k k 

x x x x

k

for all k suciently large and quadr atical ly if there exists a p ositive constant

such that



k k 

x x x x

Sequential Quadratic Programming

for all k suciently large

These rates which measure improvementateach step are sometimes

referred to as Qrates We will also have o ccasion to state results in terms of

a measure of the average rate of convergence called the Rrate A sequence

k k

fx g will b e said to converge Rlinearly if the sequence f g is x x

b ounded by a sequence that converges Qlinearly to zero Similar denitions

exist for Rsup erlinear and Rquadratic There are two relations b etween

Rrate and Qrate convergence that are of imp ortance for this pap er First

mstep Qlinear convergence where k is replaced by k m in the ab ove

denitions implies an Rlinear rate of convergence and second a Qrate of

convergence of a sequence of vectors implies the same Rrate but not the

same Qrate of convergence of its comp onents In the analyses that follow

rates not designated explicitly as Q or Rrates are assumed to b e Qrates

The Quadratic Subproblem

As suggested in the Intro duction the SQP metho d is an iterativemethodin

k

which at a current iterate x the step to the next iterate is obtained through

information generated by solving a quadratic subproblem The subproblem

is assumed to reect in some way the lo cal prop erties of the original problem

The ma jor reason for using a quadratic subproblem ie a problem with a

quadratic ob jective function and linear constraints is that such problems

are relatively easy to solve and yet in their ob jective function can reect

the nonlinearities of the original problem The technical details of solving

quadratic programs will not b e dealt with here although the algorithmic

issues involved in solving these quadratic subproblems are nontrivial and

their resolution will aect the overall p erformance of the SQP algorithm

Further comments on this are made in Section

A ma jor concern in SQP metho ds is the choice of appropriate quadratic

k

subproblems At a currentapproximation x a reasonable choice for the

k

constraints is a linearization of the actual constraints ab out x Thus the

quadratic subproblem will have the form

t 

k t

d B d minimize r d

x k x x



d

x

k t k

sub ject to rhx d hx

x

k t k

rg x d g x

x

k k

where d x x The vector r and the symmetric matrix B remain to

x k

be chosen

The most obvious choice for the ob jective function in this quadratic pro

k

gram is the lo cal quadratic approximation to f at x That is B is taken

k

k k

as the Hessian and r as the gradientoff at x While this is a reasonable

approximation to use if the constraints are linear the presence of nonlin

Boggs and Tolle

ear constraints makes this choice inappropriate For example consider the

problem





minimize x x

 



x

 

sub ject to x x

 

The p oint is a solution satisfying AA but at the p oint

the approximating quadratic program is with d replacing d

x





d minimize d

 



d





sub ject to d





whichisunb ounded regardless of how small is Thus the algorithm that

computes d breaks down for this problem

Totake nonlinearities in the constraints into account while maintaining

the linearity of the constraints in the subproblem the SQP metho d uses a

quadratic mo del of the Lagrangian function as the ob jective This can b e

justied by noting that conditions AA imply that x is a lo cal minimum

for the problem

minimize Lx u v

x

sub ject to hx

g x

Note that the constraint functions are included in the ob jective function for

this equivalent problem Although the optimal multipliers are not known

k k

approximations u and v to the multipliers can b e maintained as part of the

k k k

iterative pro cess Then given a current iterate x u v the quadratic

aylor series approximation in x for the Lagrangian is T

t t



k k k k k k k k k

H Lx u v d d Lx u v rLx u v d

x x x



A strong motivation for using this function as the ob jective function in the

quadratic subproblem is that it generates iterates that are identical to those

generated by Newtons metho d when applied to the system comp osed of the

rst order necessary condition condition A and the constraint equations

including the active inequality constraints This means that the resulting

algorithm will have go o d lo cal convergence prop erties In spite of these lo cal

convergence prop erties there are go o d reasons to consider choices other than

the actual Hessian of the Lagrangian for example approximating matrices

that have prop erties that p ermit the quadratic subproblem to b e solved at

k

any x and the resulting algorithm to b e amenable to a global convergence

k k k

analysis Letting B b e an approximation of H Lx u v we can write k

Sequential Quadratic Programming

the quadratic subproblem as

t 

k k k t

d B d minimize rLx u v d

x k x x



d

x

k t k

sub ject to rhx d hx

x

k t k

rg x d g x

x

The form of the quadratic subproblem most often found in the literature

and the one that will b e employed here is

 t

k t

minimize rf x d d B d

x x k x



d

x

k t k

QP

sub ject to rhx d hx

x

k t k

rg x d g x

x

These two forms are equivalent for problems with only equality constraints

k t

since by virtue of the linearized constraints the term rh x d is constant

x

t 

k t

d B d The two sub and the ob jective function b ecomes rf x d

x k x x



problems are not quite equivalent in the inequalityconstrained case unless

k

the multiplier estimate v is zero for all inactive linear constraints However

QP is equivalentto for the slackvariable formulation of NLP given

by

minimize f x

x z

sub ject to hx

g xz

z

p

where z R is the vector of slackvariables Therefore QP can b e con

sidered an appropriate quadratic subproblem for NLP

k 

The solution d of QP can b e used to generate a new iterate x by

x

k

taking a step from x in the direction of d But to continue to the next

x

iteration new estimates for the multipliers are needed There are several

ways in which these can b e chosen but one ob vious approach is to use the

optimal multipliers of the quadratic subproblem Other p ossibilities will b e

discussed where appropriate Letting the optimal multipliers of QP b e

denoted by u and v and setting

qp qp

k

d u u

u qp

k

d v v

v qp

allow the up dates of xuv to b e written in the compact form

k  k

x x d

x

k  k

u u d

u

k  k

v v d

v

for some selection of the steplength parameter Once the new iterates

Boggs and Tolle

are constructed the problem functions and derivatives are evaluated and a

prescrib ed choice of B calculated It will b e seen that the eectiveness

k 

of the algorithm is determined in large part by this choice

The Basic Algorithm

Since the quadratic subproblem QP has to b e solved to generate steps in

the algorithm the rst priority in analyzing an SQP algorithm is to deter

mine conditions that guarantee that QP has a solution Tohave a solution

the system of constraints of QP must have a nonempty feasible set ie

the system must b e consistent and the quadratic ob jective function should

b e b ounded b elow on that set although a lo cal solution can sometimes exist

without this condition The consistency condition can b e guaranteed when

k

x is in a neighborhood of x by virtue of Assumption A but dep ending

on the problem may fail at nonlo cal p oints Practical means of dealing

with this p ossibility will b e considered in Section An appropriate choice

of B will assure that a consistent quadratic problem will have a solution

k

the discussion of this p oint will b e a signicant part of the analysis of the

next two sections

Assuming that QP can b e solved the question of whether the sequence

generated by the algorithm will converge must then b e resolved As de

scrib ed in the Intro duction convergence prop erties generally are classied

as either lo cal or global Local convergence results pro ceed from the as

sumptions that the initial xiterate is close to a solution x and the initial

Hessian approximation is in an appropriate sense close to H L Conditions

k

on the up dating schemes that ensure that the x and the B stay close to

k

x and H L are then formulated These conditions allow one to conclude

that QP is a go o d mo del of NLP at each iteration and hence that the

system of equations dened by the rst order conditions and the constraints

for QP are nearly the same as those for NLP at x Lo cal convergence

pro ofs can b e mo deled on the pro of of convergence of the classical Newtons

metho d which assumes in

Convergence from a remote starting p oint is called global convergenceAs

stated in the Intro duction to ensure global convergence the SQP algorithm

needs to b e equipp ed with a measure of progress a merit function whose

reduction implies progress towards a solution In order to guarantee that

is reduced at each step a pro cedure for adjusting the steplength parameter

in is required Using the decrease in itcanbeshown that under

certain assumptions the iterates will converge to a solution or to b e precise



a p otential solution even if the initial xiterate x is not close to a solution

Lo cal convergence theorems are based on Newtons metho d whereas global

convergence theorems are based on descent Ideally an algorithm should

b e such that as the iterates get close to the solution the conditions for

Sequential Quadratic Programming

lo cal convergence will b e satised and the lo cal convergence theory will ap

ply without the need for the merit function In general a global algorithm

although ultimately forcing the xiterates to get close to x do es not auto

matically force the other lo cal conditions such as unit steplengths and close

Hessian approximations to hold and therefore the merit function must b e

used throughout the iteration pro cess Since there is no practical wayto

know when if ever the lo cal convergence conditions will hold an implemen

tations true p erformance can b e deduced only from numerical exp erience

These issues will b e discussed more fully in Section

With this background wecannow state a basic SQP algorithm This tem

plate indicates the general steps only without the numerous details required

for a general co de

Basic Algorithm

  

Given approximations x u v B and a merit function set k



Form and solve QP to obtain d d d

x u v

Cho ose steplength so that

k k

x d x

x

Set

k  k

d x x

x

k  k

u u d

u

k  k

v v d

v

Stop if converged

Compute B

k 

Set k k goto

Notes and References

The basic second order sucient conditions as well as a description of the

ma jor theoretical ideas of nitedimensional nonlinear constrained optimiza

tion can b e found in numerous sources See eg Luenb erger Nash

and Sofer or Gill Murray and Wright The basic denitions

for various rates of convergence and the relations among them can b e found

in Ortega and Rheinb oldt and Dennis and Schnab el

The trust region metho ds discussed in Section use dierent quadratic

subproblems than those given here Also Fukushima considers a

dierent quadratic subproblem in order to avoid the Maratos eect discussed in Section

Boggs and Tolle

Lo cal Convergence

In this section the theory of lo cal convergence for the most common versions

of the SQP metho d is develop ed The lo cal convergence analysis establishes

conditions under which the iterates converge to a solution and at what rate

  

given that that the starting data eg x u v B are suciently close to



the corresp onding data at a solution x As will b e seen it is the metho d of

generating the matrices B that will b e the determining factor in obtaining

k

the convergence results

An SQP metho d can have excellent theoretical lo cal convergence prop

erties quadratic sup erlinear or twostep sup erlinear convergence of the

x iterates can b e achieved by requiring the B to approximate H L in

k

an appropriate manner Although each of these lo cal convergence rates is

achievable in practice under certain conditions the need for a globally con

vergent algorithm further restricts the choice of the B Certain choices

k

of the B have led to algorithms that extensivenumerical exp erience has

k

shown to b e quite acceptable Therefore while there is a gap b etween what

is theoretically p ossible and what has b een achieved in terms of lo cal con

vergence this discrepancy should not obscure the very real progress that

the SQP metho ds represent in solving nonlinearly constrained optimization

problems

Two imp ortant assumptions simplify the presentation First it will b e

assumed that the active inequality constraints for NLP at x are known As

will b e discussed in Section this assumption can b e justied for manyofthe

k

SQP algorithms b ecause the problem QP at x will have the same active

k

constraints as NLP at x when x is near x The fact that the active

k

constraints and hence the inactive constraints are correctly identied at x

means that those inequality constraints that are inactive for NLP at x can

b e ignored and those that are active can b e changed to equality constraints

without changing the solution of QPThus under this assumption only

equalityconstrained problems need b e considered for the lo cal analysis For

the remainder of this section NLP will b e assumed to b e an equality

constrained problem and the reference to inequalitymultiplier vectors the

t

v g x term will b e eliminated from the Lagrangian For reference we

rewrite the quadratic subproblem with only equality constraints

 t

k t

minimize rf x d d B d

x x k x



d

x

ECQP

k t k

sub ject to rhx d h x

x

The second assumption follows from the fact that the lo cal convergence

for SQP metho ds as for many optimization metho ds is based on Newtons

metho d that is the convergence and rate of convergence results are ob

tained by demonstrating an asymptotic relation b etween the iterates of the metho d b eing analyzed and the classical Newton steps Because Newtons

Sequential Quadratic Programming

metho d for a system with a nonsingular Jacobian at the solution requires

that unit steps b e taken to obtain rapid convergence we assume that the

merit function allows to b e used in the up date formulas Simi

lar results to those given in this section can b e obtained if the steplengths

converge to one suciently quickly

For the equalityconstrained problem a go o d initial estimate of x can b e

used to obtain a go o d estimate for the optimal multiplier vector The rst

order necessary condition A together with condition A leads to the

formula

t  t

u rhx A rhx rhx A rf x

for any nonsingular matrix A that is p ositive denite on the null space of

t

rhx In particular if A is taken to b e the identity matrix then

denes the solution of the rst order necessary conditions By

our smo othness assumptions it follows that

t  t

    

u rhx rhx rhx rf x



can b e made arbitrarily close to u bycho osing x close to x Consequently

no additional assumption will b e made ab out an initial optimal multiplier

estimate for the lo cal convergence analysis

Denoting the optimal multiplier for ECQP by u we see that the rst

qp

order conditions for this problem are

k k

B d rhx u rf x

k x qp

t

k k

rhx d h x

x

If as discussed in Section w eset

k  k

u u u d

qp u

then the ab ove equations b ecome

k k k

B d rhx d rL x u

k x u

t

k k

rhx d h x

x

Wearenow in a p osition to b egin the analysis of the SQP metho ds

The Newton SQP Method

The straightforward SQP metho d derived from setting

k k

B H Lx u

k



will b e analyzed rst Assuming that x is close to x it follows from

  

u is that u can b e presumed to b e close to u and hence that H Lx

close to H L The lo cal convergence for the SQP algorithm nowfollows

Boggs and Tolle

from the application of Newtons metho d to the nonlinear system of equa

tions obtained from the rst order necessary conditions and the constraint

rLx u

x u

hx

From assumptions A and A the Jacobian of this system at the solution

H L rhx

J x u

t

rh x

is nonsingular Therefore the Newton iteration scheme

k  k

x x s

x

k  k

u u s

u

where the vector s s s is the solution to

x u

k k k k

J x u s x u

 

yields iterates that converge to x u quadratically provided x u is

suciently close to x u The equations are identical to equations

k k

and with B H Lx u d s and d s Consequently

k x x u u

k  k 

the iterates x u are exactly those generated by the SQP algorithm

For this reason wecallthisversion of the algorithm the lo cal Newton SQP

metho d The basic lo cal convergence results are summarized in the following

theorem



Theorem Let x b e an initial estimate of the solution to NLP and

k k 

x u g is let u be given by Supp ose that the sequence of iterates f

generated by

k  k

x x d

x

k  k

u u d

u

k

where d and u u d are the solution and multiplier of the quadratic

x qp u

 k k

x x program ECQP with B H Lx u Then if is suciently

k

small the sequence of iterates in x uspace is welldened and converges

quadratically to the pair x u

It is imp ortant to emphasize that this version of the SQP algorithm always

requires a unit step in the variables there is no line search to try to determine

a b etter p oint If a line search is used the socalled damp ed Newton metho d

then the rate of convergence of the iterates can b e signicantly decreased

usually to linear

In a sense the Newton version of the SQP algorithm can b e thoughtof

as the ideal algorithm for solving the nonlinear program it provides rapid

convergence with no requirement for line searches and no necessity for the

Sequential Quadratic Programming

intro duction of additional parameters Of course in this form it has little

practical value It is often dicult to cho ose an initial p oint close enough

to the true solution to guarantee that the Newton algorithm will converge

k k

to a minimum of f When remote from the solution H Lx u cannot

b e assumed to b e p ositive denite on the appropriate subspace and hence

a solution to the subproblem ECQP cannot b e guaranteed to exist The

value of this Newton SQP metho d is that it provides a standard against

which the lo cal b ehavior of other versions can b e measured In fact to obtain

sup erlinear convergence it is necessary and sucient that the steps aproach

the Newton steps as the solution is neared Despite the disadvantages the

k k

use of H Lx u often makes ECQP an excellent mo del of NLP and

its use combined with some of the techniques of Sections and can

k k

lead to eective algorithms The actual computation of H Lx u can b e

accomplished eciently by using nite dierence techniques or by automatic

dierentiation metho ds

Conditions for Local Convergence

Here we discuss the theoretical prop erties of the approximation matrices

B that are sucient to guarantee that the lo cal SQP metho d will give

k

Newtonlike convergence results In the following subsections the practical

attempts to generate matrices that satisfy these prop erties will b e describ ed

In order to formulate conditions on B that will yield lo cally convergent

k

SQP algorithms the following assumption on H L will b e imp osed

A The matrix H L is nonsingular

In lightofA most of the results that follow could b e extended to cover

the case where this condition is not satised but the resulting complexity

of the arguments would clutter the exp osition

The following conditions on the matrix approximations will b e referenced

in the remainder of the pap er

B The matrices B are uniformly p ositive denite on the null spaces of

k

k t

the matrices rhx iethere exists a such that for each k





t

d B d kdk

k 

for all d satisfying

t

k

rh x d

B The sequence fB g is uniformly b ounded ie there exists a

k 

such that for each k

B k k

k 

B The matrices B have uniformly b ounded inverses ie there exists a k

Boggs and Tolle



such that for each k B exists and

 k



B

 k

As the SQP metho ds require the solution of the quadratic subproblem at

each step it is imp erative that the choice of matrices B make that p os

k

sible The second order sucient conditions for a solution to ECQP to

exist require that the matrix B b e p ositive denite on the null space of

k

k t

rhx cf A Condition B is a slight strengthening of this require

ment Assumption A and the fact that the matrices B are to approximate

k

the Hessian of the Lagrangian suggest that conditions B and B are not

unreasonable

k k

Under assumptions BB and have a solution provided x u

is suciently close to x u Moreover the solutions can b e written in

the form

t   t 

k k k k k k

d rhx B rh x hx rhx B rLx u

u k k

and



k k 

rL x u d B

x k

k 

where u is given by In particular leads to the relation

t  t  

k k k k  k k

hx rhx B rf x u rhx B rhx

k k

k

W x B

k



Setting A B in yields

k

u W x B

k

It can now b e deduced that

k  k

u u W x B W x B

k k

t  t  

k

B B H L x x rhx B rhx rhx

k k k

w

k

where by assumptions B and B



k

w x x

k

for some indep endentofk It should b e noted that can b e used in

k

conjunction with Theorem to prove that the sequence fx g is quadrati

cally convergent when the Newton SQP metho d is used and H L is nonsin

k

gular In that case fu g converges Rquadratically

Equations and and A nowyield

h i



k  k k k 

rLx x x x x B u rLx u

k

h i



k k 

B B H L x x rhx u u

k k

Sequential Quadratic Programming



k

O x x





k k

V B H L x x O B x x

k k k

where

t   t 

V I rhx rhx B rhx rhx B

k k k

The pro jection op erator dened by which in the equalityconstrained

case has the form

t  t

x I rh xrhx rhx rh x P

satises

k

V P V

k k

Thus leads to the inequality



k

k  k

x x B kV k P B H L x x

k k k



k

x x O

Using induction the ab o ve analysis can b e made rigorous to yield the fol

lowing lo cal convergence theorem

Theorem Let assumptions BB hold Then there exist p ositive

constants and such that if



x x



u u

and

k

k k

P B H L x x x x

k

k k k

for all k then the sequences fx g and fx u g generated by the SQP algo

x u resp ectively rithm are well dened and converge linearly to x and

k

The sequence fu g converges Rlinearly to u

Condition is in conjuction with the other hyp otheses of the theo

rem almost necessary for linear convergence In fact if the other hyp otheses

k

of the theorem hold and the sequence fx g converges linearly then there

exists a such that

k

k k

x x P B H L x x

k

for all k These results and those that follow indicate the crucial role that

plays in the theory The inequality is the pro jection of B H L k

Boggs and Tolle

guaranteed by either of the stronger conditions

k

P B H L

k

or

kB H L k

k

which are easier to verify in practice

In order to satisfy it is not necessary that the approximations

converge to the true Hessian but only that the growth of the dierence

kB H L k be kept under some control In the quasiNewton theory for

k

unconstrained optimization this can b e accomplished if the approximating

matrices have a prop erty called bounded deterioration This concept can b e

generalized from unconstrained optimization to the constrained setting in a

straightforward way

Denition A sequence of matrix approximations fB g for the SQP

k

metho d is said to have the bounded deterioration prop erty if there exist

such that constants and indep endentofk

 

kB H L k kB H L k

k   k k  k

where

k  k k  k

maxf x x x x u u u u g

k

It seems plausible that the b ounded deterioration condition when applied

to the SQP pro cess will lead to a sequence of matrices that satisfy B

B and provided the initial matrix is close to the Hessian of the

k k

Lagrangian and x u is close to x u Indeed this is the case as can

b e shown by induction to lead to the following result

k k

Theorem Supp ose that the sequence of iterates fx u g is gener

ated by the SQP algorithm and the sequence fB g of symmetric matrix

k



approximations satises B and If x x and kB H L k are





suciently small and u is given by then the hyp otheses of Theorem

hold

The linear convergence for the iterates guaranteed by the ab ove theorem is

hardly satisfactory in light of the quadratic convergence of the Newton SQP

metho d As would b e exp ected a stronger relation b etween the approxima

tion matrices B and the Hessian of the Lagrangian is needed to improve

k

the rate of convergence however this relation still dep ends only on the pro

jection of the dierence b etween the approximation and the true Hessian

With some eort the following theorem can b e deduced as a consequence of

the inequalities ab ove

k k

hold and let the sequence fx u g Theorem Let assumptions BB

Sequential Quadratic Programming

k

b e generated by the SQP algorithm Assume that x x Then the se

k

quence fx g converges to x sup erlinearly if and only if the matrix approx

imations satisfy

k

k  k

P B H L x x

k

lim

k  k

k 

x x

k

If this equation holds then the sequence fu g converges Rsup erlinearl y to

k k

u and the sequence fx u g converges sup erlinearly

Not that this theorem requires convergence of the xiterates do es

not app ear to b e enough to yield convergence by itself A slightly dierent

result that uses a twosided approximation of the Hessian approximation

can b e obtained by writing as

k k k k

P B H L Q s P B H L P s

k k k k

lim

k 

ks k ks k

k k

k  k

where s x x It can b e shown that if only the rst term go es to

k

zero then a weaker form of sup erlinear convergence holds

k k

Theorem Let assumptions BB hold and let the sequence fx u g

k

b e generated by the SQP algorithm Assume that x x Then if

k k

k  k

P B H L P x x

k

lim

k  k

k 

x x

k

the sequence fx g converges to x twostep sup erlinearly

k  k

If the sequence fx x g approaches zero in a manner tangenttothenull

space of the Jacobians ie

k

k  k

Q x x

lim

k  k

k 

x x

then implies and sup erlinear convergence results This tan

gential convergence has b een observed in practice for some of the metho ds

discussed b elow

It has b een dicult to nd useful up dating schemes for the B that sat

k

isfy these conditions for linear and sup erlinear convergence Two basic ap

proaches have b een tried for generating go o d matrix approximations In ful l

Hessian approximations the matrices B are chosen to approximate H L

k

while in reduced Hessian approximations only matrices that approximate the

Hessian on the null space of the Jacobians of the constraints are computed

Each of these metho ds will b e describ ed in turn

Boggs and Tolle

Ful l Hessian Approximations

An obvious choice of the matrix B is a nite dierence approximation of

k

k k

the Hessian of the Lagrangian at x u It is clear from the Theorems

and that if a nite dierence metho d is used the resulting sequence

will b e sup erlinearly convergent if the nite dierence step size go es to zero

and will have rapid linear convergence for xed step size Of course using

nite dierence approximations while not requiring evaluation of second

derivatives suers from the same global diculties as the Newton SQP

metho d describ ed in the earlier subsection

As has b een seen one way of obtaining convergence is to use a sequence

of Hessian approximations that satises the b ounded deterioration prop erty

However this prop ertyisnotby itself enough to guarantee that

is satised and hence it do es not guarantee sup erlinear convergence The

condition essentially requires the comp onent of the step generated

by the Hessian approximation in the direction of the null space to converge

asymptotically to the corresp onding comp onent of the step generated by

the Newton SQP metho d In the following a class of approximations called

secant approximations which satisfy a version of this condition and have

the b ounded deterioration prop ertyaswell is considered

Under the smo othness assumptions of this pap er the Lagrangian satises

k  k  k k  k  k  k  k

rL x u rLx u H Lx u x x

with equality if the Lagrangian is quadratic in x As a result it makes sense

k  k 

to approximate the Hessian of the Lagrangian at x u by requiring

B to satisfy

k 

k  k k  k  k k 

B x x rLx u rLx u

k 

esp ecially since this approximation strongly suggests that is likely to

b e satised near the solution Equation is called the secant equation

it plays an imp ortant role in the algorithmic theory of nonlinear systems

and unconstrained optimization

A common pro cedure for generating the Hessian approximations that sat

k  k 

B and then to up date isfy is to compute x u with a given

k

B according to

k

B B U

k  k k

where U is a rankone or ranktwo matrix that dep ends on the values of

k

k k k  k 

B x u x and u Incho osing secant approximations that have

k

the b ounded deterioration prop erty it is natural to lo ok to those up dating

schemes of this typ e that havebeendevelop ed for unconstrained optimiza

tion

The ranktwoPowellsymmetricBroyden PSB formula gives one such

Sequential Quadratic Programming

up dating scheme For the constrained case this up date is given by

t t

B B y B ss sy B s

k  k k k

t

s s

t

y B s s

k

t

ss

t 

s s

where

k  k

s x x

and

k  k  k k 

y rL x u rLx u

The PSBSQP algorithm whichemploys this up date has b een shown to have

the desired lo cal convergence prop erties

k k

Theorem Supp ose that the sequence of iterates fx u g is gen

erated by the SQP algorithm using the sequence fB g of matrix approx

k

imations generated by the PSB up date formulas Then if

 

k are suciently small and u is given by and kB H Lx u x x



k k

the sequence fB g is of b ounded deterioration and the iterates x u

k

are well dened and converge sup erlinearly to x u In addition the x

iterates converge sup erlinearly and the multipliers converge Rsup erlinearly

Note that there is no assumption of p ositive deniteness here In fact while

B and B are satised as a result of the b ounded deterioration B do es

not necessarily hold Consequently d and d are not necessarily solutions

x u

of ECQP but of the rst order conditions and

As a practical metho d the PSBSQP algorithm has the advantage over

the Newton SQP metho d of not requiring the computation of the Hessian

of the Lagrangian but as a result yielding only sup erlinear rather than

quadratic convergence However b ecause the matrices are not necessarily

p ositive denite it suers from the same serious drawback the problem

ECQP may not have a solution if the initial starting p oint is not close to

the solution Consequently it do es not app ear to b e useful in establishing

a globally convergent algorithm see however Section As mentioned

ab ove solving ECQP requires that the matrices b e p ositive denite on

the appropriate subspace for each subproblem One way to enforce this is

to require the matrices to b e p ositive denite

A ranktwo up dating scheme that is considered to b e the most eective

for unconstrained problems and has useful p ositive denite prop erties is the

BFGS metho d The formula for this up date generalized to b e applicable to

the constrained problem is

t t

yy B ss B

k k

B B

k  k

t t

s y s B s k

Boggs and Tolle

where s and y are as in and The imp ortant qualities of the

matrices generated by this formula are that they have the b ounded deterio

ration prop erty and satisfy a hereditary p ositive deniteness condition The

latter states that if B is p ositive denite and

k

t

y s

then B is p ositive denite If the matrix approximations are p ositive

k 

denite then ECQP is easily solved and the SQP steps are well dened

Unfortunately b ecause the Hessian of the Lagrangian at x u need only

be positive denite on a subspace it need not b e the case that is

satised and hence the algorithm may break down even close to the solution

However if the Hessian of the Lagrangian is p ositive denite is valid

k k

provided B is p ositive denite and x u is close to x u In this case

k

this BFGSSQP algorithm has the same lo cal convergence prop erties as the

PSBSQP metho d

Theorem Supp ose that H L is p ositive denite and let B b e an ini



k k

u g tial p ositive denite matrix Supp ose that the sequence of iterates fx

is generated by the SQP algorithm using the sequence of matrix approxi



and x x mations generated by the BFGS up date Then if



kB H Lx u k are suciently small and u is given by the se



quence fB g is of b ounded deterioration and is satised Therefore

k

k k

the iterates x u converge sup erlinearly to the pair x u In addi

tion the xiterates converge sup erlinearly and the multipliers converge R

sup erlinearly

The assumption that H L is p ositive denite allows B to b e b oth p ositive



denite and close to H L The requirement that H L be positive denite is

satised for example if NLP is convex but cannot b e assumed in general

However b ecause the p ositive deniteness of B p ermits the solution of the

k

quadratic subproblems indep endent of the nearness of the iterate to the

solution the BFGS metho d has b een the fo cus of vigorous research eorts

to adapt it to the general case Several of these eorts have resulted in

implementable algorithms

One scheme whichwe call the PowellSQP metho d is to maintain the

p ositive denite prop erty of the Hessian approximations by mo difying the

up date formula by replacing y in by

y y B s

k

for some With this mo dication the condition can always

b e satised although the up dates no longer satisfy the secant condition Nev

ertheless it has b een shown that a sp ecic choice of leads to a sequence

k

fx g that converges Rsup erlinearly to x provided that the sequencecon

verges Unfortunately no pro of of lo cal covergence has b een found although

Sequential Quadratic Programming

algorithms based on this pro cedure haveproven to b e quite successful in

practice

A second approach is to transform the problem so that H L is p ositive

denite and then to apply the BFGSSQP metho d on the transformed prob

lem It is a common trickindeveloping algorithms for solving equality con

strained optimization problems to replace the ob jective function in NLP

by the function



khxk f x f x

A

for a p ositivevalue of the parameter This change gives a new Lagrangian

function the socalled augmentedLagrangian



kh xk L x u L x u

A

for whichx u is a critical p oint The Hessian of the the augmented

Lagrangian at x u has the form

t

H L x u H L rhx rhx

A

It follows from A that there exists a p ositivevalue such that for

H L x u is p ositive denite If the value of is known then the

A

BFGSSQP algorithm can b e applied directly to the transformed problem

with and by Theorem a sup erlinearly convergent algorithm

results This version of the SQP algorithm has b een extensiv ely studied

and implemented Although adequate for lo cal convergence theory this

augmentedBFGS metho d has ma jor drawbacks that are related to the fact

that it is dicult to cho ose an appropriate value of To apply Theorem

the value of must b e large enough to ensure that H L x u is p ositive

A

denite a condition that requires a priori knowledge of x If unnecessarily

large values of are used without care numerical instabilities can result

This problem is exacerbated by the fact that if the iterates are not close to

the solution appropriate values of may not exist

Quite recentlyanintriguing adaptation of the BFGSSQP algorithm has

b een suggested that shows promise of leading to a resolution of the ab ove

diculties This metho d called the SALSASQP metho d is related to the

augmented Lagrangian SQP metho d but diers lo cally in the sense that a

precise estimate of can b e chosen indep endentofx If

k  k  k k 

y rL x u rL x u

A A

A

then

k  k 

y y rhx hx

A

where y is the vector generated by the ordinary Lagrangian in If

t

y s A

Boggs and Tolle

then the BFGS up dates for the augmented Lagrangian have the hereditary

p ositive deniteness prop erty A minimum value of not directly dep en

k

t

dentonx can b e given that guarantees that y s is suciently p ositive

A

k k 

for given values of x and u in a neighb orho o d of x u The lo cal

version of this algorithm pro ceeds by using the augmented Lagrangian at

each iteration with the required value of whichmay b e zero to force

k

the satisfaction of this condition The B generated by this pro cedure are

k

p ositive denite while H L may not b e hence the standard b ounded dete

rioration prop erty is not applicable As a result a lo cal convergence theorem

is not yet available As in the PowellSQP algorithm an Rsup erlinear rate

k

of convergence of the x to x has b een shown provided that the sequence

is known to converge

Reduce d Hessian SQP Methods

An entirely dierent approach to approximating the Hessian is based on

the fact that assumption A requires H L to b e p ositive denite only on

a particular subspace The reduced Hessian metho ds approximate only the

p ortion of the relevant to this subspace The advantages

of these metho ds are that the standard p ositive denite up dates can b e

used and that the dimension of the problem is reduced to n m p ossibly

a signicant reduction In this section we discuss the lo cal convergence

prop erties of such an approach Several versions of a reduced Hessian typ e of

algorithm have b een prop osed they dier in the ways the multiplier vectors

are chosen and the way the reduced Hessian approximation is up dated in

particular in the form of y that is used see A general outline of

the essential features of the metho d is given b elow

k k

Let x b e an iterate for which rhx has full rank and let Z and Y

k k

k t

b e matrices whose columns form bases for the null space of rhx and

k

Also assume that the columns of Z are range space of rh x resp ectively

k

orthogonal Z and Y could b e obtained for example by a QR factorization

k k

k

of rh x

k k

Denition Let x u b e a given solutionmultiplie r pair and assume

k

that rhx has full rank The matrix

t

k k

Z H Lx u Z

k k

k k

is called a reduced Hessian for the Lagrangian function at x u

The reduced Hessian is not unique its form dep ends up on the choice of basis

k t

for the null space of rhx Since by A a reduced Hessian at the solution

k k

is p ositive denite it follows that if x u is close enough to x u then

k k

the reduced Hessian at x u is p ositive denite Decomp osing the vector

Sequential Quadratic Programming

d as

x

d Z p Y p

x k k

Z Y

it can b e seen that the constraint equation of ECQP b ecomes

t

k k

rhx Y p h x

k

Y

whichby virtue of A can b e solved to obtain

t 

k k

p rh x Y hx

k

Y

The minimization problem ECQP is now an unconstrained problem in

n m variables given by

 t

t k t t

minimize p Z B Z p rf x p B Z p

k k k k k

Z Z Y Z



p

Z

The matrix in this unconstrained problem is an approximation to the re

k

duced Hessian of the Lagrangian at x Rather than up date B and then

k

t

compute Z B Z the reduced matrix itself is up dated Thus at a partic

k k k

ular iteration givenapositive denite n m n m matrix R an

k

k k

iterate x and a multiplier estimate u the new iterate can b e found by

rst computing p from setting

Y



t

k

B p Z rf x p R

k k

Y Z

k

k  k

and setting x x d where d is given by A new multiplier

x x

k 

u is generated and the reduced Hessian approximation is up dated using

the BFGS formula with the R replacing B

k k

t t

k  k

s Z x x Z Z p

k k k

Z

and

t

k k k k

rLx Z p u rLx u y Z

k k

Z

ated by the fact that only the reduced The choices of s and y are motiv

Hessian is b eing approximated

This metho d do es not stipulate a new multiplier iterate directly since

the problem b eing solved at each step is unconstrained However the least

squares solution for the rst order conditions cf can b e used Gen

erally all that is needed is that the multipliers satisfy

k k

O x x u u

k t

Since P Z Z the approximation

k k

t

Z R Z

k k k

can b e thoughtofasanapproximation of

k k

k k

P H Lx u P

Boggs and Tolle

Thus since this metho d do es not approximate

k

P H L

neither the lo cal convergence theorem nor the sup erlinear rate of convergence

theorem Theorems and follow as for full Hessian approximations

Nevertheless the twosided approximation of the Hessian matrix suggests

that the conditions of Theorem may hold In fact if it is assumed that

the matrices Z are chosen in a smo oth way ie so that

k

k

x x kZ Z x k O

k

the assumption of lo cal convergence leads to twostep sup erlinear conver

gence

Theorem Assume that the reduced Hessian algorithm is applied with

k k

u and Z chosen so that and are satised If the sequence fx g

k



converges to x Rlinearly then fR g and fR g are uniformly b ounded and

k

k

k

fx g converges twostep sup erlinearly

While the condition on the multiplier iterates are easy to satisfysome

care is required to ensure that is satised as it has b een observed that

arbitrary choices of Z may b e b e discontinuous and hence invalidate the

k

theorem

Notes and References

The study of Newtons metho d applied to the rst order necessary conditions

for constrained optimization can b e found in Tapia and Go o dman

The equivalence of Newtons metho d applied to the system of rst

order equations and feasibility conditions and the SQP step with the La

grangian Hessian was rst mentioned byTapia

The exp ository pap er Dennis and More is a go o d source for an

intro ductory discussion of b ounded deterioration and of up dating metho ds

satisfying the secant condition in the setting of unconstrained optimization

The pap er by Broyden Dennis and More provides the basic results

on the convergence prop erties for quasiNewton metho ds

Theorems and were rst proven by Boggs Tolle and Wang

The latter theorem was proven under the assumption that the iter

ates converge linearly generalizing the result for unconstrained optimization

given in Dennis and More The assumption was substantially weak

ened by several authors see for example Fontecilla Steihaug and Tapia

Theorem is due to Powell b The pap er by Coleman

contains a go o d general overview of sup erlinear convergence theo

rems for SQP metho ds

The lo cal sup erlinear convergence of the PSBSQP metho d was proven

Sequential Quadratic Programming

by Han in his rst pap er on SQP metho ds This pap er also included

the pro of of sup erlinear covergence for the p ositive denite up dates when the

Hessian of the Lagrangian is p ositive denite Theorem and suggested

the use of the augmenting term in the general case The augmented metho d

was also investigated in Schittkowski and Tapia A description

of the PowellSQP metho d app ears in Powell b while the SalsaSQP

metho d is intro duced in Tapia and develop ed further in Byrd Tapia

and Zhang

The reduced Hessian SQP metho d has b een the sub ject of researchfora

numb er of authors The presentation here follows that of Byrd and No cedal

Other imp ortant articles include Coleman and Conn No

cedal and Overton GabayYuan and Gilb ert

The problems involved in satisfying have b een addressed in Byrd and

Schnab el Coleman and Sorensen and Gill Murray Saunders

Stewart and Wright

Other pap ers of interest on lo cal convergence for SQP metho ds include

Schittkowski Panier and Tits Bertsekas Coleman

and Feynes Glad Fukushima and Fontecilla

Merit Functions and Global Convergence

In the previous section we demonstrated the existence of variants of the SQP

metho d that are rapidly lo cally convergent Here weshowhow the merit

function assures that the iterates even tually get close to a critical p oint

in the next section we p oint out some gaps in the theory that preventa

complete analysis

A merit function is incorp orated into an SQP algorithm for the purp ose

of achieving global convergence As describ ed in Section a line search

pro cedure is used to mo dify the length of the step d so that the step from

x

k k 

x to x reduces the value of This reduction is taken to imply that

acceptable progress towards the solution is b eing made

The standard way to ensure that a reduction in indicates progress is to

construct so that the solutions of NLP are the unconstrained minimizers

of Then it must b e p ossible to decrease by taking a step in the direction

d generated by solving the quadratic subproblem ie d must b e a descent

x x

k

direction for If this is the case then for a suciently small x d

x

k

will b e less than x An appropriate steplength that decreases can

then b e computed for example bya pro cedure of trying

successively smaller values of until a suitable one is obtained

Given that a merit function is found that has these prop erties and that a

pro cedure is used to take a step that decreases the function global conver

gence pro ofs for the resulting SQP algorithm are generally similar to those

found in unconstrained optimization they dep end on proving that the limit

Boggs and Tolle

p oints of the xiterates are critical p oints of These pro ofs rely heavily on

b eing able to guarantee that a sucient decrease in can b e achieved at

each iteration

In unconstrained minimization there is a natural merit function namely

the ob jective function itself In the constrained setting unless the iterates

are always feasible a merit function has to balance the drive to decrease

f with the need to satisfy the constraints This balance is often controlled

by a parameter in that weights a measure of the infeasibili ty against the

value of either the ob jective function or the Lagrangian function In this

section we illustrate these ideas by describing two of the most p opular

merit functions a dierentiable augmented Lagrangian function and the

simpler but nondierentiable p enalty function Wederive some of the



imp ortant prop erties of these functions and discuss some of the advantages

and disadvantagesofeach In addition we provide appropriate rules for

cho osing the steplength parameter so as to obtain the basic convergence

results

To simplify the presentation we restrict our attention at the b eginning to

the problem with only equality constraints Extensions that allow inequality

constraints are discussed at the ends of the sections

As might b e exp ected some additional assumptions are needed if global

convergence is to b e achieved In fact the need for these assumptions raises

the issue as to what exactly is meantby global convergence It is imp ossi

ble to conceive of an algorithm that would conv erge from any starting p oint

for every NLP The very nature of nonlinearity allows the p ossibility for

example of p erfectly satisfactory iterations following a steadily decreasing

function towards achieving feasibility and a minimum at some innite p oint

In order to fo cus attention on the metho ds and not the problem structure

wemake the following assumptions

C The starting p oint and all succeeding iterates lie in some compact set

C

C The columns of rhx are linearly indep endent for all x C

The rst assumption is made in some guise in almost all global conver

gence pro ofs often by making sp ecic assumptions ab out the functions

The second assumption assures that the systems of linearized constraints

are consistent An additional assumption will b e made ab out the matrix

approximations B dep ending on the particular merit function

k

AugmentedLagrangian merit functions

Our rst example of a merit function is the augmented Lagrangian function

There are several versions of this function we use the following version to

Sequential Quadratic Programming

illustrate the class



t

khxk x f x hx ux

F



where is a constant to b e determined and

h i



t t

ux rhx rhx rhx rf x

The multipliers ux dened by are the least squares estimates of the

optimal multipliers based on the rst order necessary conditions and hence

ux u see and Under the assumptions and u are

F

dierentiable and is b ounded b elowonC for suciently large The

F

following formulas will b e useful in the discussion

h i



t

r ux H L x uxrhx rhx rhx

R x



ux r x rf xrh x

F

r u xhx rhxhx

The function R xin is b ounded and satises R x if x

 

satises the rst order necessary conditions

The following theorem establishes that the augmented Lagrangian merit

function has the essential prop erties of a merit function for the equality

constrained nonlinear program

Theorem Assume B B C and C are satised Then for

suciently large the following hold

i x C is a strong lo cal minimum of if and only if x is a strong

F

lo cal minimum of NLP and

ii if x is not a critical p oint of NLP then d is a descent direction

x

for

F

The pro of is rather technical but it do es illustrate the techniques that are

often used in such arguments and it provides a useful intermediate result

namely A key idea is to decomp ose certain vectors according to the

range and null space pro jections P and Q

Proof If x C is feasible and satises the rst order necessary condi

tions then it follows from that r x Converselyif x C

F

r x and is suciently large then it follows from C and C

F

that hx and x satises A for NLP To establish i the rela

tion b etween H x and H L must b e explored for such p oints x It

F

follows from that

t

H x H L Q H L rhx rhx H L Q F

Boggs and Tolle

n

where Q is dened by Nowlet y R y b e arbitrary Then

setting y Q y P y yields

t t t

y H x y P y H L P y Q y H L Q y

F

i h

t t

Q y rh x rhx Q y

Assume that x is a strong lo cal minimum of NLP Since Q y is in the

range space of rhx it follows from A that there exists a constant

such that

i h



t t

Q y kQ y k rh x rhx Q y

Let the maximum eigenvalue of H L in absolute value b e and let

max min

b e the minimum eigenvalue of P H L P whichby A is p ositive Dening

kQ y k

ky k

which implies



kP y k





ky k



and dividing b oth sides of by ky k gives

t

y H x y

F



min max min



ky k

This last expression will b e p ositivefor suciently large which implies

that x is a strong lo cal minimum of Converselyifx isastronglocal

F

minimum of then must b e p ositive for all y This implies that H L

F

t

must b e p ositive denite on the null space of rhx and hence that x is

a strong lo cal minimum of NLP This establishes i

Toshow that the direction d is always a descent direction for the

x F

inner pro duct of b oth sides of is taken with d to yield

x

t t t

k k k k

d rf x d rhx ux d r x

x x x

F

t t

k k k k

d r ux hx d rhx hx

x x

Using it follows that

h i



t t

k k t t k

ux d rhx rhx rhx rh x rf x d rhx

x x

t k

k

d Q rf x

x

k k k

k k t

Writing d Q d P d and noting that hx rh x Q d results

x x x x

in

k t t t t

k

k k k k

d P d ux rhx Q rf x d d r r x

x x x x

F

h i

t t

k k

k k

Q d rhx rhx Q d

x x

Sequential Quadratic Programming

Finally using the rst order necessary conditions for ECQP and the fact

k

k t

that P rhx the following is obtained

t

k k k

k t t

r x P d B P d Q d B d d

x k x x k x x

F

t

k

k k t

ux rhx Q d d r

x x

i h

k k

k k t t

Q d rhx rhx Q d

x x

Now from and assumptions C and C it follows that

t k t

k

k k

d r kd k Q d ux rhx Q d

x x x x 



for some constant Dividing b oth sides of by kd k letting

 x

k

Q d

x

kd k

x

and using B yields

k t

r x d

x

F



  



kd k

x

for constants and The quantity on the left of is then negative

 

and can b e uniformly b ounded away from zero provided is suciently

large thus proving d is a descent direction for

x

It is interesting to observe that the value of necessary to obtain descent

of d dep ends on the eigenvalue b ounds on fB g whereas the value of

x k

necessary to ensure that x is a strong lo cal minimizer of dep ends on the

F

eigenvalues of H L Thus a strategy to adjust to achieve a go o d descent

direction may not b e sucienttoprove that H x is p ositive denite

F

We comment further on this b elow

This line of reasoning can b e continued to obtain a more useful form of

the descent result From and arguments similar to those ab ove

k

kd k r x

 x

F

for some constant Thus from it follows that there exists a



constant such that



k t

r x d

x

F



k

r x kd k

x

F

for all k This inequality ensures that the cosine of the angle between

k

the direction d and the negative gradientof is b ounded away from zero

x

F

ie for all k

k t

r x d

x

F

cos

 k

k

r x kd k

x F

Boggs and Tolle

This condition is sucient to guarantee convergence of the sequence if it

is coupled with suitable line search criteria that imp ose restrictions on the

steplength For example the require a steplength to

satisfy

t

k k k

x d x r x d

x  x

F F F

t t

k k

r x d d r x d

x x  x

F F

where Inequality ensures that there will b e a

 

while guarantees that the steplength will sucient reduction in

F

not b e to o small An imp ortantproperty of these conditions is that if d is

x

a descent direction for then a steplength satisfying can

F

always b e found Furthermore the reduction in for such a step satises

F





k  k k

x x cos r x

 k

F F F

for some p ositive constant Therefore





X





k

cos r x

k

F

k 

and since cos is uniformly b ounded away from it follows that

k

k

lim r x

F

k 

k

This implies that fx g converges to a critical p ointof The following

F

theorem results

Theorem Assume that is chosen such that holds Then the

SQP algorithm with steplength chosen to satisfy the Wolfe conditions

is globally convergent to a critical p ointof

F

If the critical p oints of all corresp ond to lo cal minima of NLP then

F

the algorithm will converge to a lo cal solution This however can rarely

b e guaranteed Only convergence to a critical p oint and not to a lo cal

minimum of is guaranteed It is easy to see from the following example

F

that convergence can o ccur to a lo cal maxim um of NLP Consider

minimize x



x

 

sub ject to x x

 

with a starting approximation of and B I for all k The iterates

k

will converge from the right to the p oint which is a lo cal maximum

of the problem but will decrease appropriately at each iteration The

F

p oint however is a saddle p ointof In practice convergence to a

F

maximizer of NLP rarely o ccurs

Theorems and require that b e suciently large Since it is not

Sequential Quadratic Programming

known in advance how large needs to b e it is necessary to employan

adaptive strategy for adjusting when designing a practical co de Such

adjustment strategies can have a dramatic eect on the p erformance of the

implementation as is discussed in Section

Augmented Lagrangian merit functions have b een extended to handle

inequality constrained problems in several ways Two successful approaches

are describ ed b elow

In the rst an active set strategy is employed ie at eachiterationaset

of the inequality constraints is selected and treated as if they were equality

constraints the remaining inequalities are handled dierently The active

set is selected by using the multipliers from QP In particular

o n

v

qp

i

k

I i g x

k i

With this choice I will contain all unsatised constraints and no safely

k

k

satised ones The merit function at x is dened by



t 

kh xk x u v f x hx u x

qp qp qp

FI



X





g xv x g x

qp i

i i



iI

k

X



v x

qp

i

iI

k

is still dierentiable and as a result of A the correct activesetis

FI

eventually identied in a neighb orho o d of the solution In this formulation

the multipliers are the multipliers from QP not the least squares mul

tipliers Therefore is a function of b oth x and these multipliers and

FI

consequently the analysis for is somewhat more complicated

FI

A second approach uses the idea of squared slackvariables One can

consider the problem

minimize f x

x t

sub ject to hx



g x t

i

i

i p

where t is the vector of slackvariables This problem is equivalent to NLP

in the sense that b oth have a strong lo cal solution at x where in



t g x By writing out the function for the problem it

i F

i

is observed that the merit function only involves the squares of t Thus by

i



and setting letting z t

i

i

hx

h x z

g x z

Boggs and Tolle

we can construct the merit function



t

x z f x hx z ux z hx z

FZ

uxz is the least squares estimate for all of the multipliers Note Here

that is not used to create the quadratic subproblem but rather QP

is solved at each iteration to obtain the step d Theslackvariables can

x

then b e up dated at each iteration in a manner guaranteed to maintain the

nonnegativityofz For example a step

t

k k k

d rg x d g x z

z x

k  k

can b e calculated Then the constraints of QP imply that z z

k

d if z and

z

The Merit Function



One of the rst merit functions to b e intro duced was the exact p enalty



function that in the equality constrained case is given by

x f x khxk





where is a p ositiveconstanttobechosen The prop erties of this func

tion visavis the equalityconstrained havebeenwell

do cumented in the literature For our purp oses it is sucient to note that

like the augmented Lagrangian of the previous section is an exact



p enalty function that is there exists a p ositive such that for all

an unconstrained minimum of corresp onds to a solution of NLP Note



that is not dierentiable on the feasible set If the p enaltytermwere



squared to achieve dierentiability then the exact prop ertywould b e lost

minimizers of would only converge to solutions of NLP as



Although is not dierentiable it do es have a directional derivative



along d It can b e shown that this directional derivative denoted by the

x

op erator D isgiven by

t

k k k

hx x d rf x d D

x x





Substituting the rst order necessary conditions for ECQP into

yields

t t

k k k

D x d d B d d rhx u hx

x x k x x qp





It follows from the linearized constraints of ECQP that

t t

k k

d rhx u h x u

x qp qp

and since

t

k k

h x hx u ku k

qp qp

 

Sequential Quadratic Programming

the inequality

t

k k

h x B d ku k D x d d

k x qp x x





is obtained In order to have d b e a descent direction for and to ob

x



tain a convergence theorem it is sucient to assume the uniform p ositive

deniteness of fB g

k

n

B For all d R there are p ositive constants and such that

 

 

t

kdk kdk d B d

  k

for all k

The assumptions B and B that were made for the augmented Lagrangian

are sucient but wemake the stronger assumption to simplify the presen

tation Using B the rst term in is always negative and thus d is

x

a guaranteed descent direction for if ku k

 qp



Global convergence of an algorithm that uses as a merit function can



b e demonstrated using arguments similar to those for Theorem First

at each iteration the parameter is chosen by

ku k

qp



for some constant Next the rst Wolfe line search condition

is replaced by

k k k

x d x d x D

x x

  

 

where The condition is for technical reasons in practice

 

is usually chosen to b e much smaller Recall that the second of the

Wolfe conditions is to prevent a steplength that is to o small The

assumptions that have b een made guarantee that the use of a backtracking

line search pro cedure pro duces steplengths that are uniformly b ounded away

from zero for all iterations Denoting this lower b ound byitnowfollows

from and that the reduction in at each step satises



i h



k k k

hx x d x kd k

x  x

 



Assumption C and imply that



X



k k

d x h x

x



k 



k k

and therefore d x h x ask Since d if and

x x



k

only if x is a feasible p oint satisfying A the following theorem results

Theorem Assume that is chosen such that holds Then the

Boggs and Tolle



SQP algorithm started at anypoint x with steplength chosen

to satisfy converges to a stationary p ointof



As in the case of convergence to a lo cal minimum of NLP cannot

F

b e guaranteed In fact the same counterexample used in that case applies

here

An advantage of over the augmented Lagrangian is that is easily

 

extended to handle inequalities Since dierentiability is not an issue in this

case the function can simply b e dened as



xf x khxk g x







where

if g x

i

g x

g x if g x

i i

All of the theoretical results continue to hold under this extension

The merit function has b een p opular b ecause of its simplicityit



requires only the evaluation of f h and g to check a prosp ectivepoint

F

on the other hand is exp ensivetoevaluate in that it requires the evaluation

of f h g and their just to check a prosp ective p oint In addition

uxinvolves nontrivial algebra in the large scale case the evaluation of

Implementations that use these merit functions partially circumvent this

dicultyby using an approximation to eg by approximating u bya

F

linear function or bykeeping it xed over the currentstep

Notes and References

The augmented Lagrangian merit function was rst prop osed as an exact

p enalty function byFletcher See also Bertsekas and Boggs

and Tolle It was suggested as a merit function in Powell and Yuan

and in a slightly dierent form in Boggs and Tolle and Boggs

and Tolle See Byrd and No cedal for its application to reduced

Hessian metho ds The Wolfe conditions have b een studied bynumerous

authors we recommend No cedal or Dennis and Schnab el

Schittkowski and Schittkowski develop ed the form for in

equalities given by This form has b een further develop ed and incor

FI

p orated into a highly successful algorithm by Gill Murray Saunders and

Wright Boggs Tolle and Kearsley b prop osed the form given

by

FZ

The exact p enalty function was originally suggested as a merit function



by Han where he obtained the rst global convergence results for SQP

metho ds See Fletcher for a go o d intro duction to nondierentiable

optimization and the use of this function See also Polak and Wolfe

Byrd and No cedal study the merit function in the context 

Sequential Quadratic Programming

of reduced Hessian metho ds The analysis for the merit function applies



to the corresp onding merit function with the only change o curring in

p

where the norm and the norm are replaced bythepnorm and

the q norm with p q

Global to Lo cal Behavior

In the previous two sections wehave examined conditions that imply that

the basic SQP algorithm will converge Section dealt with lo cal conver

gence where the emphasis was on the rates of convergence while Section

was concerned with obtaining convergence from remote starting p oints the

implicit hop e b eing that the two theories would come together to pro duce a

unied theory that would b e applicable to a given algorithm For the global

convergence theories where the pro cess has to b e controlled by a merit func

k

tion it was seen in Section that convergence of fx g to a critical p oint

of is all that can b e demonstrated Assuming that this critical p ointisa

solution of NLP the question that arises is whether or not the conditions

for the lo cal convergence theories are eventually satised by this sequence

If they are then the more rapid rates of lo cal convergence can b e achieved

In this section we discuss three questions concerning this p ossibility will

k

will B the correct active set b e chosen by QP when x is close to x

k

eventually approximate H L in one of the ways stipulated in Section that

yields rapid lo cal convergence and will the merit function allow a steplength

of one if the underlying pro cess can achieve rapid lo cal convergence

Recall that the lo cal convergence theory was proven for equalityconstrained

problems onlyThiswas based on the assumption that the active inequality

k

constraints at the solution of NLP would b e identied by SQP when x

gets close to the solution The question of when the active constraints for

QP will b e the same as those for NLP is resolved by using a p erturbation

theory result as follows Consider NLP with only inequality constraints

minimize f x

x

sub ject to g x

and the quadratic program with the same solution



t t

minimize rf x x x x x B x x



x

t

sub ject to rg x x x g x

where the only restriction on B is that

t

y B y

t

for all y suchthatrg x y It is easily veried that x v isa

strong lo cal minimizer of b oth and This implies that the active

Boggs and Tolle

sets are the same and that strict complementary slackness assumption A

holds for b oth The quadratic programming approximation to is

t



k t

B d d minimize rf x d

k x x x



d

x

k t k

sub ject to rg x d g x

x

k

A standard p erturbation argumentnow asserts that if x is close enough to

x and B is close enough to B then the active set for is the same as

k

the active set for It follows that the active sets for and are

k

the same As a result if x is suciently close to x and B is suciently

k

close to any matrix B satisfying then will identify the correct

active set The condition will b e satised if the B are always p ositive

k

denite or if they satisfy the condition

k k

L P lim P B H

k

k 

This latter condition implies twostep sup erlinear convergence by Theorem

From Section linear sup erlinear or twostep sup erlinear convergence

is assured if B approaches H L in one of the ways hyp othesized in The

k

orems Since the initial matrix is unlikely in practice to b e a go o d

approximation to H L it is reasonable to ask under what conditions the

subsequent B will b e go o d enough approximations to H L for these results

k

of Section to hold If the B satisfy the secant equation then it is

k

reasonable to exp ect that the B will converge to H L provided that the

k

n

directions d span R rep eatedly for example if each set of n directions is

x

uniformly linearly indep endent Some recent research supp orts this exp ec

tation However if p ositive deniteness is maintained it is not p ossible for

the B to converge to H L unless the latter is at least p ositive semidenite

k

So for the general case further analysis is necessary As seen in Section

is equivalentto

P P B H L P s B H L Q s

k k k k

lim

k 

ks k ks k

k k

k  k

where s x x For sup erlinear convergence b oth of these terms

k

n

must tend to zero if the directions s ks k rep eatedly span R Thus for

k k

p ositive denite B sup erlinear convergence is not likely unless the second

k

term go es to zero which is the typ e of tangential convergence mentioned

in Section However two step sup erlinear convergence in which the

rst term of go es to zero is p ossible for p ositive denite up dates and

for reduced Hessian up dates as well provided that convergence o ccurs and

steplengths of one are acceptable Thus it is imp ortant from the p ointof

view of obtaining a rapid lo cal rate of convergence that a steplength of one

b e taken near x

Sequential Quadratic Programming

For the general algorithm however the use of the merit function with

the line search continues throughout the course of the algorithm Thus the

line search for the merit function should allow a steplength of one if the

matrix approximations are such that rapid lo cal convergence is p ossible In

the case of the augmented Lagrangian merit function it can b e shown using

arguments similar to those in Section that

k  k

x  x





F F

  



d

k k

x

k

k k

u x d P B H L x

x k

d

k k

x

O kd k

x

Then if the pro cess is converging sup erlinearly the p enultimate term in

tends to zero by Theorem and the righthandsideof is ultimately

less than zero thus showing that a steplength of one decreases A slight

F

extension to this argumentshows that the Wolfe condition also can

b e satised for a steplength of one

A signicant disadvantage of the merit function is that it maynot



allow a steplength of near the solution no matter how go o d the

approximation of QP to NLP This phenomenon which can prohibit su

p erlinear convergence is called the Maratos eect Several pro cedures have

b een prop osed to overcome this problem including socalled nonmonotone

line searches ie techniques that allow the merit function to increase over

one or more iterations see Section

Notes and References

The p erturbation argument showing conditions under which QP has the

same active set as NLP is derived from the work of Robinson who

proves a general p erturbation result For a discussion of the convergence of

matrix up dates see Boggs and Tolle Ge and Powell and Sto er

The Maratos eect ie the fact that the merit function may not allow



a steplength of one even when the iterates are close to the solution and B is

k

a go o d approximation to H L was discussed by Chamb erlain Lemarechal

Pedersen and Powell These authors suggested the watchdog tech

nique whichisessentially a nonmonotone line search metho d See also

Bonnans et al The fact that the augmented Lagrangian merit

function do es not suer from this problem has b een shown bymany au

thors

SQP Trust Region Metho ds

Trust region algorithms have b ecome a part of the arsenal for solving un constrained optimization problems so it is natural to attempt to extend the

Boggs and Tolle

ideas to solving constrained optimization problems and in particular to

SQP metho ds In this section an outline of the basic ideas of this approach

will b e provided As the sub ject is still in a state of ux no attempt will

b e made to give a comprehensive account of the algorithms that havebeen

prop osed The discussion will b e limited to equalityconstrained problems

the inclusion of inequality constraints into trust region algorithms has b een

the sub ject of little research

A ma jor diculty asso ciated with SQP metho ds arises from trying to en

sure that QP has a solution As was seen in the preceding the requirement

that the Hessian approximations b e p ositive denite on the null space of the

constraints is dicult to guarantee short of requiring the B to b e p ositive

k

denite The trust region metho ds attempt to avoid this dicultyby the

addition of a b ounding constraint Since this added constraint causes the

to b e b ounded the subproblem is guaranteed to haveasolu

tion indep endent of the choice of B provided the feasible region is nonempty

k

To b e precise a straightforward trust region adaptation of the SQP metho d

would generate the iterates by solving the problem

t



k t

d B d minimize rf x d

x k x x



d

x

k t k

sub ject to rhx d h x

x





kS d k

x

k

where S is a p ositive diagonal scaling matrix and is the trust region

k

radius In this discussion we assume the norm is the Euclidean norm but

other norms have b een used in the trust region constraint The trust region

radius is up dated at each iteration based on a comparison of the actual

decrease in a merit function to the decrease predicted by the mo del If

there is go o d agreement the radius is maintained or increased if not it is

decreased

It is worth noting that the necessary condition for d to b e a solution to

x

is that



k k

B S d rhx u rf x

k x

for some multiplier vector u and some nonnegative scalar This equation

is used to generate approximate solutions to

The removal of the p ositive denite requirementonB do es not come

k

without cost The additional constraint is a quadratic inequality constraint

and hence it is not a trivial matter to nd a go o d approximate solution to

Moreover there is the b othersome p ossibility that the solution sets

of the linear constraints and the trust region constraintmay b e disjoint

For this reason research on trust region metho ds has centered on nding

dierenttyp es of subproblems that have feasible solutions but still capture

Sequential Quadratic Programming

the essence of the quadratic subproblem Several avenues of investigation

are summarized b elow

One approachistorelax the linear constraints in suchaway that the

resulting problem is feasible In this case the linear constraintin is

replaced by

t

k k

rhx d hx

x k

where A ma jor diculty with this approach is the problem

k

of cho osing so that together with the trust region constrainthasa

k

solution

A second approach is to replace the equality constraints by a least squares

approximation Then the equality constraints in b ecome the quadratic

constraint



 t

k k

rh x d hx

k x

where is an appropriate value One choice of is the error in the linear

k k

constraints evaluated at the Cauchy p oint The Cauchy p oint is dened

to b e the optimal step in the steep est descent direction for the function



t

k k

rhx d hx

x

ie the step s which minimizes this function in the steep est descent

cp

direction This yields



 t

k k

rhx s h x

k cp

k k t

x Another p ossibility is to take to b e anyvalue of rhx s h

k

for which

ksk

 k  k

where This approach assures that the quadratic sub

 

problem is always feasible but at the cost of some extra computation to

obtain Moreover while the resulting subproblem has only two quadratic

k

constraints nding a go o d approximate solution quickly is a matter that

has not b een completely resolved

Finally in a Reduced Hessian approach to the trust region metho d the

linear constraintin is replaced by

t t

k k

rhx d rhx s

x k

where s is the solution of

k



k t k

rhx s hx minimize

s

sub ject to ksk k

Boggs and Tolle

for Using the decomp osition from Section

d Z p Y p

x k k

Z Y

k k t

where the columns of Z x are a basis for the null space of rhx and

k

the columns of Y are a basis for the range space of rh x it follows from

k

that

Y p s

k k

Y

As in Section the null space comp onent p cannowbeseentobethe

Z

solution of the quadratic problem

h i

t

t 

t k

B Z p p Z minimize f x B s p

k k k k k

Z Z Z



p

Z

 



sub ject to kp k ks k

k k

Z

A great deal of research has b een done on the sub ject of minimizing a

quadratic function sub ject to a trust region constraint so that quickand

accurate metho ds for approximating the solutions to these two problems are

available

Much of the work to b e done in transforming these approaches into al

gorithms for which lo cal and global convergence theorems can b e proven is

similar in nature to that whichmust b e done for standard SQP metho ds

In particular a metho d for up dating the matrices B needs to b e sp ecied

k

a wider class of up dating schemes is nowavailable at least in theorybe

cause there is no necessity of requiring them to have the p ositive deniteness

prop erties and a merit function has to b e sp ecied As was shown in Sec

tion lo cal sup erlinear convergence dep ends on the steps approaching the

NewtonSQP steps as the solution is neared In particular this requires that

the mo died constraints reduce to the linearized constraint of ECQP and

that the trust region constraint not b e active so that steplengths of one can

b e accepted Conditions under which these events will o ccur havenotbeen

established Global convergence theorems have b een proven for most of the

ab ovevariations of the trust region metho d by utilizing either the or the



augmented Lagrangian merit function under the assumptions C and C

Notes and References

An intro ductory exp osition of trust region metho ds for unconstrained opti

mization can b e found in the b o ok by Dennis and Schnab el Metho ds

for minimizing a quadratic function sub ject to a trust region constraintcan

b e found in Gay and More and Sorensen

The rst application of trust region metho ds to the constrained problem

app ears to b e that of Vardi who used the constraint in place

of the linearized equality constraints This approachwas also used by Byrd

Sequential Quadratic Programming

Schnab el and Schultz The intro duction of the quadratic constraint

as a substitute for the linear constraint is due to Celis Dennis and

Tapia who used the error at the Cauchy p oint as the A global

k

convergence theory for this strategy is given in ElAlem Powell and

Yuan suggested the second version of this approachmentioned in

the text Yuan prop osed solution techniques for the resulting sub

problem The reduced Hessian approach has b een intro duced by Omo jokun

who also considers the case when inequality constraints are present

This approach has also b een used by Lalee No cedal and Plantega

for largescale problems

Practical Considerations

Our goal throughout this survey has b een to concentrate on those theoretical

prop erties that b ear on actual implementations but wehavementioned

some of the diculties that arise in the implementation of an SQP algorithm

when the assumptions that wehave made are not satised In this section

we elab orate on a few of the more imp ortant of these diculties and suggest

some common computational techniques that can b e used to work around

them No attempt is made to b e complete the ob ject is to give an indication

of the issues involved We also discuss briey the extension of the SQP

algorithm to the large scale case where sp ecial considerations are necessary

to create an ecient algorithm

The Solution of QP

The assumptions b ehind the theory in Sections and imply that QP

always has a solution In practice of course this may not b e the case

QP may b e infeasible or the ob jective function maybeunb ounded on the

feasible set and have no lo cal minima As stated earlier surveying metho ds

e discuss some ways for solving QP is b eyond the scop e of this pap er but w

of continuing the computations when these diculties arise

One technique to avoid infeasibiliti es is to relax the constraints by using

the trust region metho ds in Section Another approachistotake d to b e

x

some convenient direction when QP is infeasible eg the steep est descent

direction for the merit function The infeasibility of QP is usually detected

during a Phase I pro cedure that is used to obtain a feasible p oint with

which to b egin the quadratic programming algorithm If the constraints are

inconsistent then there are no feasible p oints and the Phase I pro cedure

will fail However the direction d obtained in this case can sometimes b e

x

used to pro duce directions that reduce the constraint infeasibili ties and can

k

thus b e used to improve x For example the phase I pro cedure known as

the big M metho d mo dies QP by adding one new variable say in

Boggs and Tolle

suchaway that the new problem

t



k t

minimize rf x d d B d M

x x k x



d

x

k t k

sub ject to rhx d hx

x

k t k

rg x d g x e

x

is feasible In this form M is a constant and e is the vector of ones For

 k k 

maxfg x g x g the initial p ointd is feasible

x

i i

for The constant M is chosen large enough so that as is b eing

ed is forced to b e reduced Once d is a feasible p ointfor solv

x

QPIf cannot b e reduced to zero then the inequalities are inconsistent

Nevertheless it can b e shown under certain conditions that the resulting d

x

is a descent direction for the merit function

F

If the problem is unb ounded but feasible then the diculty is due to

the structure of the Hessian approximation B For example if B is not

k k

p ositive denite then a multiple of the identity or nonnegative diagonal

matrix can b e added to B to ensure that it is p ositive denite Note that

k

adding a strictly p ositive diagonal matrix to B is equivalent to using a trust

k

region constraint see

Finally in cases where QP do es not yield approximate multipliers for

k

nonlinear program such as when QP is infeasible or the matrix Gx has

linearly dep endent columns any reasonable approximation of the multipliers

will usually suce The most common approach in these cases is to use a

least squares multiplier approximation If NLP do es not satisfy A then

at b est the theoretical convergence rate will b e slowed if x is a simple

degeneracy and at worst even the Newton metho d may fail

Adjusting the Merit Function Parameter

It w as shown in Section that setting the parameter to ensure that the

direction d is a descent direction for the merit function given by

x



can b e relatively straightforward Setting the parameter for is only a

F

little more dicult In either case adjusting the parameter can cause b oth

theoretical and computational diculties In theory there is no problem

with having the parameter large Indeed if only increases in the parameter

are allowed the assumptions of Section coupled with a reasonable ad

justment strategy will lead to a provably nite value of the parameter and

convergence can b e proved using the techniques of Section Computation

allyhowever having a large value of the parameter implies that there will

be too much emphasis on satisfying the constraints The result will b e that

the iterates will b e forced to follow the constraint b oundary closely which in

highly nonlinear problems can cause the algorithm to b e slow A strategy

for only allowing increases in the parameter can lead to an excessively large

value due entirely to an early iterates b eing far from the solution

Sequential Quadratic Programming

Ideally the parameter should b e adjusted up or down at various stages

of the iteration pro cess to ensure b oth go o d theoretical and computational

p erformance Some strategies for this have b een prop osed For example it

is p ossible to allow controlled decreases in the parameter that ensure that

the predicted decrease in the merit function is not dominated by a decrease

in the constraint infeasibilities For suchchoices it is still p ossible to prove

convergence

Nonmonotone Decrease of the Merit Function

Global convergence results are usually proved by insisting that an appro

priate merit function b e suciently reduced at each iteration Sometimes

however it is more ecient computationally to b e less conservativeand

to allow steps to b e accepted even if the merit function is temp orarily in

creased If for example the merit function is forced to b e reduced over

any xed numb er of iterations then convergence follows In practice such

strategies have b een quite successful esp ecially near the solution As a par

ticular example note that the merit function may not allow a steplength



of one near the solution One remedy for this is to accept the full step tem

p orarily even if increases Then if is not suciently reduced after a

 

small numb er of steps the last go o d iterate is restored and reduction in



is required for the next iteration

As a second example it was noted in Section that the use of requires

F

the evaluation of f h g and their gradients just to test a prosp ectiv epoint

for acceptance This diculty can b e circumvented by using an approxi

mate or working merit function that only requires evaluation of f h and

g and no gradientstotestapoint For example could b e approximated

F

in the equalityconstrained case by



k t



k

khxk ux x f x hx

F



k th

ux stays xed throughout the k iteration This is then coupled where

with a strategy to monitor the iterations to ensure that is suciently

F

reduced after a certain numb er of iterations as discussed ab ove

Large Scale Problems

Ecient SQP algorithms in the large scale case dep end on carefully address

ing many factors In this section wemention two of these namely problem

structure and the solution of large scale quadratic programs

Problems are considered large if to solve them eciently either their

structure must b e exploited or the storage and manipulation of the matrices

involved must b e handled in a sp ecial manner The most obvious structure

and the one most commonly considered is sparsity of the matrices rh rg

and H L Typically in large scale problems most constraints dep end only on

Boggs and Tolle

a few of the variables and the ob jective function is partially separable a

partially separable function is for example made up of a sum of functions

each of which dep ends only on a few of the variables In such cases the

matrices are sparse In other cases one or more of the matrices is of low

rank and this structure can also b e exploited

Akey to using SQP in the large scale case is to b e able to solve or

to solve approximately the large quadratic programming subproblems In

the small scale case it rarely matters how QP is solved and it usually

makes sense to solve it completely Dep ending on the algorithm used in the

large scale case it is often essential to realize that QP at iteration k

diers only slightly from that at iteration k In these cases the activeset

may remain the same or change only slightly and it follows that solving

approximately the next QP can b e accomplished quickly It must b e

shown however that approximate solutions of QP are descent directions

for an appropriate merit function

Recentlyecientinterior p oint metho ds have b een develop ed for solving

large scale linear programs these ideas are now b eing applied to large scale

quadratic programs One such metho d is the subspace metho d where at

each iteration a lowdimensional subspace is chosen and QP restricted to

that subspace is solved A step in the resulting direction is calculated and

the pro cedure iterated It has b een shown that anynumb er of iterations of

this technique gives rise to a descent direction for an augmented Lagrangian

merit function an SQP algorithm based on this has shown promise

Notes and References

See Fletcher for an intro duction to the solution of quadratic programs

using classical techniques including the Big M metho d Recently there

e b een numerous pap ers on the application of interiorp oint metho ds hav

to quadratic programming problems This is still the sub ject of intense

research and there are no go o d survey pap ers on the sub ject See Vanderb ei

and Carp enter and Boggs Domich Rogers and Witzgall aor

Boggs Domich Rogers and Witzgall a for a discussion of two dierent

approaches

Adjusting the p enalty parameter in the augmented Lagrangian merit func

tion is discussed in Schittkowski Schittkowski Powell and

Yuan Byrd and No cedal Boggs Tolle and Kearsley b

Byrd and No cedal and ElAlem The idea of allowing the

merit function to increase temp orarily is an old one It is sometimes re

ferred to as a nonmonotone line search for a general discussion see Gripp o

Lampariello and Lucidi Its use with the merit function is dis



cussed in Chamb erlain et al Using approximate merit functions is

Sequential Quadratic Programming

suggested in Powell and Yuan Boggs and Tolle and Boggs

et al b

Based on the work to date there is signicant promise for SQP in the large

scale case but more work is required to compare SQP with other large scale

techniques For work to date see Murray Murray and Prieto to

app ear and Boggs et al b

Acknowledgments The authors would like to thank Ubaldo Garcia

Palomares Anthony Kearsley Stephen Nash Dianne OLearyandGW

Stewart for reading and commenting on early drafts of the pap er

Boggs and Tolle

REFERENCES

D P Bertsekas Variable metric metho ds for constrained optimization based

on dierentiable exact p enalty functions in Proceedings of the Eighteenth

Al lerton Conference on Communications Control and Computing Allerton

Park Illinois pp

D P Bertsekas Constrained Optimization and Lagrange Multipliers Methods

Academic Press London

P T Boggs and J W Tolle Augmented Lagrangians which are quadratic in

the multiplier Journal of Optimization Theory and Applications 

PTBoggsandJWTolle A family of descent functions for constrained

optimization SIAM Journal on Numerical Analysis 

PTBoggsandJWTolle A strategy for global convergence in a sequen

tial quadratic programming algorithm SIAM Journal on Numerical Analysis



P T Boggs and J W Tolle Convergence prop erties of a class of ranktwo

up dates SIAM Journal on Optimization 

P T Boggs PDDomich J E Rogers and C Witzgall a An interior

p oint metho d for linear and quadratic programming problems Mathematical

Programming Society CommitteeonAlgorithms Newsletter

T Boggs P D Domich J E Rogers and C Witzgall a An interiorp oint P

metho d for general large scale quadratic programming problems Internal Re

p ort in pro cess National Institute of Standards and Technology

PTBoggsJWTolle and A J Kearsley b A merit function for inequality

constrained nonlinear programming problems Internal Rep ort National

Institute of Standards and Technology

P T Boggs J W Tolle and A J Kearsley b A practical algorithm for

general large scale nonlinear optimization problems Internal rep ort National

Institute of Standards and Technology

PTBoggsJWTolle and PWang On the lo cal convergence of quasi

Newton metho ds for constrained optimization SIAM Journal on Control and

Optimization 

J F Bonnans E R Panier A L Tits and J L Zhou Avoiding the

Maratos eect by means of a nonmonotone line search I I Inequalitycon

strained problemsfeasible iterates SIAM Journal on Numerical Analysis



C G Broyden J E Dennis Jr and J J More On the lo cal and sup erlinear

convergence of quasiNewton metho ds Journal of the Institute of Mathemat

ical Applications 

R H Byrd and J No cedal An analysis of reduced Hessian metho ds for

constrained optimization Mathematical Programming 

R H Byrd and R B Schnab el Continuityofthenull space basis and

constrained optimization Mathematical Programming 

R H Byrd R B Schnab el and G A Schultz A trustregion strategy for

nonlinearly constrained optimization SIAM Journal on Numerical Analysis



Sequential Quadratic Programming

R H Byrd R A Tapia and Y Zhang An SQP augmented Lagrangian

BFGS algorithm for constrained optimization SIAM Journal on Optimiza

tion 

M R Celis J E Dennis Jr and R A Tapia A trustregion strategy for

nonlinear equality constrained optimization in Numerical Optimization

P Boggs R Byrd and R Schnab el eds SIAM Philadelphia pp

R Chamb erlain C Lemarechal H C Pedersen and M J D Powell The

watchdog technique for forcing convergence in algorithms for constrained op

timization Mathematical Programming Study 

T F Coleman On characterizations of sup erlinear convergence for con

strained optimization in Lectures in Applied Mathematics American Mathe

matical So ciety Providence Rho de Island pp

T F Coleman and A R Conn On the lo cal convergence of a quasiNewton

metho d for the nonlinear programming problem SIAM Journal on Numerical

Analysis 

T F Coleman and PAFeynes Partitioned quasiNewton metho ds for non

linear equality constrained optimization Mathematical Programming 

T F Coleman and D C Sorensen A note on the computation of an or

thonormal basis for the null space of a matrix Mathematical Programming



J E Dennis Jr and J J More QuasiNewton metho ds motivation and

theory SIAM Review 

J E Dennis Jr and R B Schnab el Numerical Methods for Unconstrained

Optimization and Nonlinear Equations PrenticeHall Englewo o d Clis New

Jersey

M M ElAlem A global convergence theory for the CelisDennisTapia trust

region algorithm for constrained optimization SIAM Journal on Numerical

Analysis 

M M ElAlem A robust trust region algorithm with nonmonotonic p enalty

parameter scheme for constrained optimization Department of Mathematical

Sciences Rice University

R Fletcher A class of metho ds for nonlinear programming iii Rates of con

vergence in Numerical Methods for Nonlinear Optimization F A Lo otsma

ed Academic Press New York pp

R Fletcher Practical Methods of Optimization Volume John Wiley

Sons New York

R Fontecilla Lo cal convergence of secant metho ds for nonlinear constrained

optimization SIAM Journal on Numerical Analysis 

R Fontecilla T Steihaug and R A Tapia A convergence theory for a class

of quasiNewton metho ds for constrained optimization SIAM Journal on

Numerical Analysis 

M Fukushima A successive quadratic programming algorithm with global

and sup erlinear convergence prop erties Mathematical Programming 

Boggs and Tolle

D Gabay Reduced quasiNewton metho ds with feasibilityimprovement

for nonlinearly constrained optimization Mathematical Programming Study



U M GarciaPalomares and O L Mangasarian Sup erlinearly convergent

quasiNewton metho ds for nonlinearly constrained optimization problems

Mathematical Programming 

D M Gay Computing optimal lo cally constrained steps SIAM Journal on

Scientic and Statistical Computing 

R Ge and M J D Powell The convergence of variable metric matrices in

unconstrained optimization Mathematical Programming 

J C Gilb ert Sup erlinear convergence of a reduced BFGS metho d with piece

wise linesearch and up date criterion Rapp ort de Recherche Institut

National de Recherche en Informatique et en Automatique

P E Gill W Murray and M H WrightPractical Optimization Academic

Press New York

P E Gill W Murray M A Saunders and M H Wright Users guide for

NPSOL version A Fortran package for nonlinear programming Techni

cal Rep ort Department of Op erations Research Stanford University

P E Gill W Murray M A Saunders G W StewartandMHWright

Prop erties of a represen tation of a basis for the null space Mathematical

Programming

S T Glad Prop erties of up dating metho ds for the multipliers in augmented

Lagrangians Journal of Optimization Theory and Applications 

J Go o dman Newtons metho d for constrained optimization Mathematical

Programming

L Gripp o F Lampariello and S Lucidi A nonmonotone line searchtech

nique for Newtons metho d SIAM Journal on Numerical Analysis 

SP Han Sup erlinearly convergentvariable metric algorithms for general

nonlinear programming problems Mathematical Programming 

SP Han A globally convergent metho d for nonlinear programming Jour

nal of Optimization Theory and Applications 

M Lalee J No cedal and T Plantega On the implementation of an algorithm

for largescale equality optimization Department of Electrical Engineering

and Computer Science NAM Northwestern University

D G Luenb erger Linear and Nonlinear Programming nd Edition Addison

Wesley Reading Massachusetts

J More and D C Sorensen Computing a trust region step SIAM Journal

of Scientic and Statistical Computing 

y Algorithms for large nonlinear problems in Mathematical Pro W Murra

gramming State of the Art J R Birge and K G Murty eds Univer

sityofMichigan Ann Arb or MI pp

W Murray and F J Prieto to app ear A sequential quadratic programming

algorithm using an incomplete solution of the subproblem SIAM Journal on

Optimization

S G Nash and A Sofer Linear and Nonlinear Programming McGrawHill

New York

Sequential Quadratic Programming

J No cedal Theory of algorithms for unconstrained optimization Acta Nu

merica 

J No cedal and M L Overton Pro jected Hessian up dating algorithms for

nonlinearly constrained optimization problems SIAM Journal on Numerical

Analysis 

E M Omo jokun Trust Region Algorithms for Optimization with Nonlinear

Equality and Inequality Constraints PhD thesis University of Colorado

J M Ortega and W C Rheinb oldt Iterative Solution of Nonlinear Equations

In Several Variables Academic Press New York

E R Panier and A L Tits On combining feasibility descent and sup erlinear

convergence in inequality constrained optimization Mathematical Program

ming 

E Polak Basics of minimax algorithms in Nonsmooth Optimization and

RelatedTopics FHClarkVFDemyanov and F Giannessi eds Plenum

New York pp

M J D Powell A fast algorithm for nonlinearly constrained optimization cal

culations in Numerical Analysis Dundee G A Watson ed Springer

Verlag Berlin pp

M J D Powell a Algorithms for nonlinear constraints that use Lagrangian

functions Mathematical Programming 

M J D P owell b The convergence of variable metric metho ds for nonlinearly

constrained optimization calculations in Nonlinear Programming O Man

gasarian R Meyer and S Robinson eds Academic Press New York NY

pp

M J D Powell and Y Yuan A recursive quadratic programming algorithm

that uses dierentiable exact p enalty functions Mathematical Programming



S M Robinson Perturb ed KuhnTucker p oints and rates of convergence

for a class of nonlinearprogramming algorithms Mathematical Programming

K Schittkowski The nonlinear programming metho d of Wilson Han and

Powell with an augemented Lagrangian typ e line search function part

Convergence analysis Numerische Mathematik

K Schittkowski On the convergence of a sequential quadratic programming

metho d with an augmented Lagrangian line search function Math Opera

tionsforsch U Statist Ser Optimization 

J Sto er The convergence of matrices generated by rank metho ds from the

restricted class of Broyden Numerische Mathematik 

R A Tapia A stable approach to Newtons metho d for optimization prob

ts Journal of Optimization Theory and Applica lems with equality constrain

tions 

R A Tapia Diagonalized multiplier metho ds and quasiNewton metho ds for

constrained optimization Journal of Optimization Theory and Applications



R A Tapia QuasiNewton metho ds for equality constrained optimization

Equivalence of existing metho ds and a new implementation in Nonlinear

Boggs and Tolle

Programming O Mangasarian R Meyer and S Robinson eds Academic

Press New York NY pp

R A Tapia On secant up dates for use in general constrained optimization

Mathematics of Computation 

R J Vanderb ei and T J Carp enter Symmetric indenite systems for inte

rior p oint metho ds Mathematical Programming 

A Vardi A trustregion algorithm for equality constrained minimization and

implementation SIAM Journal on Numerical Analysis 

R B Wilson A Simplicial Algorithm for Concave Programming PhD thesis

Harvard University

PWolfe A metho d of conjugate subgradients for minimizing nondierentiable

functions in Mathematical Programming Study M Balinski and PWolfe

eds North Holland pp

Y Yuan An only step qsup erlinear convergence example for some algo

rithms that use reduced Hessian approximations Mathematical Programming



Y Yuan On a subproblem of trust region algorithms for constrained opti

mization Mathematical Programming