<<

The and Noisy



Optimization Problems

D M Bortz

C T Kelley

North Carolina State University Department of Mathematics

Center for Research in Scientic Computation

Box Raleigh N C

Many classes of metho ds for noisy optimization problems are based on func

tion information computed on sequences of simplices The NelderMead

multidirectional search and implicit ltering metho ds are three such meth

o ds The p erformance of these metho ds can b e explained in terms of the

dierence approximation of the gradient implicit in the function evalua

tions Insight can b e gained into choice of termination criteria detection

of failure and design of new metho ds

Intro duction

Noisy nonsmo oth and discontinuous optimization problems

arise in many elds of science and engineering A few of these

are semiconductor mo deling and manufacturing

design and calibration of instruments design of wire

less systems and automotive engineering

In this pap er we consider ob jective functions that are p er

turbations of simple smo oth functions The surface in on the

left in Figure taken from and the graph on the right

illustrate this typ e of problem

The p erturbations may b e results of discontinuities or nons

mo oth eects in the underlying mo dels randomness in the func

tion evaluation or exp erimental or measurement errors Con

ventional gradientbased metho ds will b e trapp ed in lo cal min

ima even if the noise is smo oth



This research was partially supp orted by National Science Foundation

grant DMS

D M Bortz C T Kelley

Figure Optimization Landscap es

4.5

4

20 3.5

0 3

-20 2.5

-40 2

1.5 -60 25 20 1 -80 0 15 5 10 0.5 10 15 5 20 0 0

25 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Many classes of metho ds for noisy optimization problems

are based on function information computed on sequences of

simplices The NelderMead multidirectional search

and implicit ltering metho ds are three examples

The p erformance of such metho ds can b e explained in terms of

the dierence approximation of the gradient that is implicit in

the function evaluations they p erform

In this pap er we show how use of that gradient information

can unify extend and simplify the analysis of these metho ds in

the context of this imp ortant class of problems

We b egin by recalling the simplex gradient from the rst

order estimates it satises and its application to the Nelder

Mead metho d In x we show how this idea can b e directly

applied to the multidirectional search and implicit ltering al

gorithms in a way that allows for aggressive attempts to improve

the p erformance andor exploit parallelism

The we discuss in this pap er all examine a simplex

N

of p oints in R at each iteration and the change the simplex in

resp onse We consider problems where the ob jective f that is

sampled is a p erturbation of a smo oth function f by a small

s

function

f x f x x

s

The small oscillations could cause f to have several lo cal min

ima that would trap any conventional gradientbased algorithms

The p erturbation can b e random and therefore need not

The Simplex Gradient and Noisy Optimization Problems

1

even b e a function We take L only to make the analysis

simpler The ideas in this section were originally used in

to analyze the NelderMead and we will restate

those results at the end of this section

N

Denition A simplex S in R is the convex hul l of N

N +1

x is the j th vertex of S We let V or V S points fx g

j j

j =1

denote the N N matrix of simplex directions

V S x x x x x x v v

2 1 3 1 N +1 1 1 N

We say S is nonsingular if V is nonsingular The simplex

diameter diamS is

diamS max kx x k

i j

1ij N +1

2

We wil l refer to the l condition number V of V as the sim

plex condition

We let f S denote the vector of ob jective function dif

ferences

T

f S f x f x f x f x f x f x

2 1 3 1 N +1 1

We will not use the simplex diameter directly in our estimates

or algorithms Rather we will use two oriented lengths

V max kx x k and V min kx x k

+ 1 j 1 j

2j N +1 2j N +1

Clearly

S diamS S

+ +

Denition Let S be a nonsingular simplex with vertices

N

fx g

j

j =1

The simplex gradient D f S is

T

D f S V f S

D M Bortz C T Kelley

Note that the matrix of simplex directions and the vector of

ob jective function dierences dep end on which of the vertices is

lab eled x Each of the algorithms we consider in this section

1

uses a vertex ordering and hence at least implicitly maintains

a simplex gradient

This denition of simplex gradient is motivated by the rst

order estimate

Lemma Let S be a simplex Let rf be Lipschitz continuous

in a neighborhood of S with Lipschitz constant K Then

krf x D f S k K V S

1 +

Search algorithms are not intended of course for smo oth

problems Minimization of ob jective functions of the form in

are one of the applications of these metho ds Lemma is a

rst order estimate that takes p erturbations into account

We will need to measure the p erturbations on each simplex

To that end we dene for a set T

kk esssup kxk

T

x2T

The analog of Lemma for ob jective functions that satisfy

is

Lemma Let S be a nonsingular simplex Let f satisfy

and let rf be continuously dierentiable in a neighborhood of

s

S Then there is K such that

kk

S

krf x D f S k K V S

s 1 +

S

+

In these ideas were applied to the NelderMead algorithm

with a view toward detecting stagnation in the iteration The

NelderMead algorithm uses a simplex S of approximations to

N +1

an optimal p oint In this algorithm the vertices fx g are

j

j =1

sorted according to the ob jective function values

f x f x f x

1 2 N +1

The Simplex Gradient and Noisy Optimization Problems

x is called the b est vertex and x the worst The sp ecic

1 N +1

nature of the sort and tiebreaking rules have no eect on the

p erformance of the algorithm

The algorithm attempts to replace the worst vertex x

N +1

with a new p oint of the form

x x x

N +1

N

where x is the centroid of the convex hull of fx g

i

i=1

N

X

x x

i

N

i=1

The value of is selected from a sequence

ic oc r e

by rules that we formally describ e in Algorithm nelder Our

formulation of the algorithm allows for termination if either

f x f x is suciently small or a usersp ecied numb er

N +1 1

of function evaluations has b een exp ended

Formally the algorithm is

Algorithm nelderS f k max

Evaluate f at the vertices of S and sort the vertices of S

so that holds

Set f count N

While f x f x

N +1 1

x and f f x f count f count a Compute

r r

b Reect If f count k max then exit If f x

1

f f x replace x with x and go to to

r N N +1 r

step g

D M Bortz C T Kelley

c Expand If f count k max then exit If f f x

r 1

then compute f f x f count f count

e e

If f f replace x with x otherwise replace

e r N +1 e

x with x Go to to step g

N +1 r

d Outside Contraction If f count k max then exit

If f x f f x compute f f x

N r N +1 c oc

f count f count

If f f replace x with x and go to step g

c r N +1 oc

otherwise go to step f

e Inside Contraction If f count k max then exit

If f f x compute f f x f count

r N +1 c ic

f count If f f x replace x with x

c N +1 N +1 ic

and go to step g otherwise go to step f

f Shrink If f count k max N exit For i

N set x x x x compute f x

i 1 i 1 i

g Sort Sort the vertices of S so that holds

A typical sequence of candidate values for is

f g f g

r e oc ic

Figure is an illustration of the options in two dimensions

The vertices lab eled x x and x are those of the original

ordered simplex

Figure illustrates b oth the b enets and disadvantages of

the NelderMead algorithm Unlike the other algorithms we

consider in this pap er the simplex shap e is free to adapt to

the optimization landscap e However the price for that adapt

ability is that the simplex can b ecome highly illconditioned

The results from which we now state must assume that

the conditioning of the simplices remains under control in order

to guarantee convergence

Another dierence is that there is no distinguished vertex in

the simplex whose function value is reduced Unless a shrink

The Simplex Gradient and Noisy Optimization Problems

Figure NelderMead Simplex and New Points x1 x3

ic

oc

r x2

e

step o ccurs the NelderMead iteration reduces the average

N +1

X

f x f

j

N

j =1

b ecause the worst vertex is replaced by one with a lower function

value We will assume that shrink steps which are rare do not

o ccur

The two theorems b elow are from Both theorems re

quire an analog of the sucient decrease condition from

gradientbased optimization and an assumption on the condi

tioning of the simplices Failure of these conditions which can

happ en in the NelderMead algorithm is an indicator of stagnation

D M Bortz C T Kelley

Theorem Assume that the NelderMead simplices are such

k k

that V V S is nonsingular and that

k +1 k

k 2

f kD f S k f

holds for some and al l but nitely many k Let the as

k

sumptions of Lemma hold with the Lipschitz constants K

k k

uniformly bounded Then if the product S V then

+

any accumulation point of the simplices is a critical point of f

Theorem makes an assumption similar to one made in

that the noise decays to zero as the minimum is approached

Theorem Assume that the NelderMead simplices are such

k

that V is nonsingular and let the assumptions of Lemma hold

k

with the Lipschitz constants K uniformly bounded Then if

s

holds for al l but nitely many k and that

kk k

S

k k

lim V S

+

k

k !1

S

+

then any accumulation point of the simplices is a critical point

of f

s

Convergence Results

Implicit Filtering

Implicit ltering is a dierencegradient implementation of the

gradient pro jection algorithm in which the dierence incre

ment is reduced in size as the iteration progresses In this way

the simplex gradient is used directly It was originally prop osed

in for various problems in semiconductor mo del

ing and analyzed in

In this pap er we fo cus on the unconstrained form see also

This is sucient to show how the simplex gradient can

b e used and to show how sup erlinear convergence might b e p os

sible Implicit ltering is a p ointbased algorithm and unlike

The Simplex Gradient and Noisy Optimization Problems

the simplexbased algorithms do es distinguish the b est p oint

on a simplex Rather the current iterate x is the p oint from

c

which a simplex is build to compute a dierence gradient The

new iterate x is computed using a which may fail

+

even for smo oth problems b ecause the forward dierence gra

N

dient may not b e a descent direction For a given x R and

h we let the simplex S x h b e the right simplex from x

with edges having length h Hence the vertices are x and x hv

i

for i N with V I So V

The forward dierence gradient is of course

r f x D f S x h

h

While a centered dierence can b e b etter in practice

a forward dierence will illustrate the idea and we use that

in this pap er We use a simple Armijo line search and demand

that the sucient decrease condition

2

f x r f x f x kr f xk

h h

hold compare to for some Our forward dierence

steep est descent algorithm fdsteep terminates when

kr f xk h

h

for some when more than k max iterations have b een

taken or when the line search fails by taking more than amax

backtracks Even the failures of fdsteep can b e used to advan

tage by triggering a reduction in h The line search parameters

and the parameter in the termination criterion do not

aect the convergence analysis that we present here but can

aect p erformance

Algorithm fdsteepx f k max h amax

For k k max

a Compute f and r f terminate if holds h

D M Bortz C T Kelley

b Find the least integer m amax such that

m

holds for If no such m exists terminate

c x x r f x

h

Algorithm fdsteep will terminate after nitely many itera

tions b ecause of the limits on the numb er of iterations and the

numb er of backtracks If the set fx j f x f x g is b ounded

0

then the iterations will remain in that set Implicit ltering

calls fdsteep rep eatedly reducing h after each termination of

fdsteep Aside from the data needed by fdsteep a sequence of

dierence increments called scales in

1

and fh g is needed for the form of the algorithm

k

k =0

given here

Algorithm imfilterx f k max fh g amax

j

For k

Cal l fdsteepx f k max h amax

k

k k

Since h S and V the rst order estimate

k +

implies a convergence result that is dierent from the one in

Theorem Let h and let f satisfy Let fx g be the

k k

k

implicit ltering sequence and let S S x h Assume that

k

holds i e there is no line search failure for al l but nitely

many k Then if

1

lim h h kk k

k

S

k

k !1

then any limit point of the sequence fx g is a critical point of

k

f

s

Proof If holds for all but nitely many k then as is

standard

k

r f x D f S

h k k

The Simplex Gradient and Noisy Optimization Problems

Hence using and Lemma

rf x

s k

as asserted

Because implicit ltering directly maintains an approximate

gradient and uses that to compute a descent direction it is natu

ral to try a quasiNewton Hessian Successful exp eriments with

SR up date have b een rep orted in and

Multidirectional Search

A natural way to address the p ossible illconditioning in the

NelderMead algorithm is to require that the condition numb ers

of the simplices b e b ounded The most direct way to do that

is to insist that the simplices have the same shap e The mul

tidirectional search metho d do es this by making each

new simplex congruent to the previous one In the sp ecial case

k 0

of equilateral simplices V is a constant multiple of V and the

simplex condition numb er is constant If the simplices are not

equilateral then V may vary dep ending on which vertex is

called x but we will have for some and

1 +

T T 2 2

V and x V V x V kxk for all x

+ +

The algorithm is b est understo o d by consideration of Fig

ure which illustrates the twodimensional case for two typ es

c

of simplices Beginning with the ordered simplex S with ver

tices x x x one rst attempts a rotation step leading to a

1 2 3

r

simplex S with vertices x r r

1 1 2

r

If the b est function value of the vertices of S is b etter than

0 r

the b est f x in S S is provisionally accepted and and

1

expansion is attempted The expansion step is similar to that

e

in the NelderMead algorithm The expansion simplex S has

r

vertices x e e and is accepted over S if the b est function

1 1 2

e r

value of the vertices of S is b etter than the b est in S If the

r

b est function value of the vertices of S is not b etter than the

D M Bortz C T Kelley

c

b est in S then the simplex is contracted and the new simplex

has vertices x c c After the new simplex is identied the

1 1 2

+

vertices are reordered to create the new ordered simplex S

Figure MDS Simplices and New Points

Right Simplex Equilateral Simplex

x2 x3

x3 c2 c2 c3 x1 e2 r2

x1 c1 x2

r3 r2 r3

e3

e2 e3

Similarly to the NelderMead algorithm there are expansion

and contraction parameters and Typical values for these

e c

are and

Algorithm mdsS f k max

Evaluate f at the vertices of S and sort the vertices of S

so that holds

Set f count N

While f x f x

N +1 1

a Reect If f count k max then exit

For j N r x x x Compute f r

j 1 j 1 j

If f x min ff r g then goto step b else goto

1 j j

step c

b Expand

i For j N e x x x Compute

j 1 e j 1

f e j

The Simplex Gradient and Noisy Optimization Problems

ii If min ff r g min ff e g then

j j j j

for j N x e

j j

else

for j N x r

j j

iii Goto step d

c Contract For j N x x x x

j 1 c j 1

Compute f x

j

d Sort Sort the vertices of S so that holds

c

IF the function values at the vertices of S are known then

+

the cost of computing S is N additional evaluations Just as

with NelderMead the expansion step is optional but has b een

observed to improve p erformance

Assume that the simplices are either equilateral or right sim

plices having one vertex from which all N edges are at right

angles In those cases as p ointed out in the p ossible ver

tices created by expansion and reection steps form a regular

lattice of p oints If the MDS simplices remain b ounded only

nitely many reections and expansions are p ossible b efore ev

ery p oint on that lattice has b een visited and a contraction to a

new maximal simplex size must take place This exhaustion of

a lattice takes place under more general conditions but is

most clear for the equilateral case

The p oint of Lemma is that innitely many contractions

and convergence of the simplex diameters to zero imply conver

gence of the simplex gradient to zero

Lemma Let S be an ordered simplex such that holds

Let f satisfy let rf be Lipschitz continuously continuously

s

dierentiable in a bal l of B radius S about x Assume

+ 1

that

f x minff r g

1 j

j

Then if K is the constant from Lemma

kk

B

1

krf x k K S

s 1 + +

S +

D M Bortz C T Kelley

Proof Let R the unordered reected simplex have ver

N

tices x and fr g implies that each comp onent of f

1 j

j =1

S and f R is p ositive Now since

V V S V R

we must have

T

f S f R

T T T T T

V V f S V R V R f R

T T

D f S V V D f R

We apply Lemma to b oth D f S and D f R to obtain

D f S rf x E and D f R rf x E

s 1 1 s 1 2

where since V V R

+

kk

B

kE k K S

k + +

S

+

Since kV k S we have by

+

T T 2

rf x V V rf x S krf x kkE k kE k

s 1 s 1 + s 1 1 2

2

S kE kkE k

+ 1 2

The assumptions of the lemma give a lower estimate of the left

side of

T T 2 2

w V V w V kw k

+

Hence

2 2

kr f x k B kr f x k C

1 1

where using

kk

B

1

B K S

s + +

1

S +

The Simplex Gradient and Noisy Optimization Problems

and

2

kk

B

2 1 2

B S C K

+ s +

S

+

2 2

So B C B and the quadratic formula then

implies that

q

p

2

B C B

2

B B kr f x k

1

as asserted

The similarity of Lemma to Lemma and of Theorem

the convergence result for multidirectional search to Theorem

is no accident The NelderMead iteration which is more ag

gressive that the multidirectional search iteration requires far

stronger assumptions well conditioning and sucient decrease

for convergence but the ideas are the same Lemma and

Theorem extends the results in to the noisy case The

observation in that one can apply any heuristic or machine

dep endent idea to improve p erformance say by exploring far

away p oints on spare pro cessors the sp eculative function eval

uations of without aecting the analysis is still valid here

Theorem Let f satisfy and assume that the set

0

fx j f x f x g

1

is bounded Assume that the simplex shape is such that

k

lim S

+

k !1

k k k

Let B be a bal l of radius S about x Then if

+

1

kk

k

B

lim

k

k !1

S

+

then every limit point of the vertices is a critical point of f s

D M Bortz C T Kelley

Recall that if the simplices are equilateral or right simplices

then holds

The more general class of pattern search algorithms studied

in can also b e analyzed in this way and we plan to do that

in future work

References

L Armijo Minimization of functions having Lipschitz

continuous rst partial derivatives Pacic J Math

pp

D B Bertsekas On the GoldsteinLevitinPolyak gra

dient projection method IEEE Trans Autom Control

pp

C G Broyden QuasiNewton methods and their appli

cation to function minimization Math Comp

pp

R H Byrd R B Schnabel and G A Schultz

Paral lel quasiNewton methods for unconstrained optimiza

tion Math Prog pp

J W David C Y Cheng T D Choi C T Kel

ley and J Gablonsky Optimal design of high speed

mechanical systems Tech Rep CRSCTR North

Carolina State University Center for Research in Scien

tic Computation July Mathematical Mo deling and

Scientic Computing to app ear

J W David C T Kelley and C Y Cheng Use

of an implicit ltering algorithm for mechanical system pa

rameter identication SAE Pap er SAE Inter

national Congress and Exp osition Conference Pro ceedings

Mo deling of CI and SI Engines pp

The Simplex Gradient and Noisy Optimization Problems

J E Dennis and R B Schnabel Numerical Methods

for Nonlinear Equations and Unconstrained Optimization

no in Classics in Applied Mathematics SIAM Philadel

phia

J E Dennis and V Torczon Direct search methods

on paral lel machines SIAM J Optim pp

A V Fiacco and G P McCormick Nonlinear Pro

gramming John Wiley and Sons New York

S J Fortune D M Gay B W Kernighan

O Landron R A Valenzuela and M H Wright

WISE design of indoor wireless systems IEEE Computa

tional Science and Engineering Spring pp

P Gilmore An Algorithm for Optimizing Functions with

Multiple Minima PhD thesis North Carolina State Uni

versity Raleigh North Carolina

P Gilmore and C T Kelley An implicit ltering al

gorithm for optimization of functions with many local min

ima SIAM J Optim pp

P Gilmore C T Kelley C T Miller and G A

Williams Implicit ltering and optimal design problems

Proceedings of the workshop on optimal design and control

Blacksburg VA April in Optimal Design and

Control J Borggaard J Burkhardt M Gunzburger and

J Peterson eds vol of Progress in Systems and Control

Theory Birkhauser Boston pp

C T Kelley Detection and remediation of stagnation

in the NelderMead algorithm using a sucient decrease

condition Tech Rep CRSCTR North Carolina State

University Center for Research in Scientic Computation

January Submitted for Publication

D M Bortz C T Kelley

J C Lagarias J A Reeds M H Wright and

P E Wright Convergence properties of the NelderMead

simplex algorithm in low dimensions Tech Rep

ATT Bell Lab oratories April

D Q Mayne and E Polak Nondierential optimiza

tion via adaptive smoothing J Optim Theory Appl

pp

K I M McKinnon Convergence of the NelderMead

simplex method to a nonstationary point tech rep De

partment of Mathematics and Univer

sity of Edinburgh Edinburgh

J A Nelder and R Mead A simplex method for func

tion minimization Comput J pp

D Stoneking G Bilbro R Trew P Gilmore

and C T Kelley Yield optimization using a GaAs pro

cess simulator coupled to a physical device model IEEE

Transactions on Microwave Theory and Techniques

pp

D E Stoneking G L Bilbro R J Trew

P Gilmore and C T Kelley Yield optimization

using a GaAs process simulator coupled to a physical de

vice model in Pro ceedings IEEECornell Conference on

Advanced Concepts in High Sp eed Devices and Circuits

IEEE pp

V Torczon On the convergence of the multidimensional

direct search SIAM J Optim pp

On the convergence of pattern search algorithms

SIAM J Optim pp

T A Winslow R J Trew P Gilmore and C T

Kelley Doping proles for optimum class B performance

The Simplex Gradient and Noisy Optimization Problems

of GaAs mesfet ampliers in Pro ceedings IEEECornell

Conference on Advanced Concepts in High Sp eed Devices

and Circuits IEEE pp

Simulated performance optimization of GaAs MES

FET ampliers in Pro ceedings IEEECornell Conference

on Advanced Concepts in High Sp eed Devices and Circuits

IEEE pp

S K Zavriev On the global optimization properties of

nitedierence local descent algorithms J Global Opti

mization pp