Chapter

Intro duction

There is scarcely a mo dern journal whether of engineering economics management

mathematics physics or the so cial sciences in which the concept optimization is missing

from the sub ject index If one abstracts from all sp ecialist p oints of view the recurring

problem is to select a b etter or b est Leibniz eventuallyheintro duced the term

optimal alternativefromamonganumb er of p ossible states of aairs However if one

were to followthehyp othesis of Leibniz as presented in his Theodicee that our world

is the b est of all p ossible worlds one could justiably sink into passive fatalism There

would b e nothing to improve or to optimize

Biology esp ecially since Darwin has replaced the static world picture of Leibniz time

by a dynamic one that of the more or less gradual development of the sp ecies culminat

ing in the app earance of man Paleontology is providing an increasingly complete picture

of organic evolution Socalled missing links rep eatedly turn out to b e not missing but

rather hitherto undiscovered stages of this pro cess Very much older than the recogni

tion that man is the result or b etter intermediate state of a meliorization pro cess is

the seldomquestioned assumption that he is a p erfect end pro duct the pinnacle of cre

ation Furthermore long b efore man conceived of himself as an active participantin

the development of things he had unconsciously inuenced this evolution There can b e

no doubt that his ability and eorts to make the environment meet his needs raised him

ab ove other forms of life and have enabled him despite physical inferioritytondto

hold and to extend his place in the worldso far at least As long as mankind has existed

on our planet spaceship earth we together with other sp ecies havemutually inuenced

and changed our environment Has this always b een done in the sense of meliorization

In the French philosopher Voltaire dissatised with the conditions of his

age was already taking up arms against Leibniz philosophical optimism and calling for

conscious eort to change the state of aairs In the same waytodaywhenwe optimize

we nd that we are b oth the sub ject and ob ject of the history of development In the

desire to improve an ob ject a pro cess or a system Wilde and Beightler see an

expression of the human striving for perfection Whether sucha lofty goal can b e attained

dep ends on many conditions

It is not p ossible to optimize when there is only one way to carry out a taskthen one

has no alternative If it is not even known whether the problem at hand is soluble the

Intro duction

situation calls for an invention or discovery and not at that stage for any optimization

But wherever two or more solutions exist and one must decide up on one of them one

should cho ose the b est that is to say optimize Those indep endent features that distin

guish the results from one another are called indep endent variables or parameters of the

ob ject or system under consideration they may b e represented as binaryinteger other

wise discrete or real values A rational decision between the real or imagined variants

presupp oses a value judgement which requires a scale of values a quantitative criterion

of merit according to which one solution can b e classied as b etter another as worse

This dep endentvariable is usually called an objective function b ecause it dep ends on

the ob jective of the systemthe goal to b e attained with itand is functionally related to

the parameters There mayeven exist several ob jectives at the same timethe normal

case in living systems where the mix of ob jectives also changes over time and may in fact

b e induced by the actual course of the evolutionary paths themselves

Sometimes the hardest part of optimization is to dene clearly an ob jective function

For instance if several subgoals are aimed at a relativeweightmust b e attached to eachof

the individual criteria If these are contradictory one only can hop e to nd a compromise

on a tradeo subset of nondominated solutionsVariability and distinct order of merit

are the unavoidable conditions of any optimization One may sometimes also think one

has found the right ob jective for a subsystem only to realize later that in doing so

one has provoked unwanted side eects the ramications of whichhaveworsened the

disregarded total ob jective function We are just now exp eriencing hownarrowminded

scales of value can steer us into dangerous plights and how it is sometimes necessary to

consider the whole Earth as a system even if this is where dierences of opinion ab out

value criteria are the greatest

The second diculty in optimization particularly of multiparameter ob jectives or

pro cesses lies in the choice or design of a suitable strategy for pro ceeding Even when the

ob jective has b een suciently clearly dened indeed even when the functional dep endence

on the indep endentvariables has b een mathematically or computationally formulated

it often remains dicult enough if not completely imp ossible to nd the optimum

esp ecially in the time available

The uninitiated often think that it must b e an easy matter to solve a problem expressed

in the language of mathematics that most exact of all sciences Far from it The problem

of how to solve problems is unsolvedand mathematicians havebeenworking on it for

centuries For giving exact answers to questions of extremal values and corresp onding

p ositions or conditions we are indebted for example to the dierential and variational

calculusofwhich the development in the th century is asso ciated with such illustrious

names as Newton Euler Lagrange and Bernoulli These constitute the foundations of

the present metho ds referred to as classical and of the further developments in the theory

of optimization Still there is often a long way from the theorywhich is concerned

with establishing necessary and sucient conditions for the existence of minima and

maxima to the practice the determination of these most desirable conditions Practically

signicant solutions of optimization problems rst b ecame p ossible with the arrival of

large and fast programmable computers in the midth century Since then the o o d

of publications on the sub ject of optimization has b een steadily rising in volume it is a

Intro duction

simple matter to collect several thousand published articles ab out optimization metho ds

Even an interested party nds it dicult to keep pace nowadays with the development

that is going on It seems far from b eing over for there still exists no allembracing theory

of optimization nor is there anyuniversal metho d of solution Thus it is appropriate in

Chapter to give a general survey of optimization problems and metho ds The sp ecial

role of direct static nondiscrete and nonsto chastic parameter optimization emerges

here for many of these metho ds can b e transferred to other elds the converse is less often

p ossible In Chapter some of these strategies are presented in more depth principally

those that extract the information they require only from values of the ob jective function

thatistosay without recourse to analytic partial derivatives derivativefree methods

Metho ds of a probabilistic nature are omitted here

Metho ds which use chance as an aid to decision making are treated separately in

Chapter In numerical optimizationchance is simulated deterministically by means of

a pseudorandom numb er generator able to pro duce some kind of deterministic chaos only

One of the random strategies proves to b e extremely promising It imitates in a highly

simplied manner the mutationselection game of nature This concept a twomemb ered

evolution strategy is formulated in a manner suitable for numerical optimization in Chap

ter Section Following the hyp othesis put forward byRechenb erg that biological

evolution is or p ossesses an esp ecially advantageous optimization pro cess and is there

fore worthy of imitation an extended multimemb ered scheme that imitates the population

principle of evolution is intro duced in Chapter Section It p ermits a more natural

as well as more eective sp ecication of the step lengths than the two memberedscheme

and actually invites the addition of further evolutionary principlessuch as for example

sexual propagation and recombination An approximate theory of the rate of convergence

can also b e set up for the evolution strategyinwhich only the b est of descendants

of a generation b ecome parents of the following one

A short excursion new to this edition intro duces nearly concurrentdevelopments

that the author was unaware of when compiling his dissertation in the early s ie

genetic algorithms andtabu search

Chapter then makes a comparison of the evolution strategies with the direct search

metho ds of zero rst and second order whichwere treated in detail in Chapter

Since the predictivepower of theoretical pro ofs of convergence and statements of rates

of convergence is limited to simple problem structures the comparison includes mainly

numerical tests employing various mo del ob jective functions The results are evaluated

from twopoints of view

Eciency or sp eed of approach to the ob jective

Eectivity or reliability under varying conditions

The evolution strategies are highly successful in the test of eectivityor robustness

Contrary to the widely held opinion that biological evolution is a very wasteful metho d

of optimization the convergencerate test shows that in this resp ect to o the evolution

metho ds can hold their own and are sometimes even more ecient than many purely

deterministic metho ds The circle is closed in Chapter where the analogy b etween

Intro duction

iterative optimization and evolution is raised once again for discussion with a lo ok at

some natural improvements and extensions of the concept of the evolution strategy

The list of test problems that were used can b e found in App endix A and FORTRAN

co des of the evolution strategies with detailed guidance for users are in App endix B

Finally App endix C explains how to use the C and FORTRAN programs on the oppy disk

Chapter

Problems and Metho ds of

Optimization

General Statement of the Problems

According to whether one emphasizes the theoretical asp ect existence conditions of op

timal solutions or the practical pro cedures for reaching optima optimization nowadays

is classied as a branch of applied or numerical mathematics operations researchorof

computerassisted systems engineering design In fact many optimization metho ds are

based on principles whichwere develop ed in linear and nonlinear algebra Whereas for

equations or systems of equations the problem is to determine a quantity or set of quan

tities such that functions which dep end on them have sp ecied values in the case of an

optimization problem an initially unknown extremal value is sought Many of the current

metho ds of solution of systems of linear equations start with an approximation and suc

cessively improveitby minimizing the deviation from the required value For nonlinear

equations and for incomplete or overdetermined systems this way of pro ceeding is actually

essential Ortega and Rheinb oldt Thus many seemingly quite dierent and ap

parently unrelated problems turn out after a suitable reformulation to b e optimization

problems

Into this class come for example the solution of dierential equations b oundary

and initial value problems and eigenvalue problems as well as problems of observational

calculus adaptation and approximation Stiefel Schwarz Rutishauser and Stiefel

Collatz and Wetterling In the rst case the basic problem again is to

solve equations in the second the problem is often reduced to minimize deviations in the

Gaussian sense sum of squares of residues or the Tchebyche sense maximum of the

absolute residues Even game theory Vogelsang and pattern or shaperecognition

as a branch of information theory Andrews Niemann have features in

common with the theory of optimization In one case from among a stored set of idealized

typ es a pattern will b e sought that has the maximum similarity to the one presented in

another case the search will b e for optimal courses of action in conict situations Here

two or more interests are comp eting Eachplayer tries to maximize his chanceofwinning

with regard to the way in which his opp onent supp osedly plays Most optimization

Problems and Metho ds of Optimization

problems however are characterized by a single interest to reach an ob jectivethatisnot

inuenced by others

The engineering asp ect of optimization has manifested itself esp ecially clearly with the

design of learning rob ots whichhaveto adapt their op eration to the prevailing conditions

see for example Feldbaum Zypkin The feedbackbetween the environment

and the b ehavior of the rob ot is eected here by a program a strategy which can p erhaps

even alter itself Wiener go es even further and considers selfrepro ducing machines

thus arriving at a consideration of rob ots that are similar to living b eings Computers

are often regarded as the most highly develop ed rob ots and it is therefore tempting to

make comparisons with the human brain and its neurons and synapses von Neumann

Marfeld Steinbuch They are nowadays the most imp ortant aid

to optimization and many problems are intractable without them

Particular Problems and Metho ds of Solution

The lack of a universal method of optimization has led to the presentavailabilityof

numerous pro cedures that eachhave only limited application to sp ecial cases No attempt

will b e made here to list them all A short survey should help to distinguish the parameter

optimization strategies treated in detail later from the other pro cedures but while at

the same time exhibiting some features they haveincommon The chosen scheme of

presentation is to discuss two opp osing concepts together

Exp erimental Versus Mathematical Optimization

If the functional relation b etween the variables and the ob jective function is unknown

one is forced to exp eriment either on the real ob ject or on a scale mo del Todosoone

must b e as free as p ossible to vary the indep endentvariables and have access to measuring

instruments with which the dep endentvariable the quality can b e measured Systematic

investigation of all p ossible states of the system will b e to o costly if there are many

variables and random sampling of various combinations is to o unreliable for achieving

the desired result A pro cedure must b e signicantly more eective if it systematically

exploits whatever information is retained ab out preceding attempts Such a plan is also

called a strategy The concept originated in game theory and was formulated byvon

Neumann and Morgenstern

Many of the search strategies of mathematical optimization to b e discussed later

were also applied under exp erimental conditionsnot always successfully An imp ortant

characteristic of the exp eriment is the unavoidable eect of sto chastic disturbances on

the measured results A go o d experimental optimization strategy has to take accountof

this fact and approach the desired extremum with the least p ossible cost in attempts

Two metho ds in particular are most frequently mentioned in this connection

The EVOP evolutionary op eration metho d prop osed byGEPBox a

development of the exp erimental gradient metho d of Box and Wilson

The strategy of articial evolution designed by Rechenb erg

Particular Problems and Metho ds of Solution

The algorithm of Rechenb ergs evolution strategy will b e treated in detail in Chapter

In the exp erimental eld it has often b een applied successfully for example to the solution

of multiparameter shaping problems Rechenb erg Schwefel Klo ckgether and

Schwefel All variables are simultaneously changed by a small random amount

The changes are binomially or normally distributed The exp ected value of the random

vector is zero for all comp onents Failures leave the starting condition unchanged

only successes are adopted Sto chastic disturbances or p erturbations brought ab out by

errors of measurement do not aect the reliability but inuence the sp eed of convergence

according to their magnitude Rechenb erg gives rules for the optimal choice of a

common variance of the probability density distribution of the random changes for b oth

the unp erturb ed and the p erturb ed cases

The EVOP method of G E PBoxchanges only two or three parameters at a timeif

p ossible those whichhave the strongest inuence A square or cub e is constructed with

 

an initial condition at its midp oint its or corners represent the p oints in

a cycle of trials These deterministically established states are tested sequentially several

times if p erturbations are acting The state with the b est result b ecomes the midp oint

of the next pattern of p oints Under some conditions one also changes the scaling of

the variables or exchanges one or more parameters for others Details of this altogether

heuristic way of pro ceeding are describ ed byBox and Drap er The metho d

has mainly b een applied to the dynamic optimization of chemical pro cesses Exp eriments

are p erformed on the real system sometimesover a p erio d of several years

The counterpart to exp erimental optimization is mathematical optimization The

functional relation b etween the criterion of merit or quality and the variables is known

at least approximately to put it another way a more or less simplied mathematical

model of the ob ject pro cess or system is available In place of exp eriments there app ears

the manipulation of variables and the ob jective function It is sometimes easy to set up

a mathematical mo del for example if the laws governing the b ehavior of the physical

pro cesses involved are known If however these are largely unresearched as is often the

case for ecological or economic pro cesses the work of mo del building can far exceed that

of the subsequent optimization

Dep ending on what delib erate inuence one can have on the pro cess one is either

restricted to the collection of available data or one can uncover the relationships b etween

indep endent and dep endentvariables by judiciously planning and interpreting tests Such

metho ds Co chran and Cox Kempthorne Davies Cox Fisher

Va jda Yates John were rst applied only to agricultural prob

lems but later spread into industry Since the analyst is intent on building the b est

p ossible mo del with the fewest p ossible tests such an analysis itself constitutes an op

timization problem just as do es the synthesis that follows it Wald therefore

recommends pro ceeding sequentially that is to construct a mo del as a hyp othesis from

initial exp eriments or given a priori information and then to improve it in a stepwise

fashion by a further series of tests or alternatively to sometimes reject the mo del com

pletely The tting of mo del parameters to the measured data can b e considered as an

optimization problem insofar as the exp ected error or maximum risk is to b e minimized

This is a sp ecial case of optimization called calculus of observations whichinvolves sta

Problems and Metho ds of Optimization

tistical tests like regression and variance analyses on data sub ject to errors for which the



principle of maximum likeliho o d or minimum is used see Heinhold and Gaede

The cost of constructing a mo del of large systems with manyvariables or of very

complicated ob jects can b ecome so enormous that it is preferable to get to the desired

optimal condition by direct variation of the parameters of the pro cess in other words to

optimize exp erimentally The fact that one tries to analyze the b ehavior of a mo del or

system at all is founded on the hop e of understanding the pro cesses more fully and of

b eing able to solvethesynthesis problem in a more general way than is p ossible in the

case of exp erimental optimization which is tied to a particular situation

If one has succeeded in setting up a mathematical mo del of the system under consid

eration then the optimization problem can b e expressed mathematically as follows

F x F x x x extr

  n

The round brackets symb olize the functional relationship b etween the n indep endent

variables



fx i ng

i

and the dep endentvariable F the quality or ob jective function In the following it

is always a scalar quantity The variables can b e scalars or functions of one or more

parameters Whether a maximum or a minimum is sought for is of no consequence for

the metho d of optimization b ecause of the relation

maxfF xg minfF xg

Without loss of generality one can concentrate on one of the typ es of problem usually

the minimum problem is considered Restrictions do arise insofar as in many practical

problems the variables cannot b e chosen arbitrarily They are called constraints The

simplest of these are the nonnegative conditions

x for all i n

i

They are formulated more generally like the ob jective function

G xG x x x for all j m

j j   n

The notation chosen here follows the convention of parameter optimization One

distinguishes b etween equalities and inequalities Eachequality constraint reduces the

numb er of true variables of the problem by one Inequalities on the other hand simply

reduce the size of the allowed space of solutions without altering its dimensionality The

sense of the inequality is not critical Like the interchanging of minimum and maximum

problems one can transform one typ e into the other by reversing the signs of the terms

It is sucient to limit consideration to one formulation For minimum problems this is



The term n stands for n

Particular Problems and Metho ds of Solution

normally the typ e G x Points on the edge of the closed allowed space are thereby

j

p ermitted A dierent situation arises if the constraintisgiven as a strict inequalityof

the form G x Then the allowed space can b e op en if G xiscontinuous If for

j j

G x with other conditions the same the minimum lies on the b order G x

j j

then for G x there is no true minimum One refers here to an inmumthe

j

greatest lower b ound at which actually G x In analogous fashion one distinguishes

j

between maximaand suprema smallest upp er b ounds Optimization in the following

means always to nd a maximum or a minimum p erhaps under inequality constraints

Static Versus Dynamic Optimization

The term static optimization means that the optimum is time invariant or stationary

It is sucient to determine its p osition and size once and for all Once the lo cation

of the extremum has b een found the searchisover In many cases one cannot control

all the variables that inuence the ob jective function Then it can happ en that these

uncontrollable variables change with time and displace the optimum nonstationary case



The goal of dynamic optimization is therefore to maintain an optimal condition in

the face of varying conditions of the environment The search for the extremum b ecomes

a more or less continuous pro cess According to the sp eed of movement of the optimum

it may b e necessary instead of making the slow adjustment of the indep endentvariables

by handas for example in the EVOP metho d see Chap Sect to give the task

to a robot or automaton

The automaton and the pro cess together form a control lo op However unlike con

ventional control lo ops this one is not required to maintain a desired value of a quantity

but to discover the most favorable value of an unknown and timedep endent quantity

Feldbaum Frankovic et al and Zach investigate in detail suchau

tomatic optimization systems known as extreme value controllers or optimizers In each

case they are built around a search pro cess For only one variable adjustable setting a

varietyofschemes can b e designed It is signicantly more complicated for an optimal

value lo op when several parameters havetobeadjusted

Many of the search metho ds are so very costly b ecause there is no a priori information

ab out the pro cess to b e controlled Hence nowadays one tries to build adaptive control

systems that use information gathered over a p erio d of time to set up an internal mo del

of the system or that in a sense learn Oldenburger and in more detail Zypkin

tackle the problems of learning and selflearning rob ots Adaptation is said to take

place if the change in the control characteristics is made on the basis of measurements

of those input quantities to the pro cess that cannot b e alteredalso known as disturbing

variables If the output quantities themselves are used here the ob jective function to

adjust the control system the pro cess is called selflearning or selfadaptation The latter

p ossibility is more reliable but b ecause of the time lag slower Cyb ernetic engineering is

concerned with learning pro cesses in a more general form and always sees or even seeks

links with natural analogues

An example of a rob ot that adapts itself to the environment is the homeostat of Ashby



Some authors use the term dynamic optimization in a dierentway than is done here

Problems and Metho ds of Optimization

Nowadays however one do es not build ones own optimizer every time there is a

given problem to b e solved Rather one makes use of socalled process computers which

for a new task only need another sp ecial program They can handle large and complicated

problems and are coupled to the pro cess by sensors and transducers in a closed lo op on

line Levine and Vilis McGrew and Haimes The actual computer usually

works digitally so that analoguedigital and digitalanalogue converters are required for

input and output Pro cess computers are employed for keeping pro cess quantities constant

and maintaining required proles as well as for optimization In the latter case an internal

mo del a computer program usually serves to determine the optimal pro cess parameters

taking account of the latest measured data values in the calculation

If the p osition of the optimum in a dynamic pro cess is shifting very rapidly the

manner in which the search pro cess follows the extremum takes on a greater signicance

for the overall quality In this case one has to go ab out setting up a dynamic mo del and

sp ecifying all variables including the controllable ones as functions of time The original

parameter optimization go es over to functional optimization

Parameter Versus Functional Optimization

The case when not only the ob jective function but also the indep endentvariables are

scalar quantities is called parameter optimization Numerical values

fx i ng

i

of the variables or parameters are sought for whichthevalue of the ob jective function

F F x extrfF xg

is an optimum The numb er of parameters describing a state of the ob ject or system is

nite In the simplest case of only one variable n the b ehavior of the ob jective

function is readily visualized on a diagram with two orthogonal axes The value of the

parameter is plotted on the abscissa and that of the ob jective function on the ordinate

The functional dep endence app ears as a curve For n a three dimensional Cartesian

co ordinate system is required The state of the system is represented as a p oint in the

horizontal plane and the value of the ob jective function as the vertical heightabove it

A mountain range is obtained the surface of which expresses the relation of dep endent

to indep endentvariables To further simplify the representation the curves of intersec

tion b etween the mountain range and parallel horizontal planes are pro jected onto the

base plane which provides a contour diagram of the ob jective function From this three

dimensional picture and its two dimensional pro jection concepts like p eak plateau val

ley ridge and contour line are readily transferred to the ndimensional case whichis

otherwise b eyond our p owers of description and visualization

In functional optimization instead of optimal p oints in three dimensional Euclidean

space optimal tra jectories in function spaces suchasBanach or Hilb ert space are to

b e determined Thus one refers also to innite dimensional optimization as opp osed to

the nite dimensional parameter optimization Since the variables to b e determined are

Particular Problems and Metho ds of Solution

themselves functions of one or more parameters the ob jective function is a function of a

function or a functional

A classical problem is to determine the smo oth curvedown whichapoint mass will

slide b etween twopoints in the shortest time acted up on by the force of gravityand

without friction Known as the brachistochrone problem it can b e solved by means of the

ordinary variational calculus Courant and Hilb ert ab Denn Clegg If

the functions to b e determined dep end on several variables it is a multidimensional varia

tional problem Klotzler In many cases the time t app ears as the only parameter

The ob jective function is commonly an integral in the integrand of which will app ear not

only the indep endentvariables

xt fx tx tx tg

  n

but also their derivativesx t x t and sometimes also the parameter t itself

i i

Z

t



F xt f xt x tt dt extr

t



Such problems are typical in control theory where one has to nd optimal controlling

functions for control pro cesses eg Chang Lee Leitmann Hestenes

Balakrishnan and Neustadt Karreman Demyanov and Rubinov

Whereas the variational calculus and its extensions provide the mathematical basis

of functional optimization in the language of control engineering optimization with dis

tributed parameters parameter optimization with lo calized parameters is based on the

theory of maxima and minima from the elementary dierential calculus Consequently

b oth branches havefollowed indep endent paths of development and b ecome almost sepa

rate disciplines The functional analysis theory of Dub ovitskii and Milyutin see Girsanov

has bridged the gap b etween the problems byallowing them to b e treated as sp e

cial cases of one fundamental problem and it could thus lead to a general theory of

optimization However dierent their theoretical bases in cases of practical signicance

the problems must b e solved on a computer and the iterative methods employed are then

broadly the same

One of these iterative metho ds is the or stepwise optimization

of Bellman It was originally conceived for the solution of economic problems in

which timedep endentvariables are changed in a stepwise way at xed p oints in time

The metho d is a discrete form of functional optimization in which the tra jectory sought

app ears as a steplike function At each step a decision is taken the sequence of whichis

called a policy Assuming that the state at a given step dep ends only on the decision at

that step and on the preceding stateie there is no feedback then dynamic programming

can b e applied The Bellman optimum principle implies that each piece of the optimal

tra jectory that includes the end p oint is also optimal Thus one b egins by optimizing the

nal decision at the transition from the lastbutone to the last step Nowadays dynamic

programming is frequently applied to solving discrete problems of optimal control and

regulation Gessner and Spremann Lerner and Rosenman Its advantage

compared to other analytic metho ds is that its algorithm can b e formulated as a program

suitable for digital computers allowing fairly large problems to b e tackled Gessner and

Problems and Metho ds of Optimization

Wacker Bellmans optimum principle can however also b e expressed in dierential

form and applied to an area of continuous functional optimization Jacobson and Mayne

The principle of stepwise optimization can b e applied to problems of parameter opti

mization if the ob jective function is separable Hadley that is it must b e express

ible as a sum of partial ob jective functions in whichjustoneoravery few variables app ear

at a time The numb er of steps k corresp onds to the numb er of the partial functions at

each step a decision is made only on the variables in the partial ob jective They are

also called control or decision variables Subsidiary conditions number m in the form

of inequalities can b e taken into account The constraint functions likethevariables

are allowed to take a nite number b of discrete values and are called state variables

The recursion formula for the stepwise optimization will not b e discussed here Only the

numb er of required op erations N in the calculation will b e mentioned which is of the

order

m

N kb

For this reason the usefulness of dynamic programming is mainly restricted to the case

k n and m Then at eachofthen steps just one control variable is

sp ecied with resp ect to one subsidiary condition In the other limiting case where all

variables have to b e determined at one step the normal case of parameter optimization

the pro cess go es over to a grid method complete enumeration with a computational

nm

requirementoforder O b Herein lies its capability for lo cating global optima even

of complicated multimodal ob jective functions However it is only esp ecially advantageous

if the structure of the ob jective function p ermits the enumeration to b e limited to a small

part of the allowed region

Digital computers are p o orly suited to solving continuous problems b ecause they can

not op erate directly with functions Numerical integration pro cedures are p ossible but

costly Analogue computers are more suitable b ecause they can directly imitate dynamic

pro cesses Compared to digital computers however they have a small numerical range

and low accuracy and are not so easily programmable Thus sometimes digital and ana

logue computers are coupled for certain tasks as socalled hybrid computers With such

systems a set of dierential equations can b e tackled to the same extent as a problem

in functional optimization Volz The digital computer takes care of the

iteration control while on the analogue computer the dierentiation and integration op

erations are carried out according to the parameters supplied by the digital computer

Korn and Korn and Bekey and Karplus describ e the op erations involved

in tra jectory optimization and the solution of dierential equations by means of hybrid

computers The fact that random metho ds are often used for such problems has to do

with the computational imprecision of the analogue part with which deterministic pro

cesses usually fail to cop e If requirements for accuracy are very high however purely

digital computation has to takeover with the consequent greater cost in computation time

Particular Problems and Metho ds of Solution

Direct NumericalVersus

Indirect Analytic Optimization

The classication of mathematical metho ds of optimization into direct and indirect pro

cedures is attributed to Edelbaum Esp ecially if one has a computer mo del of a

system with which one can p erform simulation exp eriments the search for a certain set

of exogenous parameters to generate excellent results asks for robust direct optimization

metho ds Direct or numerical metho ds are those that approach the solution in a step

wise manner iteratively at each step hop efully improving the value of the ob jective

function If this cannot b e guaranteed a trial and error pro cess results

An indirect or analytic pro cedure attempts to reach the optimum in a single calcula

tion step without tests or trials It is based on the analysis of the sp ecial prop erties of

the ob jective function at the p osition of the extremum In the simplest case parameter

optimization without constraints one pro ceeds on the assumption that the tangent plane

at the optimum is horizontal ie the rst partial derivatives of the ob jective function

exist and vanish in x

F

for all i n

x

i



xx

This system of equations can b e expressed with the socalled Nabla op erator rasa

single vector equation for the stationary p oint x

rF x

Equation or transforms the original optimization problem into a problem of

solving a set of p erhaps nonlinear simultaneous equations If F x or one or more of its

derivatives are not continuous there may b e extrema that do not satisfy the otherwise

n

necessary conditions On the other hand not every p ointinIR the ndimensional space

of real variables that satises conditions need b e a minimum it could also b e a

maximum or a saddle p oint Equation is referred to as a necessary condition for the

existence of a lo cal minimum

Togive sucient conditions requires further pro cesses of dierentiation In fact

dierentiations must b e carried out until the determinant of the matrix of the second

or higher partial derivatives at the p oint x is nonzero Things remain simple in the case

of only one variable when it is required that the lowest order nonvanishing derivativeis

p ositive and of even order Then and only then is there a minimum If the derivativeis

negative x represents a maximum A saddle p oint exists if the order is o dd

n

For n variables at least the n second partial deriv atives





F x

for all i j n

x x

i j



must exist at the p oint x The determinant of the r F x must b e

p ositive as well as the further n principle sub determinants of this matrix While

MacLaurin had already completely formulated the sucient conditions for the existence

of minima and maxima of one parameter functions in the corresp onding theory

Problems and Metho ds of Optimization

for functions of several variables was only completed nearly years later byScheeer

and Stolz see also Hanco ck

Sucient conditions can only b e applied to check a solution that was obtained from

the necessary conditions The analytic path thus always leads the original optimization

problem back to the problem of solving a system of simultaneous equations Equation

If the ob jective function is of second order one is dealing with a linear system

which can b e solved with the aid of one of the usual metho ds of linear algebra Even if non

iterative pro cedures are used suchastheGaussian elimination algorithm or the matrix

decomposition metho d of Cholesky this cannot b e done with a singlestep calculation



Rather the numb er of op erations grows as O n With fast digital computers it is certainly

a routine matter to solve systems of equations with even thousands of variables however

the inevitable rounding errors mean that complete accuracy is never achieved Broyden

One can normally b e satised with a suciently go o d approximation Here relaxation

methods which are iterative show themselves to b e comparable or sup erior It dep ends in

detail on the structure of the co ecient matrix Starting from an initial approximation

the error as measured by the residues of the equations is minimized Relaxation pro cedures

are therefore basically optimization metho ds but of a sp ecial kind since the value of the

ob jective function at the optimum is known b eforehand This a priori information can b e

exploited to makesavings in the computations as can the fact that each comp onentof

the residue vector must individually go to zero eg Traub Wilkinson and Reinsch

Hestenes Hestenes and Stein

Ob jective functions having terms or memb ers of higher than second order lead to

nonlinear equations as the necessary conditions for the existence of extrema In this

case the stepwise approachtothenull p osition is essential eg with the interpolation

method whichwas conceived in its original form by Newton Chap Sect

The equations are linearized ab out the current approximation p oint Linear relations

for the correcting terms are then obtained In this way a complete system of n linear

equations has to b e solved at each step of the iteration Occasionally a more convenient

approach is to search for the minimum of the function



n

X

F

F x

x

i

i

with the help of a direct optimization metho d Besides the fact that F x go es to zero not

only at the sought for minimum of F x but also at its maxima and saddle p oints it can

sometimes yield nonzero minima of no interest for the solution of the original problem

Thus it is often preferable not to pro ceed via the conditions of Equation but to

minimize F x directly Only in sp ecial cases do indirect metho ds lead to faster more

elegant solutions than direct metho ds Such is for example the case if the necessary

existence condition for minima with one variable leads to an algebraic equation and

sectioning algorithms like the computational scheme of Horner can b e used or if ob jective

functions are in the form of socalled posynomes for which Dun Peterson and Zener

devised geometric programminganentirely indirect metho d

Particular Problems and Metho ds of Solution

Subsidiary conditions or constraints complicate matters In rare cases equality con

straints can b e expressed as equations in one variable that can b e eliminated from the

ob jective function or constraints in the form of inequalities can b e made sup eruous

by a transformation of the variables Otherwise there are the metho ds of b ounded vari

ation and Lagrange multipliers in addition to p enalty functions and the pro cedures of

mathematical programming

The situation is very similar for functional optimization except that here the indirect

metho ds are still dominanteven to dayThevariational calculus provides as conditions for

optima dierential instead of ordinary equationsactually ordinary dierential equations

EulerLagrange or partial dierential equations HamiltonJacobi In only a few cases

can such a system b e solved in a straightforward way for the unknown functions One

must usually resort again to the help of a computer Whether it is advantageous to

use a digital or an analogue computer dep ends on the problem It is a matter of sp eed

versus accuracyAhybrid system often turns out to b e esp ecially useful If however the

solution cannot b e found by a purely analytic route why not cho ose from the start the

direct pro cedure also for functional optimization In fact with the increasing complexity

of practical problems in numerical optimization this eld is b ecoming more imp ortant as

illustrated by the work of Daniel who takes over metho ds without derivatives from

parameter optimization and applies them to the optimization of functionals An imp ortant

p oint in this is the discretization or parameterization of the originally continuous problem

whichcanbeachieved in at least twoways

By approximation of the desired functions using a sum of suitable known functions or

p olynomials so that only the co ecients of these remain to b e determined Sirisena

By approximation of the desired functions using step functions or sides of p olygons

so that only heights and p ositions of the connecting p oints remain to b e determined

Recasting a functional into a parameter optimization problem has the great advantage

that a digital computer can b e used straightaway to nd the solution numericallyThe

disadvantage that the result only represents a sub optimum is often not serious in prac

tice b ecause the assumed values of parameters of the pro cess are themselves not exactly

known Dixon a The exp erimentally determined numb ers are prone to errors or

to statistical uncertainties In any case large and complicated functional optimization

problems cannot b e completely solved by the indirect route

The direct pro cedure can either start directly with the functional to b e minimized if

the integration over the substituted function can b e carried out RayleighRitz metho d

or with the necessary conditions the dierential equations which sp ecify the optimum In

the latter case the integral is replaced by a nite sum of terms Beveridge and Schechter

In this situation gradient methods are readily applied Kelley Klessig and

Polak The detailed way to pro ceed dep ends very much on the subsidiary conditions

or constraints of the problem

Problems and Metho ds of Optimization

Constrained Versus Unconstrained Optimization

Sp ecial techniques have b een develop ed for handling problems of optimization with con

straints In parameter optimization these are the metho ds of penalty functions and math

ematical programming In the rst case a mo died ob jective function is set up which

For the minimum problem takes the value F x in the forbidden region

but which remains unchanged in the allowed feasible region barrier method eg

used within the evolution strategies see Chap

Only near the b oundary inside the allowed region yields values dierent from F x

and thus keeps the search at a distance from the edge partial penalty function eg

used within Rosenbro cks strategy see Chap Sect

Diers from F xover the whole space spanned bythevariables global penalty

function

This last is the most common way of treating constraints in the form of inequalities The

main ideas here are due to Carroll createdresponse surfacetechnique and to Fiacco

and McCormick SUMT sequential unconstrained minimization technique

For the problem

F x min

G x for all j m

j

H x for all k

k

the p enalty function is of the form with rv w and G

j k j

m

X X

v

j



w H x F x F x r

k k

G x r

j

j 

k 

The co ecients v and w are weighting factors for the individual constraints and r is a

j k

free parameter The optimum of F x will dep end on the choice of r so it is necessary

to alter r in a stepwise way The original extreme value problem is therebysolved bya

sequence of optimizations in which r is gradually reduced to zero One can hop e in this

way at least to nd go o d approximations for the required minimum problem within a

nite sequence of optimizations

The choice of suitable values for r is not however easy Fiacco and Fiacco

and McCormick give some indications and also suggest further p ossibilities

for p enalty functions These pro cedures are usually applied in conjunction with gradient

metho ds The hemstitching metho d and the riding the constraints metho d of Rob erts and

Lyvers work bychanging the chosen direction whenever a constraint is violated

without using a mo died ob jective function They orient themselves with resp ect to

the gradient of the ob jective and the derivatives of the constraint functions Jacobian

matrix In hemstitching there is always a return into the feasible region while in riding

the constraints the search runs along the active constraint b oundaries The variables are

Particular Problems and Metho ds of Solution

reset into the allowed region bythecomplex method of M J Box a direct search

strategy whenever explicit b ounds are crossed Implicit constraints on the other hand

are treated as barriers see Chap Sect

The metho ds of mathematical programming b oth linear and nonlinear treat the

constraints as the main asp ect of the problem They were sp ecially evolved for op erations

researchMullerMerbach and assume that all variables must always b e p ositive

Such nonnegativity conditions allow sp ecial solution pro cedures to b e develop ed The

simplest mo dels of economic pro cesses are linear There are often no b etter ones available

For this purp ose Dantzig develop ed the simplex metho d of see

also Krelle and Kunzi Hadley Web er

The linear constraints together with the condition on the signs of the variables span

the feasible region in the form of a p olygon for n or a p olyhedron sometimes called

simplex Since the ob jective function is also linear except in sp ecial cases the desired

extremum must lie in a corner of the p olyhedron It is therefore sucient just to examine

the corners The simplex method of Dantzig do es this in a particularly economic way since

only those corners are considered in which the ob jective function has progressively b etter

values It can even b e thought of as a gradient metho d along the edges of the p olyhedron

It can b e applied in a straightforward way to manyhundreds even thousands of variables

and constraints For very large problems whichmayhave a particular structure sp ecial

metho ds have also b een developed Kunzi and Tan Kunzi Into this category

come the revised and the dual simplex metho ds the multiphase and duplex metho ds and

decomp osition algorithms An unpleasant prop erty of linear programs is that sometimes

just small changes of the co ecients in the ob jective function or the constraints can cause

a big alteration in the solution Toreveal such dep endencies metho ds of parametric linear

programming and sensitivity analysis have b een develop ed Dinkelbach

Most strategies of nonlinear programming resemble the simplex metho d or use it

as a subprogram Abadie This is the case in particular for the techniques of

which are conceived for quadratic ob jective functions and linear

constraints The theory of nonlinear programming is based on the optimality conditions

develop ed by Kuhn and Tucker an extension of the theory of maxima and min

ima to problems with constraints in the form of inequalities These can b e expressed

geometrically as follows at the optimum in a corner of the allowed region the gradi

ent of the ob jective function lies within the cone formed by the gradients of the active

constraints To start with this is only a necessary condition It b ecomes sucientunder

certain assumptions concerning the structure of the ob jective and constraint functions

For minimum problems the ob jective function and the feasible region must b e convex

that is the constraints must b e concave Such a problem is also called a convex program

Finally the KuhnTucker theorem transforms a convex program into an equivalent saddle

p oint problem Arrow and Hurwicz just as the Lagrange multiplier metho d do es

for constraints in the form of equalities A complete theory of equality constraints is due

to Ap ostol

Nonlinear programming is therefore only applicable to in which

to b e precise one must distinguish at least seven types of convexity Ponstein In

addition all the functions are usually required to b e continuously dierentiable with an

Problems and Metho ds of Optimization

analytic sp ecication of their partial derivatives There is an extensive literature on this

sub ject of whichthebooksby Arrow Hurwicz and Uzawa Zoutendijk Va

jda Kunzi Krelle and Oettli Kunzi Tzschach and Zehnder

Kunzi and Krelle Zangwill Suchowitzki and Awdejewa Mangasar

ian Sto er and Witzgall Whittle Luenb erger and Varga

are but a small sample Kappler considers some of the pro cedures from the

p oint of view of gradient metho ds Kunzi and Oettli give a survey of the more

extended pro cedures together with an extensive bibliographyFORTRAN programs are

to b e found in McMillan Kuester and Mize and Land and Powell

Of sp ecial imp ortance in control theory are optimization problems in which the con

straints are partly sp ecied as dierential equations They are also called nonholonomous

constraintsPontrjagin et al havegiven necessary conditions for the existence of

optima in these problems Their trickwas to distinguish b etween the free control func

tions to b e determined and the lo cal or state functions which are b ound by constraints

Although the theory has given a strong fo othold to the analytic treatmentofoptimal

control pro cesses it must b e regarded as a case of go o d luck if a practical problem can

b e made to yield an exact solution in this wayOnemust usually resort in the end to

numerical approximation metho ds in order to obtain the desired optimum eg Balakr

ishnan and Neustadt Rosen Leitmann Kopp Mufti

Tabak Canon Cullum and Polak Tolle Unb ehauen Boltjanski

Luenb erger Polak

Other Sp ecial Cases

According to the typeofvariables there are still other sp ecial areas of mathematical

optimization In parameter optimization for example the variables can sometimes b e

restricted to discrete or integer values The extreme case is if a parameter may only

taketwo distinct values zero and unity Mixed variable typ es can also app ear in the

same problem hence the terms discrete integer binary or zeroone and mixedinteger

programming Most of the solution pro cedures that have b een worked out deal with linear

integer problems eg those prop osed by Gomory Balas and Beale An imp ortant

class of metho ds the methods is describ ed for example byWeinb erg

They are classed together with dynamic programming as decision treestrategies

For the general nonlinear case a last resort can b e to try out all p ossibilities This

kind of optimization is referred to as complete enumeration Since the cost of sucha

pro cedure is usually prohibitive heuristic approaches are also tried with which usable

not necessarily optimal solutions can b e found Weinb erg and Zehnder More

clever ways of pro ceeding in sp ecial cases for example by applying noninteger techniques

of linear and nonlinear programming can b e found in Korbut and Finkelstein

Greenb erg Plane and McMillan Burkard Hu and Garnkel

and Nemhauser

By stochastic programming is meant the solution of problems with ob jective functions

and sometimes also constraints that are sub ject to statistical p erturbations Fab er

It is simplest if such problems can b e reduced to deterministic ones for example byworking

Other Sp ecial Cases

with expectation values However there are some problems in which the probability

distributions signicantly inuence the optimal solution Op erational metho ds at rst

only existed for sp ecial cases such as for example warehouse problems Beckmann

Their numb ers as well as the elds of application are growing steadily Hammer

Ermoliev and Wets Ermakov In general one has to make a clear distinction

between deterministic solution metho ds for more or less noisy or sto chastic situations

and stochastic methods for deterministic but dicult situations likemultimo dal or fractal

top ologies Here we refer to the former in Chapter we will do so for the latter esp ecially

under the asp ect of global optimization

In a rather new branch within the mathematical programming eld called nonsmooth

or nondierentiable optimization more or less classical gradienttyp e metho ds for nd

ing solutions still p ersist eg Balinski and Wolfe Lemarechal and Miin

Nurminski Kiwiel

For successively approaching the zero or extremum of a function if the measured values

are sub ject to uncertainties a familiar strategy is that of stochastic approximation Wasan

The original concept is due to Robbins and Monro Kiefer and Wolfowitz

have adapted it for problems in which the maximum of a unimo dal regression

function is sought Blum a has proved that the metho d is certain to converge It

distinguishes b etween test or trial steps and work steps With one variable starting at the

k  k  k 

p oint x the value of the ob jective function is obtained at the two p ositions x c

The slop e is then calculated as

k  k  k  k 

F x c F x c

k 

y

k 

c

Awork step follows from the recursion formula for minimum searches

k  k  k  k 

x x a y

k  k 

The choice of the p ositive sequences c and a is imp ortant for convergence of the

pro cess These should satisfy the relations

k 

lim c

k

X

k 

a

k 

X

k  k 

a c

k 



k 

X

a

k 

c

k 

One cho oses for example the sequences



a

 k 

a a

k



c

k  

p

c c k

 k

Problems and Metho ds of Optimization

This means that the work step length go es to zero very much faster than the test step

length in order to comp ensate for the growing inuence of the p erturbations

Blum b and Dvoretzky describ e how to apply this pro cess to multidi

mensional problems The increment in the ob jective function hence an approximation to

the gradientvector is obtained from n observ ations Sacks uses n trial steps

The sto chastic approximation can thus b e regarded in a sense as a particular gradient

metho d

Yet other basic strategies have b een prop osed these adopt only the choice of step

lengths from the sto chastic approximation while the directions are governed by other

criteria Thomas and Wilde for example combine the sto chastic approximation

with the relaxation metho d of Southwell Kushner and Schmitt

even take random directions into consideration All the pro ofs of convergence of the

sto chastic approximation assume unimo dal ob jective functions A further disadvantage

is that stability against p erturbations is b oughtatavery high cost esp ecially if the

number of variables is large Howmany steps are required to achieve a given accuracy

can only b e stated if the probability density distribution of the sto chastic p erturbations

is known Many authors have tried to devise metho ds in which the basic pro cedure can

b e accelerated eg Kesten who only reduces the step lengths after a change

in direction of the search or Odell who makes the lengths of the work steps

dep endent on measured values of the ob jective function Other attempts are directed

towards reducing the eect of the p erturbations Venter Fabian for example

by making only the direction and not the size of the gradients determine the step lengths

Bertram describ es various examples of applications More of suchwork is that of

Krasulina and Engelhardt

In this intro duction many classes of p ossible or practically o ccurring optimization

problems and metho ds have b een sketched briey but the coverage is far from complete

No mention has b een made for example of broken rational programmingnorof graphical

methods of solution In op erations research esp ecially Henn and Kunzi there are

many sp ecial techniques for solving transport al location routing queuingandwarehouse

problemssuch as network planning and other graph theoretical metho ds This excursion

into the vast realm of optimization problems was undertaken b ecause some of the algo

rithms to b e studied in more depth in what follows esp ecially the random metho ds of

Chapter owe their origin and nomenclature to other elds It should also b e seen to

what extent metho ds of direct parameter optimization p ermeate the other branches of

the sub ject and how they are related to each other An overall scheme of howthevarious

branches are interrelated can b e found in Saaty

If there are twoormoreobjectives at the same time and o ccasion and esp ecially

if these are not conictfree single solution p oints in the decision variable space can

no longer give the full answer to an optimization question not even in the otherwise

simplest situation How to lo ok for the whole subset of ecient nondominated or Pareto

optimal solutions can b e found under keywords like vector optimization polyoptimization

or multiple criteria decision making MCDM eg Bell Keeney and Raia Hwang

and Masud Peschel Grauer Lewandowski and Wierzbicki Steuer

Game theory comes into play when several decision makers have access to dierent

Other Sp ecial Cases

parts of the decision variable set only eg Luce and Raia Maynard Smith

Axelro d Sigmund No consideration is given here to these sp ecial elds

Problems and Metho ds of Optimization

Chapter

Hill climbing Strategies

In this chapter some of the direct mathematical parameter optimization metho ds will

b e treated in more detail for static nondiscrete nonsto chastic mostly unconstrained

functions They come under the general heading of hil l climbing strategies b ecause their

manner of searching for a maximum corresp onds closely to the intuitiveway a sightless

climb er might feel his way from a valley up to the highest p eak of a mountain For

minimum problems the sense of the displacements is simply reversed otherwise uphil l or

ascent and downhil l or descent metho ds Bach are identical Whereas metho ds of

mathematical programming are dominant in op erations research and the sp ecial metho ds

of functional optimization in control theory the strategies are most frequently

applied in engineering design Analytic metho ds often prove unsuitable in this eld

Because the assumptions are not satised under which necessary conditions for

extrema can b e stated eg continuity of the ob jective function and its derivatives

Because there are diculties in carrying out the necessary dierentiations

Because a solution of the equations describing the conditions do es not always lead

to the desired optimum it can b e a lo cal minimum maximum or saddle p oint

Because the equations describing the conditions in general a system of simultaneous

nonlinear equations are not immediately soluble

To what extent hill climbing strategies take care of these particular characteristics

dep ends on the individual metho d Very thorough presentations covering some topics can

b e found in Wilde Rosenbro ck and Storey Wilde and Beightler

Kowalik and Osb orne Box Davies and Swann Pierre Pun

Converse Co op er and Steinb erg Homann and Hofmann Beveridge

and Schechter Aoki Zahradnik Fox Cea Daniel

Himmelblau b Dixon a JacobyKowalik and Pizzo Stark and

Nicholls Brent Gottfried and Weisman Vanderplaats and

Papageorgiou More variations or theoretical and numerical studies of older meth

o ds can b e found as individual publications in a wide variety of journals or in the volumes

of collected articles suchasGraves and Wolfe Blakemore and Davis Lavi

Hill climbing Strategies

and Vogl Klerer and Korn Abadie Fletcher a Rosen

Mangasarian and Ritter Georion Murray a Lo otsma a

Szego and Sebastian and Tammer

Formulated as a minimum problem without constraints the task can b e stated as

follows

n

minfF x j x IR g

x

The column vector x at the extreme p osition is required

x



x



T

x x x x

  n

x

n

and the asso ciated extreme value F F x of the ob jective function F x in this case

n

the minimum The expression x IR means that the variables are allowed to take all

real values x can thus b e represented byany p ointinanndimensional Euclidean space

n

IR Dierenttyp es of minima are distinguished strong and weak lo cal and global

For a lo cal minimum the following relationship holds

F x F x

for

v

u

n

X

u



t

x x kx x k

i

i

i

and

n

x IR

This means that in the neighborhood of x dened by the size of there is no vector

x for which F x is smaller than F x If the equality sign in Equation only

applies when x x the minimum is called strong otherwise it is weak An ob jective

function that only displays one minimum or maximum is referred to as unimodalIn

many cases however F x has several local minima and maxima whichmaybeof

dierent heights The smallest absolute or global minimum minimum minimorumofa

multimo dal ob jective function satises the stronger condition

n

F x F x for all x IR

This is always the preferred ob ject of the search

If there are also constraints in the form of inequalities

G x for all j m

j

or equalities

H x for all k k

One Dimensional Strategies

n

then IR in Equations to must either b e replaced by the hop efully nonempty

n n

subset M IR to represent the feasible region in IR dened by Equation or by

n

IR the subspace of lower dimensionality spanned by the variables that now dep end

on each other according to Equation If solutions at innity are excluded then

the theorem of Weierstrass holds see for example Rothe In a closed compact

region a x b every function whichiscontinuous there has at least one ie an

absolute minimum and maximum This can lie inside or on the b oundary In the case

of discontinuous functions every p oint of discontinuity is also a p otential candidate for

the p osition of an extremum

One Dimensional Strategies

The search for a minimum is esp ecially easy if the ob jective function only dep ends on one

variable

F(x)

x

a b cd e f g h

Figure Sp ecial p oints of a function of one variable

a lo cal maximum at the b oundary

b lo cal minimum at a p oint of discontinuityof F x

x

c saddle p oint or p oint of inection

de weak lo cal maximum

f lo cal minimum

g maximum may b e global at a p ointofdiscontinuityof F x

h global minimum at the b oundary

Hill climbing Strategies

This problem would b e of little practical interest however were it not for the fact that

many of the multidimensional strategies make use of one dimensional minimizations in

selected directions referred to as line searches Figure shows some p ossible ways

minima and other sp ecial p oints can arise in the one dimensional case

Simultaneous Metho ds

One p ossible way of discovering the minimum of a function with one parameter is to

determine the value of the ob jective function at a number of points and then to declare

the p oint with the smallest value the minimum Since in principle all trials can b e carried

out at the same time this pro cedure is referred to as simultaneous optimization How

closely the true minimum is approached dep ends on the choice of the numb er and lo cation

of the trial p oints The more trials are made the more accurate the solution can b e One

will b e concerned however to obtain a result at the lowest cost in time and computation

or material The two requirements of high accuracy and lowest cost are contradictory

thus an optimum compromise must b e sought

The eectiveness of a search metho d is judged by the size of the largest remaining

interval of uncertainty in the least favorable case relative to the p osition of the minimum

for a given numb er of trials the socalled minimax concept see Wilde Beamer and

Wilde Assuming that the p oints in the series of trials are so densely distributed

that several at a time are in the neighborhood of a local minimum then the length of

the interval of uncertainty is the same as the distance b etween the twopoints in the

neighb orho o d of the smallest value of F x The numb er of necessary trials can thus

get very large unless one has at least some idea of whereab outs the desired minimum

is situated In practice one must limit investigation of the ob jective function to a nite

interval a b It is obvious and it can b e proved theoretically that the optimal choice of

all simultaneous search metho ds is the one in which the trial p oints are evenly distributed

over the interval a b Boas ad

If N equidistantpoints are used the interval of uncertainty is of length

b a

N

N

and the eectiveness takes the value

N

Put another way To b e sure of achieving an accuracy of the equidistant search

also called lattice grid or tabulation method requires N trials where

b a b a

N N integer

Apart from the requirements that the chosen interval a b should contain the absolute

minimum b eing sought and that N should b e big enough in relation to the wavyness

of the ob jective function no further conditions need to b e fullled in order for the grid metho d to succeed

One Dimensional Strategies

Even more eectivesearchschemes can b e devised if the ob jective function is unimo dal

in the interval a b Wilde and Beightler describ e a pro cedure using evenly

distributed pairs of p oints which is also referred to as a simultaneous dichotomous search

The distance between two p oints of a pair must b e chosen to b e suciently large that

their ob jective function values are dierent As the dichotomous search with an

even numb er of trials even block search is the b est The numb er of trials required is

b a b a

N N integer

This is one less than for equidistantsearches with the same accuracy requirement Such

ascheme is referred to as optimal in the sense of the minimax concept Investigations

of arrangements of trials also for uneven numb ers odd block search with nonvanishing

can b e found in Wilde and Wilde and Beightler

The one dimensional search pro cedures dealt with so far which strive to reduce the

interval of uncertainty are called by Wilde direct elimination procedures As a sec

ond category one can consider the various interpolation methods These can b e much

more eective but only when the ob jective function F x satises certain smo othness

conditions or can b e suciently well approximated in the interval under considera

tion by a p olynomial P x In this case instead of the minimum of F x a zero of

P x dP xdx is determined The number of points at which the ob jective function

x

must b e investigated dep ends on the order of the chosen p olynomial and on the typ e of

information at the p oints that determine it Besides the values of the ob jective function

itself consideration is given to its rst second and less frequently higher derivatives In

general no exact statements can b e made regarding the quality of the approximation to

the desired minimum for a given numb er of trials Details of the various metho ds of lo

cating real zeros of rational functions of one variable such as regula falsi Lagrangian and

Newtonian interp olation can b e found in b o oks on practical or numerical mathematics

under the heading of nonlinear equations eg Bo oth Faddejew and Faddejewa

Saaty and Bram Traub Zurmuhl Walsh Ralston

and Wilf Householder Ortega and Rheinb oldt and Brent

Although from a fundamental p oint of view interp olation metho ds represent indi

rect optimization pro cedures they are of interest here as strategies esp ecially

when they are applied iteratively and obtain information ab out derivatives from function

values

Sequential Metho ds

If the trials for determining a minimum can b e made sequentiallytheintermediate results

retained at a given time can b e used to lo cate the next trial of the sequence more favorably

than would b e p ossible without this information With the digital computers usually

available nowadays whichwork in a serial way one is actually obliged to execute all steps

one after the other

Sequential metho ds in which the solution is approached in a stepwise or iterative

manner are advantageous here The evaluation of the intermediate results and prediction

Hill climbing Strategies

of favorable conditions for the next trial presupp oses a more or less precise internal model

of the ob jective function the b etter the mo del corresp onds to reality the b etter will b e

the results of the interp olation and extrap olation pro cesses The simplest assumption

is that the ob jective function is unimo dal which means that lo cal minima also always

represent global minima On this basis a number of sequential intervaldividing pro ce

dures have b een constructed Sect Iterativeinterp olation metho ds demand more

smo othness of the ob jective function Sect In the former case it is necessary

 

in the latter useful to determine at the outset a suitable interval a b in which the

desired extremum lies Sect

Boxing in the Minimum

If there are no clues as to whereab outs the desired minimum might b e situated one can

  

start with two p oints x and x x s and determine the ob jective function there

 

If F x Fx one pro ceeds in the chosen direction keeping the same step length

k  k 

x x s

until

k  k 

F x Fx

 

If however F x Fx one cho oses the opp osite direction

 

x x s

and

k  k 

x x s for k

similarlyuntil a step past the minimum is taken one has thus determined the minimum

of the unimo dal function to within an uncertaintyinterval of length s Beveridge and

Schechter

In numerical optimization problems the values of the variables often run through

several p owers of or alternatively they must b e precisely determined at manypoints

In this case the b oxingin metho d with a very small xed step length is to o costlyBox



Davies and Swann therefore suggest starting with an initial step length s and

doubling it at each successful step Their recursion formula is as follows

k   k 

x x s

k  k  k  k 

It is applied as long as F x F x holds As so on as F x Fx

 k 

is registered however b x is set as the upp er b ound to the interval and the

 

starting p oint x is returned to The lower b ound a is found by a corresp onding

pro cess with negative step lengths going in the opp osite direction In this way a starting

 

interval a b is obtained for the one dimensional search pro cedure to b e describ ed

b elow It can happ en b ecause of the convention for equalityoftwo function values that

the search for a b ound to the interval do es not end if the ob jective function reaches a

constant horizontal level It is therefore useful to sp ecify a maximum step length that

may not b e exceeded

One Dimensional Strategies

The b oxingin metho d has also b een prop osed o ccasionally as a one dimensional op

timization strategy Rosenbro ck Berman in its own right In order not to

waste to o many trials far from the target when the accuracy requirementisvery high it

is useful to start with relatively large steps Each time a lo op ends with a failure the step

length is reduced by a factor less than eg If the ab ove rules for increasing

and reducing the step lengths are combined a very exible pro cedure is obtained Dixon

 

a calls it the successfailureroutine If a starting interval a b is already at

hand however there are signicantly b etter strategies for successively reducing the size

of the interval

Interval Division Metho ds

If an equidistant division metho d is applied rep eatedly the interval of uncertaintyis

k

reduced at eachstepby the same factor andthus for k steps by This exp onential

progression is considerably stronger than the linear dep endence of the value of on the

numb er of trials p er step Thus as few simultaneous trials as p ossible would b e used A

comparison of twoschemes with two and three simultaneous trials shows that except in

the rst lo op only two new ob jective function values must b e obtained at a time in b oth

cases since of three trial p oints in one step one coincides with a p oint of the previous

step The total numb er of trials required with sequential application of the equidistant

three p ointscheme is

ba ba

log log

N N odd

log log

Even b etter results are provided by the sequential dichotomous search with one pair p er

step For the limiting case one obtains

ba ba

log log

N N even

log log

Detailed investigations of the inuence of on various equidistant and dichotomous

searchschemes can b e found in Avriel and Wilde b and Beamer and Wilde

Of greater interest are the two purely sequential elimination metho ds describ ed in the

following chapters which only require a single ob jective function value p er step They

require that the ob jective function b e unimo dal otherwise they only guarantee that a

lo cal minimum or maximum will b e found Shub ert describ es an interval dividing

pro cedure that is able to lo cate all lo cal extrema including the global optimum Touse

it however an upp er b ound to the slop e of the function must b e known The metho d is

rather costly esp ecially with regard to the storage space required

Fib onacci Division This interval division strategy was intro duced by

Kiefer It op erates with a series due to Leonardo of Pisa which b ears his pseudonym

Fibonacci

f f

 

Hill climbing Strategies

f f f for k

k k  k 

 

An initial interval a b is required containing the extremum together with a number

N which represents the total number of intended interval divisions If the general interval

k  k 

is called a b the lengths

k  k  k  k  k  k 

s t b a b a

are subtracted from its ends with the reduction factor

f

N k 

k 

t

f

N k

giving

k  k  k 

c a s

k  k  k 

d b s

k  k 

The values of the ob jective function at c and d are compared and whichever sub

interval contains the b etter in a minimum search lower value is taken as dening the

interval for the next step

If

k  k 

F d Fc

then

k  k 

a a

and

k  k 

b c

If

k  k 

F d Fc

then

k  k 

a d

and

k  k 

b b

A consequence of the Fib onacci series is that except for the rst interval division at

k  k 

all of the following steps one of the twonewpoints c and d is always already

known

If

k  k 

F d Fc

then

k  k 

c d

and if

k  k 

F d Fc

then

k  k 

d c

One Dimensional Strategies

F(x)

x

a d c b Step k

a d c b Step k+1

a d c b Step k+2

Figure Interval division in the Fib onacci search

so that each time only one new value of the ob jective function needs to b e obtained

Figure illustrates two steps of the pro cedure The pro cess is continued until k

k  k 

N At the next division b ecause f f d and c coincide A further

 

interval reduction can only b e achieved byslightly displacing one of the test p oints The

displacement must b e at least big enough for the two ob jective function values to still

b e distinguishable Then the remaining interval after N trials is of length

 

b a

N

f

N



As the eectiveness tends to f Johnson and Kiefer show

N

that this value is optimal in the sense of the minimax concept according to whichthe

Fib onacci search is the b est of all sequential interval division pro cedures However by

taking accountofthe displacement not only at the last but at all the steps Oliver and

Wilde give a recursion formula that for the same numb er of trials yields a slightly

smaller residual interval Avriel and Wilde a provide a pro of of optimalityIfone

has a priori information ab out the structure of the ob jective function it can b e exploited

to advantage Gal in order to reduce further the numb er of trials Overholt a

suggests that in general there is no a priori information available to x suitably

and it is therefore b etter to omit the nal division using a displacement rule and to cho ose

N one bigger from the start In order to obtain the minimum with accuracy one

Hill climbing Strategies

should cho ose N suchthat

 

b a

f f

N  N

Then the eectiveness of the pro cedure b ecomes

f

N 

and since Lucas

p p p

N  N  N 

p p

f

N

the numb er of trials is approximately

p

 

b a

 

log log

b a

p

log N



log



Overholt shows by means of numerical tests that the pro cedure must often b e

k  k 

terminated prematurely as F d b ecomes equal to F c for example b ecause of

computing with a nite numb er of signicant gures Further divisions of the interval of

uncertainty are then p ointless

For the b oxingin metho d of determining the initial interval one would x an initial

step length of ab out and a maximum step length of ab out so that for a bit

computer the numb er range of integers is not exceeded by the largest required Fib onacci

numb er Finallytwo further applications of the Fib onacci pro cedure maybementioned

By reversing the scheme Wilde and Beightler obtain a metho d of b oxing in the

minimum Kiefer shows how to pro ceed if values of the ob jective function can

only b e obtained at discrete not necessarily equidistant p oints More ab out such lattice

search problems can b e found in Wilde and Beveridge and Schechter

The Golden Section It can sometimes b e inconvenienttohave to sp ecify

in advance the number of interval divisions In this case Kiefer and Johnson

k 

prop ose instead of the reduction factor t whichvaries with the iteration number in

the Fib onacci search a constant factor



p

p ositiverootof t t t

k 

For large N k t reduces to t In addition t is identical to the ratio of lengths a

to b which is obtained by dividing a total length of a b into two pieces such that the

smaller a has the same ratio to the larger b as the larger to the total This harmonic

division after Euclid is also known as the golden sectionwhichgave the pro cedure its

name Wilde After N function calls the uncertaintyinterval is of length

N   

t b a N

One Dimensional Strategies

For the limiting case N since

N 

lim t f

N

N

the numb er of trials compared to the Fib onacci pro cedure is ab out higher Com

pared to the Fib onacci search without displacement since

N 

lim t f

N 

N

the numb er of trials is ab out lower It should further b e noted that when using the

Fib onacci metho d on digital computers the Fib onacci numbers must rst b e generated

or a sucientnumber of them must b e provided and stored The numb er of trials needed

for a sequential golden section is

 

b a

 

log

b a

N log

log t

Other prop erties of the iteration sequence including the criterion for termination at

equal function values are the same as those of the metho d of interval division according

to Fib onacci numb ers Further details can b e found for example in Avriel and Wilde

Complete programs for the interval division pro cedures have b een published by

Pike and Pixner and Overholt bc see also Bo othroyd Pike Hill and

James Overholt a

Interp olation Metho ds

In many cases one is dealing with a continuous function the minimum of whichisto

b e determined If in addition to the value of the ob jective function its slop e can b e

sp ecied everywhere many metho ds can b e derived that may converge faster than the

optimal elimination metho ds One of the oldest schemes is based on the pro cedure named

after Bolzano for determining the zeros of a function Assuming that one has two p oints

at which the slop es of the ob jective function have opp osite signs one bisects the interval

between them and determines the slop e at the midp oint This replaces the interval end

p oint which has a slop e of the same sign The pro cedure can then b e rep eated iteratively

At each trial the interval is halved If the slop e has to b e calculated from the dierence of

two ob jective function values the bisection or midpoint strategy b ecomes the sequential

dichotomous search Avriel and Wilde b prop ose as a variant of the Bolzano search

evaluating the slop e at two p oints in the interval so as to increase the reduction factor

They showthattheirdiblock strategy is slightly sup erior to the dichotomous search

If derivatives of the ob jective function are available or at least if it can b e assumed that

these exist ie the function F xiscontinuous and dierentiable far b etter strategies for

the minimum search can b e devised They determine analytically the minimum of a trial

function that coincides with the ob jective function and p ossibly also its derivatives at

selected argumentvalues One distinguishes linear quadratic and cubic mo dels according

to the order of the trial p olynomial Polynomials of higher order are virtually never used

Hill climbing Strategies

They require to o much information ab out the function F x Furthermore it turns out

that in contrast to all the metho ds referred to so far such strategies do not always converge

for reasons other than rounding error

k  k 

Regula Falsi Iteration Given two p oints a and b with their function

k  k  k 

values F a and F b the simplest approximation formula for a zero c of F xis

k  k 

b a

k  k  k 

c a F a

k  k 

F b F a

This technique known as regula falsi or regula falsorum predicts the p osition of the zero

correctly if F x dep ends linearly on x For one dimensional minimization it can b e

applied to nd a zero of F xdF xdx

x

k  k 

b a

k  k  k 

c a F a

x

k  k 

F b F a

x x

k 

The underlying mo del here is a second order p olynomial with linear slop e If F a

x

k  k  k  k  k 

and F b have opp osite sign c lies b etween a and b If F c the

x x

k  k 

pro cedure can b e continued iteratively by using the reduced interval a b

k  k  k  k  k  k  k  k 

a c if F c andF b have the same sign or using a b c b if

x x

k  k  k  k  k 

F c and F a have the same sign If F a andF b have the same sign c

x x x x

k  k  k  k 

must lie outside a b If F c has the same sign again c replaces the argument

x

value at which jF j is greatest This extrap olation is also called the secant methodIf

x

k 

F c has the opp osite sign one can continue using regula falsi to interp olate iteratively

x

k  k 

As a termination criterion one can apply F c orjF c j A

x x

minimum can only b e found reliably in this way if the starting p oint of the search lies in

its neighb orho o d Otherwise the iteration sequence can also converge to a maximumat

which of course the slop e also go es to zero if F xiscontinuous

x

Whereas in the Bolzano interval bisection metho d only the sign of the function whose

zero is sought needs to b e known at the argumentvalues the regula falsi metho d also

makes use of the magnitude of the function This extra information should enable it to

converge more rapidly As Ostrowski and Jarratt show for example

this is only the case if the function corresp onds closely enough to the assumed mo del

The simpler bisection metho d is b etter even optimal as a zero metho d in the minimax

sense if the function has opp osite signs at the two starting p oints is not linear and not

convex In this case the linear interp olation sometimesconverges very slowly According to

Stanton a cubic interp olation as a line search in the eccentric quadratic case often

yields even worse results Dixon a names twovariants of the regula falsi recursion

formula but it is not known whether they lead to b etter convergence Fox prop oses

a combination of the Bolzano metho d with the linear interp olation Dekker see

also Forsythe accredits this pro cedure with b etter than linear convergence Even

greater reliability and sp eed is attributed to the algorithm of Brent which follows

Dekkers metho d by a quadratic interp olation pro cess as so on as the latter promises to b e successful

One Dimensional Strategies

It is inconvenient when dealing with minimization problems that the derivatives of

the function are required If the slop es are obtained from function values by a dierence

metho d diculties can arise from the nite accuracy of such a pro cess For this reason

Brent combines regula falsi iteration with division according to the golden section

Further variations can b e found in Schmidt and Trinkaus Dowell and Jarratt

King and Anderson and Bjorck

NewtonRaphson Iteration Newtons interp olation formula for improv

k 

ing an approximate solution x to the equation F x see for example Madsen

k 

F x

k  k 

x x

k 

F x

x

uses only one argumentvalue but requires the value of the derivative of the function

as well as the function itself If F x is linear in x the zero is correctly predicted here

otherwise an improved approximation is obtained at b est and the pro cess must b e re

p eated Like regula falsi Newtons recursion formula can also b e applied to determining

F x with of course the reservations already stated The socalled NewtonRaphson

x

rule is then

k 

F x

x

k  k 

x x

k 

F x

xx

If F x is not quadratic the necessary numb er of iterations must b e made until a termina

k  k 

tion criterion is satised Dixon a for example uses the condition jx x j

To set against the advantages that only one argumentvalue is required and for quadratic

ob jective functions one iteration is sucienttondapoint where F x there are

x

several disadvantages

If the derivatives F and F are obtained approximately bynumerical dierentia

x xx

tion the eciency of the pro cedure is worsened not only by rounding errors but also

by inaccuracies in the approximation This is esp ecially true in the neighborhood

of the minimum b eing sought since F b ecomes vanishingly small there

x



Minima maxima and saddle p oints are not distinguished The starting p oint x

must already b e lo cated as close as p ossible to the minimum b eing sought

If the ob jective function is of higher than second order the NewtonRaphson it

eration can diverge The condition for convergence towards a minimum is that

F x forallx

xx

Lagrangian Interp olation Whereas regula falsi and NewtonRaphson it

eration attempt to approximate the minimum using information ab out the derivatives of

the ob jective function at the argumentvalues and can therefore b e classied as indirect

optimization metho ds in Lagrangian interp olation only values of the ob jective function

itself are required In its general form the pro cedure consists of tting a pth order p olyno

mial through p p oin ts Zurmuhl One usually uses three argumentvalues and a

Hill climbing Strategies

parab ola as the mo del function quadratic interpolation Assuming that the three p oints

k  k  k  k  k  k 

are a b c with the ob jective function values F a Fb and F c

the trial parab ola P xhasavanishing rst derivativeatthepoint

k   k   k  k   k   k  k   k   k 

b c F a c a F b a b F c

k 

d

k  k  k  k  k  k  k  k  k 

b c F a c a F b a b F c

k 

This p oint is a minimum only if the denominator is p ositive Otherwise d represents

k 

a maximum or a saddle p oint In the case of a minimum d is intro duced as a new

argumentvalue and one of the old ones is deleted

k  k 

a a

k  k  k 

if a d b

k  k 

b d

k  k 

and F d Fb

k  k 

c b

k  k 

a d

k  k  k 

if a d b

k  k 

b b

k  k 

and F d Fb

k  k 

c c

k  k 

a b

k  k  k 

if b d c

k  k 

b d

k  k 

and F d Fb

k  k 

c c

k  k 

a a

k  k  k 

if b d c

k  k 

b b

k  k 

and F d Fb

k  k 

c d

Figure shows one of the p ossible cases

 

It is useful to determine the ends of the interval a and c at the start of the one

dimensional searchby a pro cedure such as one of those describ ed in Section The

third argumentvalue is b est p ositioned at the center of the interval It can b e seen from

the recursion formula Equation that only one value of the ob jective function needs

k 

to b e obtained at each iteration at the current p osition of d The three p oints clearly

do not in general remain equidistant When the interp olation formula Equation

predicts a minimum that coincides to the desired accuracy with one of the argument

values the search can b e terminated As the result one will select the smallest ob jective

function value from among the argumentvalues at hand and the computed minimum

There is little prosp ect of success if the minimum is predicted to lie outside the interval

k  k 

a c at any rate when a b ounding pro cedure of the typ e describ ed in Section

has b een applied initially In this case to o the pro cedure is b est terminated The same

holds if a negative denominator in Equation indicates a maximum or if a p oint

of inection is exp ected b ecause the denominator vanishes If the measured ob jective

function values are sub ject to error for example in exp erimental optimization sp ecial

precautions must b e taken Hotelling treats this problem in detail

How often the interp olation must b e rep eated until the minimum is suciently well ap

proximated cannot b e predicted in general it dep ends on the level of agreementbetween

One Dimensional Strategies

F P F(x) Minimum P(x)

P(x)

x

(k) (k) a (k) d b c (k)

abc(k+1) (k+1) (k+1)

Figure Lagrangian quadratic interp olation

the ob jective function and the trial function In the most favorable case the ob jective

function is also quadratic Then one iteration is sucient This is why it can b e advan

tageous to use an interp olation metho d rather than an interval division metho d suchas

the optimal Fib onacci search Dijkhuis describ es a variant of the basic pro cedure

in which four argumentvalues are taken The two inner ones and each of the outer ones

in turn are used for two separate quadratic interp olations The weighted mean of the two

results yields a new iteration p oint This pro cedure is claimed to increase the reliability

of the minimum search for nonquadratic ob jective functions

Hermitian Interp olation If one cho oses instead of a parab ola a third

order p olynomial as a test function more information is needed to make it agree with the

ob jective function Beveridge and Schechter describ e sucha cubic interpolation

pro cedure In place of four argumentvalues and asso ciated ob jective function values two

k  k 

p oints a and b are enough if in addition to the values of the ob jective function values

of its slop e ie the rst order dierentials are available This Hermitian interp olation is

mainly used in conjunction with gradient or quasiNewton metho ds b ecause in anycase

they require the partial derivatives of the ob jective function or they approximate them

using nite dierence metho ds

The interp olation formula is

k 

z w F a

x

k  k  k  k 

c a b a

k  k 

w F b F a

x x

Hill climbing Strategies

where

k  k 

F a F b

k  k 

z F a F b

x x

k  k 

a b

and

q

 k  k 

z F a F b w

x x

k 

Recursive exchange of the argumentvalues takes place according to the sign of F c

x

k  k 

in a similar way to the Bolzano metho d It should also b e veried here that a and b

always b ound the minimum Fletcher and Reeves use Hermitian interp olation in

their conjugate gradient metho d as a subroutine to approximate a relative minimum in

k  k 

sp ecied directions They terminate the iteration as so on as ja b j

As in all interp olation pro cedures the sp eed and reliabilityofconvergence dep end on

the degree of agreementbetween the mo del function and the ob jective function Pearson

for example rep orts that the Fib onacci search is sup erior to Hermitian interp o

lation if the ob jective function is logarithmic or a p olynomial of high order Guilfoyle

Johnson and Wheatley prop ose a combination of cubic interp olation and either

the Fib onacci search or the golden section

Multidimensional Strategies

There have b een frequent attempts to extend the basic ideas of one dimensional optimiza

tion pro cedures to several dimensions The equidistant grid strategy also known in the

exp erimental eld as the method of factorial design places an evenly meshed grid over

the space under consideration and evaluates the ob jective function at all the no des

If n is the dimension of the space under consideration and N i n is the

i

numb er of discrete values that the variable x can take then the numb er of combinations

i

to b e tested is given by the pro duct

n

Y

N N

i

i

n

If N N for all i n one obtains N N This exp onential increase in the

i 



numb er of trials and the computational requirements is what provoked Bellmans now

famous curse of dimensions see Wilde and Beightler On a traditional computer

whichworks sequentially the trials must all b e carried out one after another The compu

n

tation time therefore increases as O c in which the constant c dep ends on the required

accuracy and the size of the interval to b e investigated

Pro ceeding sequentially brought considerable advantages in the one dimensional case

if it could only b e assumed that the ob jectivefunctionwas unimo dal Krolak and Co op er

see also Krolak and Sugie have given an extension to several di

mensions of the Fib onacci searchscheme For n two p oints are chosen on one of

the two co ordinate axes within the given interval in the same way as for the usual one

dimensional Fib onacci search The values of this variable are rst held constant while

two complete Fib onacci searches are made to nd the relative optima with resp ect to

Multidimensional Strategies

the second variable Both end results are then used to reject one of the values of the

rst variable that were held constant and to reduce the size of the interval with resp ect

to this parameter By analogy a three dimensional minimization consists of a recursive

sequence of two dimensional Fib onacci searches If the numb er of function calls to reduce

the uncertaintyinterval a b suciently with resp ect to the variable x is N then the

i i i i

total number N also ob eys Equation The advantage compared to the grid metho d

is simply that N dep ends logarithmically on the ratio of initial interval size to accuracy

i

see Equation Aside from the fact that eachvariable must b e suitably xed in

advance and that the unimo dality requirement of the ob jective function only guaran

tees that lo cal minima are approached there is furthermore no guarantee that a desired

accuracy will b e reached within a nite numb er of ob jective function calls Kaup e

Other elimination pro cedures have b een extended in a similar waytothemultivariable

case such as for example the dichotomous search Wilde and a sequential b oxing

in metho d Berman In each case the eort rises exp onentially with the number

of variables Another elimination concept for the multidimensional case the metho d of

contour tangents is due to Wilde see also Beamer and Wilde It requires

however the determination of gradientvectors Newman indicates how to pro ceed

in the two dimensional case and also for discrete values of the variables lattice search

He requires that F xbeconvex and unimo dal Then the cost should only increase

linearly with the number of variables For n however no applications of the contour

tangent metho d are as yet known

Transferring interp olation metho ds to the ndimensional case means transforming the

original minimum problem into a series of problems in the form of a set of equations to

b e solved As nonlinear equations can only b e solved iteratively this pro cedure is limited

to the sp ecial case of linear interp olation with quadratic ob jective functions Practical

algorithms based on the regula falsi iteration can b e found in Schmidt and Schwetlick

and Schwetlick The pro cedure is not widely used as a minimization metho d

Schmidt and Vetters The slop es of the ob jective function that it requires are

implicitly calculated from function values The secant method describ ed byWolfe b

for solving a system of nonlinear equations also works without derivatives of the functions

From n curren t argumentvalues it extracts the required information ab out the

structure of the n equations

Just as the transition from simultaneous to sequential one dimensional search meth

o ds reduces the eort required at the exp ense of global convergence so each further

acceleration in the multidimensional case is b oughtby a reduction in reliability High

convergence rates are achieved by gathering more information and interpreting it in the

form of a mo del of the ob jective function If assumptions and reality agree then this

pro cedure is successful if they do not agree then extrap olations lead to worse predictions

and p ossibly even to abandoning an optimization strategy Figure shows the contour

diagram of a smo oth two parameter ob jective function

All the strategies to b e describ ed assume a degree of smo othness in the ob jective

function They do not converge with certainty to the global minimum but at b est to one

of the lo cal minima or sometimes only to a saddle p oint

Hill climbing Strategies

x 2

c b

a: Global minimum d

b: Local minimum

e c: Local maxima a

d,e: Saddle points

c

x

1

Figure Contour lines of a two parameter function F x x

 

Various metho ds are distinguished according to the kind of information they need

namely

Direct search methodswhich only need ob jective function values F x

Gradient methodswhich also use the rst partial derivatives rF x rst order

strategies

Newton methods which in addition make use of the second partial derivatives



r F x second order strategies

The emphasis here will b e placed on derivativefree strategies that is on direct search

metho ds and on such higher order pro cedures as glean their required information ab out

derivatives from a sequence of function values The recursion scheme of most multidi

mensional strategies is based on the formula

k  k  k  k 

x x s v

k 

They dier from each other with regard to the choice of step length s and search direction

k 

v the former b eing a scalar and the latter a vector of unit length

Direct Search Strategies

Direct search strategies do without constructing a mo del of the ob jective function In

stead the directions and to some extent also step lengths are xed heuristicallyorby

Multidimensional Strategies

ascheme of some sort not always in an optimal way under the assumption of a sp ecied

internal mo del Thus the risk is run of not b eing able to improve the ob jective function

value at each step Failures must accordingly b e planned for if something can also b e

learned from them This trial character of search strategies has earned them the name

of trialanderror methods The most imp ortant of them that are still in current use will

b e presented in the following chapters Their attraction lies not in theoretical pro ofs of

convergence and rates of convergence but in their simplicity and the fact that they have

proved themselves in practice In the case of convex or quadratic unimo dal ob jective

functions however they are generally inferior to the rst and second order strategies to

b e describ ed later

Co ordinate Strategy

The oldest of multidimensional search pro cedures trades under a variety of names eg

successivevariation of the variables relaxation parallel axis search univariate or uni

variant search onevariableatatime metho d axial iteration technique cyclic co ordinate

ascent metho d alternating variable search sectioning metho d GaussSeidel strategy and

manifests itself in a large number of variations

The basic idea of the coordinate strategy as it will b e called here comes from linear

algebra and was rst put into practice by Gauss and Seidel in the single step relaxation

metho d of solving systems of linear equations see Ortega and Ro cko Ortega and

Rheinb oldt VanNorton Schwarz Rutishauser and Stiefel As an

optimization strategy it is attributed to Southwell or Friedmann and Savage

see also DEsop o Zangwill Zadeh Schechter

The parameters in the iteration formula are varied in turn individually ie the

search directions are xed by the rule

n if k pn p integer

k 

v e with

k mo d n otherwise

where e is the unit vector whose comp onents havethevalue zero for all i and unity

k 

for i In its simplest form the co ordinate strategy uses a constantsteplengths

Since however the direction to the minimum is unknown b oth p ositive and negative

k 

values of s must b e tried In a rst and easy improvement on the basic pro cedure a

successful step is followed by further steps in the same direction until a worsening of the

ob jective function is noted It is clear that the choice of step length strongly inuences

the numb er of trials required on the one hand and the accuracy that can b e achieved in

the approximation on the other

One can avoid the problem of the choice of step length most eectively by using a line

search metho d each time to lo cate the relative optimum in the chosen direction Besides

the interval division metho ds the Fib onacci search and the golden section Lagrangian

interp olation can also b e used since all these pro cedures work without knowledge of the

partial derivatives of the ob jective function A further strategy for b oxing in the minimum

must b e added in order to establish a suitable starting interval for each one dimensional

minimization

Hill climbing Strategies

The algorithm can b e describ ed as follows

Step Initialization



Establish a starting p oint x and cho ose an accuracy b ound for the

one dimensional search

Set k andi

Step Boxing in the minimum

ki

Starting from x with an initial step length s s eg s

min min

box in the minimum in the direction e

i

Double the step length at each successful trial as long as s s

max

eg s

max

k  k 

Dene the interval limits a and b

i i

Step Line search

k  k 

By varying x within the interval a b search for the relative minimum x

i

i i

with the required accuracy line searchwithaninterval division pro cedure

or an iterativeinterp olation metho d

ki

F x minfF x se g

i

s

Step Check for improvement

ki ki

If F x F x then set x x

ki ki

otherwise set x x

Step Inner lo op over all co ordinates

If in increase i i and go to step

Step Termination criterion outer iteration lo op

k  kn

Set x x

k  k

If all jx x j for all i n then end the search

i i

otherwise increase k k set i and go to step

Figure shows a typical sequence of iteration p oints for n variables

In theory the co ordinate strategy always converges if F x has continuous partial

derivatives and the line searches are carried out exactly Kowalik and Osb orne

Schechter Ortega and Rheinb oldt Rapid convergence is only achieved

however if the contour surfaces F xconst are approximately concentric surfaces of

ahyp ersphere or in the case of elliptic contours if the principle axes almost coincide

with the co ordinate axes

If the signicant system parameters inuence each other nonseparable ob jective func

tion then the distances covered byeach line search are small without the minimum b eing

within striking distance This is esp ecially true when the number of variables is large

At discontinuities in the ob jective function it may happ en that improvements in the ob

jective function can only b e made in directions that are not along a co ordinate axis In

this case the co ordinate strategy fails There is a similar outcome at steep valleys of a

continuously dierentiable function ie if the step lengths that would enable successful

Multidimensional Strategies

e 2 Variable

Numbering Iteration index Direction index values

Start kixx 1 2 (0,1) x (0) 0 0 0 9 (0,0) x (1) 0 1 3 9 (2) 0 2 3 5 (1,1) x carried over (0,2) (1,0) (2) 1 0 3 5 x = x (2,1) x (3) 1 1 7 5 (1,2) (2,0) (4) 1 2 7 3 x = x (2,2) (3,0) x = x carried over (4) 2 0 7 3 End (5) 2 1 9 3 (6) 2 2 9 2 carried over (6) 3 0 9 2

e1

Figure Co ordinate strategy

convergence are so small that the numb er of signicant gures to which data are handled

by the computer is insucient for the variables to b e signicantly altered

Numerical tests with the co ordinate strategy show that an exact determination of the

relative minima is unnecessary at least at distances far from the ob jective It can even

happ en that one inaccurate line search can make the next one particularly eective This

phenomenon is exploited in the pro cedures known as under or overrelaxation Engeli

Ginsburg Rutishauser and Stiefel Varga Schechter Cryer

Although the relativeoptimum is determined as b efore either an incrementis

added on in the same direction or an iteration p oint is dened on the route b etween the

start and nish of the one dimensional search The choice of the under or overrelaxation

factor requires assumptions ab out the structure of the problem The necessary information

is available for the problem of solving systems of linear equations with a p ositivedenite

matrix of co ecients but not for general optimization problems

Further p ossible variations of the co ordinate strategy are obtained if the sequence of

searches parallel to the axes is not made to follow the cyclic scheme Southwell for

example always selects either the direction in which the slop e of the ob jective function

F x

x F

x

i

x

i

is maximum or the direction in which the largest step can b e taken Toevaluate the choice

Hill climbing Strategies

of direction Synge uses the ratio F F of rst to second partial derivatives at

x x x

i i i

k 

the p oint x Whether or not the additional eort for this scheme is worthwhile dep ends

on the particular top ology of the contour surface Adding directions other than parallel

to the axes is also often found to accelerate the convergence Pinsker and Tseitlin

Elkin

Its great simplicity has always made the co ordinate strategy attractive despite its

sometimes slow convergence Rules for handling constraintsnot counting here p enalty

function metho dshave b een devised for example by Singer Murata and

Mugele Singers maze method departs from the co ordinate directions

as so on as a constraint is violated and progresses into the feasible region or along the

b oundary For this however the gradient of the active constraints must b e known

Mugeles poor mans optimizer a discrete co ordinate strategy without line searches not

only handles active constraints but can also cop e with narrowvalleys that do not run

parallel to the co ordinate axes In this case diagonal steps are p ermitted Similar to this

strategy is the direct searchmethodofHooke and Jeeves which b ecause it has b ecome

very widely used will b e treated in detail in the following chapter

Strategy of Ho oke and Jeeves Pattern Search

The direct pattern search of Ho oke and Jeeves was originally devised as an auto

matic exp erimental strategy see Ho oke Ho okeandVanNice It is nowadays

much more widely used as a numerical parameter optimization pro cedure

The metho d bywhich the direct pattern searchworks is characterized bytwotyp es

of move Ateach iteration there is an exploratory movewhich represents a simplied

GaussSeidel variation with one discrete step p er co ordinate direction No line searches

are made On the assumption that the line joining the rst and last p oints of the ex

ploratory move represents an esp ecially favorable direction an extrap olation is made along

it pattern move b efore the variables are varied again individually The extrap olations

do not necessarily lead to an improvement in the ob jective function value The success of

the iteration is only checked after the following exploratory move The length of the pat

tern step is thereby increased each time while the optimal search direction only changes

graduallyThispays o to most advantage where there are narrowvalleys An ALGOL

implementation of the strategy is due to Kaup e It was improved by Bell and Pike

as well as by Smith see also DeVogelaere Tomlin and Smith

In the rst case the sequence of plus and minus exploratory steps in the co ordinate di

rections is mo died to suit the conditions at any instant The second improvement aims

at p ermitting a retrosp ective scaling of the variables as the step lengths can b e chosen

individually to b e dierent from each other

The algorithm runs as follows

Step Initialization

 n

Cho ose a starting p oint x x an accuracy b ound and initial

 

if no more plausible for all i n eg s step lengths s



i

values are at hand

Set k andi

Multidimensional Strategies

Step Exploratory move

k 

ki

Construct x x s e discrete step in p ositive direction

i

i

ki

If F x Fx go to step successful rst trial

k 

otherwise replace x x s e discrete step in negative direction

i

i

ki

If F x Fx go to step success

k 

otherwise replace x x s e back to original situation

i

i

Step Retention and switch to next co ordinate

ki

Set x x

If in increase i i and go to step

Step Test for total failure in all directions

kn k k  k

If F x F x set x x and go to step

Step Pattern move

k  kn k n

Set x x x extrap olation

k  k  kn k n

and s s signx x for all i n

i i i i

This maychange the sequence of p ositive and negative directions

in the next exploratory move

Increase k k and set i

Observe There is no success control of the pattern movesofar

Step Exploration after extrap olation

k 

ki

Construct x x s e

i

i

ki

If F x Fx go to step

k 

e otherwise replace x x s

i

i

ki

If F x Fx go to step

k 

e otherwise replace x x s

i

i

Step Inner lo op over co ordinates

ki

Set x x

If in increase i i and go to step

Step Test for failure of pattern move

kn k n

If F x F x back to p osition b efore pattern move

k  k 

k  k n

set x x s s for all i n and go to step

i i

Step After successful pattern move retention and rst termination test

k  kn k n



js j for all i n If jx x j

i i i



k  k n

set x x andgotostep

otherwise go to step for another pattern move

Step Step size reduction and termination test

k 

k

If js j for all i n end the search with result x

i

k  k 



otherwise set s s for all i n

i i 

Hill climbing Strategies

Step Iteration lo op

Increase k k set i and go to step

Figure together with the following table presents a p ossible sequence of iteration

p oints From the starting p oint a successful step and is taken in each co ordinate

direction Since the end p oint of this exploratory move is b etter than the starting p oint

it serves as a basis for the rst extrap olation This leads to It is not checked here

whether or not any improvementover has o ccurred At the next exploratory move

from to the ob jective function value can only b e improved in one co ordinate

direction It is nowchecked whether the condition is b etter than that of p oint

This is the case The next extrap olation step to has a changed direction b ecause

of the partial failure of the exploration but maintains its increased length Now it will

b e assumed that starting from with the hitherto constant exploratory step length

no success will b e scored in any co ordinate direction compared to The comparison

with shows that a reduction in the value of the ob jective function has nevertheless

o ccurred Thus the next extrap olation to remains the same as the previous one with

resp ect to direction and step length The next exploratory move leads to a p oint

which although b etter than is worse than Now there is a return to Only

after the exploration again has no success here are the step lengths halved in order to

make further progress p ossible The fact that at some p oints in this case the ob jective

function was tested several times is not typical for n

Starting point (2) Success Failure (1) Extrapolation (0) Final point (7)

(3) (21) (5) (12) (4) (18) (17) (22) (9) (10) (23) (6) (19) (8) (24)

(25) (11) (15) (13) (14) (20)

(16)

Figure Strategy of Ho oke and Jeeves

Multidimensional Strategies

Numbering Iteration Direction Variable Comparison Step Remarks

index index values point lengths

k i x x s s

   

starting p oint

success

failure

success

 extrap olation

success success

failure

failure

 extrap olation

success

failure

failure

failure

failure

 extrap olation

failure

success failure

 failure

failure

 return

failure

failure

failure

failure

 step lengths

halved

failure

success

success success

  extrap olation

A pro of of convergence of the direct searchofHooke and Jeeves has b een derived by

Cea it is valid under the condition that the ob jective function F x is strictly

convex and continuously dierentiable The computational op erations are very simple

and even in unforeseen circumstances cannot lead to invalid arithmetical manipulations

such as for example division by zero A further advantage of the strategy is its small

storage requirement It is of order O n The selected pattern accelerates the search

in valleys provided they are not sharply b ent The extrap olation steps follow in an

approximate way the gradient tra jectoryHowever the limitation of the trial steps to

co ordinate directions can also lead to a premature termination of the search here as in

the co ordinate strategy

Further variations on the metho d whichhavenotachieved such p opularity are due

to among others Wo o d see also Weisman and Wo o d Weisman

Hill climbing Strategies

Wo o d and Rivlin Emery and OHagan spider method Fend and Chandler

moment rosetta search Bandler and MacDonald razor search see also

Bandler ab Pierre bunnyhop search Erlicki and App elbaum and

Houston and Human A more detailed enumeration of older metho ds can b e

found in Lavi and Vogl Some of these mo dications allow constraints in the form

of inequalities to b e taken into account directly Similar to them is a program designed

byMSchneider see Drenick Aside from the fact that in order to use it one must

sp ecify which of the variables enter the individual constraints it do es not app ear to work

very eectively Excessively long computation times and inaccurate results esp ecially

with manyvariables made it seem reasonable to omit M Schneiders pro cedure from the

strategy comparison see Chap The problem of howtotakeinto account constraints in

a direct search has also b een investigated by Klingman and Himmelblau and Glass

and Co op er The resulting metho ds to a greater or lesser extent transform the

original problem They havenowadays b een sup erseded by the general p enalty function

metho ds Automatic optimizators for online optimization of chemical pro cesses which

once were well known under the names Opcon Bernard and Sonderquist and

Optimat Weiss Archer and Burt also apply mo died versions of the direct search

metho d Another application is describ ed bySawaragi at al

Strategy of Rosenbro ck Rotating Co ordinates

Rosenbro cks idea was to remove the limitation on the numb er of search directions

in the co ordinate strategy so that the search steps can move parallel to the axes of a

n

co ordinate system that can rotate in the space IR One of the axes is set to p ointin

the direction that app ears most favorable For this purp ose the exp erience of successes

and failures gathered in the course of the iterations is used in the manner of Ho oke and

Jeeves direct search The remaining directions are xed normal to the rst and mutually

orthogonal

To start with the search directions comprise the unit vectors



v e for all i n

i

i



Starting from the p oint x a trial is made in each direction with the discrete initial



step lengths s for all i n When a success is scored including equality of the

i

ob jective function values the changed variable vector is retained and the step length is

multiplied by a p ositivefactor for a failure the vector of variables is left unchanged

and the step length is multiplied by a negative factor Rosenbro ck prop oses

the choice and This pro cess is rep eated until at least one success

followed not necessarily immediately by a failure is registered in each direction As a rule

several cycles must b e run through b ecause if there is a failure in any particular direction

the opp osite direction is not tested in the same cycle Following this rst part of the search

the co ordinate axes are rotated Rosenbro ck uses for this the orthogonalization pro cedure

of Gram and Schmidt see for example Birkho and MacLane Rutishauser

Nake The recursion formulae are as follows

w

i

k 

v for all i n

i

kw k i

Multidimensional Strategies

where

a

i

for i

i

P

w

k  k 

i

T

a a v v

for i n

i

j j

i

j 

and

n

X

k  k 

a d v for all i n

i

j j

j i

k  k 

A scalar d represents the distance covered in direction v in the k th iteration

i i

k 

p oints in the overall successful direction of the step k It is exp ected that a Thus v



particularly large search step can b e taken in this direction at the next iteration The

requirementofwaiting for at least one success in each direction has the eect that no

k 

direction is lost and the v always span the full ndimensional Euclidean space The

i

termination rule or convergence criterion is determined by the lengths of the vectors

k  k  k 

a and a Before each orthonormalization there is a test whether ka k and

  

k  k 

ka k ka k When this condition is satised in six consecutive iterations the

 

search is ended The second condition is designed to ensure that a premature termination

of the search do es not o ccur just b ecause the distances covered have b ecome small More

signicantly the requirement is also that the main success direction changes suciently

rapidly something that Rosenbro ck regards as a sure sign of the proximity of a minimum

As the strategy comparison will show see Chap this requirement is often to o strong

It even hinders the ending of the pro cedure in manycases

In his original publication Rosenbro ck has already given detailed rules as to how

inequality constraints can b e treated His pro cedure for doing this can b e viewed as

a partial penalty function metho d since the ob jective function is only altered in the

neighb orho o d of the b oundaries Immediately after eachvariation of the variables the

ob jective function value is tested If the comparison is unfavorable a failure is registered

as in the unconstrained case For equalityoranimprovement however if the iteration

p oint lies near a b oundary of the region the success criterion changes For example for

constraints of the form G x for all j m the extended ob jective function

j

F xtakes the form this is one of several suggestions of Rosenbro ck

m

X

xf F x F xF x

j j

j 

in which

if G x

j

 

x if G x

j j

if G x

j

and

G x

j

The auxiliary item f is the value of the ob jective function b elonging to the last success

j

of the search that did not fall in the region of the j th b oundary As a reasonable value for

k  k 

the b oundary zone one can take Rosenbro ck sets b x a x

j j

Hill climbing Strategies

for constraints of the form a x G x b x this kind of double sided b ounding is

j j j

not always given however The basis of the pro cedure is fully describ ed in Rosenbro ck

and Storey

Using the notations

x ob ject variables

i

s step sizes

i

v direction comp onents

i

d distances travelled

i

successfailure indications

i

the extended algorithm of the strategy runs as follows

Step Initialization

 

Cho ose a starting p oint x such that G x for all j m

j

Cho ose an accuracy parameter

Rosenbro cktakes



Set v e for all i n

i

i

Set k outer lo op counter

inner lo op counter

If there are constraints m



set f F x for all j m

j

Step Initialization of step sizes distances travelled and indicators

k

Set s

i

k 

d and

i

k 

for all i n

i

Set andi

Step Trial step

k k 

kn  i 

Construct x x s v

i i

kn  i 

If F x Fx go to step

otherwise

go to step

if m

set F F x and j

Step Test of feasibility

go to step

set f F x and go to step

j

If G x

j

otherwise replace F F x f F as Equation

j j

k n  i 

If FFx go to step

Step Constraints lo op

If jmincrease j j and go to step

Multidimensional Strategies

Step Store the success and up date the internal memory

k  k k  k  k

kn  i

Set x x s s and replace d d s

i i i i i

k  k 

If set

i i

Go to step

Step Internal memory up date in case of failure

kn  i kn  i 

Set x x

k  k



s s

i i



k  k 

If set

i i

Step Main lo op

k 

If for all j ngotostep

j

otherwise

n increase i i

if i

n increase andset i

Go to step

Step Preparation for the orthogonalization and check for termination

n

P

k  k 

k  k n  i k

Set x x x d v

j j

j 

n

P

k  k  k 

Construct the vectors a d v for all i n

i j j

j  i

a is the total progress during the lo op just nished



k  k  k 

If ka k and if nka k ka k increase

  

otherwise set

If end the search

Step Orthogonalization

If n

k 

construct new direction vectors v for i n

i

according to the recursion formula Equation of the

GramSchmidt orthogonalization

Increase k k and go to step

Hill climbing Strategies

(2)

(0) v 2 (0) v (1) (3) 1 (0) (4) (1) v Starting point (5) 1 (1) Success v (7) 2 Failure Overall success

(8)

(6) (9)

(2) v 2 (2) v 1

(10) (11)

Figure Strategy of Rosenbro ck

Multidimensional Strategies

Numb ering Iteration Test index Variable Step Remarks

index iteration values lengths

k nl i x x s s

   

starting p oint

success

failure

failure

success

failure

failure

transformation

and orthogo

nalization

success

success

success

failure

failure

transformation

and orthogo

nalization

In Figure including the following table a few iterations of the Rosenbro ck strategy



for n are represented geometricallyAt the starting p oint x the search directions

are the same as the unit vectors After three runs through trials the trial steps in each

direction have led to a success followed by a failure At the b est condition thus attained

 

  

are generated Five further trials and v at x x new direction vectors v

 

 

lead to the b est p oint at x x of the second iteration at which a new choice

of directions is again made The complete sequence of steps can b e followed if desired

with the help of the accompanying table

Numerical exp eriments show that within a few iterations the rotating co ordinates

b ecome oriented such that one of the axes p oints along the gradient direction The

strategy thus allows sharp valleys in the top ology of the ob jective function to b e followed

Like the metho d of Ho oke and Jeeves Rosenbro cks pro cedure needs no information ab out

partial derivatives and uses no line search metho d for exact lo cation of relative minima

This makes it very robust It has however one disadvantage compared to the direct

pattern search The orthogonalization pro cedure of Gram and Schmidt is very costly



It requires storage space of order O n for the matrices A fa g and V fv g

ij ij



and the numb er of computational op erations even increases with O n At least in

cases where the ob jective function call costs relatively little the computation time for

the orthogonalization with manyvariables b ecomes highly signicant Besides this the

numb er of parameters is in any case limited by the high storage space requirement

If there are constraints care must b e taken to ensure that the starting p oint is inside

the allowed region and suciently far from the b oundaries Examples of the application of

Hill climbing Strategies

Rosenbro cks strategy can b e found in Storey and Storey and Rosenbro ck

Among them is also a discretized functional optimization problem For unconstrained

problems there exists the co de of Machura and Mulawa The GramSchmidt

orthogonalization has b een programmed for example byClayton

LangeNielsen and Lance have prop osed on the basis of numerical exp eri

ments two improvements in the Rosenbro ck strategy The rst involves not setting con

stant step lengths at the b eginning of a cycle or after each orthogonalization but rather

mo difying them and simultaneously scaling them according to the successes and failures

during the preceding cycle The second improvement concerns the termination criterion

Rosenbro cks original version is replaced by the simpler condition that according to the

achievable computational accuracyseveral consecutive trials yield the same value of the

ob jective function

Strategy of Davies Swann and Camp ey DSC

A combination of the Rosenbro ck idea of rotating co ordinates with one dimensional search

metho ds is due to Swann It has b ecome known under the name DaviesSwann

Campey abbreviated DSC strategy The description of the pro cedure given byBox

Davies and Swann diers somewhat from that in Swann and so several versions

of the strategy have arisen in the subsequent literature Preference is given here to the

original concept of Swann which exhibits some features in common with the metho d of



conjugate directions of Smith see also Sect Starting from x a line



e for all i n This pro cess is search is made in each of the unit directions v

i

i

followed by a one dimensional minimization in the direction of the overall success so far

achieved

n 

x x



v

n

n 

kx x k

n 

with the result x

The orthogonalization follows this eg by the GramSchmidt metho d If one of the

line searches was unsuccessful the new set of directions would no longer span the complete

parameter space Therefore only those old direction vectors along which a prescrib ed

minimum distance has b een moved are included in the orthogonalization pro cess The

other directions remain unchanged The DSC metho d however places a further hurdle

b efore the co ordinate rotation If the distance covered in one iteration is smaller than

the step length used in the line search the latter is reduced by a factor and the next

iteration is carried out with the old set of directions

After an orthogonalization one of the new directions the rst coincides with that of

the n th line searc h of the previous step This can therefore also b e interpreted as the

rst minimization in the new co ordinate system Only n more one dimensional searches

need b e made to nish the iteration As a termination criterion the DSC strategy uses

the length of the total vector b etween the starting p ointandendpoint of an iteration

The search is ended when it is less than a prescrib ed accuracy b ound

Multidimensional Strategies

The algorithm runs as follows

Step Initialization

 

Sp ecify a starting p oint x and an initial step length s

the same for all directions

Dene an accuracy requirement



Cho ose as a rst set of directions v e for all i n

i

i

Set k andi

Step Line search

ki ki

Starting from x seek the relative minimum x

k 

in the direction v such that

i

k  k  k 

ki ki ki

F x F x d v minfF x dv g

i i i

d

Step Main lo op

n increase i i and go to step

If i n go to step

n go to step

Step Eventually one more line search

kn k

Construct z x x

k 

zkz k i n and go to step If kz k set v

n

k 

kn kn

and go to step otherwise set x x d

n

Step Check appropriateness of step length

kn k k 

If kx x ks go to step

Step Termination criterion

k  k 

Set s s

k 

If s end the search

k  kn

otherwise set x x

increase k k set i and go to step

Step Check appropriateness of orthogonalization

k  k 

Reorder the directions v and asso ciated distances d such that

i i

for all i p

k 

jd j

i

for all i p n

If p thus always for n go to step

Step Orthogonalization

k 

Construct new direction vectors v for i p by means of the

i

orthogonalization pro cess of Gram and Schmidt Equation

k  k 

k  k 

Set s s d d

 n

k  kn k  kn

and x x x x

Increase k k set i and go to step

Hill climbing Strategies

No geometric representation has b een attempted here since the ne deviations from

the Rosenbro ck metho d would hardly b e apparent on a simple diagram

The line search pro cedure of the DSC metho d has b een describ ed in detail byBox

Davies and Swann It b oxes in the minimum in the chosen direction using three

equidistantpoints and then applies a single Lagrangian quadratic interp olation The

authors state that in their exp erience this is more economical with regard to the number

of ob jective function calls than an exact line search with a sequence of interp olations

The algorithm of the line search is

Step Initialization

Sp ecify a starting p oint x a step length s and a direction v



all given from the main program

Step Step forward

Construct x x sv



If F x F x go to step



Step Step backward

Replace x x sv and s s

If F x F x go to step



otherwise b oth rst trials without success go to step

Step Further steps

Replace s s and set x x



Construct x x sv



If F x F x rep eat step



Step Prepare interp olation

Replace s s

Construct x x sv



Of the four p oints just generated x s x x sandx s reject

   

the one which is furthest from the p oint that has the smallest value of the

ob jective function

Step Interp olation

Dene the three available equidistant p oints x x x and the asso ciated

  

function values F F and F this can b e done in the course of the trial steps

  

to b ox in the minimum Fit a trial parab ola through the three p oints and

solve the necessary condition for its minimum Because the argumentvalues

are equidistant Equation for the Lagrangian quadratic interp olation

simplies to

s F F

 

x x



F F F

  

If the denominator vanishes or if it turns out that F x Fthentheline



search ends with the result x x F F

  



otherwise with the result x x and F F x

 

Multidimensional Strategies

Anumerical strategy comparison byMJBox shows the metho d to b e a very

eective optimization pro cedure in general sup erior b oth to the Ho oke and Jeeves and

the Rosenbro ck metho ds However the tests only refer to smo oth ob jective functions with

few variables If the numb er of parameters is large the costly orthogonalization pro cess

makes its inconvenient presence felt also in the DSC strategy

Several suggestions have b een made to date as to how to simplify the GramSchmidt

pro cedure and to reduce its susceptibilitytonumerical rounding error Rice Powell

a Palmer Golub and Saunders Householder metho d

Palmer replaces the conditions of Equation by

n

P

k 



d otherwise v if

i

j

j 

s

n n

P P

k 



k 

d v d for i

j

j j

v

i

j  j 

s

n n n n

P P P P

k  k 



 

d d for i n d v v d d

j i

j j i j

j

j i j i j i j i

k 

that is d He shows that even if no success was obtained in one of the directions v

i

i

k 

the new vectors v for all i n still span the complete parameter space b ecause

i

k  k 

v is set equal to v Thus the algorithm do es not need to b e restricted to directions

i i

for which d as happ ens in the algorithm with GramSchmidt orthogonalization

i

The signicant advantage of the revised pro cedure lies in the fact that the number



of computational op erations remains only of the order O n The storage requirement

is also somewhat less since one n n matrix as an intermediate storage area is omitted

For problems with linear constraints equalities and inequalities Box Davies and Swann

recommend a mo dication of the orthogonalization pro cedure that works in a

similar way to the metho d of projectedgradients of Rosen see also Davies

Nonlinear constraints inequalities can b e handled with the createdresponse

surface technique devised by Carroll which is one of the p enalty function metho ds

Further publications on the DSC strategy also with comparison tests are those of

Swann Davies and Swann Davies and Swann Hoshino

observes that in a narrowvalley the search causes zigzag movements His remedy

k 

for this is to add a further search again in direction v after eachsetof n line searches



With the help of two examples for n andn heshows the accelerating eect of

this measure

Simplex Strategy of Nelder and Mead

There are a group of metho ds called simplex strategies that work quite dierently to

the direct search metho ds describ ed so far In spite of their common name they have

nothing to do with the simplex metho d of linear programming of Dantzig The

idea Sp endley Hext and Himsworth originates in an attempt to reduce as much

as p ossible the number of simultaneous trials in the exp erimental identication pro cedure

Hill climbing Strategies

of factorial design see for example Davies The minimum numb er according to

Bro oks and Mickey is n Th us instead of a single starting p oint n vertices

are used They are arranged so as to b e equidistant from each other for n in an

equilateral triangle for n atetrahedron and in general a polyhedron also referred to

as a simplex The ob jective function is evaluated at all the vertices The iteration rule is

Replace the vertex with the largest ob jective function value by a new one situated at its

reection in the midp oint of the other n vertices This rule aims to lo cate the new p oint

at an esp ecially promising place If one lands near a minimum the newest vertex can

also b e the worst In this case the second worst vertex should b e reected If the edge

length of the p olyhedron is not changed the searcheventually stagnates The p olyhedra

rotate ab out the vertex with the b est ob jective function value A closer approximation to

the optimum can only b e achieved by halving the edge lengths of the simplex Sp endley

Hext and Himsworth suggest doing this whenever a vertex is common to more than



n n consecutive p olyhedra Himsworth holds that this strategy is

esp ecially advantageous when the number of variables is large and the determination of

the ob jective function prone to error

To this basic pro cedure various mo dications have b een prop osed by among others

Nelder and Mead Box Ward Nag and Dixon and Dambrauskas

Richardson and Kuester haveprovided a complete program The

most common version is that of Nelder and Mead in which the main dierence from the

basic pro cedure is that the size and shap e of the simplex is mo died during the run to

suit the conditions at each stage

The algorithm with an extension by ONeill runs as follows

Step Initialization





Cho ose a starting p oint x initial step lengths s for all i n

i



and an accuracy parameter if no b etter scaling is known s

i

eg Set c andk

Step Establish the initial simplex

k  k 

e for all n x x cs

Step Determine worst and b est p oints for the normal reection

Determine the indices w worst p oint and b b est p oint suchthat

kw k 

F x maxfF x ng

kb k 

F x minfF x ng

n

P



k  kw

Constructx x and x x x normal reection

n

 w

kb

If F x Fx go to step

Step Compare trial with other vertices

k 

Determine the number for which F x F x holds for all n

k w 

set x x and go to step

If go to step

go to step

Multidimensional Strategies

Step Expansion

Construct x x x

kb k w 

If F x Fx set x x

k w 

otherwise set x x

Go to step

Step Partial outside contraction

Construct x x x

k w 

If F x F x set x x and go to step

otherwise go to step

Step Partial inside contraction

kw

Construct x x x

kw k w 

If F x F x set x x and go to step

Step Total contraction

Construct

k   kb k 

x x x for all n

Go to step

Step Normal iteration lo op

k   k 

Assign x x for all n except w

Step Termination criterion

Increase k k



n n

P P

 

 k  k  

F x F x go to step If

n n

 

otherwise go to step

Step Restart test note that index w points to the new vertex

Test whether anyvector



kw kw

x x s e for all i n exists such that F x Fx

i

i

k

If so set x x c andgotostep restart

kw

otherwise end the search with result x

The criterion for ending the minimum search is based on testing whether the variance

of the ob jective function valuesatthevertices of the simplex is less than a prescrib ed

limit A few hyp othetical iterations of the pro cedure for twovariables are shown in

Figure including the following table The sequence of reections expansions and

contractions is taken from the accompanying table in which the simplex vertices are

numb ered sequentially and sorted at each stage in order of their ob jective function values

Hill climbing Strategies

(1) (2) Starting point

Vertex point

First and last simplex (3) (4)

(5)

(9) (6) (7)

(15) (11) (17) (14) (16) (10) (12) (8)

(13) Figure Simplex strategy of Nelder and Mead

Multidimensional Strategies

Iteration Simplex vertices

index worst best Remarks

start simplex

reection

expansion successful

reection

reection

expansion unsuccessful

reection

reection

partial outside contraction

reection

expansion unsuccessful

reection

partial inside contraction

total contraction

The main dierence b etween this program and the original strategy of Nelder and Mead

is that after a normal ending of the minimization there is an attempt to construct a new

starting simplex To this end small trial steps are taken in each co ordinate direction

If just one of these tests is successful the search is started again but with a simplex

of considerably reduced edge lengths This restart pro cedure recommends itself b ecause

esp ecially for a large number of variables the simplex tends to no longer span the complete

parameter space ie to collapse without reaching the minimum

For few variables the simplex metho d is known to b e robust and reliable but also to b e

relatively costly There are n parameter v ectors to b e stored and the reection requires



anumb er of computational op erations of order O n According to Nelder and Mead the



numb er of function calls increases approximately as O n however this empirical value

is based only on test results with up to variables Parkinson and Hutchinson ab

describ e a variant of the strategy in which the real storage requirement can b e reduced

by ab out half see also Sp endley Masters and Drucker recommend altering

the expansion or contraction factor after consecutive successes or failures resp ectively

Complex Strategy of Box

M J Box calls his mo dication of the p olyhedron strategy the complex method

an abbreviation for constrained simplex since he conceived it also for problems with

Hill climbing Strategies

inequality constraints The starting p oint of the search do es not need to lie in the feasible

region For this case Box suggests lo cating an allowed p ointby minimizing the function

m

X

F x G x x

j j

j 

with

if G x

j

x

j

otherwise

until

F x

The two most imp ortant dierences from the NelderMead strategy are the use of more

vertices and the expansion of the p olyhedron at each normal reection Both measures

are intended to prevent the complex from eventually spanning only a subspace of reduced

dimensionality esp ecially at active constraints If an allowed starting p ointisgiven or

has b een found it denes one of the n N n vertices of the p olyhedron The

remaining vertex p oints are xed by a random pro cess in which eachvector inside the

closed region dened by the explicit constraints has an equal probability of selection If an

implicit constraint is violated the new p oint is displaced stepwise towards the midp oint

of the allowed vertices that have already b een dened until it satises all the constraints

Implicit constraints G x are dealt with similarly during the course of the minimum

j

search If an explicit b oundary is crossed x a the oending variable is simply set

i i

back in the allowed region to a value near the b oundary

The details of the algorithm are as follows

Step Initialization



Cho ose a starting p oint x andanumber of vertices N n eg

N n Numb er the constraints such that the rst j m eachdepend



only on one variable x G x explicit form

j j j



Test whether x satises all the constraints

If not then construct a substitute ob jective function according to Equation

Set up the initial complex as follows

 

x x

n

P

  

and x x z e for N

i i

i

where the z are uniformly distributed random numb ers from the range

i

a b if constraints are given in the form a x b

i i i i i

h

 

otherwise x s x s where eg s

i i

 

If G x forany j m

j 

    

replace x x x

j j j

 

If G x forany j m

j



P



    

replace x x x





Multidimensional Strategies

 

If necessary rep eat this pro cess until G x for all j m

j

Set k

Step Reection

Determine the index w worst vertex such that

kw k 

F x maxfF x N g

N

P



k 

x Constructx

N 



w

kw

and x x x x overreection factor

Step Check for constraints

If m go to step otherwise set j

If m go to step



Step Set vertex backinto b ounds for explicit constraints

Obtain g G x G x

j j

j

If g go to step

otherwise replace x x g backwards length

j j

If G x replace x x g

j

j j

Step Explicit constraints lo op

Increase j j

j m go to step



If m j m go to step



jm go to step

Step Check implicit constraints

If G x go to step

j

otherwise go to step unless the same constraint caused a failure ve times

in a row without its function value G x b eing changed In this case go to

j

step

Step Implicit constraints lo op

If jm increase j j and go to step

Step Check for improvement

k 

If F x Fx for at least one N except w

k 

x for all N except w

k  

set x

x for w

increase k k and go to step

otherwise go to step unless a failure o ccurred ve times in a rowwithno

change in the ob jective function value F x In this case go to step

Step Contraction

Replace x x x Go to step

Hill climbing Strategies

Step Termination

Determine the index b b est vertex such that

kb k 

F x minfF x N g

kb kb

End the search with the result x and F x

Box himself rep orts that in numerical tests his complex strategy gives similar results

to the simplex metho d of Nelder and Mead but b oth are inferior to the metho d of

Rosenbro ck with regard to the numb er of ob jective function calls He actually uses his

own mo dication of the Rosenbro ck metho d Investigation of the eect of the number of

vertices of the complex and the expansion factor in this case n and resp ectively

lead him to the conclusion that neither value has a signicant eect on the eciency of

the strategyFor n he considers that a number of vertices N n is unnecessarily

high esp ecially when there are no constraints

The convergence criterion app ears very reliable While Nelder and Mead require that

the standard deviation of all ob jective function values at the p olyhedron vertices referred

to its midp oint must b e less than a prescrib ed size the complex search is only ended

when several consecutivevalues of the ob jective function are the same to computational

accuracy

Because of the larger numb er of p olyhedron vertices the complex metho d needs even



more storage space than the simplex strategy The order of magnitude O n remains

the same No investigations are known of the computational eort in the case of many

variables Mo dications of the strategy are due to Guin Mitchell and Kaplan

and Dambrauskas Guin denes a contraction rule with whichan

allowed p oint can b e generated even if the allowed region is not convex This is not always

the case in the original metho d b ecause the midp ointtowhichtheworst vertex is reected

is not tested for feasibility

Mitchell nds that the initial conguration of the complex inuences the results ob

tained It is therefore b etter to place the vertices in a deterministic way rather than to

make a random choice Dambrauskas combines the complex metho d with the step length

rule of the sto chastic approximation He requires that the step lengths or edge lengths of

the p olyhedron go to zero in the limit of an innite numb er of iterations while their sum

tends to innity This measure maywell increase the reliability of convergence however

it also increases the cost Beveridge and Schechter describ e how the iteration rules

must b e changed if the variables can take only discrete values A practical application

in which a pro cess has to b e optimized dynamically is describ ed byTazaki Shindo and

Umeda this is the original problem for which Sp endley Hext and Himsworth

conceived their simplex EVOP evolutionary operation pro cedure

Compared to other numerical optimization pro cedures the polyhedrastrategies have

the disadvantage that in the closing phase near the optimum they converge rather slowly

and sometimes even stagnate The direction of progress selected by the reection then

no longer coincides at all with the gradient direction To remove this diculty it has

b een suggested that information ab out the top ology of the ob jective function as given by

function values at the vertices of the p olyhedron b e exploited to carry out a quadratic

interp olation Such surface tting is familiar from the related metho ds of test planning and

Multidimensional Strategies

evaluation latticesearch factorial design in which the task is to set up mathematical

mo dels of physical or other pro cesses This territory is entered for example byGEPBox

Box and Wilson BoxandHunter Box and Behnken Box

and Drap er Box et al and Beveridge and Schechter It will

not b e covered in any more detail here

Gradient Strategies

The GaussSeidel strategy very straightforwardly uses only directions parallel to the co

ordinate axes to successively improve the ob jective function value All other direct search

metho ds strivetoadvance more rapidly by taking steps in other directions Todoso

they exploit the knowledge ab out the top ology of the ob jective function gleaned from

the successes and failures of previous iterations Directions are viewed as most promising

in which the ob jective function decreases rapidly for minimization or increases rapidly

for maximization Southwell for example improves the relaxation bycho osing

the co ordinate directions not cyclically but in order of the size of the lo cal gradientin

them If the restriction of parallel axes is removed the lo cal b est direction is given by

the negative gradient vector

T

rF x F xF xF x

x x x

n

 

with

F

x for all i n F x

x

i

x

i



at the p oint x All hill climbing pro cedures that orient their choice of search directions



v according to the rst partial derivatives of the ob jective function are called gradient

strategies They can b e thought of as analogues of the total step procedure of Jacobi for

solving systems of linear equations see Schwarz Rutishauser and Stiefel

So great is the numb er of metho ds of this typ e whichhave b een suggested or applied

up to the present that merely to list them all would b e dicult The reason lies in

the fact that the gradient represents a local prop erty of a function To follow the path

of the gradient exactly would mean determining in general a curvedtrajectory in the n

dimensional space This problem is only approximately soluble numerically and is more

dicult than the original optimization problem With the help of analogue computers

continuous gradient metho ds have actually b een implemented Bekey and McGhee

Levine They consider the tra jectory xt as a function of time and obtain it as

the solution of a system of rst order dierential equations

All the numerical variants of the gradient metho d dier in the lengths of the discrete

steps and thereby also with regard to how exactly they follow the gradient tra jectory

The iteration rule is generally

k 

rF x

k  k  k 

x x s

k 

krF x k

It assumes that the partial derivatives everywhere exist and are unique If F x is con

tinuously dierentiable then the partial derivatives exist and F xiscontinuous

Hill climbing Strategies

A distinction is sometimes drawn b etween short step methods whichevaluate the gra

k  k 

dients again after a small step in the direction rF x for maximization or rF x

for minimization and their equivalent long step methods Since for nite step lengths

k 

s it is not certain whether the new variable vector is really b etter than the old after the

step the value of the ob jective function must b e tested again Working with small steps

increases the numb er of ob jective function calls and gradientevaluations Besides F xn

partial derivatives must b e evaluated Even if the slop es can b e obtained analytically and

can b e sp ecied as functions there is no reason to supp ose that the numb er of compu

tational op erations p er function call is much less than for the ob jective function itself

Except in sp ecial cases the total cost increases roughly as the weighting factor n

and the numb er of ob jective function calls This also holds if the partial derivatives are

approximated by dierential quotients obtained by means of trial steps

F x F x e F x

i



F x O for all i n

x

i

x

i

Additional diculties arise here since for values of that are to o small the subtraction

is sub ject to rounding error while for trial steps that are to o large the neglect of terms



O leads to incorrect values The choice of suitable deviations requires sp ecial care

in all cases Hildebrand Curtis and Reid

Cauchy Kantorovich Levenb erg and Curry are the

originators of the gradient strategywhich started life as a metho d of solving equations

and systems of equations It is rst referred to as an aid to solving variational problems

by Hadamard and Courant Whereas Cauchyworks with xed step lengths

k 

s Curry tries to determine the distance covered in the not normalized direction

k  k 

v rF x so as to reach a relative minimum see also Brown In principle

any one of the one dimensional search metho ds of Section can b e called up on to nd

k 

the optimal value for s

k  k  k  k  k 

F x s v minfF x sv g

s

This variant of the basic strategy could thus b e called a longest step procedure It is

b etter known however under the name optimum gradient methodormethodofsteepest

descent for maximization ascent Theoretical investigations of convergence and rate

of convergence of the metho d can b e found eg in Akaike Goldstein

Ostrowski Forsythe Elkin Zangwill and Wolfe

Zangwill proves convergence based on the assumptions that the line searches are

exact and the ob jective function is continuously twice dierentiable Exactness of the one

dimensional minimization is not however a necessary assumption Wolfe It is

signicant that one can only establish theoretically that a stationary p ointwillbereached

rF x or approached krF xk The stationary p oint is a minimum

only if F x is convex and three times dierentiable Akaike Zellnik Sondak and

Davis however show that saddle p oints are in practice an obstacle only if the

search is started at one or on a straight gradient tra jectory passing through one In other

cases numerical rounding errors ensure that the path to a saddle p oint is unstable

Multidimensional Strategies

The gradient strategyhowever cannot distinguish global from lo cal minima The

optimum at which it aims dep ends only on the choice of the starting p oint for the search

The only chance of nding absolute extrema is to start suciently often from various

initial values of the variables and to iterate eachtimeuntil the convergence criterion is

satised Jacoby Kowalik and Pizzo The termination rules usually recommended

for gradient metho ds are that the absolute value of the vector

k 

krF x k

 

or the dierence

k  k 

F x F x

 

vanishes or falls b elow a given limit

The rate of convergence of the strategy of steep est descent dep ends on the structure

of the ob jective function but is no b etter than rst order apart from exceptional cases

like contours that are concentric spherically symmetric hyp ersurfaces Forsythe

In the general quadratic case

T T

x Ax x b c F x

with elliptic contours ie p ositive denite matrix A the convergence rate dep ends on the

ratio of smallest to greatest eigenvalue of A or geometrically expressed on the oblateness

of the ellipsoid It can b e extremely small Curry Akaike Goldstein

and is in principle no b etter than the co ordinate strategy with line searches Elkin

In narrowvalleys b oth pro cedures execute zigzag movements with very small changes in

the variable values in relation to the distance from the ob jective The individual steps can

even b ecome to o small to eect anychange in the ob jective function value if this is dened

with a nite numb er of decimal places Then the search ends b efore reaching the desired

optimum Toobviate this diculty Bo oth has suggested only going of

the way to the relative minimum in each line search see also Stein Kantorovich

Faddejew and Faddejewa In fact one often obtains much b etter results with

this kind of underrelaxation Even more advantageous is a mo dication due to Forsythe

and Motzkin It is based on the observation that the searchmovements in the

minimization of quadratic ob jective functions oscillate b etween two asymptotic directions

Stiefel Forsythe Forsythe and Motzkin therefore from time to time include

a line search in the direction

k  k  k 

v x x for k

in order to accelerate the convergence For n the gradient metho d is thereby greatly

improved with manyvariables the eciency advantage is lost again Similar prop osals

for increasing the rate of convergence have b een made by Baer Humphrey and

Cottrell Witte and Holst Schinzinger and McCormick

The Partan acronym for parallel tangents metho ds of Buehler Shah and Kempthorne

have attracted particular attention One of these continuedgradient Partan

alternates b etween gradient directions

k  k 

v rF x for k as well as k odd

Hill climbing Strategies

and those derived from previous iteration p oints

k  k  k   

v x x for k even with x x

For quadratic functions the minimum is reached after at most n line searches

Shah Buehler and Kempthorne This desirable prop erty of converging after a

nite numb er of iterations that is also called quadratic convergence is only shared by

strategies that apply conjugate gradients of whichthePartan metho ds can b e regarded

as forerunners Pierre Sorenson

In the fties simple gradient strategies were very p opular esp ecially the metho d of

steep est descent Today they are usually only to b e found as comp onents of program

packages together with other hill climbing metho ds eg in GROPE of Flo o d and Leon

in AID of Casey and Rustay in AESOP of Hague and Glatt and in

GOSPEL of Huelsman McGhee presents a detailed ow diagram Wass

cher ab has published two ALGOL co dings see also Haubrich Wallack

Varah Wasscher The partial derivatives are obtained numerically A com

prehensive bibliographyby Leon b names most of the older versions of strategies

and gives many examples of their application Numerical comparison tests have b een car

ried out by Fletcher Box Leon a Colville and Kowalik

and Osb orne They show the sup eriority of rst and second order metho ds over

direct search strategies for ob jective functions with smo oth top ology Gradient metho ds

for solving systems of dierential equations are describ ed for example byTalkin

For such problems as well as for functional optimization problems analogue and hy

brid computers have often b een applied Rybashov ab Sydow Fogarty

and Howe A literature survey on this sub ject has b een compiled byGilbert

For the treatmentofvariational problems see Kelley Altman Miele

Bryson and Ho Cea Daniel and Tolle

In the exp erimental eld there are considerable diculties in determining the partial

derivatives Errors in the values of the ob jective function can cause the predicted direction

of steep est descent to lie almost p erp endicular to the true gradientvector Kowalik and

Osb orne Box and Wilson attempt to comp ensate for the p erturbations

by rep eating the trial steps or increasing their number above the necessary minimum of

n

n With trials for example a complete factorial design can b e constructed

eg Davies The slop e in one direction is obtained byaveraging the function

n

value dierences over pairs of p oints Lapidus et al Another p ossibilityisto

determine the co ecients of a linear p olynomial such that the sum of the squares of the

errors b etween measured and mo del function values at N n p oin ts is a minimum

The linear function then represents the tangent plane of the ob jective function at the p oint

under consideration The cost of obtaining the gradients when there are manyvariables

is to o great for practical application and only justied if the aim is rather to set up a

mathematical mo del of the system than simply to p erform the optimization

In the EVOP acronym for evolutionary op eration schemeGEPBox has

presented a practical simplication of this gradient metho d It actually counts as a direct

search strategy b ecause it do es not obtain the direction of the gradient but only one of a

nite numb er of esp ecially go o d directions Sp endley Hext and Himsworth have

Multidimensional Strategies

devised a variant of the pro cedure see also Sections and Lowe

has gathered together the various schemes of trial steps for the EVOP strategy The

philosophyoftheEVOP strategy is treated in detail byBox and Drap er Some

examples of applications are given byKenworthy The eciency of metho ds of

determining the gradient in the case of sto chastic p erturbations is dealt with by Mlynski

ab ab Sergiyevskiy and TerSaakov and others

Strategy of Powell Conjugate Directions

The most imp ortant idea for overcoming the convergence diculties of the gradient strat

egy is due to Hestenes and Stiefel and again comes from the eld of linear algebra

see also Ginsburg Beckman It trades under the names conjugate directions

or conjugate gradients The directions fv i ng are said to b e conjugate with

i

resp ect to a p ositive denite n n matrix A if Hestenes

T

v Av for all i j ni j

j

i

A further prop erty of conjugate directions is their linear indep endence ie

n

X

v

i i

i

only holds if all the constants f i ng are zero If A is replaced by the unit matrix

i



A I thenthev are mutually orthogonal With A r F xHessian matrix the

i

minimum of a quadratic function is obtained exactly in n line searches in the directions

v This is a factor two b etter than the gradientPartan metho d For general nonlinear

i

problems the convergence rate cannot b e sp ecied As it is frequently assumed however

that many problems b ehave roughly quadratically near the optimum it seems worthwhile

to use conjugate directions The quadratic convergence of the search with conjugate

directions comes ab out b ecause second order prop erties of the ob jective function are

taken into account In this resp ect it is not in fact a rst order gradient metho d but

n

n second partial deriv atives are a second order pro cedure If all the n rst and



available the conjugate directions can b e generated in one pro cess corresp onding to the

GramSchmidt orthogonalization Kowalik and Osb orne It calls for exp ensive

matrix op erations Conjugate directions can however b e constructed without knowledge

of the second derivatives for example from the changes in the gradientvector in the

course of the iterations Fletcher and Reeves Because of this implicit exploitation

of second order prop erties conjugate directions has b een classied as a gradient metho d

The conjugate gradients method of Fletcher and Reeves consists of a sequence of line



searches with Hermitian interp olation see Sect As a rst search direction v



at the starting p oint x the simple gradient direction

 

v rF x

is used The recursion formula for the subsequent iterations is

k  k  k  k 

v v rF x for all k n

Hill climbing Strategies

with the correction factor

k  T k 

rF x rF x

k 

k  T k 

rF x rF x

For a quadratic ob jective function with a p ositive denite Hessian matrix conjugate

directions are generated in this way and the minimum is found with n line searches Since

at any time only the last direction needs to b e stored the storage requirement increases

linearly with the number of variables This often signies a great advantage over other

strategies In the general nonlinear nonquadratic case more than n iterations must b e

carried out for which the metho d of Fletcher and Reeves must b e mo died Continued

application of the recursion formula Equation can lead to linear dep endence of

the search directions For this reason it seems necessary to forget from time to time the

accumulated information and to start afresh with the simple gradient direction Crowder

and Wolfe Various suggestions have b een made for the frequency of this restart

rule Fletcher a Absolute reliability of convergence in the general case is still not

guaranteed by this approach If the Hessian matrix of second partial derivatives has

p oints of singularity then the conjugate gradient strategy can fail The exactness of

the line searches also has an imp ortant eect on the convergence rate Kawamura and

Volz Polak denes conditions under which the metho d of Fletcher and

Reeves achieves greater than linear convergence Fletcher c himself has written a

FORTRAN program

Other conjugate gradient metho ds have b een prop osed byPowell Polak and

Ribiere see also Klessig and Polak Hestenes and Zoutendijk

Schley has published a complete FORTRAN program Conjugate directions are

also pro duced bytheprojectedgradient methods Myers Pearson Sorenson

Cornick and Michel and the memory gradient methods Miele and Cantrell

see also Cantrell Cragg and Levy Miele Miele Huang

and Heidemann Miele Levy and Cragg Miele Tietze and Levy

Miele et al Relevant theoretical investigations have b een made by among others

Greenstadt a Daniel a Huang Beale and Cohen

Conjugate gradient metho ds are encountered esp ecially frequently in the elds of

functional optimization and optimal control problems Daniel b Pagurek and

Wo o dside Nenonen and Pagurek Rob erts and Davis Polyak

Lasdon Kelley and Sp eyer Kelley and Myers Sp eyer et al

Kammerer and Nashed Szego and Treccani Polak McCormick and

Ritter Variable metric strategies are also sometimes classied as conjugate gra

dient pro cedures but more usually as quasiNewton methods For quadratic ob jective

functions they generate the same sequence of p oints as the FletcherReeves strategy and

its mo dications Myers Huang In the nonquadratic case however the

search directions are dierent With the variable metric but not with conjugate direc

tions Newton directions are approximated

For many practical problems it is very dicult if not imp ossible to sp ecify the partial

derivatives as functions The sensitivity of most conjugate gradient metho ds to imprecise

Multidimensional Strategies

sp ecication of the gradient directions makes it seem inadvisable to apply nite dierence

metho ds to approximate the slop es of the ob jective function This is taken into account

by some pro cedures that attempt to construct conjugate directions without knowledge

of the derivatives The oldest of these was devised by Smith On the basis of

numerical tests by Fletcher however the version of Powell has proved to

b e b etter It will b e briey presented here It is arguable whether it should b e counted

as a gradient strategy Its intermediate p osition b etween direct search metho ds that only

use function values and Newton metho ds that make use of second order prop erties of the

ob jective function if only implicitly nevertheless makes it come close to this category

The strategy of conjugate directions is based on the observation that a line through the

minimum of a quadratic ob jective function cuts all contours at the same angle Powells

idea is then to construct such sp ecial directions by a sequence of line searches The

unit vectors are taken as initial directions for the rst n line searches After these a

minimization is carried out in the direction of the overall result Then the rst of the old

direction vectors is eliminated the indices of the remainder are reduced by one and the

direction that was generated and used last is put in the place freed by the nth vector As

shown byPowell after n cycles eachof n line searc hes a set of conjugate directions

is obtained provided the ob jective function is quadratic and the line searches are carried

out exactly

Zangwill indicates how this scheme might fail If no success is obtained in

one of the search directions ie the distance covered b ecomes zero then the direction

vectors are linearly dep endent and no longer span the complete parameter space The

same phenomenon can b e provoked by computational inaccuracyTo prevent this Pow

ell has mo died the basic algorithm First of all he designs the scheme of exchanging

directions to b e more exible actually by maximizing the determinant of the normalized

direction vectors It can b e shown that assuming a quadratic ob jective function it is

most favorable to eliminate the direction in which the largest distance was covered see

Dixon a Powell would also sometimes leave the set of directions unchanged This

dep ends on howthevalue of the determinantwould change under exchange of the search

directions The ob jective function is here tested at the p osition given by doubling the

distance covered in the cycle just completed Powell makes the termination of the search

dep end on all variables having changed by less than within an iteration where

represents the required accuracy Besides this rst convergence criterion he oers a sec

ond stricter one according to which the state reached at the end of the normal pro cedure

is slightly displaced and the minimization rep eated until the termination conditions are

again fullled This is followed by a line search in the direction of the dierence vector

between the last two endp oints The optimization is only nally ended when the result

agrees with those previously obtained to within the allowed deviation of for each

comp onent

The algorithm of Powell runs as follows

Step Initialization



Sp ecify a starting p oint x

and accuracy requirements for all i n i

Hill climbing Strategies

Step Sp ecify rst set of directions



Set v e for all i n

i

i

and set k

Step Start outer lo op

k k 

Set x x and i

Step Line search

ki

Determine x such that

k 

ki ki

g F x minfF x sv

i

s

Step Inner lo op

If in increase i i and go to step

Step First termination criterion

Increase k k

k  k 

k  k n

Set x x and v v for all i n

i i

k  k 

If jx x j for all i ngotostep

i

i i

Step First test for direction exchange

k  k 

Determine F F x x

k 

If F F x go to step

Step Second test for direction exchange

Determine the index n suchthat

k i k i

maxf i ng where F x F x

i i

i

k  k  k  k  

If F x F x F F x F x



k  

F x F gotostep



Step New direction set and additional line search

k 

k 

b ecomes free from the list of directions so that v Eliminate v

n

k  k  k  k n k 

Set v x x x x

n

k 

Determine a new x such that

k  k n k 

F x minfF x sv g

n

s

Go to step

Step Second termination criterion

n

P

 k   

e Set y x and replace x y

i

i

Rep eat steps to until the convergence criterion step is fullled again



and call the result y



Determine y suchthat

   

F y minfF y s y y g

s

 

j for all i n y If jy

i

i i

 

and jy y j for all i n

i

i i

 

then end the search with the result y and F y

Multidimensional Strategies



   

otherwise set x y v y y



 k 

v v for i n k and go to step

i i

Figure illustrates a few iterations for a hyp othetical two parameter function Each

of the rst lo ops consists of n line searc hes and leads to the adoption of a new

search direction If the ob jective function had b een of second order the minimum would

certainly have b een found by the last line search of the second lo op In the third and

fourth lo ops it has b een assumed that the trial steps have led to a decision not to exchange

directions thus the old direction vectors numb ered v and v are retained Further lo ops



eg according to step are omitted

The quality of the line searches has a strong inuence on the construction of the

conjugate directions Powell uses a sequence of Lagrangian quadratic interp olations It is

terminated as so on as the required accuracy is reached For the rst minimization within

an iteration three p oints and Equation are used The argumentvalues taken in

direction v are x the starting p oint x s v and either x s v or x s v

i i i i i i i

according to whether F x s v Fx or not The step length s is given initially by

i i i

the asso ciated accuracy multiplied by a maximum factor and is later adjusted in the

i

course of the iterations

In the direction constructed from the successful results of n one dimensional searches

k kn kn k

the argumentvalues are called x x and x x With three p oints a

bc and asso ciated ob jective function values F F F not only the minimum but

a b c

also the second derivative of the quadratic trial function P x can b e sp ecied In the

notation of Section the formula for the curvature in the direction v is

i i

Starting point End points of line searches Tests for direction exchange v End points of iterations 1 (0,1) (0,0) v 2 v (0,2) 3 (0,3)=(1,0)

(1,1) (1,2) v 4 (3,1) (2,1) (1,3)=(2,0)

(3,3)=(4,0) v (3,2) (2,2)=(3,0)

5

Figure Strategy of Powell conjugate directions

Hill climbing Strategies



b c F c a F a b F

a b c

P x sv

i i



s b cc aa b

Powell uses this quantity for all subsequentinterp olations in the direction v as a scale

i i

for the second partial derivative of the ob jective function He scales the directions v

i

p

which in his case are not normalized by This allows the p ossibility of subsequently

i

carrying out a simplied interp olation with only two argumentvalues x and x s v

i i

It is a worthwhile pro cedure since each direction is used several times The predicted

minimum assuming that the second partial derivatives havevalue unity is then

s F x s v F x v x x

i i i i

s

i

For the trial step lengths s Powell uses the empirical recursion formula

i

q

k 

k  k 

s F x F x

i

Because of the scaling all the step lengths actually b ecome the same A more detailed

justication can b e found in Homann and Hofmann

Contrary to most other optimization pro cedures Powells strategy is available as a

precise algorithm in a tested co de Powell f As Fletcher rep orts this metho d

of conjugate directions is sup erior for the case of a few variables b oth to the DSC metho d

and to a strategy of Smith esp ecially in the neighborhood of minima For manyvariables

however the strict criterion for adopting a new direction more frequently causes the old

set of directions to b e retained and the pro cedure then converges slowly A problem which

had a singular Hessian matrix at the minimum made the DSC strategy lo ok b etter In

a later article Fletcher a denes a limit of n to ab ovewhichthePowell

strategy should no longer b e applied This is conrmed by the test results presented

in Chapter Zangwill combines the basic idea of Powell with relaxation steps

in order to avoid linear dep endence of the search directions Some results of Rhead

lead to the conclusion that Powells improved concept is sup erior to Zangwills

Brent also presents a variant of the strategy without derivatives derived from

Powells basic idea which is designed to prevent the o ccurrence of linear dep endence of

the search directions without endangering the quadratic convergence After every n

iterations the set of directions is replaced by an orthogonal set of vectors So as not to

lose all the information however the unit vectors are not chosen For quadratic ob jective

functions the new directions remain conjugate to each other This pro cedure requires



O n computational op erations to determine orthogonal eigenvectors As however they



are only p erformed every O n line searches the extra cost is O n p er function call and

is thus of the same order as the cost of evaluating the ob jective function itself Results of

tests by Brent conrm the usefulness of the strategy

Newton Strategies

Newton strategies exploit the fact that if a function can b e dierentiated anynumber of

k 

times its value at the p oint x can b e represented by a series of terms constructed at

Multidimensional Strategies

k 

another p oint x

k  k  T k  T  k 

F x F x h rF x h r F x h

where

k  k 

h x x

In this Taylor series as it is called all the terms of higher than second order are

zero if F x is quadratic Dierentiating Equation with resp ect to h and setting

the derivative equal to zero one obtains a condition for the stationary p oints of a second

order function

k  k   k  k  k 

rF x rF x r F x x x

or

k  k   k   k 

x x r F x rF x

 

If F x is quadratic and r F x ispositivedenite Equation yields the

 

solution x in a single step from any starting p oint x without needing a line search If

Equation is taken as the iteration rule in the general case it represents the extension

of the NewtonRaphson method to functions of several variables Householder It

is also sometimes called a second order gradient method with the choice of direction and

step length Cro ckett and Cherno

k   k   k 

v r F x rF x

k 

s

The real length of the iteration step is hidden in the nonnormalized Newton direction

k 

v Since no explicit value of the ob jective function is required but only its derivatives

the NewtonRaphson strategy is classied as an indirect or analytic optimization metho d

Its ability to predict the minimum of a quadratic function in a single calculation at rst

sightlooksvery attractive This single step however requires a considerable eort Apart

n

n second partial deriv atives the Hessian from the necessityofevaluating n rst and



 k 

matrix r F x must b e inverted This corresp onds to the problem of solving a system

of linear equations

 k  k  k 

F x x rF x r

k 

for the unknown quantities x All the standard metho ds of linear algebra eg Gaus

sian elimination Brown and Dennis Brown and the matrix decomp osition



metho d of Cholesky Wilkinson need O n computational op erations for this

see Schwarz Rutishauser and Stiefel For the same cost the strategies of conju

gate directions and conjugate gradients can execute O nstepsThus in principle the

NewtonRaphson iteration oers no advantage in the quadratic case

If the ob jective function is not quadratic then



do es not in general p ointtowards a minimum The iteration rule Equation v

must b e applied rep eatedly

Hill climbing Strategies

k 

s may lead to a p ointwithaworse value of the ob jective function The

 k 

searchdiverges eg when r F x is not p ositivedenite

 k 

It can happ en that r F x is singular or almost singular The Hessian matrix

cannot b e inverted



Furthermore it dep ends on the starting p oint x whether a minimum a maximum

or a saddle p oint is approached or the whole iteration diverges The strategy itself do es

not distinguish the stationary p oints with regard to typ e

If the metho d do es converge then the convergence is b etter than of linear order

Goldstein Under certain very strict conditions on the structure of the ob jective

function and its derivatives even second order convergence can b e achieved eg Polak

that is the numb er of exact signicant gures in the approximation to the minimum

solution doubles from iteration to iteration This phenomenon is exhibited in the solution

of some test problems particularly in the neighb orho o d of the desired extremum

All the variations of the basic pro cedure to b e describ ed are aimed at increasing the

reliability of the Newton iteration without sacricing the high convergence rate A dis

tinction is made here b etween quasiNewton strategies whichdonotevaluate the Hessian

matrix explicitlyand modied Newton methods for which rst and second derivatives

must b e provided at each p oint The only strategy presently known whichmakes use of

higher than second order prop erties of the ob jective function is due to Biggs

The simplest mo dication of the NewtonRaphson scheme consists of determining the

k  k 

step length s by a line search in the Newton direction v Equation until the

relative optimum is reached eg Dixon a

k  k  k  k  k 

F x s v minfF x sv g

s

Tosave computational op erations the second partial derivatives can b e redetermined

less frequently and used for several iterations Care must always b e taken however that

k  k  k 

v always p oints downhill ie the angle b etween v and rF x is less than



The Hessian matrix must also b e p ositivedenite If the eigenvalues of the matrix

are calculated when it is inverted their signs show whether this condition is fullled

If a negative eigenvalue app ears Pearson suggests pro ceeding in the direction of



the asso ciated eigenvector until a p ointisreached with p ositivedenite r F x Green

stadt a simply replaces negative eigenvalues by their absolute value and vanishing

eigenvalues by unity Other prop osals have b een made to keep the Hessian matrix p ositive

denite by addition of a correction matrix Goldfeld Quandt and Trotter

Shanno a or to include simple gradient steps in the iteration scheme Dixon and

Biggs Further mo dications which op erate on the matrix inversion pro cedure

itself havebeensuggestedby Goldstein and Price Fiacco and McCormick

and Matthews and Davies A go o d survey has b een given by Murray b

Very few algorithms exist that determine the rst and second partial derivatives nu

merically from trial step op erations Whitley see also Wasscher c Wegge

The inevitable approximation errors to o easily cancel out the advantages of the Newton directions

Multidimensional Strategies

DFP DavidonFletcherPowell Metho d

QuasiNewton StrategyVariable Metric Strategy

Much greater interest has b een shown for a group of second order gradient metho ds that

attempt to approximate the Hessian matrix and its inverse during the iterations only from

rst order data This now extensive class of quasiNewton strategies has grown out of

the work of Davidon Fletcher and Powell improved and translated it into

a practical pro cedure The DavidonFletcherPowel l or DFP metho d and some variants

of it are also known as variable metric strategies They are sometimes also regarded

as conjugate gradient metho ds b ecause in the quadratic case they generate conjugate

directions For higher order ob jective functions this is no longer so Whereas the variable

metric concept is to approximate Newton directions this is not the case for conjugate

gradient metho ds The basic recursion formula for the DFP metho d is

k  k  k  k 

x x s v

with

k  k T k 

v H rF x



H I

and

k  k  k 

H H A

k  k 

The correction A to the approximation for the inverse Hessian matrix H is

derived from information collected during the last iteration thus from the change in the

variable vector

k  k  k  k  k 

y x x s v

and the change in the gradientvector

k  k  k 

z rF x rF x

it is given by

k  k  k  k  T k  k T

H z H z y y

k 

A

k T k  k T k  k 

y z z H z

k  k 

The step length s is obtained by a line search along v Equation Since

the rst partial derivatives are needed in any case they can b e made use of in the one

dimensional minimization Fletcher and Powell do so in the context of a cubic Hermi

tian interp olation see Sect A corresp onding ALGOL program has b een

published byWells for corrections see Fletcher Hamilton and Bo othroyd

House The rst derivatives must b e sp ecied as functions which is usually

inconvenient and often imp ossible The convergence prop erties of the DFP metho d have

b een thoroughly investigated eg byBroyden bc Adachi Polak

and Powell abc Numerous suggestions have thereby b een made for im

provements Convergence is achieved if F xisconvex Under stricter conditions it can

b e proved that the convergence rate is greater than linear and the sequence of iterations

Hill climbing Strategies

converges quadratical ly ie after a nite numb er maximum n of steps the minimum of

a quadratic function is lo cated Myers and Huang show that if the same

starting p ointischosen and the ob jective function is of second order the DFP algorithm

generates the same iteration p oints as the conjugate gradient metho d of Fletcher and

Reeves

All these observations are based on the assumption that the computational op erations

k 

including the line searches are carried out exactly Then H always remains p ositive



denite if H was p ositivedenite and the minimum search is stable ie the ob jective

function is improved at each iteration Numerical tests eg Pearson Tabak

Huang and Levy Murtagh and Sargent Himmelblau ab and

theoretical considerations Bard Dixon b show that rounding errors and

esp ecially inaccuracies in the one dimensional minimization frequently cause stability

k 

problems the matrix H can easily lose its p ositivedeniteness without this b eing due

to a singularityintheinverse Hessian matrix The simplest remedy for a singular matrix

k 

H or one of reducedrank is to forget from time to time all the exp erience stored within

k 

H and to b egin again with the unit matrix and simple gradient directions Bard

McCormick and Pearson To do so certainly increases the numb er of necessary

iterations but in optimization as in other activities it is wise to put safety b efore sp eed

Stewart makes use of this pro cedure His algorithm is of very great practical

interest since he obtains the required information ab out the rst partial derivatives from

function values alone by means of a cleverly constructed dierence scheme

Strategy of Stewart

Derivativefree Variable Metric Metho d

k 

Stewart fo cuses his attentiononcho osing the length of the trial step d for the

i

approximation

F x

k 

k 

F x g

x

i

i

k 

x

i

x

to the rst partial derivatives in sucha wayastominimize the inuence of rounding

errors on the actual iteration pro cess Two dierence schemes are available

i h

k  k 

k  k 

g e F x forward dierence F x d

i

i i

k 

d

i

and

i h

k  k  k 

k  k 

e central dierence e F x d F x d g

i i

i i i

k 

d

i

Application of the one sided forward dierence Equation is preferred since

it only involves one extra function evaluation To simplify the computation Stewart

k  k  

intro duces the vector h whichcontains the diagonal elements of the matrix H

representing information ab out the curvature of the ob jective function in the co ordinate

directions

k 

i n runs as follows The algorithm for determining the g i

Multidimensional Strategies

Step

k  k 

jg jjx j

i i

Set max

b c

k 

F x

represents an estimate of the error in the calculation of F x Stewart sets

b

 

and

b c

Step



k  k 

k 

If g h F x

i i

v

u

k 

k 

u

h F x

i

i

u

A

t

dene and

i

i i

k  k  k 

h h g

i i i

i

otherwise dene

v

u

k  k 

k 

u

F x g g

i i



t

A

and

i

i i

k  k  k 



g h h

i i i

i

Step



k  k  k 

Set d signh signg

i

i i i

 

k  k 

if d d

k 

i i

and d



i

k  k 

if d d

i i

k  k 

h d

i i

use Equation otherwise If

k 

g

i

r

k  k  k  k 

 k 

replace d g g F x h

i i i i

k 

h

i

and use Equation Stewart cho oses

Stewarts main algorithm takes the following form

Step Initialization



Cho ose an initial value x accuracy requirements i n and

a

i



initial step lengths d for the gradient determination eg

i

 

if x x



i i

d

i



if x

i





Calculate the vector g from Equation using the step lengths d

i





Set H I h foralli n and k

i

Step Prepare for line search

k  k  k 

Determine v H g

If k go to step

k T k 

If g v go to step

k 

for all i n go to step If h i

Hill climbing Strategies

Step Forget second order information

k  

Replace H H I

k  

h h foralli n

i i

k  k 

and v g

Step Line search and eventual breako

k 

Determine x suchthat

k  k  k 

F x minfF x sv g

s

k  k  k  k 

If F x F x end the search with result x and F x

Step Prepare for inverse Hessian up date

k 

Determine g by the ab ove dierence scheme

k  k  k  k  k  k 

Construct y x x and z g g

k  k 

If knand v and y for all i n

a a

i i i i

k  k 

end the search with result x Fx

Step Up date inverse Hessian

k  k  k 

Construct H H A using Equation and

k 

k  k T k 

z

k  k  k  k 

s g v

k 

i

s g z h h

i i i i

k T k  k T k 

v z v z

for all i n

Step Main lo op termination criterion

If the denominators are nonzero increase k k and go to step

k  k 

otherwise terminate with result x Fx

In place of the cubic Hermitian interp olation Stewart includes a quadratic Lagrangian

interp olation as used byPowell in his conjugate directions strategy Gradient calculations

k 

at the argumentvalues are therebyavoided One p oint x isgiven each time by the

k  k  k 

initial vector of the line search The second x sv is situated in the direction v

at a distance

k 

F x F

m

s min

k T k 

g v

F is an estimate of the value of the ob jective function at the minimum b eing sought It

m

must b e sp ecied b eforehand s is an upp er b ound corresp onding to the length of a

NewtonRaphson step The third argumentvalue is to b e calculated from knowledge of

k  k  k  k T k 

the p oints x and x sv their asso ciated ob jective function values and g v

k  k 

the derivative of the ob jective function at p oint x in direction v The sequence of

Lagrangian interp olations is broken o if at any time the predicted minimum worsens

the situation or lies at such a distance outside the interval that it is more than twice as

far from the next p oint as the latter is from the midp oint

Lill see also Kovacs and Lill has published a complete ALGOL

program for the derivativefree DFP strategy of Stewart it diers slightly from the original

only in the line search Fletcher b rep orts tests that demonstrate the sup eriorityof

Stewarts algorithm to Powells as the number of variables increases

Multidimensional Strategies

Brown and Dennis and Gill and Murrayhave suggested other schemes

for obtaining the partial derivatives numerically from values of the ob jective function

Stewart himself rep orts tests that show the usefulness of his rules insofar as the results are

completely comparable to others obtained with the help of analytically sp ecied deriva

tives This may b e simply b ecause rounding errors are in any case more signicanthere

due to the matrix op erations than for example in conjugate gradient metho ds Kelley

and Myers therefore recommend carrying out the matrix op erations with double

precision

Further Extensions

The ability of the quasiNewton strategy of Davidon Fletcher and Powell DFP to

construct Newton directions without needing explicit second partial derivatives makes

it very attractive from a computational p oint of view All eorts in the further rapid

and intensivedevelopment of the concept have b een directed to mo difying the correction

Equation so as to reduce the tendency to instability b ecause of rounding errors

and inexact line searches while retaining as far as p ossible the quadratic convergence

There has b een a spate of corresp onding prop osals and b oth theoretical and exp erimental

investigations on the sub ject up to ab out for example

Adachi ab

Bass

Broyden abc

Broyden Dennis and More

Broyden and Johnson

Davidon

Dennis

Dixon abc

Fiacco and McCormick

Fletcher ab b bd

Gill and Murray

Goldfarb

Goldstein and Price

Greenstadt

Hestenes

Himmelblau ab

Hoshino

Huang

Huang and Chambliss

Huang and Levy

Jones

Lo otsma ab

Mamen and Mayne

Matthews and Davies

McCormickandPearson

Hill climbing Strategies

McCormick and Ritter

Murray ab

Murtagh

Murtagh and Sargent

Oi Sayama and Takamatsu

Oren

Ortega and Rheinb oldt

Pierson and Ra jtora

Powell abcg abcd

Rauch

Ribiere

Sargent and Sebastian

Shanno and Kettler ab

Sp edicato

Tabak

Tokumaru Adachi and Goto

Werner

Wolfe

Many of the dierently sophisticated strategies eg the classes or families of similar

metho ds dened byBroyden bc and Huang are theoretical ly equivalent

k 

They generate the same conjugate directions v and with an exact line search the same

k 

sequence x of iteration p oints if F x is quadratic Dixon c even proves this

identity for more general ob jective functions under the condition that no term of the

k 

sequence H is singular

The imp ortant nding that under certain assumptions convergence can also b e achieved

without line searches is attributed to Wolfe A recursion formula satisfying these

conditions is as follows

k  k  k 

H H B

where

k  k  k  k  k  k  T

y H z y H z

k 

B

k  k  k  k T

y H z z

The formula was prop osed indep endently by Broyden Davidon Pear

son and Murtagh and Sargent see Powell a The correction matrix

k  k 

B has rank one while A in Equation is of rank two Rank one methods also

k 

called variance methods byDavidon cannot guarantee that H remains p ositivedenite

k  k 

It can also happ en even in the quadratic case that H b ecomes singular or B in

creases without b ound Hence in order to make metho ds of this typ e useful in practice

anumb er of additional precautions must b e taken Powell a Murray c The

following compromise prop osal

k  k  k  k  k 

H H A B

k 

can b e freely chosen is intended to exploit the advan where the scalar parameter

tages of b oth concepts while avoiding their disadvantages eg Fletcher b Broyden

Multidimensional Strategies

bc Shanno ab and Shanno and Kettler give criteria for cho osing suit

k 

able However the mixed correction also known as BFS or BroydenFletcherShanno

formula cannot guarantee quadratic convergence either unless line searches are carried

out It can b e proved that there will merely b e a monotonic decrease in the eigenvalues of

k 

the matrix H From numerical tests however it turns out that the increased number

of iterations is usually more than comp ensated for bythesaving in function calls made

by dropping the one dimensional optimizations Fletcher a Fielding has de

signed an ALGOL program following Broydens work with line searches Broyden

With regard to the numb er of function calls it is usually inferior to the DFP metho d but

it sometimes also converges where the variable metric metho d fails Dixon denes

a correction to the chosen directions

k  k  k  k 

v H rF x w

where



w

and

k  k  T k 

x x rF x

k  k  k  k 

x x w w

k  k  T k 

x x z

by which together with a matrix correction as given by Equation quadratic con

vergence can b e achieved without line searches He shows that at most n function

k 

calls and gradient calculations are required each time if after arriving at v an

iteration

k  k  k  k 

x x H rF x

is included Nearly all the pro cedures dened assume that at least the rst partial deriva

tives are sp ecied as functions of the variables and are therefore exact to the signicant

gure accuracy of the computer used The more costly matrix computations should

wherever p ossible b e executed with double precision in order to keep down the eect of

rounding errors

Just two more suggestions for derivativefree quasiNewton metho ds will b e mentioned

here those of Greenstadt and of Cullum While Cullums algorithm like

Stewarts approximates the gradientvector by function value dierences Greenstadt at

tempts to get away from this Analogously to Davidons idea of approximating the Hessian

matrix during the course of the iterations from knowledge of the gradients Greenstadt

prop oses approximating the gradients by using information from ob jective function values

over several subiterations Only at the starting p ointmust a dierence scheme for the rst

partial derivatives b e applied Another interesting variable metric technique describ ed by

Elliott and Sworder ab combines the concept of the sto chastic approxima

tion for the sequence of step lengths with the direction algorithms of the quasiNewton

strategy

QuasiNewton strategies of degree one are esp ecially suitable if the ob jective func

tion is a sum of squares Bard Problems of minimizing a sum of squares arise

for example from the problem of solving systems of simultaneous nonlinear equations

Hill climbing Strategies

or determining the parameters of a mathematical mo del from exp erimental data non

linear regression and curve tting Such ob jective functions are easier to handle b ecause

Newton directions can b e constructed straightaway without second partial derivatives

as long as the Jacobian matrix of rst derivatives of each term of the ob jective function

is given The oldest iteration pro cedure constructed on this basis is variously known as

the GaussNewton Gauss metho d generalizedleast squares metho d or Taylor

series method It has all the advantages and disadvantages of the NewtonRaphson strat

egy Improvements on the basic pro cedure are given byLevenb erg and Marquardt

Wolfes secant method Wolfe b see also Jeeves is the forerunner of

manyvariants which do not require the Jacobian matrix to b e sp ecied at the start but

construct it in the course of the iterations Further details will not b e describ ed here the

reader is referred to the sp ecialist literature again up to

Barnes JGP

Bauer FL

Beale

Brown and Dennis

Broyden

Davies and Whitting

Dennis

Fletcher

Golub

Jarratt

Jones

Kowalik and Osb orne

Morrison

Ortega and Rheinb oldt

Osb orne

Peckham

Powell b de a

Powell and MacDonald

Rabinowitz

Ross

Smith and Shanno

Spath see also Silverman

Stewart

Vitale and Taylor

Zeleznik

Brent gives further references Peckhams strategy is p erhaps of particular

interest It represents a mo dication of the simplex metho d of Nelder and Mead

and Sp endley and in tests it proves to b e sup erior to Powells strategy

with regard to the numb er of function calls It should b e mentioned here at least that

nonlinear regression where parameters of a mo del that enter the mo del in a nonlinear

way eg as exp onents have to b e estimated in general requires a global optimization

Multidimensional Strategies

metho d b ecause the squared sum of residuals denes a multimo dal function

Reference has b een made to a numb er of publications in this and the preceding chapter

in which strategies are describ ed that can hardly b e called genuine hill climbing meth

o ds they would fall more naturally under the headings of mathematical programming or

functional optimization It was not however the intention to giveanintro duction to the

basic principles of these twovery wide sub jects The interested reader will easily nd out

that although a nearly exp onentially increasing numb er of new b o oks and journals have

b ecome available during the last three decades she or he will lo ok in vain for new direct

search strategies in that realm Such metho ds form the core of this b o ok

Hill climbing Strategies