AAddddiiss AAbbaabbaa UUnniivveerrssiittyy
SScchhooooll ooff GGrraadduuaattee SSttuuddiieess Department of Mathematics
A Graduate project report
On
Derivative Free Optimization
By: Teklebirhan Abraha
A Project submitted to the School of Graduate Studies of Addis Ababa University in Partial fulfillment of the requirements for the Degree of Master of Science in Mathematics
Advisor: Semu Mitku (Ph.D)
Addis Ababa, Ethiopia
January, 2011
Acknowledgment With much pleasure, I take the opportunity to thank all the people who have helped me through the course of my graduate study.
First and foremost I would like to thank my advisor and instructor, Dr. Semu Mitku, for his in valuable guidance, a dvice, e ncouragement a nd pr oviding m e pl enty of m aterials related to my project work, without which this well organized (compiled) project would have be en i mpossible. H is c onstant c onfidence, pe rsistent que stioning a nd de ep knowledge of the subject matter have been and will always be inspiring to me. Finally I would like to express my gratitude to all my; families, friends, and those who supported me in any means to complete this project work.
T/Birhan Abraha
A.A.U
January, 2011
i Abstract We w ill p resent d erivative f ree algorithms w hich opt imize non-linear unconstrained optimization problems of the following kind: min ( ) , : ๐๐ ๐๐ The algorithms developed for๐ฅ๐ฅโโ this ๐๐type๐ฅ๐ฅ of๐ค๐คโ๐๐๐๐๐๐ problems๐๐ โ areโ categorized โ as one-dimensional search ( golden s ection and F ibonacci) m ethods a nd multidimensional s earch m ethods (Powellโs method and trust region). These algorithms will, hopefully, find the value of for which is the lowest. ๐ฅ๐ฅ The dimension๐๐ n of the search space must be lower than some number (say 100). We do NOT have to know the derivatives of ( ). We must only have a code which evaluates
( ) for a given value of . Each component๐๐ ๐ฅ๐ฅ of the vector must be a continuous real ๐๐parameter๐ฅ๐ฅ of ( ). ๐ฅ๐ฅ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ
ii Table of contents Introduction ...... 1 Chapter 1 Preliminary concepts ...... 4 1.1 Nonlinear optimization methods ...... 4 Need for derivative free optimization ...... 4 1.2 Overview of differentiable nonlinear unconstrained optimization methods ...... 5 1.2.1 The problem ...... 5 . .2. Solution concepts ...... 6
๐๐. ๐๐. Basic concepts of the methods ...... 6 ๐๐. ๐๐. ๐๐ Necessary conditions for solutions of differentiable optimization problems ...... 8 1.3๐๐ Methods๐๐ ๐๐ of solving nonlinear differentiable optimization problems ...... 9 1.3.1 Gradient method (first order derivative method) ...... 9 . . Newtonโs method (second order derivative method) ...... 12
๐๐. ๐๐. ๐๐ Line search methods ...... 14 ๐๐. ๐๐. ๐๐ Trust region methods ...... 20 Chapter๐๐ ๐๐ 2๐๐ Derivative Free Optimization Methods ...... 23 2.1 What is derivative free optimization ...... 23 2.2 line search methods ...... 24 2.2.3 Interpolation Methods ...... 36 2.3. Multidimensional search...... 42 2.3.1 Unvariate Method ...... 42 2.3.2 Pattern Directions ...... 45 2.3.4 POWELLโS METHOD ...... 46 Chapter 3 Trust region methods ...... 55 3.1Trust region frame work ...... 55 3.2 Quadratic interpolation ...... 56 Appendix(MATLAB codes) ...... 62 References ...... 71
Declaration Letter
I, Teklebirhan Abraha, declare that this project has been composed by me and that no part of
the project has formed the basis for the award of any Degree, Diploma, Associate ship,
Fellowship or any other similar title to me.
T/Birhan Abraha
______
Addis Ababa University
January, 2011
iii
Permission Letter
This is to certify that this project is compiled by Mr. Teklebirhan Abraha in the department of
Mathematics, College of Mathematics and Computational Sciences, Addis Ababa University, under my supervision.
Semu Mitku (Ph.D)
______
Addis Ababa University
January, 2011
iv Introduction In the process of solving optimization problems, it is well known that expensive useful information i s c ontained i n t he de rivatives of t he ob jective f unction one w ishes t o optimize. After all the standard mathematical characterization of a local minimum given by the first order necessary conditions, requires a continuous differentiability of functions that th e f irst o rder d erivatives a re zero. H owever, f or a v ariety o f re asons th ere have always b een many i nstances w here ( at l east so me) d erivatives ar e u navailable o r unreliable. Nevertheless, under such circumstances, it may still be desirable to carry out optimization [1].
Consequently c lasses of nonl inear opt imization t echniques c alled de rivative-free optimization m ethods a re ne eded. I n f act w e c onsider opt imization t echniques w ithout derivatives as one of the most important and challenging areas in computational science and e ngineering. D erivative f ree op timization ( DFO) i s de veloped to date f or s olving small di mensional p roblems(less t han 100 va riables) in which t he c omputation of a n objective function is relatively expensive and the derivatives of the objective functions are not a vailable. P roblems o f t his na ture m ore a nd m ore a rise i n m odern phy sical, chemical and econometric measurements and in engineering applications where computer simulation is employed for the evaluation of the objective function [6].
There are two important components of derivative free methods, sampling better points in the iteration procedure is the first one of these components. The other one is searching appropriate subspaces where the chance of finding a minimum is relatively high. In order to be able to use the extensive convergence theory for derivative based methods, these derivative f ree methods ne ed t o s atisfy s ome pr operties. F or i nstance, to gua rantee t he convergence of a derivative free method, we need to ensure that the error in the gradient converges to zero when the trust region or line search steps are reduced. Hence a descent step will be found if the gradient of the trust function is not zero at the current iterate.
1
The problem of minimizing a nonlinear function : of several variables when the ๐๐ derivatives of the function are not available is attempted๐๐ โ โ โto be solved by the derivative free methods (DFM). The formal statement for the above problem can be written as
min ( )
( ) s.t ๐๐ ๐ฅ๐ฅ ( ) ( = 1,2 โฆ , ) ( )
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ โค ๐๐ ๐ฅ๐ฅ โค ๐๐ ๐๐ ๐๐ โ ๐๐ Where ( )cannot ๐ฅ๐ฅbe computed โ ๐๐โโ or just does not exist for every . Here is an arbitrary subset ofโ๐๐ ๐ฅ๐ฅ , is a n e asy constraint. While th e f unctions๐ฅ๐ฅ ( ) (๐ฅ๐ฅ = 1, 2, โฆ , ) ๐๐ represent diโ fficult๐ฅ๐ฅ c onstraints. โ๐๐ B y e asy c onstraints w e m ean bound๐๐๐๐ ๐ฅ๐ฅc onstraints๐๐ on t๐๐ he variables, linear constraints or more generally nonlinear smooth constraints whose values and the Jacobi matrix can be computed cheaply(easily).
Difficult constraints are only nonlinear constraints whose value is expensive (difficult) to compute and whose derivatives are unavailable for a small repetition and preparation, we include some basic concepts of differentiable optimization in this project.
If the problem function in ( ) above is differentiable then 0 is local minimizer of ( ) 2 and hence โ ( 0) = 0 andโ if ( 0) = 0 and ( 0) is๐ฅ๐ฅ positive definite then 0๐๐ is a lo cal m inimizer๐๐ ๐ฅ๐ฅ o f . B ut i n mโ๐๐ ost๐ฅ๐ฅ of t he c asesโ f๐๐ inding๐ฅ๐ฅ a lgebraically a poi nt ๐ฅ๐ฅ0 2 โ such that (๐๐ ) = 0 may be difficult. Or computing ( ) or ( )P may๐ฅ๐ฅ beโ ๐๐ expensive๐๐ โ as the functionโ๐๐ ๐ฅ๐ฅ cannot (may not) be defined explicitly โ๐๐or even๐ฅ๐ฅ theโ ๐๐function๐ฅ๐ฅ may not be differentiable at all. In this case numerical methods with derivative free algorithms are required. The methods such as line search and trust region methods will be discussed in this project. Because the line search without derivatives and the trust-region methods algorithms are used to solve Optimization problems without derivatives.
The f irst c hapter o f th is p roject d eals w ith n onlinear o ptimization p roblems a nd t he methods of solving these problems. The second and the third chapters are on de rivative free opt imization m ethods. P articularly in the second c hapter w e i nclude l ine search methods w hich he lp us t o s olve o ne di mensional m inimization p roblems such as t he 2
golden s ection m ethod a nd t he Fibonacci method, a nd be st kno wn methods f or minimization o f multidimensional sear ch m ethods l ike P owellโs m ethod ar e d iscussed well. Finally in the last chapter we include the trust region method which is one of the methods for derivative free optimization methods.
3
Chapter 1 Preliminary concepts
1.1 Nonlinear optimization methods Optimization methods can be classified as d erivative based and derivative free methods depending on their use of derivative (or absence of it) in the process of finding a solution.
Derivative based o ptimization m ethods ar e ch aracterized b y: E xplicitly u ses th e derivatives of the objective function, analytical solution is possible, and convergence is faster. A nd d erivative free o ptimization m ethods ar e ch aracterized b y:only o bjective function evaluation, not require derivative information, and handle noisy functions(since methods only rely on function comparisons)
Need for derivative free optimization Some of the reasons to apply derivative free optimization (DFO) methods are: Growing sophistication of computer hard ware and mathematical algorithms and software(which opens ne w pos sibilities f or opt imization), D erivative e valuations c osty a nd noi sy(one canโt t rust d erivatives o r ap proximate t hem b y f inite d ifference), B inary co des ( source codes not available or owned by a company) making automatic differentiation impossible to a pply, L egacy c odes(written i n the pa st a nd not m aintained by t he original a uthor), Lack of sophistication of the user(the user need improvement but want to use something simple)
With th e c urrent s tate o f th e ar t D FO methods w e can su ccessfully address p roblems where:
. The evaluation of derivatives is expensive and/or computed with noise (and for which accurate finite difference derivative estimation is ruled out). . The number of variables do not exceed ,say , a hundred(in serial computation) . The functions are not excessively non smooth. . Rapid assumption convergence is not of primary importance . Only a few digits of accuracy are required
4
It i s ha rd to m inimize non c onvex f unctions w ithout de rivatives, how ever, i t is generally a ccepted that D FO methods ha ve t he a bility t o f ind โ goodโ l ocal optimization
1.2 Overview of differentiable nonlinear unconstrained optimization methods
1.2.1 The problem Optimization pr oblems c an be di vided i nto two l arge classes, na mely constrained an d unconstrained problems. The basic unconstrained optimization problem can be stated in its standard form as
( ), (1.1) ๐๐ Where๐๐๐๐๐๐๐๐ ๐๐: ๐๐๐๐๐๐ ๐๐ ๐ฅ๐ฅ is๐ ๐ ๐ ๐ the๐ ๐ ๐ ๐ objective๐ ๐ ๐ ๐ ๐ ๐ ๐ก๐ก๐ก๐ก ๐ฅ๐ฅ๐ฅ๐ฅโ function. On the other hand, constrained optimization ๐๐ problems๐๐ canโ โbe โwritten as
( ) (1.2 )
๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐ ๐ฅ๐ฅ , (1.2๐๐) ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ (๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ) ๐ก๐ก๐ก๐ก0 ๐ฅ๐ฅ๐ฅ๐ฅโ (1.2 )๐๐
๐๐ ๐๐( ๐ฅ๐ฅ) =โค 0 ๐๐ โ๐ผ๐ผ (1.2๐๐ )
๐๐ conditions (1.2 1.2 )๐๐ indicate๐ฅ๐ฅ t he๐๐ co nstraints. โโฐ T he d isjoint๐๐ i ndex sets
correspond to t๐๐ he i โ nequality๐๐ a nd e quality c onstraints, respectively, defined ๐ผ๐ผby๐๐๐๐๐๐ t heโฐ functions : , . the set is contained in and is also contained in the ๐๐ ๐๐ domain of๐๐ ๐๐ andโ โ, โ ๐๐. โ ๐ผ๐ผโโฐ ๐๐ โ ๐๐ A poi nt ๐๐ is said๐๐ ๐๐ t o b e โ f ๐ผ๐ผโโฐ easible i f i t sat isfies a ll t he co nstraints. A nd t he set o f a ll
feasible points๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ is called the feasible set, and denoted by .
The formulations (1.1) and (1.2) are called standard formulationsโฑ due to the observation that
max ( ) = min( ( ))
๐๐ ๐ฅ๐ฅ โ โ๐๐ ๐ฅ๐ฅ 5
1.2.2. Solution concepts The solution of an optimization problem can be characterized by c ertain properties. In a minimization problem, if we are looking for a point of such that โ ( ) ( )for all , ๐ฅ๐ฅ ๐๐๐๐ ๐ก๐กโ๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐ ๐๐ โ ๐๐Then๐ฅ๐ฅ โค ๐๐is๐ฅ๐ฅ cal led a๐ฅ๐ฅ g lobal โ๐๐ m inimizer, w here as ( ) is t he gl obal m inimum o f . โ โ Similarly๐ฅ๐ฅ in, a constrained problem, the solution must๐๐ lie๐ฅ๐ฅ in the feasible set , and thus,๐๐ a global constrained minimizer satisfies โฑ โ ( ) ( ) for๐ฅ๐ฅ all . โ However, in both๐๐ ๐ฅ๐ฅ cases,โค ๐๐finding๐ฅ๐ฅ a global๐ฅ๐ฅ minimizer โโฑ of a function can prove to be very
difficult in pr actice. I t m ight be i nteresting, t hus, to look f or๐๐ a solution in a โ neighborhood of , such that ๐ฅ๐ฅ โ โต ๐ฅ๐ฅ โต โ( ๐๐ ) ( ), for all (feasible) in (1.3) โ The point is then called๐๐ a ๐ฅ๐ฅlocalโค minimizer ๐๐ ๐ฅ๐ฅ where as ( ๐ฅ๐ฅ ) isโต a local minimum of in โ โ . If is๐ฅ๐ฅ such that ๐๐ ๐ฅ๐ฅ ๐๐ โ โต ๐ฅ๐ฅ ( ) < ( ), for all (feasible) (1.4) โ Then is said to๐๐ be๐ฅ๐ฅ a st๐๐ rict๐ฅ๐ฅ local minimizer, and๐ฅ๐ฅ โ โต( ) is strict local minimum of โ โ in . ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐๐
โต . . Basic concepts of the methods Derivatives and Taylorโs Theorem ๐๐ ๐๐ ๐๐ All methods considered here are based on the fact that the function to be minimized has derivatives. If is differentiable and all derivatives of are continuous with respect to , 1 then we say that๐๐ is continuously differentiable and is ๐๐denoted by . ๐ฅ๐ฅ
If all the second๐๐ partial derivatives of exist, then is said to be๐๐ twice โ๐ถ๐ถ differentiable. If further m ore a ll s econd pa rtial de rivatives๐๐ of f a re๐๐ c ontinuous w e s ay t hat is t wice 2 continuously differentiable and is denoted by . ๐๐ 6๐๐ โ๐ถ๐ถ
The gradient of is a vector that groups all its partial derivatives and is denoted by
๐๐ ( )
1 ๐๐๐๐ ๐ฅ๐ฅ โ ๐๐๐ฅ๐ฅ( )โ โ โฎ โ ๐๐๐๐ ๐ฅ๐ฅ ๐๐ โ ๐๐๐ฅ๐ฅ 2 โ ( ) 2 ( ) 2 ๐๐ ๐๐ 1๐ฅ๐ฅ ๐๐ 1๐๐ ๐ฅ๐ฅ ร 2 ( ) = The defined a s ๐๐๐ฅ๐ฅ ๐๐๐ฅ๐ฅ ๐๐๐ฅ๐ฅ๐๐ is ca lled the H essian 2 ( ) โฏ 2 ( ) โ 2 โ ๐๐ ๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ โ ๐๐ ๐ฅ๐ฅ โ ๐๐ ๐๐โฎ๐ฅ๐ฅ 1 โฑ๐๐ ๐๐ โฎ ๐ฅ๐ฅ โ matrix of . ๐๐๐ฅ๐ฅ๐๐ ๐๐๐ฅ๐ฅ โฏ ๐ฅ๐ฅ๐๐ โ โ The curvature๐๐ of at along a direction is given by ๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ๐ฅ๐ฅโ , 2f(x)d๐๐๐๐โ
โฉ๐๐ โ โช If 2 ( ) is a positive-semi definite for everyโ๐๐โ in the domain of .we say that is a 2 ( ) convexโ ๐๐ function.๐ฅ๐ฅ I f is positive d efinite ๐ฅ๐ฅin i ts domain w e ๐๐ say th at is s trictly๐๐ convex. โ ๐๐ ๐ฅ๐ฅ ๐๐
Theorem1.1 (Taylorโs Theorem)
Let : be c ontinuously di fferentiable a nd then there ex ists so me [ ] ๐๐ ๐๐ 0,1๐๐ sโuchโถ that โ ๐ ๐ ๐ ๐ โ
๐ก๐ก๐ก๐ก( + ) = ( ) + ( + ), . Moreover, if is twice continuously differentiable, ( ) ( ) 1 2 then๐๐ ๐ฅ๐ฅ ๐ ๐ +๐๐ ๐ฅ๐ฅ= โฉโ๐๐ ๐ฅ๐ฅ+ 0๐ก๐ก๐ก๐ก ๐ ๐ โช( + ) ๐๐ โ๐๐ ๐ฅ๐ฅ ๐ ๐ โ๐๐ ๐ฅ๐ฅ โซ โ ๐๐ ๐ฅ๐ฅ ๐ก๐ก๐ก๐ก ๐ ๐ ๐ ๐ ๐ ๐ Furthermore, we have that for some (0,1)
๐ก๐ก๐ก๐ก 1 ( + ) = ( ) + ( ), + , 2 ( + ) (1.5) 2 ๐๐ ๐ฅ๐ฅ ๐ ๐ ๐๐ ๐ฅ๐ฅ โฉโ๐๐ ๐ฅ๐ฅ ๐ ๐ โช โฉ๐ ๐ โ ๐๐ ๐ฅ๐ฅ ๐ก๐ก๐ก๐ก ๐ ๐ โช A function is said to be differentiable if all its partial derivatives ( )/( ) ๐๐ ๐๐ exist. โ โ โ ๐๐๐๐๐๐ ๐ฅ๐ฅ ๐๐๐ฅ๐ฅ๐๐ 7
. . Necessary conditions for solutions of differentiable nonlinear optimization problems (The case of unconstrained problems) All๐๐ ๐๐ the๐๐ unconstrained minimization methods are iterative in nature and hence they start from a n in itial tr ial s olution a nd p roceed to ward th e m inimum p oint in a s equential manner. The iterative process is given by
+ = + (1.6)
๐ฅ๐ฅWhere๐๐ ๐๐ ๐ฅ๐ฅ๐๐ is the๐ผ๐ผ๐๐ starting๐ ๐ ๐๐ point, is the search direction, is the optimal step length, and +1 is ๐ฅ๐ฅth๐๐ e f inal p oint in i teration๐๐๐๐ . It i s im portant to๐ผ๐ผ n๐๐ ote th at a ll the u nconstrained minimization๐ฅ๐ฅ๐๐ methods (1) require an๐๐ initial point 1 to start the iterative procedure, and (2) differ from one another only in the method of generating๐ฅ๐ฅ the new point + (from ) and in testing the point +1for optimality. ๐ฅ๐ฅ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ Rate of Convergence๐ฅ๐ฅ๐๐ Different iterative optimization methods have different rates of convergence. In general, an optimization method is said to have convergence of order p if
+1 , 0, 1 (1.7) โ โ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅ โ โ ๐๐ โค ๐๐ ๐๐ โฅ ๐๐ โฅ where and +1 denoteโ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅ theโ poi nts ob tained a t t he e nd of i terations and + 1, respectively,๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐represents the optimum point, and | | denotes the length or๐๐ norm๐๐ of the โ vector : ๐ฅ๐ฅ ๏ฟฝ ๐ฅ๐ฅ ๏ฟฝ
๐ฅ๐ฅ 2 2 2 || || = ( 1 + 2 + + )
๐๐ If = 1 and 0 1, the๐ฅ๐ฅ method๏ฟฝ is๐ฅ๐ฅ said๐ฅ๐ฅ to beโฏ linearly๐ฅ๐ฅ convergent (corresponds to
slow๐๐ c onvergence).โค ๐๐ I fโค = 2, the m ethod i s said to be quadratically c onvergent (corresponds to fast convergence).๐๐ An optimization method is said to have super linear convergence (corresponds to fast convergence) if
lim +1 0 (1.8) โ โ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅ โ ๐๐โโ โ โ The definitions of rates of convergenceโ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅgivenโ in Eqs. (1.7) and (1.8) are applicable to single v ariable a s w ell as m ultivariable o ptimization p roblems. In th e c ase o f s ingle- variable problems, the vector, for example, degenerates to a scalar, .
๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ 8
Theorem 1.2 (First- Order Necessary Conditions) If a l ocal minimizer โ of : , where f is a continuously differentiable in an open๐ฅ๐ฅ ๐๐ ๐๐neighborhood of , ๐๐ โ then๐๐ โ โถ โ โต ๐ฅ๐ฅ
( ) = 0 (1.9) โ 2( ) Exists a nd i s c ontinuousโ๐๐ ๐ฅ๐ฅ i n a ne ighborhood of , w e ca n st ate an other โ ๐๐necessary๐๐ โ ๐ฅ๐ฅ condition satisfied by a local minimizer. ๐ฅ๐ฅ
Theorem 1.3 (Second order necessary conditions) If is a local minimizer of , and โ is twice continuously differentiable in an open neighborhood๐ฅ๐ฅ of , then ๐๐ ๐๐ โ ( )=0 and 2 ( ) is positive-semi definite.โต ๐ฅ๐ฅ (1.10) โ โ Any point โ๐๐that ๐ฅ๐ฅsatisfies (1.9)โ ๐๐ is๐ฅ๐ฅ called a stationary point of . Thus Theorem1.2 states โ that any local๐ฅ๐ฅ minimizer must be a stationary point; it is important๐๐ to note, however, that the converse is not necessarily true. Fortunately, if the next conditions called sufficient conditions are satisfied by a stationary point they guarantee that it is a local minimizer. โ Theorem 1.4 (Second -Order Sufficient ๐ฅ๐ฅConditions): Let be t wice c ontinuously
differentiable on an open neighborhood of . If satisfies ๐๐ โ โ ( ) = 0, and 2 ( ) is positive definiteโต ๐ฅ๐ฅ then ๐ฅ๐ฅ is a strict local minimizer of . โ โ โ โ๐๐Note๐ฅ๐ฅ that the secondโ ๐๐ order๐ฅ๐ฅ sufficient conditions are๐ฅ๐ฅ not necessary: a point can be ๐๐a strict local minimizer and fail to satisfy the sufficient conditions.
1.3 Methods of solving nonlinear differentiable optimization problems
. . Gradient method (first order derivative method) Mathematical pr ogramming i s c oncerned w ith methods t hat c an be us ed t o s olve ๐๐ ๐๐ ๐๐ optimization problems. In practice we will be concerned with algorithms, defined so that their computational implementation finds either an approximate an exact solution to the original mathematical programming problem [7].
9
These algorithms are mostly iterative methods, which form a starting point 0, use some { } rule t o compute a s equence of poi nts 1, 2,โฆ, ,โฆ ๐ฅ๐ฅsuch t hat lim = , ๐๐ โ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ โ ๐๐ ๐๐โ ๐ฅ๐ฅ ๐ฅ๐ฅ where the solution to the problem. โ Here w๐ฅ๐ฅ e ๐๐a๐๐ re m ostly i nterested i n unc onstrained pr oblems. T he s implest m ethod developed for the solution of a minimization problem is the steepest descent method. This method is based on the fact that, from any starting point , the direction in which any
function decrease most rapidly is the direction ( ) . ๐ฅ๐ฅ
Definition๐๐1.1(descent direction) let : beโโ๐๐ differentiable๐ฅ๐ฅ at 0 ๐๐ ( ๐๐ ) ( ) is a descent direction of at if๐๐ thereโ โ exists โ > 0 such that ๐ฅ๐ฅฬ + โโ ๐ก๐กโ๐๐๐๐< โ ๐๐ for โ ๐๐ allโ (0, ]. ๐๐ ๐ฅ๐ฅฬ ๐ฟ๐ฟ ๐๐ ๐ฅ๐ฅฬ ๐๐๐๐ ๐๐ ๐ฅ๐ฅฬ
Consider๐๐ โ ๐ฟ๐ฟ
min f(x) where f n
๐๐ ๐ฅ๐ฅโโ โถ โ โ โ If is continuously differentiable at ( ) 0 then there are infinitely ๐๐ many๐๐ de scent di rection of f at ๐ฅ๐ฅ๐๐ โ โthat ๐๐ ๐๐๐๐i s๐๐ t๐๐ โ here๐๐ ๐ฅ๐ฅฬ exโ ists {0} such ๐๐ that ( ) < 0. ๐ฅ๐ฅ๐๐ ๐๐๐๐ โ โ โ ๐๐ ๐๐ ๐๐ To findโ๐๐ the๐ฅ๐ฅ direction๐๐ in which f decreases most rapidly we solve the problem
min ( ) = 1 to get ๐๐ ๐๐ ๐๐๐๐ โโ ๐๐ ๐๐ ๐๐ โ๐๐ ๐ฅ๐ฅ ๐๐ ๐ค๐ค๐ค๐ค๐ค๐คโ โ๐๐ โ ( ) = ( ๐๐ ) ๐๐ โ๐๐ ๐ฅ๐ฅ ๐๐ โ ๐๐ ( ) โโ๐๐ ๐ฅ๐ฅ โ Now = ( ) or = , will be , us ed f or s earching di rections. T his ( ) โ๐๐ ๐ฅ๐ฅ๐๐ method๐๐ ๐๐i s cโโ๐๐ alled ๐ฅ๐ฅt he๐๐ s teepest๐๐๐๐ deโ โ scentโ๐๐ ๐ฅ๐ฅ๐๐ mโ ethod. T he f ollowing a lgorithm was gi ven by Cauchy [8].
Algorithm (Method of steepest descent)
10
Given : continuously differentiable, at each iteration k, find the lowest ๐๐ ( ) โ ๐๐ point of๐๐ f โin theโ โdirection of form ๐ฅ๐ฅ thatโ โis find that solves โโ๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐๐ min ( ( )) , +1 = ( ) >0 ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐ผ๐ผ ๐๐ ๐ฅ๐ฅ โ ๐ผ๐ผ โ๐๐ ๐ฅ๐ฅ ๐ก๐กโ๐๐๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ โ ๐ผ๐ผ โ๐๐ ๐ฅ๐ฅ And then we continue until ( ) = 0.
โ๐๐ ๐ฅ๐ฅ๐๐ If ( ) = 0, t hen +1 = , the a lgorithm s tops. H owever may b e ei ther l ocal 2 minimizerโ๐๐ ๐ฅ๐ฅ๐๐ or saddle point๐ฅ๐ฅ๐๐ i.e.,๐ฅ๐ฅ ๐๐ ( ) is either positive semi-definite๐ฅ๐ฅ๐๐ or indefinite. ๐๐ We know that from second orderโ ๐๐Taylor๐ฅ๐ฅ expansion for sufficiently small at
๐๐ 1 ๐ผ๐ผ ๐ฅ๐ฅ ( + ) = ( ) + ( ) + 2 ( ) 2 ๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐ฅ๐ฅ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผโ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐ ๐ผ๐ผ ๐๐๐๐ โ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐ If the searching direction is the eigen vector corresponding to the largest negative eigen value say for the corresponding value of 2 ( ) ( ) = 0, we have
๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ ๐๐ โ ๐๐ ๐ฅ๐ฅ1 ๐๐๐๐ ๐ฅ๐ฅ ๐ค๐คโ๐๐๐๐๐๐ โ๐๐ ๐ฅ๐ฅ ( + ) = ( ) + ( ) + 2 2 ( ) ( ) 2 ๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐ผ๐ผ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผโ๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ ๐ผ๐ผ ๐ฅ๐ฅ โ ๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ โค๐๐ ๐ฅ๐ฅ๐๐ This implies that ( + ) < ( ).
๐๐ ๐๐ The steepest descent๐๐ ๐ฅ๐ฅ method๐ผ๐ผ๐ผ๐ผ is a ๐๐line๐ฅ๐ฅ search method that moves along = ( ) at
each step it can choose the step length in various ways. One advantage๐๐๐๐ of โโ๐๐this method๐ฅ๐ฅ๐๐ is i t r equires onl y c alculations of ๐ผ๐ผ(๐๐ ) but i t doe snot r equire s econd de rivative. However i t c an be s low on di fficult โ๐๐pr oblems,๐ฅ๐ฅ๐๐ t hat i s t his method us ually w orks qui te well dur ing t he e arly steps of t he opt imization pr ocess, depending on t he po int of initialization. The method of steepest descent is the simplest of the gradient methods. The choice of direction is where f decreases most quickly which is the direction opposite to
( ). the search starts at arbitrary point 0 and then slides down the gradient until we
โ๐๐close๐ฅ๐ฅ ๐๐enough to the solution. The iterative procedure๐ฅ๐ฅ is
+1 = f( ) = ( ), where ( ) is the gradient at one given point.
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ Now๐ฅ๐ฅ the ๐ฅ๐ฅquestionโ ๐ผ๐ผ โ is ๐ฅ๐ฅhow big๐ฅ๐ฅ shouldโ ๐ผ๐ผ ๐๐ the๐ฅ๐ฅ step taken๐๐ in๐ฅ๐ฅ that direction be, that is what is the value of ? obviously w e w ant t o m ove t he poi nt w here t he f unction f t akes on a 11 ๐ผ๐ผ๐๐
minimum v alue, w hich i s w here t he d irectional d erivative i s zer o. The d irectional derivative is given by
( +1) = ( +1) +1 = ( +1) ( ) Setting t his e xpression t o ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐๐ผ๐ผzero,๐๐ we๐ฅ๐ฅ see thatโ๐๐ ๐ฅ๐ฅ shouldโ ๐๐๐ผ๐ผ be ๐ฅ๐ฅchoosenโโ๐๐ so that๐ฅ๐ฅ f( โ+1 ๐๐ )๐ฅ๐ฅ ( ) are orthogonal. The
next step is taken ๐ผ๐ผin๐๐ the direction of the negativeโ gradient๐ฅ๐ฅ๐๐ ๐๐ ๐๐๐๐at this๐๐ ๐ฅ๐ฅ new๐๐ point and we may get a zigzag pattern.
However the steepest descent method can be extremely slow for some problems. There are f ortunately, s everal ot her m ethods t hat w ork ve ry w ell i n p ractice, a nd he re w e present some of them.
. . Newtonโs method (second order derivative method) Consider any nonlinear system of equations of the form ๐๐ ๐๐ ๐๐ ( ) = 0 (1.14)
F: n n. If the Jacobi of exists,๐น๐น ๐ฅ๐ฅthen we can write the Taylor first- order approximation๐๐โ๐๐๐๐๐๐ โ โ to โthis function as ๐น๐น
( + ) ( ) + ( ) (1.15)
Where ( ) denotes the Jacobean๐น๐น of๐ฅ๐ฅ F evaluated๐ ๐ โ ๐น๐น ๐ฅ๐ฅ at x.๐ฝ๐ฝ form๐ฅ๐ฅ ๐ ๐ these equations, we can derive
an iterative๐ฝ๐ฝ ๐ฅ๐ฅ method. Given an initial point 0, at each iteration , we will compute a new ( ) iterate +1 = + such that + ๐ฅ๐ฅ = 0 which means๐๐ that must satisfy the linear system๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐ ๐น๐น ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐ ๐ ๐ ๐๐
( ) = ( )
๐๐ ๐๐ ๐๐ This is called the Newtonโs๐ฝ๐ฝ ๐ฅ๐ฅ method๐ ๐ โ๐น๐น for ๐ฅ๐ฅsolving nonlinear systems of equations.
Now r eturning t o our o ptimization problem, w e not e that when t he f irst a nd s econd derivatives of are av ailable, we c an us e N ewtonโs m ethod t o s olve t he ( possible
nonlinear) system๐๐ of equations defined by
( ) = 0. (1.16) 12 โ๐๐ ๐ฅ๐ฅ
Since w e know f orm Theorem 1.2 above t hat any minimizer of must s atisfy th is condition. This is the basis for the Newtonโs method for optimization ๐๐problems, and with some va riables, i t i s t he ba sis f or m any ot her methods i n unc onstrained opt imization. More formally if we want to apply this method to equation (1.16). we also know that the second order Taylor approximation to f at is
๐๐ ๐ฅ๐ฅ 1 ( + ) ( ) + ( ), + , 2 ( ) (1.17) 2 ๐๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐ โฉโ๐๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐ โช โฉ๐ ๐ ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐ โช In order to find a minimum of this function, we try to find a solution to ( + ) = 0
which is equivalent to โ๐๐ ๐ฅ๐ฅ๐๐ ๐ ๐ ๐๐
( ) + 2 ( ) = 0.
๐๐ ๐๐ ๐๐ Thus weโ๐๐ have๐ฅ๐ฅ thatโ ๐๐ must๐ฅ๐ฅ ๐ ๐ satisfy the so called Newton equations
๐๐ ๐ ๐ 2 ( ) = ( ) (1.18)
๐๐ ๐๐ ๐๐ If 2 ( ) positiveโ de๐๐ finite๐ฅ๐ฅ ๐ ๐ ( PD),โโ๐๐ t hen๐ฅ๐ฅ w e c an f ind i ts i nverse, a nd t he s olution t o
(1.17โ)๐๐ is๐ฅ๐ฅ ๐๐
1 = 2 ( ) ( ) (1.19) โ ๐ ๐ ๐๐ โ๏ฟฝโ ๐๐ ๐ฅ๐ฅ๐๐ ๏ฟฝ โ๐๐ ๐ฅ๐ฅ๐๐ This direction is called the Newton direction.
๐๐ If 2 ( ) is๐ ๐ not positive definite the Newtonโs method (direction) is not suitable for
searchโ ๐๐ di๐ฅ๐ฅ rection.๐๐ O ne of t he s trategy t o ov ercome s uch pr oblem i s m odification of t he Hessian matrix 2 ( ) during or before the method of solution 2 ( ) = ( )
so that it becomesโ ๐๐positive๐ฅ๐ฅ๐๐ definite. โ ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐ โโ๐๐ ๐ฅ๐ฅ๐๐
One of the problems in Newtonโs method is that the method is based on a necessary first order optimality condition (namely that the gradient of the objective function be equal to zero). In order to guarantee that we have found a minimizer of , it is also necessary to 2 ( ) guarantee t hat t he H essian be pos itive de finite moreover๐๐ t he a pproximations โ โ ๐๐ ๐ฅ๐ฅ
13
(1.15) and 1.17) are only valid in a neighborhood of the solution of (1.15) and (1.16) respectively.
Thus N ewtonโs m ethod is onl y a ppropriate w hen t he s tarting poi nt 0 is s ufficiently
close to the solution . However, when it works, it is very fast and most๐ฅ๐ฅ optimization โ methods try to mimic ๐ฅ๐ฅits behavior around the solution.
There ar e so cal led globalization t echniques t hat can b e u sed t o g uarantee t he convergence of Newtonโs method from any starting point. These techniques give rise to different methods which can be divided in to two classes: line search methods and trust region m ethods. T he main d ifference b etween these t wo c lasses is t hat i n l ine sea rch methods, t he di rection i n w hich w e c hoose t o t ake our ne xt i teration i s s elected f irst, while th e s ize o f th e s tep to b e ta ken in th is d irection is c omputed w ith t he di rection fixed. On the other hand, in trust region methods, the step size and the direction are more or less chosen simultaneously [7]. The two strategies are discussed below in more details.
. . Line search methods Although Newtonโs di rection i s e ffective for s earching directions, t he method may not ๐๐ ๐๐ ๐๐ converge to the solution. The local model may not be a good representative for the given (objective) function. H ence w e h ave t o ba cktrack. T he s trategy we c onsider f or proceeding from a solution estimate outside the convergence of Newtonโs method is the method of line searches.
As the name suggests, the idea behind line search methods is to find a step size along a certain line which gives us a good reduction on the function value, while being reasonably in expensive to compute. More formally, they are iterative methods that, at every step, choose a certain descent direction and move along this direction. Each iteration of a line search method computes a search direction and then decides how far to move along that direction. The iteration is given by ๐๐๐๐ +1 = +
๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐ Where, t he p ositive sca lar , is c alled a s tep l ength, a nd can b e ch osen, as t he
Newton direction given by๐ผ๐ผ ๐๐(1.19). ๐๐๐๐ ๐ ๐ ๐๐ 14
Moreover, the search direction often has the form = 1 ( ) (1.20) โ ๐๐ ๐๐ ๐๐ Where is a symmetric and nonsingular matrix.๐๐ In theโ๐ฝ๐ฝ steepestโ๐๐ ๐ฅ๐ฅ descent method is 2 ( ) simply the๐ฝ๐ฝ๐๐ identity matrix I, while in Newtonโs method is the exact Hessian ๐ฝ๐ฝ๐๐ . in Quasi Newton method, is an approximation to the๐ฝ๐ฝ Hessian๐๐ that is updatedโ at๐๐ every๐ฅ๐ฅ๐๐ iteration, by m eans o f a l๐ฝ๐ฝ ow๐๐ r ank f ormula. Where is de fined by (1.20) and is positive definite we have ๐๐๐๐ ๐ฝ๐ฝ๐๐
( ) = ( ) 1 ( ) < 0. ๐๐ ๐๐ โ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐andโ๐๐ therefore๐ฅ๐ฅ โโ๐๐ is๐ฅ๐ฅ a descent๐ฝ๐ฝ โ๐๐ direction.๐ฅ๐ฅ
๐๐ Now in the following๐๐ we study how to choose the matrix or more generally how to
compute t he sear ch d irection, and g ive ca reful consideration๐ฝ๐ฝ๐๐ t o the ch oice o f t he step length parameter .
๐๐ Step length: the ideal๐ผ๐ผ choice of the step length would be the global minimizer of the
univariate function (. ) defined by ๐ผ๐ผ๐๐
( ) = ( + ๐๐), > 0 (1.21)
๐๐ ๐๐ But๐๐ ๐ผ๐ผ in general,๐๐ ๐ฅ๐ฅ it๐ผ๐ผ๐๐ is too๐ผ๐ผ expensive to identify this value. To find even a local minimizer of to m oderate pr ecision ge nerally requires too m any e valuations of t he obj ective function๐๐ and possibly the gradient .
In or der ๐๐t o e nsure t hat e ven a n aโ๐๐ pproximate s olution i s e nough t o gua rantee t he convergence of t he l ine s earch m ethod t o a minimizer of t he obj ective f unction, s ome conditions are imposed on the step length at each iteration. One very important condition that must be satisfied is that the decrease in the objective function is not too small. One way of measuring this is by using the following inequality.
( + ) ( ) + 1 ( ), , for some 1 (0,1) (1.22)
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐Thi๐ฅ๐ฅs condition๐ผ๐ผ๐๐ โค is ๐๐ sometimes๐ฅ๐ฅ ๐๐ ๐ผ๐ผ calledโฉโ๐๐ ๐ฅ๐ฅ the๐๐ Armijoโช condition๐๐ โ and it states that the reduction in f should be proportional to the step length and the derivative of . On the other hand;
๐๐ ๐ผ๐ผ15 ๐๐
we must also guarantee that the step is not too short. Indeed, condition (1.22 is satisfied for a ll s ufficiently s mall v alues o f . O ne w ay of e nforcing t his i s by i mposing a
curvature condition, which requires that๐ผ๐ผ satisfy ๐ผ๐ผ๐๐ ( + ), 2 ( ), for some 2 ( 1, 1) (1.23)
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ โฉConditionsโ๐๐ ๐ฅ๐ฅ ๐ผ๐ผ (1.๐๐22)๐๐ andโช โฅ (1.๐๐ 23โฉโ๐๐) are๐ฅ๐ฅ known๐๐ โช as the Wolfe๐๐ โ conditions.๐๐
Remark: for every function that is smooth and bounded below, there exist step lengths
that satisfy the Wolfe conditions.๐๐
These conditions on the step length are very important in practice and are widely used in the line search methods.
Algorithm (line search)
Given a d escent direction , set +1 = + for so me > 0 that makes +1
acceptable i terate. H owever,๐๐๐๐ r ather๐ฅ๐ฅ๐๐ t han set๐ฅ๐ฅ๐๐ ting๐ผ๐ผ ๐๐ ๐๐๐๐ = ( )๐ผ๐ผ ๐๐we will u se N ewtonโs๐ฅ๐ฅ๐๐ 1 2 direction o r its m odification f or in stance, ๐๐๐๐ ( โโ๐๐) where๐ฅ๐ฅ๐๐ = ( ) + is 2 โ ( ) ๐๐ ๐๐ ๐๐ ๐๐ positive de finite, = 0 if is s afelyโ๐ป๐ป๐๐ pโ๐๐ ositive๐ฅ๐ฅ d efinite ๐ป๐ปto re tainโ ๐๐ i ts๐ฅ๐ฅ f ast ๐๐lo cal convergence t he term๐๐๐๐ l ine seโ arch๐๐ ๐ฅ๐ฅ ๐๐ refers to t he ch oice of in this a lgorithm. T he common procedure is to use = 1 (the full Quassi-Newtonโs๐ผ๐ผ๐๐ step) if it fails backtrack in t he symmetric w ay a long ๐ผ๐ผt๐๐ he s ame di rection, t he c ommon s ense i s t o r equire t hat ( +1) < ( ) but this simple condition doesnot guarantee that { } will converge to
๐๐the๐ฅ๐ฅ minimizer๐๐ ๐๐ ๐ฅ๐ฅ of๐๐ . If very small reduction in are taken relative ๐ฅ๐ฅto๐๐ the length of the steps, the limiting ๐๐value ( ) may not be local minimize๐๐ [8]. ๐๐ 2๐๐ ๐ฅ๐ฅ +1 For example, ( ) = , 0 = 2, if we choose { } = {( 1) } ๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ ( +1)๐ฅ๐ฅ ๐ฅ๐ฅ ๐๐ โ { } = {2 + 3(2 )}, then โ ๐๐ ๐๐ ๐ผ๐ผ 3 5 9 { } = 2, , , ,โฆ, = 1k 1 + 2 k , each p is a descent direction at x . 2 4 8 k k โ ๐ฅ๐ฅ๐๐ ๏ฟฝ โ โ ๏ฟฝ ๏ฟฝ๏ฟฝโ ๏ฟฝ๏ฟฝ ๏ฟฝ๏ฟฝ And ( +1) < ( ), ( ) is monotonically decreasing with
๐๐ ๐ฅ๐ฅ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐๐๐ ๐๐ ๐ฅ๐ฅ๐๐ 16
lim ( ) = 1 โ ๐๐ ๐๐โ ๐๐ ๐ฅ๐ฅ Which is not the minimizer of and { } has limit ยฑ1 so it does not converge.We fix ( ) ( ) this by requiring that the average๐๐ rate of๐ฅ๐ฅ decrease๐๐ from to +1 be at least some prescribed fraction of the initial rate, that is we pick ๐๐ ๐ฅ๐ฅ๐๐(0,1)๐๐ and๐ฅ๐ฅ๐๐ choose among > 0 that sat ๐๐ โ ๐ผ๐ผ๐๐ isfies ( ) ( ) ๐ผ๐ผ + (1.24) ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ It๐๐ is๐ฅ๐ฅ also๐ผ๐ผ๐๐ possibleโค ๐๐๐๐ thatโ๐๐ ๐ฅ๐ฅif steps๐๐ are too small relative to the initial rate of decrease of ,
then any limiting value { } may not be local minimizer of . For example in the above๐๐ { } example taking the same๐ฅ๐ฅ function๐๐ with the same initial estimate๐๐ and let us take pk = 3 5 9 { 1}, { } = 2 (k+1) , then the sequence {x } = 2, , , ,โฆ, = 1 + 2 k . each k k 2 4 8 โ โ โis a descentฮฑ ๏ฟฝ direction๏ฟฝ at , ( ) is monotonically๏ฟฝ decreasing ๏ฟฝwith๏ฟฝ ๏ฟฝ
๐๐ ๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ lim = 1 โ ๐๐ ๐๐โ ๐ฅ๐ฅ Which is n ot th e m inimizer o f . I n t he a bove e xample, we s ee t hat monotonically decreasing sequence o f i terates t๐๐ hat d oes n ot co nverge t o t he m inimizer, t o en sure sufficiently large steps we require that the rate of decrease of in the direction of at
+1 be larger than some prescribed fraction of the rate of decrease๐๐ of in the direction๐๐ ๐ฅ๐ฅof๐๐ at ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ( +1 ) ( +1 ) for some ( , 1) (1.25) ๐๐ ๐๐ ๐๐ ๐๐ โ๐๐But ๐ฅ๐ฅthis condition๐๐ is โฅ ๐ฝ๐ฝโ๐๐not๐ฅ๐ฅ necessarily๐๐ required,๐ฝ๐ฝ โ because๐๐ the use of the backtracking strategy avoids excessively small steps.
Step selection by backtracking
Here the strategy is to start with = 1, and then if +1 = + is not acceptable,
backtracking (reducing ) until ๐ผ๐ผan๐๐ acceptable +1 =๐ฅ๐ฅ๐๐ + ๐ฅ๐ฅ๐๐ is๐ผ๐ผ๐๐ found.๐๐๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ Example1.1: consider the๐ผ๐ผ problem ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ผ๐ผ ๐๐
17
2 2 ( 1, 2) = 5 1 + 2 3 1 2,
Let = (2,3) and ๐๐๐๐๐๐๐๐=๐๐ (๐๐๐๐๐๐5,๐๐ 7)๐ฅ๐ฅ ๐ฅ๐ฅ, s o t๐ฅ๐ฅ hat ๐ฅ๐ฅ (โ ) ๐ฅ๐ฅ=๐ฅ๐ฅ65.if = 1, then ( + ๐๐ ๐๐ =121>๐ฅ๐ฅ๐๐ ( ) ๐๐๐๐ โ โ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐๐ ๐๐ ๐ฅ๐ฅ๐๐
๐ผ๐ผ๐๐๐๐๐๐ ๐๐ ๐ฅ๐ฅ๐๐ 1 1 1 9 So this is not an acceptable step length. If = then ( + ) = , = 2 2 2 4 ๐ผ๐ผ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐ ๏ฟฝโ โ ๏ฟฝ and so this step length produce a decrease in the function value, as desired. The frame work of this algorithm is given below.
Algorithm (backtracking line search frame work)
1 Given 0, , 0 < < < 1, = 1 2 ๐๐ โ๏ฟฝ ๏ฟฝ ๐๐ ๐ข๐ข ๐ผ๐ผ๐๐ while ( + ) < ( ) + ( ) then = for s ome [ , ]. is ๐๐ choosen๐๐ ๐ฅ๐ฅ at๐๐ e ach๐ผ๐ผ๐๐ i๐๐ teration๐๐ ๐๐ ๐ฅ๐ฅ by๐๐ line๐๐๐ผ๐ผ s๐๐ earch,โ๐๐ ๐ฅ๐ฅ ๐๐ +1๐๐๐๐ = ๐ผ๐ผ+ ๐๐๐ผ๐ผ๐๐. ๐๐ โ is๐๐ s๐ข๐ข et t๐๐ o b e very small so that no more functional value๐ฅ๐ฅ is๐๐ required๐ฅ๐ฅ๐๐ [8].๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐
Strategy for choosing ( )
๐๐ Define ( ) = ( + ๐ถ๐ถ) which,๐๐ is one dimensional restriction of f to the line through
in the๐๐ direction๐ผ๐ผ ๐๐ ๐ฅ๐ฅ ๐๐ , if๐ผ๐ผ๐๐ we๐๐ need to backtrack we will use our most current information ๐ฅ๐ฅabout๐๐ f to model it and๐๐๐๐ then take the value of that minimizes the model as a next value of in the above algorithm. ๐ผ๐ผ ๐ผ๐ผ๐๐ Initially we have (0) = ( ) and (0) = ( ) (1.26) ๐๐ ๐๐ ฬ ๐๐ ๐๐ (1) = (๐๐ + ๐๐) ๐ฅ๐ฅ ๐๐ โ๐๐ ๐ฅ๐ฅ ๐๐ (1.27)
๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐ So if ( + ) does not satisfy (1.24), that is, (1) > (0) + (0), then we model
( ) by๐๐ one๐ฅ๐ฅ๐๐ dimensional๐๐๐๐ quadratic model satisfying๐๐ (1.26)๐๐ and (1.๐๐๐๐27ฬ), that is ๐๐ ๐ผ๐ผ ( ) = (1) (0) (0) 2 + (0) + (0) . A nd c alculate t he poi nt, =
(0) ๐๐๏ฟฝ ๐ผ๐ผ ๏ฟฝ ๐๐ โ> ๐๐ 0 forโ which ๐๐ฬ ๏ฟฝ๐ผ๐ผ( ) =๐๐ฬ 0. ๐ผ๐ผ ๐๐ฬ ๐ผ๐ผ๏ฟฝ 2[ (1) (0) (0)] ๐๐ฬ ๐๐ โ๐๐ โ๐๐ฬ ๐๐๏ฟฝฬ ๐ผ๐ผ 18
Now ( ) > 0.Since (1) > (0) + (0) > (0) + (0). Thus minimizes t he
model๐๐๏ฟฝ function.ฬ ๐ผ๐ผ๏ฟฝ Furthermore๐๐ ๐๐> 0 because๐๐๐๐ฬ ๐๐(0) < 0,๐๐ฬ therefore ๐ผ๐ผ๏ฟฝwe take as our ( ) ( ) new value of in our backtracking๐ผ๐ผ๏ฟฝ a lgorithm.๐๐ฬ We not e t hat s ince 1 >๐ผ๐ผ๏ฟฝ 0 + (0), we have๐ผ๐ผ ๐๐ ๐๐ ๐๐
๐๐๐๐ฬ 1 1 1 < . In fact (1) (0), then this gives an upper bound of = on the 2(1 ) 2 2 ๐ผ๐ผ๏ฟฝfirst valueโ๐๐ of in ๐๐the algorithm.โฅ ๐๐ On the๐ผ๐ผ๏ฟฝโค other hand if (1) is much larger๐ข๐ข than (0), 1 can be very small but we impose a lower bound of = in the algorithm, that is at the ๐๐ 10๐๐ ๐๐ ๐ผ๐ผ๏ฟฝ 1 first b acktrack if 0.1 then w e take = . we๐๐ can u se b acktracking as t he f irst 10 iteration us ing qua๐ผ๐ผ๏ฟฝโค dratic m odel i f ( )๐ผ๐ผ=๐๐ ( + )doesnot s atisfy (1.24) in th is
case we need to backtrack again. ๐๐ ๐ผ๐ผ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐
Although w e w ould us e qua dratic model a s i n t he f irst b acktrack w e now ha ve f our information on which are (0), (0), and the two values of ( ). so at this and any
subsequent backtrack๐๐ d uring๐๐ t he c๐๐ urrentฬ iteration, w e us e t he๐๐ c๐ผ๐ผ ubic model of , the calculation of proceed as follows; ๐๐ ๐ผ๐ผ Let , 2 be t he l ast t wo pr evious va lues of then c ubic m odel t hat s atisfies
๐๐ ๐๐ 3 2 ๐๐ (0)๐ผ๐ผ, (๐ผ๐ผ0), , 2 ( ) = + ๐ผ๐ผ + (0) + (0)
๐๐ ๐๐ ๐๐๐๐๐๐ ๐๐ ๐๐ฬ ๐๐๏ฟฝ๐ผ๐ผ 1๏ฟฝ ๐๐๏ฟฝ๐ผ๐ผ1 ๏ฟฝ ๐๐๐๐ ๐๐๏ฟฝ ๐ผ๐ผ ๐๐๐ผ๐ผ ๐๐๐ผ๐ผ ๐๐ฬ ๐ผ๐ผ ๐๐ 1 2 2 (0) (0) = 2 2 = 2 ๐๐๐๐ ๐๐ (0) (0) ๐๐ ๐ผ๐ผ 2 โ ๐ผ๐ผ 2 2๐๐ ๐๐2 ๐๐ ๐๐ ๐๐๏ฟฝ๐ผ๐ผ ๏ฟฝ โ ๐๐ โ ๐๐ ๐ผ๐ผ ๐๐ ๐ผ๐ผ๐๐ โ๐ผ๐ผ ๐๐ โ๐ผ๐ผ ๐ผ๐ผ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๐๐ ๐๐ ๏ฟฝ ๐ผ๐ผ๐๐ โ ๐ผ๐ผ๐๐ ๐๐๏ฟฝ๐ผ๐ผ ๏ฟฝ โ ๐๐ โ ๐๐ฬ ๐ผ๐ผ 1 1 1 2 2 (0) (0) 2 2 2 โก ๐๐ โ ๐๐ โค 2๐๐ (0) (0) ๐๐2 โข ๐ผ๐ผ 2 ๐ผ๐ผ 2โฅ ๐๐๏ฟฝ๐ผ๐ผ ๏ฟฝ โ ๐๐ โ ๐๐ ๐ผ๐ผ ๐๐ 2๐๐ ๏ฟฝ ๏ฟฝ ๐ผ๐ผ๐๐ โ ๐ผ๐ผ ๐๐ โข ๐ผ๐ผ ๐ผ๐ผ โฅ ๐๐ ฬ ๐๐ โขโ โ โฅ ๐๐๏ฟฝ๐ผ๐ผ ๏ฟฝ โ ๐๐ โ ๐๐ ๐ผ๐ผ ๐ผ๐ผ๐๐ ๐ผ๐ผ ๐๐ โฃ + 2โฆ 3 (0) Its local minimizing point is = . It can be shown that is always real if 3 โ๐๐ ๏ฟฝ๐๐ โ ๐๐๐๐ฬ 1 1 1 < . F inally w e us e uppe r๐ผ๐ผ๏ฟฝ bound f or ๐๐ = and a l ower bound ๐ผ๐ผ๏ฟฝ= for this 4 2 10 ๐ผ๐ผ ๐ข๐ข ๐๐ ๐ผ๐ผ๏ฟฝ
19
1 1 1 means that if > , we set a new = and if < then we use new 2 2 10 1 = . ๐ผ๐ผ๏ฟฝ ๐ผ๐ผ ๐๐ ๐ผ๐ผ๐๐ ๐ผ๐ผ๐๐ ๐ผ๐ผ๏ฟฝ ๐ผ๐ผ๐๐ 10 ๐ผ๐ผ๐๐ ๐ผ๐ผ๐๐ . . Trust region methods The c oncept of t he t rust r egion f irst a ppears in a pa per of L iebenberg ( 1944) and ๐๐ ๐๐ ๐๐ Marquart (1963) for solving nonlinear least square problems. Trust region methods are iterative numerical procedures like the line search methods in which an approximation of the obj ective f unction ( ) by a model ( ) is c omputed i n a ne ighborhood of t he
current it erate , w hich๐๐ w๐ฅ๐ฅ e r efer t o a s t๐๐ he๐๐ t rust๐๐ r egion. T he m odel ( ) should b e constructed so๐ฅ๐ฅ that๐๐ it is easier to handle than ( ) itself. Let us assume๐๐ fo๐๐ r๐๐ this that our 2 function is of class [7]. ๐๐ ๐ฅ๐ฅ
We solve๐๐ the following๐ถ๐ถ subproblem to obtain the next iteration at each step k of a trust 1 min ( ) ( ) + ( )Tp + pT 2f( )p region method. ( ) 2
๐๐ ๐๐ ๐๐๐๐ ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐ โ๐๐ ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅ๐๐ ๐๐๐๐ ๐ก๐ก๐ก๐ก ๏ฟฝ ๐๐ Where, > 0 is the t rust region r adius,๐ ๐ ๐ ๐ a๐ ๐ nd๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ .๐ก๐ก๐ก๐ก โis๐๐ dโ efinedโค โ t o be t he E uclidean norm.
These sโ ub๐๐ problems a re c onstrained opt imizationโ โ p roblems i n w hich t he ob jective function and the constraints are both quadratic. The constraint is a quadratic inequality constraint and can be written as + 2 0. In fact usually, the model ( ) is a ๐๐ quadratic function which is truncatedโ๐๐ from๐๐ โa๐๐ Taylorโฅ series for around the point๐๐ ๐๐ ๐๐: ๐๐ 1 ๐๐ ๐ฅ๐ฅ ( ) = ( ) + ( )Tp + pT 2f( )p (k N ) 2 0 ๐๐๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ โ๐๐ ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅ๐๐ โ Where N0 = {0,1,2, โฆ }
We n ote t hat o ne c an choose a ny other no rm in t he f ormulations. I n t his p roject w e consider the Euclidean norm . = . 2 since it makes some computations easier. Hence
our trust region for the modelโ โ ( )โ isโ a bounded neighborhood of the current iterate ๐๐๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ( ) { + | 2 }.
๐ฝ๐ฝ ๐ฅ๐ฅ๐๐ โ ๐ฅ๐ฅ๐๐ ๐๐ โ๐๐โ โค โ๐๐
20
After constructing the model ( ) and its trust region then one seeks a t rial step to
the next iteration = + ๐๐ which๐๐ ๐๐ will result in a reduction for the model while๐๐ the ( ) size of t he step i s๐ฅ๐ฅ boun๐๐ ๐ฅ๐ฅ ded๐๐ by๐๐ that i s 2 .Then, the obj ective function i s evaluated at + to compare its๐ฝ๐ฝ value๐ฅ๐ฅ๐๐ to theโ one๐๐โ predictedโค โ๐๐ by the model at this point. If the sufficient๐ฅ๐ฅ reduction๐๐ ๐๐ predicted by the model is accomplished by the objective function, + is accepted as the next iterate and the trust region is possibly expanded to include
๐ฅ๐ฅthis๐๐ new๐๐ point ( increase). If the reduction in the model is a poor predictor of the actual reduction of theโ objective๐๐ function, then the trial point is rejected. We conclude that the region is too large and the size of the trust region is reduced( decreases), with the hope that the model provides a better prediction in the smaller region.โ๐๐
1.3.4.1 Outline and properties of the trust region algorithm In a trust region algorithm, a strategy for determining the trust region radius, , at each,
iteration is needed to be developed. The trust region radius can be determined โby๐๐ looking at the agreement between the model function and the objective function at previous iterations. Given a step we define the ratio ๐๐ ๐๐ ๐๐ ๐๐ ( ) ๐๐ ( + ) = = (1.28) ( ๐๐) (๐๐ +๐๐ ) ( ) ๐๐ ๐๐ ๐ฅ๐ฅ โ ๐๐ ๐ฅ๐ฅ ๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ There are various๐๐ definitions๐ฅ๐ฅ โ ๐๐ ๐ฅ๐ฅfor ๐๐ in the ๐๐mathematical๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ literature but we shall prefer the
above i n pa rticular he re. W e not๐๐๐๐ e t hat the de nominator of namely t he pr edicted reduction i s a lways non ne gative since t he s tep is c omputed๐๐๐๐ f rom t he s ubproblem ( ) . over a region that includes the step = 0.๐๐ In๐๐ fact can be called a measure of ๐๐ ( ) how๐๐๐๐ good๐ก๐ก ๐๐ t he model predicts the reduction๐๐ in the ๐๐ ๐๐value. If is closer to 0 or negative, then the actual๐๐๐๐ prediction๐๐ is smaller than the predicted๐๐ one.๐๐ This๐๐ indicates that the model cannot be trusted in this region with radius . Thus will be rejected and
will be reduced. O n t he ot her h and i f is c lose โt o๐๐ one , an๐๐๐๐ a dequate pr ediction โ is๐๐ obtained we safely expand the trust region๐๐๐๐ since the model can be trusted over a wider region that should be increased. If is positive but not close to 1, then the trust region
radius is not changed. ๐๐๐๐
Algorithm(Trust-Region Algorithm) 21
The steps in derivative based trust region methods can be summarized by the following [2].
1. Specify some initial guess of the solution 0. Select the initial trust-region bound
0 > 0. Specify the constants 0 < ฮผ <๐ฅ๐ฅ ฮท < 1 (perhaps ฮผ = ยผ and ฮท = ยพ). 2. For โ = 0, 1, . . . (i) if ๐๐ is optimal, stop. (ii)solve๐ฅ๐ฅ๐๐ 1 min ( ) = ( ) + ( ) + 2 ( ) 2 ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ ๐ป๐ป๐๐ ๐ฅ๐ฅ ๐๐ ๐๐ ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐ ๐๐
for the trial step ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก๐ก๐ก๐ก โ๐๐โ โค โ๐๐ (iii) Compute ๐๐๐๐ ( ) ( + ) = = . ( ) ( ) ๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ๐๐ ๐๐ ๐๐ ๐๐ (iv) ๐๐, ๐๐ +1๐ฅ๐ฅ =โ ๐๐ ๐๐ (unsuccessful๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ st๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ ep), el se +1 = + (successful
step).๐ผ๐ผ ๐ผ๐ผ ๐๐๐๐ โค ๐๐ ๐ก๐กโ๐๐๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐๐๐๐ (v) Update : 1 ๐๐ = โ +1 2 ๐๐ ๐๐ ๐๐ ๐๐ <โค ๐๐ <โน โ +1โ =
๐๐ ๐๐ ๐๐ ๐๐ โน+1 =โ๐๐ 2 . โ๐๐ ๐๐๐๐ โฅ ๐๐ โน โ๐๐ โ๐๐ The value of indicates how well the model predicts the reduction in the function value.
If is small๐๐ ๐๐(that is, ), then the actual reduction in the function value is much smaller๐๐๐๐ than that predicted๐๐๐๐ โค by๐๐ ( ), indicating that the model cannot be trusted for a bound as large as ; in this case๐๐๐๐ the๐๐๐๐ step will be rejected and will be reduced. If is large (that is, โ๐๐ ), then the model๐๐ ๐๐is adequately predictingโ๐๐ the reduction in the๐๐๐๐ function value, suggesting๐๐๐๐ โฅ ๐๐ that the model can be trusted over an even wider region; in this case the bound will be increased.
. โ๐๐
22
Chapter 2 Derivative Free Optimization Methods
Introduction (What is derivative free optimization) Derivative Free O ptimization ( DFO) methods a re typically de signed t o solve optimization pr oblems whose obj ective function i s c omputed by a โ black boxโ ; he nce, the gradient computation is unavailable. Each call to the โblack boxโ is often expensive, so e stimating d erivatives b y finite differences may b e p rohibitively costly. F inally, t he objective f unction va lue may be c omputed w ith s ome noi se, a nd t he f inite di fferences estimates may not be accurate [6].
The de rivative f ree opt imization method which w e us e a pproximates t he obj ective function e xplicitly w ithout a pproximating i ts de rivatives. T he t heoretical an alysis presented i n t his project assumes t hat no noise is present. Expensive experiments and intuition s upport t he claim t hat r obustness of D FO doe snโt s uffer f rom pr esence of modern level of noise [1].
Derivative free optimization has been developed for solving optimization problems of the following form
min ( )
( ) Such that๐๐ ๐ฅ๐ฅ ( ) ( = 1, 2 โฆ , )
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ โค, ๐๐ ๐ฅ๐ฅ โค ๐๐ ๐๐ ๐๐ ๐๐ Where the objective function ( ) and the๐ฅ๐ฅ constraint โ ๐น๐นโโ s ( ) are expensive to compute at
a given vector and the derivatives๐๐ ๐ฅ๐ฅ of at are not available.๐๐๐๐ ๐ฅ๐ฅ
The i dea i s t o๐ฅ๐ฅ a pproximate t he o bjective๐๐ ๐ฅ๐ฅ f unction by a model w hich i s a ssumed t o describe the obj ective f unction w ell i n a trust r egion w ithout e xplicitly m odeling its derivatives. T his m odel i s c omputationally l ess e xpensive t o e valuate a nd easier t o optimize t han t he obj ective f unction i tself. T he model i s obt ained b y i nterpolating t he objective function using a quadratic interpolation polynomial.
23
Derivative f ree opt imization m ethods f or unc onstrained op timization b uild a linear or quadratic model of the objective function and apply one of the fundamental approaches of c ontinuous opt imization i .e. a trust r egion o r a l ine s earch, t o opt imize t his m odel. While de rivative ba sed methods t ypically us e a T aylor-based m odel w hich i s a n approximation of t he objective f unction, de rivative f ree methods us e i nterpolation, regression or other sample-based models. If the problem has constraints, the strategy of derivative f ree m ethods i s us ually t o a pply s equential qua dratic p rogramming methods for th e li nearization of t he c onstraints [ 6]. T he main c oncern of t his pr oject i s on unconstrained nonlinear programming problems without using derivatives.
We now c onsider t wo basic m ethods i n D FO: l ine s earch a nd t rust r egion m ethods without using derivatives.
2.2 line search methods Line search, also called one-dimensional search, refers to an optimization procedure for solving nonlinear Univariant problems. One dimensional line search is the back bone of many a lgorithms f or s olving non l inear p rogramming problems. M any non linear programming algorithms proceed as follows. Given a point , find a direction vector
and t hen a suitable s tep si ze , yi elding a ne w poi nt,๐ฅ๐ฅ๐๐ +1 = + ; t he process๐๐๐๐ is then repeated. ๐ผ๐ผ๐๐ ๐ฅ๐ฅ๐๐ ๐ฅ๐ฅ๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐
Finding t he ne w s tep s ize involves s olving t he s ub pr oblem t o minimize ( +
which is a one dimensional๐ผ๐ผ๐๐ search problem in the variable . The minimization๐๐ ๐ฅ๐ฅ๐๐ is ๐ผ๐ผ๐๐๐๐๐๐over all real , a non negative , or such that + is feasible.๐ผ๐ผ ๐๐ ๐๐ As s tated b efore,๐ผ๐ผ in m ultivariable๐ผ๐ผ ๐ผ๐ผ optimization๐ฅ๐ฅ a lgorithms,๐ผ๐ผ๐๐ f or g iven , th e i terative scheme +1 = + . (2.1) ๐ฅ๐ฅ๐๐ The key๐๐ ๐๐is๐ฅ๐ฅ ๐๐to find ๐ฅ๐ฅthe๐๐ direction๐ผ๐ผ๐๐ ๐๐๐๐ vector and a suitable step size . Let ( ) ( ) = ๐๐๐๐ + . (2.2) ๐ผ๐ผ๐๐ So, the problem that departs ๐๐from๐ผ๐ผ and๐๐ ๐ฅ๐ฅ ๐๐finds ๐ผ๐ผ๐๐a step๐๐ size in the direction such that ๐ฅ๐ฅ๐๐ ๐๐๐๐ ( ) < (0) ๐๐ ๐ผ๐ผ๐๐ ๐๐ 24
is just line search about ฮฑ. If we find such that the objective function in the direction is minimized,
i.e., ๐ผ๐ผ๐๐ ๐๐๐๐ ( + ) = min ( + ) >0 ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ or ๐๐ ๐ฅ๐ฅ ๐ผ๐ผ ๐๐ ๐ผ๐ผ ๐๐ ๐ฅ๐ฅ ๐ผ๐ผ๐๐ ( ) = min ( ) , >0 ๐๐ such a l ine search i s c alled ex act๐๐ l ine๐ผ๐ผ sear๐ผ๐ผ ch o๐๐ r o๐ผ๐ผ ptimal l ine sear ch, a nd is cal led
optimal step size. If we choose such that the objective function has acceptable๐ผ๐ผ๐๐ descent amount, i .e., s uch that the de scent๐ผ๐ผ๐๐ ( ) ( + ) > 0is accep table by u sers, such a line search is called inexact line๐๐ ๐ฅ๐ฅsearch,๐๐ โ ๐๐ or๐ฅ๐ฅ ๐๐approximate๐ผ๐ผ๐๐ ๐๐๐๐ line search, or acceptable line search. Since, in practical computation, theoretically exact optimal step size generally cannot be found, and it is also expensive to find almost exact step size, therefore the inexact line search with less computation load is highly popular. The f ramework o f lin e s earch is as f ollows. First, d etermine o r g ive a n in itial s earch interval w hich co ntains t he m inimizer; t hen em ploy so me sect ion t echniques o r interpolations to reduce the interval iteratively until the length of the interval is less than some given tolerance [10]. Next, we give a notation about the search interval and a simple method to determine the initial search interval. Interval of uncertainty Definition 2.1. Let , [0, + ), โ ( ) = min ( ) ๐๐ โถ โ โ โ ๐ผ๐ผ โ โ 0๐๐๐๐๐๐ โ If there exists a closed interval [ , ๐๐] ๐ผ๐ผ [0, + ๐ผ๐ผโฅ) such๐๐ ๐ผ๐ผ that [ , ], then โ [ , ] is called a search interval๐๐ for๐๐ one-โ dimensionalโ minimization๐ผ๐ผ โ ๐๐ ๐๐problem min๐๐ ๐๐ 0 ( ). Since the exact location of the minimum of over [ , ] is not known, this interval๐ผ๐ผโฅ ๐๐is also๐ผ๐ผ called the interval of uncertainty. ๐๐ ๐๐ ๐๐ A simple method to determine an initial interval is called the forward-backward method. The ba sic i dea of t his m ethod i s a s f ollows. Given a n i nitial poi nt a nd a n i nitial s tep length, we attempt to determine three points at which their function values show โhighโ 25
lowโhighโ ge ometry. I f i t i s not s uccessful t o go f orward, w e w ill go ba ckward.
Concretely, given an initial point, 0, and a step length, 0 > 0. If
๐ผ๐ผ( 0 + 0) < ( 0),โ then, next step, depart from 0 +๐๐ ๐ผ๐ผ0 and โcontinue๐๐ going๐ผ๐ผ forward with a larger step until the objective function increases.๐ผ๐ผ If โ ( 0 + 0) > ( 0),
then, next step, depart from 0 and๐๐ go๐ผ๐ผ backwardโ until๐๐ ๐ผ๐ผ the objective function increases. So, we will obtain an initial interval๐ผ๐ผ which contains the minimum . โ Algorithm 2.1.2 (Forward-Backward Method) ๐ผ๐ผ Step 1. Given 0 [0, ), 0 > 0, the multiple coefficient t > 1
(Usually t = 2๐ผ๐ผ). Evaluateโ โ (โ 0), = 0. Step 2. Compare the objective๐๐ ๐ผ๐ผ function๐๐ โถ values. Set +1 = + and evaluate +1 = ( +1). If +1 < , go to Step 3; otherwise, go๐ผ๐ผ ๐๐to Step๐ผ๐ผ 4.๐๐ โ๐๐ ๐๐๐๐ Step๐๐ ๐ผ๐ผ ๐๐3. F orward๐๐๐๐ s tep.๐๐ Se๐๐ t +1 = , = , = +1, = +1, = + 1, go to Step 2. โ๐๐ โถ ๐ก๐กโ๐๐ ๐ผ๐ผ โถ ๐ผ๐ผ๐๐ ๐ผ๐ผ๐๐ โถ ๐ผ๐ผ๐๐ ๐๐๐๐ โถ ๐๐๐๐ ๐๐ โถ ๐๐ Step 4. Backward step. If k = 0, invert the search direction. Set = , = + 1, go to Step 2; otherwise, set
โ๐๐ โถ โโ๐๐ ๐ผ๐ผ๐๐ โถ ๐ผ๐ผ๐๐ = { , +1}, = { , +1}, Output [a, b] and stop. ๐๐ ๐๐๐๐๐๐ ๐ผ๐ผ ๐ผ๐ผ๐๐ ๐๐ ๐๐๐๐๐๐ ๐ผ๐ผ ๐ผ๐ผ๐๐ The methods of line search presented in this chapter use the unimodality of the function and interval. T he f ollowing de finitions and t heorem i ntroduce their c oncepts a nd properties. Definition 2.2 Let ,[ , ] . If there [ , ] such that โ ( ) is strictly decreasing๐๐ โถ โ โ onโ [ ๐๐, ๐๐ ] andโ โ strictly increasing๐๐๐๐ ๐ผ๐ผ โ on๐๐ [๐๐ , ], then โ โ ๐๐(๐ผ๐ผ) is called a unimodal function๐๐ ๐ผ๐ผ on [ , ]. Such an interval [ ๐ผ๐ผ, ]๐๐ is called a unimodal ๐๐interval๐ผ๐ผ related to ( ). ๐๐ ๐๐ ๐๐ ๐๐ The unimodal function๐๐ ๐ผ๐ผ can also be defined as follows. Definition 2.3 If there exists a unique [ , ], such that for any โ 1, 2 [ , ], 1 < 2, th e fo llowing๐ผ๐ผ โs tatements๐๐ ๐๐ h old: if 2 < , t hen ( 1) > โ ๐ผ๐ผ (๐ผ๐ผ2);โ if ๐๐1 ๐๐ > ๐ผ๐ผ , then๐ผ๐ผ ( 1) < ( 2); then ( ) is the unimodal๐ผ๐ผ ๐ผ๐ผfunction on๐๐ ๐ผ๐ผ[ , ]. โ ๐๐ ๐ผ๐ผ ๐ผ๐ผ ๐ผ๐ผ ๐๐ ๐ผ๐ผ ๐๐ ๐ผ๐ผ 26 ๐๐ ๐ผ๐ผ ๐๐ ๐๐
Note that, first, the unimodal function does not require the continuity and differentiability of t he f unction; s econd, us ing t he property of t he uni modal f unction, w e c an e xclude portions of the i nterval of unc ertainty t hat do n ot c ontain the m inimum, s uch that th e interval of uncertainty is reduced. The following theorem shows that if the function is
unimodal on [ , ], then the interval of uncertainty could be reduced by c omparing๐๐ the function values๐๐ of๐๐ at two points within the interval. Theorem 2.1 Let ๐๐ be unimodal on [ , ]. Let 1, 2 [ , ], and 1 < 2. Then ๐๐ โถ โ โ โ ๐๐ ๐๐ ๐ผ๐ผ ๐ผ๐ผ โ ๐๐ ๐๐ ๐ผ๐ผ ๐ผ๐ผ 1. if ( 1) ( 2), then [ , 2] is a unimodal interval related to ;
2. if ๐๐(๐ผ๐ผ1) โค ๐๐(๐ผ๐ผ2), then [๐๐1๐ผ๐ผ, ] is a unimodal interval related to ๐๐. Proof.๐๐ ๐ผ๐ผFromโฅ t๐๐ he ๐ผ๐ผD efinition๐ผ๐ผ 2.2,๐๐ there ex ists [ , ] such t๐๐ hat ( ) is s trictly โ decreasing over [ , ] and strictly increasing over๐ผ๐ผ โ[ ๐๐, ๐๐]. since ( 1)๐๐ ๐ผ๐ผ ( 2), then โ โ [ , 2].Since๐๐ ๐ผ๐ผ ( ) is unimodal on[ , ], it is also๐ผ๐ผ ๐๐ unimodal๐๐ on๐ผ๐ผ [ ,โค2]๐๐. Therefore๐ผ๐ผ โ ๐ผ๐ผ[ , โ2] is๐๐ a๐ผ๐ผ unimodal๐๐ interval๐ผ๐ผ related to ( ๐๐) ๐๐and the proof of the first part๐๐ ๐ผ๐ผis complete. The๐๐ ๐ผ๐ผ second part of the theorem can be proved๐๐ ๐ผ๐ผ similarly. This t heorem i ndicates t hat, f or reducing t he i nterval of unc ertainty, we must at l east select two observations, evaluate and compare their function values. Theorem 2.2 Let : be a st rictly q uassi-convex ove r t he interval[ , ] . Le t [ ] , , be such๐๐ thatโโ โ< .then; ๐๐ ๐๐
๐ผ๐ผIf ๐๐( โ )๐๐>๐๐ ( ), t hen (๐ผ๐ผ ) ๐๐ ( ), fo r a ll [ , ). I f ( ) ( ), t hen ( ) ( ) [ ) ๐๐ ๐ผ๐ผ, for all๐๐ ๐๐ , ๐๐. ๐ง๐ง โฅ ๐๐ ๐๐ ๐ง๐ง โ ๐๐ ๐ผ๐ผ ๐๐ ๐ผ๐ผ โค ๐๐ ๐๐ ๐๐ ๐ง๐ง โฅ ๐๐Proof๐ผ๐ผ : suppose๐ง๐ง โthat๐๐ ๐๐( ) > ( ) , and let [ , ). ( ) ( ) By contradiction, suppose๐๐ ๐ผ๐ผ that๐๐ ๐๐ < ๐ง๐ง,โ since๐๐ ๐ผ๐ผ can be written as a convex ( ) combination of and , and by๐๐ the๐ง๐ง strict๐๐ quasi-convexity๐๐ ๐ผ๐ผ of , we have < { ( ) ( )} ( ) ( ) ( ) ( ) ( ) , ๐ง๐ง = ๐๐ contradicting > . Hence,๐๐ ๐๐ ๐ผ๐ผ . The ๐๐second๐๐๐๐ ๐๐ part๐ง๐ง ๐๐of ๐๐the Theorem๐๐ ๐๐ can be proved๐๐ similarly.๐ผ๐ผ ๐๐ ๐๐ ๐๐ ๐ง๐ง โฅ ๐๐ ๐๐
FromTheorem.2.2, unde r strict qua ssi-convexity, i f ( ) > ( ), th e n ew interval o f [ ] ( ) ( ) uncertainty is , , on t he other hand, if ๐๐ ,๐ผ๐ผ the new๐๐ ๐๐ interval of uncertainty [ ] is , . ๐ผ๐ผ ๐๐ ๐๐ ๐ผ๐ผ โค ๐๐ ๐๐ 27 ๐๐ ๐๐
The Golden Section Method and the Fibonacci Method The golden section method and the Fibonacci method are section methods. Their basic idea for minimizing a unimodal function over [ , ] is iteratively reducing the interval of uncertainty by comparing the function values of๐๐ ๐๐the observations. When the length of the interval of uncertainty is reduced to some desired degree, the points on the interval can be r egarded as a pproximations of the m inimizer. S uch a class of m ethods only needs to evaluate the functions and has wide applications; especially it is suitable to nonsmooth problems and those problems with complicated derivative expressions.
2.2.1.1 The golden section method We now de scribe the more e fficient gol den s ection m ethod f or minimizing a s trictly quassi-convex f unction. A t a ge neral i teration k of t he gol den s ection method, l et t he interval of unc ertainty be [ , ] then by t he a bove t heorem the ne w i nterval of [ ] ( ) [ ] uncertainty [ +1, +1] is gi๐๐๐๐ ven๐๐๐๐ by , if > ( ) and by , ( ) if (๐๐๐๐ ). The๐๐๐๐ points and are๐ผ๐ผ ๐๐selected๐๐๐๐ such๐๐ ๐ผ๐ผ that;๐๐ ๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐ ๐ผ๐ผ๐๐ โค ๐๐ ๐๐๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐ 1) The length of the new interval of uncertainty +1 +1 does not depend up on
the out come of iteration, t hat i s, on w hether๐๐๐๐ โ ๐๐(๐๐ ) > ( ) or ( ) ๐ก๐กโ ( ). Therefore๐๐ we must have = . ๐๐Thus,๐ผ๐ผ๐๐ if ๐๐ is๐๐ ๐๐of the ๐๐form๐ผ๐ผ๐๐ โค ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ = + (1 )(๐๐ โ ๐ผ๐ผ ) ๐๐ โ ๐๐ ๐ผ๐ผ ( )
๐๐ ๐๐ ๐๐ ๐๐ (0,1๐ผ๐ผ).Then๐๐ mustโ be ๐๐ of๐๐ theโ form ๐๐ โ
๐๐ ๐ค๐คโ๐๐๐๐๐๐ ๐๐ โ = ๐๐+ ( ) ( )
๐๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐๐ โ ๐๐๐๐ โโ So that +1 +1 = ( )
๐๐๐๐ โ ๐๐๐๐ ๐๐ ๐๐๐๐ โ ๐๐๐๐ 2) As +1 and +1 are sel ected f or t he p urpose o f a n ew i teration, either +1
coincides๐๐๐๐ with๐ผ๐ผ ๐๐ or +1 coincides with . If this can be realized, then during๐ผ๐ผ๐๐ iteration + 1, ๐๐only๐๐ one๐๐๐๐ extra observation ๐ผ๐ผis๐๐ needed. Consider the following two cases ๐๐
28
Case 1 : ( ) > ( ) in t his cas e +1 = +1 = . To s atisfy +1 =
, and applying๐๐ ๐ผ๐ผ๐๐ ( ๐๐) with๐๐๐๐ replaced by๐๐ ๐๐ + 1,๐ผ๐ผ we๐๐ ๐๐ get๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐ โ ๐๐ ๐๐ = +1 = +1 + (1 )( +1 +1) = + (1 )( )
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ Substituting๐๐ ๐ผ๐ผ the๐๐ expressionsโ ๐๐of ๐๐ andโ ๐๐ from ๐ผ๐ผ( ) โ( ๐๐) into๐๐ โ the ๐ผ๐ผ above equation 2 we get + 1 = 0. ๐ผ๐ผ๐๐ ๐๐๐๐ โ ๐๐๐๐๐๐ โโ
Case 2:๐๐ ( ๐๐) โ ( )
๐๐ ๐ผ๐ผ๐๐ โค ๐๐ ๐๐๐๐ In this case +1 = and +1 = to satisfy +1 = and apply ( ) with
replaced by ๐๐๐๐+ 1. we๐๐ ๐๐get ๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐ผ๐ผ๐๐ โโ ๐๐ ๐๐ = +1 = +1 + ( +1 +1)
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐ผ๐ผ ๐๐ = ๐๐+ ( ๐๐ ๐๐ ). โ ๐๐
๐๐ ๐๐ ๐๐ Noting (*) and (**), the above equation๐๐ gives๐๐ ๐๐ โ2 ๐๐+ 1 = 0.
The r oots of t he e quation 2 + 1 = 0. are๐๐ ๐๐0.618 โ 1.618. since
must be in the interval (0,1)๐๐ then๐๐ โ 0.618. ๐๐ โ ๐๐๐๐๐๐ ๐๐ โ โ ๐๐
To s ummarize, if a t iteration k , ๐๐ โ and are c hosen a ccording t o ( *) and ( **),
where = 0.618, then the interval๐๐ of๐๐ uncertainty๐ผ๐ผ๐๐ is reduced by a factor of 0.618. At the first๐๐ iteration, two observations are needed at 1 and 1, but at each subsequent iteration, only one evaluations is needed, since either๐ผ๐ผ +1๐๐= or +1 = . ๐๐ ๐๐ ๐๐ ๐๐ Algorithm (the golden section method) ๐ผ๐ผ ๐๐ ๐๐ ๐ผ๐ผ
The following is a summary of the golden section method for minimizing the strictly quassi-convex function over the interval [ , ] [8].
๐๐ ๐๐ Initialization step: choose an allowable final length of uncertainty > 0.let [ 1, 1] ( ) be the interval of uncertainty, and let 1 = 1 + 1 ( 1 1) andโ ๐๐ ๐๐ ๐ผ๐ผ ๐๐ โ ๐๐ ๐๐ โ ๐๐ 1 = 1 + ( 1 1) for = 0.618. E valuate ( 1) ( 1). Le t = 1, a nd go
๐๐to the ๐๐main๐๐ step.๐๐ โ ๐๐ ๐ผ๐ผ ๐๐ ๐ผ๐ผ ๐๐๐๐๐๐๐๐ ๐๐ ๐๐ 29
step1: if < , : the optimal solution lies in the interval [ , ]. Otherwise, ( ) ( ) if >๐๐๐๐ โ( ๐ผ๐ผ๐๐ go to๐๐ ๐ ๐ ๐ ๐ step๐ ๐ ๐ ๐ 2; and if ( ), go to step 3. ๐๐๐๐ ๐๐๐๐ ๐๐ ๐ผ๐ผ๐๐ ๐๐ ๐๐๐๐ ๐๐ ๐ผ๐ผ๐๐ โค ๐๐ ๐๐๐๐ Step 2: let +1 = and +1 = . Furthermore, let +1 = , and let +1 = +1 + ( ) +1 ๐๐๐๐+1 . Evaluate๐ผ๐ผ๐๐ ๐๐๐๐( +1)๐๐ and๐๐ go to step 4. ๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐๐ ๐๐๐๐ Step3: let +1 = and +1 = . Furthermore, let +1 =
๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐๐๐ +1 = +1 + + (1 )( +1 +1). Evaluate ( +1) and go to step 4.
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ Step4๐๐๐๐๐๐๐ผ๐ผ : replace๐๐ by ๐ผ๐ผ + 1 andโ ๐๐go to๐๐ stepโ 1. ๐๐ ๐๐ ๐ผ๐ผ
Example2.1: consider๐๐ ๐๐ the following problem
2 + 2 3 5
Note that๐๐ the๐๐๐๐๐๐ ๐๐true๐๐๐๐๐๐ minimum๐ฅ๐ฅ ๐ฅ๐ฅ ๐ ๐ ๐ ๐ is ๐ ๐ ๐ ๐ 1.0๐ ๐ ๐ ๐ ๐ ๐ ๐ก๐ก๐ก๐ก โ โค ๐ฅ๐ฅ โค
Table2.1; Summary of computationsโ for the above problem using golden section method
Iteration ( ) ( )
๐๐1 3๐๐.000๐๐ 5.๐๐000๐๐ 0.๐ผ๐ผ056๐๐ 1.๐๐944๐๐ ๐๐ 0.115๐ผ๐ผ๐๐ * ๐๐ 7.667๐๐๐๐ * 2 โ3.000 1.944 1.112 0.056 1.987* 0.115 3 โ3.000 0.056 โ1.832 1.112 โ0.308 0.987 4 โ1.832 0.056 โ1.112 โ0.664 โ 0.987 โ โ0.887 5 โ1.832 0.664 โ1.384 โ1.112 โ0.853 โ 0.987 โ 6 โ1.384 โ0.664 โ1.112 โ0.936 โ 0.987 โ โ0.996* 7 โ1.112 โ0.664 โ0.936 โ0.840 โ0.996 โ0.974* 8 โ1.112 โ0.840 โ1.016 โ0.936 โ1.000* โ0.996 โ โ โ โ โ โ
30
Clearly the function to be minimized is strictly quassi-convex, and the initial interval
of uncertainty is of length๐๐ 8. We reduce this interval of uncertainty to one whose length is at most 0.2. The first two observations are located at
1 = 3 + 0.382(8) = 0.056
๐ผ๐ผ โ 1 = 3 + 0.618(8) = 1.944
๐๐ โ Note t hat ( 1) ( 1). H ence t he new i nterval of unc ertainty is[ 3, 1.944]. T he
process is ๐๐repeated,๐ผ๐ผ โค ๐๐and๐๐ the computations are summarized in the above table.โ The values of that are computed at each iteration are indicated by an asterisk.
After๐๐ e ight i terations i nvolving ni ne obs ervations, t he i nterval of unc ertainty i s [ 1.112, 0.936], so that the minimum can be estimated to be the midpoint 1.024.
Theโ numericalโ result for example 2.1 is -0.9998 (see appendix for the matlab โcode)
2.2.1.2 The Fibonacci search The F ibonacci se arch is a l ine search p rocedure f or m inimizing st rictly q uassi-convex function over a c losed bounde d i nterval. S imilar t o t he golden s ection m ethod t he
Fibonacci๐๐ s earch p rocedure m akes t wo f unctional ev aluations at the first iteration an d then onl y o ne e valuation a t e ach of t he s ubsequent i terations. H owever, t he p rocedure differs from golden section method in the reduction of interval of uncertainty varies from one iteration to another [8].
The procedure is based on the Fibonacci sequence { } defined as follows;
๐น๐น๐ฃ๐ฃ +1 = + 1 = 1,2 โฆ (2.3)
๐น๐น๐ฃ๐ฃ ๐น๐น๐ฃ๐ฃ ๐น๐น๐ฃ๐ฃโ ๐ฃ๐ฃ 0 = 1 = 1
The seq uence i s therefore๐น๐น 1,1,2,5,8,๐น๐น 13,21,34,55,89,144,233, โฆ. at t he ite ration
suppose t hat t he i nterval of unc ertainty i s[ , ]. C onsider t he t wo poi nts and ๐๐ given below, where n is the total number of functional๐๐๐๐ ๐๐๐๐ evaluations planned. ๐ผ๐ผ๐๐ ๐๐๐๐
31
1 = + ( ), = 1,2, โฆ , 1 (2.4) +1 ๐น๐น๐๐โ๐๐โ ๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐๐ ๐๐ โ ๐น๐น๐๐โ๐๐ = + ( ), = 1,2, โฆ , 1 (2.5) +1 ๐น๐น๐๐โ๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐๐ ๐๐ โ ๐น๐น๐๐โ๐๐ By theorem(2.2), the new interval of uncertainty [ +1, +1] is given by
๐๐ ๐๐ [ , ] ( ) > ( ) and is given by[ , ]๐๐ (๐๐ ) ( ).
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ In๐ผ๐ผ the๐๐ former๐๐๐๐๐๐ ๐ผ๐ผcase, nothing๐๐ ๐๐ (2.4) and letting ๐๐ =๐๐ ๐๐๐๐๐๐ in๐ผ๐ผ(2.3)โค ๐๐, we๐๐ get
๐ฃ๐ฃ ๐๐ โ๐๐ +1 +1 =
๐๐ ๐๐ ๐๐ ๐๐ ๐๐ โ ๐๐ 1 ๐๐ โ ๐ผ๐ผ = ( ) (2.6) +1 ๐น๐น๐๐โ๐๐โ ๐๐๐๐ โ ๐๐๐๐ โ ๐๐๐๐ โ ๐๐๐๐ ๐น๐น๐๐โ๐๐ = ( ) +1 ๐น๐น๐๐โ๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐น๐น๐๐โ๐๐ In the later case, nothing (2.5) we get +1 +1 = = ( ๐น๐น๐๐+1 โ๐๐ (2.7) ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐น๐น๐๐ โ๐๐ ๐๐๐๐ โ
๐๐๐๐ Thus, in either case the interval of uncertainty is reduced by the factor .We now ๐น๐น๐๐+1 โ๐๐ ๐๐ โ๐๐ show th at a t ite ration k+1, e ither +1 = +1 = , s o t hat onl y ๐น๐นo ne f unctional ( ) evaluation is ne eded. S uppose t๐ผ๐ผ๐๐ hat ๐๐๐๐ ๐๐๐๐>๐๐๐๐ ( ๐ผ๐ผthen๐๐ by t heorem(2.2), +1 = , +1 = . ๐๐ ๐๐๐๐ ๐๐ ๐๐๐๐ ๐๐๐๐ ๐๐ ๐๐ ๐๐ Thus,๐ผ๐ผ ๐๐๐๐๐๐ applying๐๐ (2.4)๐๐ with k replaced by k+1, we get
2 +1 = +1 + ( +1 +1) ๐๐โ๐๐โ ๐๐ ๐๐ ๐น๐น ๐๐ ๐๐ ๐ผ๐ผ ๐๐ ๐๐โ๐๐ ๐๐ โ ๐๐ ๐น๐น 2 = + ( ) ๐๐โ๐๐โ ๐๐ ๐น๐น ๐๐ ๐๐ ๐ผ๐ผ ๐๐โ๐๐ ๐๐ โ ๐๐ Substituting for from (2.4) we get ๐น๐น
๐ผ๐ผ๐๐ 32
1 2 1 +1 = + ( ) + 1 ( ) +1 +1 ๐น๐น๐๐โ๐๐โ ๐น๐น๐๐โ๐๐โ ๐น๐น๐๐โ๐๐โ ๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐๐๐ โ ๐๐๐๐ ๏ฟฝ โ ๏ฟฝ ๏ฟฝ ๐๐๐๐ โ ๐๐๐๐ ๏ฟฝ ๐น๐น๐๐โ๐๐ ๐น๐น๐๐โ๐๐ ๐น๐น๐๐โ๐๐ Letting = in (2.3), it follows that 1 1 = . ๐น๐น๐๐+1 โ๐๐โ ๐น๐น๐๐+1 โ๐๐ ๐ฃ๐ฃ ๐๐ โ๐๐ โ ๐น๐น๐๐ โ๐๐ ๐น๐น๐๐ โ๐๐ 1 + 2 +1 = + ( )( ) +1 ๐น๐น๐๐โ๐๐โ ๐น๐น๐๐โ๐๐โ ๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐น๐น๐๐โ๐๐ Now l et = 1 in (2.3), and not hing (2.5), it f ollows th at +1 = +
( ๐ฃ๐ฃ ๐๐) = .Similarly, โ๐๐ if ( ) ( ), and then we can easily ๐ผ๐ผverify๐๐ that๐๐๐๐ ๐น๐น๐๐+1 โ๐๐ ๐น๐น๐๐ โ๐๐ ๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐ ๐๐ ๐ผ๐ผ๐๐ โค ๐๐ ๐๐๐๐ +1 = . Thus, in either case, only one observation is needed at iteration + 1.
๐๐ ๐๐ To๐๐ summarize,๐ผ๐ผ at the first iteration, two observations are made, and at each,๐๐ subsequent iteration, only one observation is necessary. Thus, at the end of iteration 2, we have ( ) computed 1 functional evaluations. Further, = 1 it follows ๐๐from โ 2.4 and ( ) 2.5 that ๐๐ โ ๐๐๐๐๐๐ ๐๐ ๐๐ โ
1 = = ( + ). Since either = or = , 1 1 2 1 1 1 2 1 2 ๐ผ๐ผtheoretically๐๐โ ๐๐๐๐โ no new๐๐ ๐๐โobservations๐๐๐๐โ are to be made๐ผ๐ผ๐๐โ at this๐๐ stage.๐๐โ However๐๐๐๐โ ๐ผ๐ผ in๐๐โ order to further reduce the interval of uncertainty; the last observation is placed slightly to the 1 right or to the left of the middle point = so that ( ) is the length 1 1 2 1 1 of the final uncertainty[ , ]. ๐ผ๐ผ๐๐โ ๐๐๐๐โ ๐๐๐๐โ โ ๐๐๐๐โ
๐๐ ๐๐ Algorithm (Fibonacci search๐๐ ๐๐ method)
The f ollowing i s a s ummary o f t he F ibonacci search f or minimizing a qua ssi-convex
function over the interval[ 1, 1].
Initialization s tep: ch ose๐๐ an๐๐ al lowable f inal l ength o f u n c ertainty > 0 and a
distinguishability c onstant > 0. let [ 1, 1] be t he in itial interval o f uโ ncertainty, a nd ( 1 1) choose the number of observations๐๐ n to ๐๐to be๐๐ taken such that > . Let 1 = 1 + ๐๐ โ๐๐ 2 ๐๐ โ ( 1 1) and ๐น๐น ๐ผ๐ผ ๐๐ ๐น๐น๐๐ โ ๐น๐น๐๐ ๐๐ โ ๐๐ 33
1 1 = 1 + ( 1 1). Evaluate ( 1) and ( 1), let k=1, and go to the main step ๐น๐น๐๐ โ ๐๐ ๐๐ ๐น๐น๐๐ ๐๐ โ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ Main steps
1) If ( ) > ( ), go to step2: and if ( ) ( ), go to step 3
2) Let๐๐ ๐ผ๐ผ+1๐๐ = ๐๐ ๐๐and๐๐ +1 = . Furthermore,๐๐ ๐๐๐๐ letโค ๐๐ ๐๐+1๐๐ = , and let ๐๐ ๐๐ ๐๐ 1 ๐๐ ๐๐ ๐๐ +1๐๐= +1๐ผ๐ผ+ ๐๐ ( ๐๐+1 +1). ๐ผ๐ผ ๐๐ ๐น๐น๐๐ โ๐๐โ ๐๐ ๐๐ ๐๐ โ๐๐ ๐๐ ๐๐ ๐๐If = ๐๐ 2, go๏ฟฝ to๐น๐น step5;๏ฟฝ ๐๐otherwiseโ ๐๐ evaluate ( +1) and go to step 4.
3) Let๐๐ +1๐๐= โ and +1 = . Furthermore, let๐๐ ๐๐๐๐+1 = , and let ๐๐ ๐๐ ๐๐ 2 ๐๐ ๐๐ ๐๐ +1๐๐ = +1๐๐ + ๐๐ ( ๐๐ +1 +1). I f =๐๐ 2: ๐ผ๐ผgo t o s tep 5: ot herwise ๐น๐น๐๐ โ๐๐โ ๐๐ ๐๐ ๐๐ โ๐๐ ๐๐ ๐๐ evalua๐ผ๐ผ te ๐๐ ( +1)๏ฟฝ and๐น๐น go๏ฟฝ to๐๐ stepโ 4 ๐๐ ๐๐ ๐๐ โ
4) Replace ๐๐ by๐ผ๐ผ ๐๐ + 1 and go to step 1 ( ) ( ) 5) Let =๐๐ 1๐๐, a nd = 1 + . If > , le t = and = ( ) ( ) ๐ผ๐ผ1๐๐. O therwise,๐ผ๐ผ๐๐โ if ๐๐ ๐๐ ๐ผ๐ผ๐๐โ ๐๐ , le๐๐ t๐ผ๐ผ ๐๐ =๐๐ ๐๐๐๐1 and ๐๐๐๐= ๐ผ๐ผ.๐๐ S top;๐๐ t๐๐ he [ ] ๐๐optimal๐๐โ solution lies in๐๐ the๐ผ๐ผ๐๐ intervalโค ๐๐ ๐๐ ๐๐ , ๐๐๐๐ ๐๐๐๐โ ๐๐๐๐ ๐ผ๐ผ๐๐ ๐๐ ๐๐ Example2.2: consider the following problem๐๐ ๐๐
2 + 2 3 5
Note that the function ๐๐is ๐๐๐๐๐๐strictly๐๐๐๐๐๐๐๐ quassi๐ฅ๐ฅ -convex๐ฅ๐ฅ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ on๐ ๐ ๐ ๐ the๐ก๐ก๐ก๐ก โintervalโค ๐ฅ๐ฅ and โค that the true minimum occurs a t = 1. We r educe t he i nterval of unc ertainty t o one whose l ength i s a t 8 most 0.2. H ence w e m ust h ave > = 40. S o t hat = 9.we a dopt t he ๐ฅ๐ฅ โ 0.2 distinguishabilty constant = 0.01 ๐น๐น๐๐ ๐๐
The first two observations ๐๐are located at
7 8 1 = 3 + (8) = 0.054545, 1 = 3 + (8) = 1.9454554 . ๐น๐น9 ๐น๐น9 ๐ผ๐ผ โ ๐น๐น ๐๐ โ ๐น๐น Note that ( 1) < ( 1). Hence, the interval of uncertainty is [ 3.000000,1.945454]
Table2.2; ๐๐Summary๐ผ๐ผ ๐๐ of๐๐ computations for the Fibonacci search methodโ
34
Iteration ( ) ( 1)
k ๐๐๐๐ ๐๐๐๐ ๐ผ๐ผ๐๐ ๐๐๐๐ ๐๐ ๐ผ๐ผ๐๐ ๐๐ ๐๐ 1 3.000000 5.000000 0.054545 1.945454 0.1120* 7.67569
2 โ3.000000 1.945454 1.109091 0.05454 0.9886* 0.112065โ 3 โ3.000000 0.054545 โ1.836363 1.109091 0.30049* 0.988099 4 โ1.836363 0.054545 โ1.109091 โ 0.67727 0.98809 โ0.89289* 0.8400* 5 โ1.836363 0.672727 โ1.399999 โ1.109091 โ 0.988099 0.9889 6 โ1.399999 โ0.672727 โ1.109091 โ0.963636 โ0.99867* 0.9986 7 โ1.109091 โ0.672727 โ0.963636 โ0.818182 โ0.96694* 0.9986 8 โ1.109091 โ0.818182 โ0.963636 โ0.963636 โ 0.99867 0.9986* 9 โ1.109091 โ0.963636 โ0.963636 โ0.953636 โ0.99785* โ โ โ โ โ
The process is repeated, and the computations are summarized in the above table. The values of that ar e computed at each i teration are i ndicated b y an ast erisk. N ote t hat
at = 8, ๐๐ = = 1, s o t hat no f unctional e valuations a re ne eded a t t his stage. ( ) For๐๐ = 9๐ผ๐ผ, ๐๐ =๐๐๐๐ ๐ผ๐ผ1 ๐๐โ= 0.963636 and = + = 0.953636. S ince, > ( ๐๐), th e f๐ผ๐ผ inal๐๐ ๐ผ๐ผ interval๐๐โ โo f u ncertainty [๐๐9๐๐, 9]๐ผ๐ผ ๐๐is [๐๐ 1.109091โ , 0.963636],๐๐ whose๐๐๐๐ ๐๐length๐ผ๐ผ๐๐ is 0.145455. We a pproximate t he๐๐ m๐๐ inimumโ t o be t he mโ idpoint 1.036364. Note f romโ example 2.1 that w ith t he same nu mber of obs ervations = 9โ, t he gol den section method gave a final interval of uncertainty whose length is 0.176๐๐ .
The Fibonacci method has the following limitations [9] 1. The initial interval of uncertainty, in which the optimum lies, has to be known. 2. The function being optimized has to be unimodal in the initial interval of uncertainty. 3. The exact optimum cannot be located in this method. Only an interval known as the final interval of uncertainty will be known. The final interval of uncertainty can be made as small as desired by using more computations. 4. The number of function evaluations to be used in the search or the resolution required has to be specified beforehand.
35
2.2.3Interpolation Methods The interpolation methods were originally developed as one-dimensional searches within multivariable optimization techniques, and are generally more efficient than Fibonacci-type approaches. The aim of all the one-dimensional minimization methods is to find , the smallest nonnegative value of , for which the function โ ๐๐ ( ) = ( + ) ๐๐ (2.8) attains a local๐๐ minimum.๐๐ ๐๐ ๐๐ Hence๐๐๐๐ if the original function (X) is expressible as an explicit function of ( = 1, 2, . . . , ), we can readily write the๐๐ expression for ( ) = ๐ฅ๐ฅ(X๐๐ +๐๐ S) for any ๐๐specified vector S, set ๐๐ ๐๐ ๐๐ ๐๐ ( ) = 0 (2.9) ๐๐๐๐ ๐๐ ๐๐๐๐ and solve Eq. (2.9) to find in terms of and . However, in many practical problems, โ the f unction ( ) cannot b๐๐ e ex pressed ๐๐ex plicitly๐๐ i n t erms o f .In such cases t he interpolation methods๐๐ ๐๐ can be used to find the value of . ๐๐ โ Example 2.3 Derive the one-dimensional minimization๐๐ problem for the following case: 2 2 2 Minimize ( ) = 1 โ 2 + 1 โ 1 ( 1) 2 1.00 from the starting๐๐ ๐๐ point๏ฟฝ๐ฅ๐ฅ =๐ฅ๐ฅ ๏ฟฝ ๏ฟฝalong๐ฅ๐ฅ the๏ฟฝ search direction๐ธ๐ธ = . 1 2 0.25 โ SOLUTION the new design๐๐ point๏ฟฝ ๏ฟฝ X can be expressed as ๐๐ ๏ฟฝ ๏ฟฝ โ 1 2 + = = 1 + S = 2 2 + 0.25 ๐ฅ๐ฅ โ ๐๐ By s ubstituting = ๐๐2 + ๏ฟฝ ๏ฟฝ ๐๐ = ๐๐ 2 +๏ฟฝ 0.25 in E q.๏ฟฝ ( 1), w e obt ain as a 1 ๐ฅ๐ฅ 2 โ ๐๐ function of as ๐ฅ๐ฅ โ ๐๐ ๐๐๐๐๐๐ ๐ฅ๐ฅ โ ๐๐ ๐ธ๐ธ ๐๐ 2 + ( ) = ๐๐ = [( 2 + )2 ( 2 + 0.25 )]2 + [1 ( 2 + )]2 2 + 0.25 โ ๐๐ ๐๐ ๐๐ ๐๐ ๏ฟฝ = 4 8.5๏ฟฝ 3 โ+ 31.0625ฮป โ2 โ 57.0 +ฮป 45.0 โ โ ฮป โ ๐๐ ฮป โ ฮป ฮป โ ฮป The value of at which ( ) attains a minimum gives . โ In th e f ollowing๐๐ s ections,๐๐ w๐๐ e d iscuss d ifferent in terpolation๐๐ methods with re ference to one-dimensional m inimization p roblems th at a rise d uring multivariable o ptimization problems. 36
2.2.3.1Quadratic interpolation method The qua dratic i nterpolation method uses t he function values onl y; he nce i t i s useful t o find t he minimizing s tep ( ) of functions (X) for which the partial derivatives with โ respect to the variables are๐๐ not available or๐๐ difficult to compute. This method finds the minimizing๐ฅ๐ฅ๐๐ step length in three stages. In the first stage the โ S-vector is normalized so that a step length of ๐๐ = 1 is acceptable. In the second stage the function ( ) is approximated by a quadratic๐๐ function ( ) and the minimum, , of โ ( ) is found.๐๐ If๐๐ is not sufficiently close to the true minimumโ ๐๐ a third stage is used.๐๐ฬ In โ โ 2 โthis๐๐ stage a new quadratic๐๐ฬ function (refit) ( ) = + +๐๐ is used to approximate ( ), and a new value of isโโฒ found.๐๐ This๐๐โฒ procedure๐๐โฒ๐๐ ๐๐ โฒis๐๐ continued until a โ that is sufficiently๐๐ ๐๐ close to is found.๐๐ฬ โ โ ๐๐Stageฬ 1. In this stage, the vector๐๐ is normalized as follows: Find | | ๐บ๐บ = max , ๐๐ where is t he component of ฮand di๐๐ vide๐ ๐ e ach c omponent of S by . A nother ๐ก๐กโ 1 ๐๐ 2 2 2 method๐ ๐ o f nor malization๐๐ i s t o f ind๐บ๐บ = ( 1 + 2 + + )2 and diฮ vide e ach component of S by . ฮ ๐ ๐ ๐ ๐ โฏ ๐ ๐ ๐๐ ( ) 2 Stage 2. Let =ฮ + + (2.10) be the quadraticโ ๐๐ function๐๐ used๐๐๐๐ for๐๐ ๐๐approximating the function ( ). It is worth noting at this point that a quadratic is the lowest-order polynomial for which๐๐ ๐๐ a finite minimum can exist. The necessary condition for the minimum of ( ) is that
= + 2 =โ ๐๐ 0. ๐๐โ that is, ๐๐ ๐๐๐๐ ๐๐๐๐ = (2.11) 2 โ ๐๐ The sufficiency condition for๐๐ ฬthe minimumโ of ( ) is that ๐๐ 2 2 > 0 โ ๐๐ ๐๐ ๐๐ โ that ๐๐๐๐ is,๏ฟฝ๐๐๏ฟฝ > 0 (2.12)
To evaluate the constants๐๐ , , in Eq. (2.10), we need to evaluate the function 37 ๐๐ ๐๐ ๐๐๐๐๐๐ ๐๐
( ) at three points. Let = , = , = be the points at which the function
๐๐ (๐๐) is evaluated and let ๐๐ , ๐ด๐ด, ๐๐ ๐ต๐ต be๐๐ the๐๐๐๐ corres๐๐ ๐ถ๐ถponding function values, that is, 2 ๐๐ ๐๐ ๐๐๐ด๐ด ๐๐๐ต๐ต ๐๐๐๐๐๐ =๐๐๐ถ๐ถ + + 2 ๐๐๐ด๐ด = ๐๐ + ๐๐๐๐ + ๐๐๐ด๐ด 2 ๐๐ ๐ต๐ต =๐๐ +๐๐๐๐ +๐๐๐ต๐ต (2.13) The solution of Eqs. (2.13) gives ๐๐๐ถ๐ถ ๐๐ ๐๐๐๐ ๐๐๐ถ๐ถ ( ) + ( ) + ( ) = (2.14) ( )( )( ) ๏ฟฝ๐๐๐ด๐ด๐ต๐ต๐ต๐ต ๐ถ๐ถ โ ๐ต๐ต ๐๐๐ต๐ต๐ถ๐ถ๐ถ๐ถ ๐ด๐ด โ ๐ถ๐ถ ๐๐๐ถ๐ถ๐ด๐ด๐ด๐ด ๐ต๐ต โ ๐ด๐ด ๏ฟฝ ๐๐ ( 2 2) + ( 2 2) + ( 2 2) = ๐ด๐ด โ ๐ต๐ต ๐ต๐ต โ ๐ถ๐ถ ๐ถ๐ถ โ ๐ด๐ด (2.15) ( )( )( ) ๏ฟฝ๐๐๐๐ ๐ต๐ต โ ๐ถ๐ถ ๐๐๐๐ ๐ถ๐ถ โ ๐ด๐ด ๐๐๐๐ ๐ด๐ด โ ๐ต๐ต ๏ฟฝ ๐๐ ( ) + ( ) + ( ) = ๐ด๐ด โ ๐ต๐ต ๐ต๐ต โ ๐ถ๐ถ ๐ถ๐ถ โ ๐ด๐ด (2.16) ( )( )( ) ๐๐๐ด๐ด ๐ต๐ต โ ๐ถ๐ถ ๐๐๐ต๐ต ๐ถ๐ถ โ ๐ด๐ด ๐๐๐ถ๐ถ ๐ด๐ด โ ๐ต๐ต ๐๐ โ From Eqs. (2.11), (2.15),๐ด๐ด โ ๐ต๐ต (2.๐ต๐ต16โ), the๐ถ๐ถ minimum๐ถ๐ถ โ ๐ด๐ด of ( ) can be obtained as ( 2 2) + ( 2 2) + ( 2 2) = = ๐๐๐๐๐๐ โ ๐๐ (2.17) 2 2[ ( ) + ( ) + ( )] โ ๐๐ ๐๐๐ด๐ด ๐ต๐ต โ ๐ถ๐ถ ๐๐๐ต๐ต ๐ถ๐ถ โ ๐ด๐ด ๐ด๐ด โ ๐ต๐ต ๐๐ฬ โ Provided that ๐๐, as given๐๐๐ด๐ด ๐ต๐ต byโ Eq.๐ถ๐ถ (2.16๐๐๐ต๐ต),๐ถ๐ถ is โpositive.๐ด๐ด ๐๐๐ถ๐ถ ๐ด๐ด โ ๐ต๐ต To s tart w ith,๐๐ f or s implicity, th e p oints , , a nd can b e ch osen as 0, , a nd 2 , respectively, where is a preselected trial step๐ด๐ด ๐ต๐ต length.๐ถ๐ถ By this procedure, we can๐ก๐ก save one๐ก๐ก function e valuation๐ก๐ก s ince = ( = 0) is ge nerally kno wn f rom t he pr evious iteration (of a multivariable search).๐๐๐ด๐ด ๐๐ For๐๐ this case, Eqs. (2.14) to (2.17) reduce to = (2.18)
4๐ด๐ด 3 ๐๐ = ๐๐ (2.19) 2 ๐๐๐ต๐ต โ ๐๐๐ด๐ด โ ๐๐๐ถ๐ถ ๐๐ + 2 = ๐ก๐ก (2.20) 2 2 ๐๐๐ถ๐ถ ๐๐๐ด๐ด โ ๐๐๐ต๐ต ๐๐ (4 3 ) = ๐ก๐ก (2.21) 4 2 2 โ ๐๐๐ต๐ต โ ๐๐๐ด๐ด โ ๐๐๐ถ๐ถ ๐๐ฬ ๐ก๐ก Provided that ๐๐๐ต๐ต โ ๐๐๐ถ๐ถ โ ๐๐๐ด๐ด + 2 = > 0 (2.22) 2 2 ๐๐๐ถ๐ถ ๐๐๐ด๐ด โ ๐๐๐ต๐ต The inequality (2.22) can๐๐ be satisfied if ๐ก๐ก
38
+ > (2.23) 2 ๐๐๐ด๐ด ๐๐๐ถ๐ถ (i.e., the function value should be smaller๐๐๐ต๐ต than the average value of and ).
This can be satisfied if ๐๐๐ต๐ต lies below the line joining and . ๐๐๐ด๐ด ๐๐๐ถ๐ถ The following procedure๐๐๐ต๐ต can be used not only to satisfy๐๐๐ด๐ด the๐๐ ๐ถ๐ถinequality (2.23) but also to ensure that the minimum lies in the interval0 < < 2 . โ โ . Assuming that = ๐๐ฬ ( = 0) and the initial๐๐ ฬstep size๐ก๐ก 0 are known, evaluate the ๐๐function at = ๐๐๐ด๐ด0 and๐๐ obtain๐๐ 1 = ( = 0). ๐ก๐ก . If 1 >๐๐ ๐๐,set ๐ก๐ก = 1 and evaluate๐๐ ๐๐the๐๐ function๐ก๐ก at = 0 ๐ด๐ด using๐ถ๐ถ Eq. (2.21) with = 0 . ๐๐ ๐๐2 ๐๐ ๐๐ ๐๐ 2 ๐๐ ๐ก๐ก โ ๐ก๐ก ๐๐ ๐๐๐๐๐๐ ๐๐ฬ ๐ก๐ก ๏ฟฝ . If 1 , set = 1, and evaluate the function
at๐๐ ๐๐ = โค2 0๐๐ ๐ด๐ดto find๐๐๐ต๐ต 2 =๐๐ ( = 2 0). ๐๐ . ๐๐If 2 turns๐ก๐ก out to ๐๐be greater๐๐ ๐๐ than 1๐ก๐ก, set = 2and compute ๐๐ according๐๐ to Eq. (2.21) with =๐๐ 0. ๐๐๐ถ๐ถ ๐๐ โ ๐๐ฬ. If 2 turns out to be smaller ๐ก๐กthan ๐ก๐ก1, set new 1 = 2 and 0 = 2 0 , and repeat steps ๐๐2 4๐๐ until we are able to find . ๐๐ ๐๐ ๐๐ ๐ก๐ก ๐ก๐ก โ Stage๐ก๐ก๐ก๐ก 3. The found in stage 2๐๐ฬ is the minimum of the approximating quadratic ( ) and โ we have to make๐๐ฬ sure that this is sufficiently close to the true minimum โof๐๐ ( ) โ โ before taking . several tests๐๐ฬ are possible to ascertain this. One possible๐๐ test๐๐ is ๐๐to โ โ compare ( ๐๐ฬ) withโ ๐๐ฬ ( ) and c onsider a s ufficiently go od a pproximation i f t hey โ โ โ differ not ๐๐more๐๐ฬ than byโ a ๐๐ฬsmall amount. This๐๐ฬ criterion can be stated as
(2.24) โ โ 1 โ๏ฟฝ๐๐ฬ ๏ฟฝ โ ๐๐ ๏ฟฝ๐๐ฬ ๏ฟฝ ๏ฟฝ โ ๏ฟฝ โค ๐๐ Another p ossible test i s t o ex๐๐ amine๏ฟฝ๐๐ฬ ๏ฟฝ w hether / is cl ose t o zero at . S ince t he โ derivatives of are not used in this method, we๐๐๐๐ can๐๐๐๐ use a finite-difference๐๐ฬ formula for / and use๐๐ the criterion ๐๐ ๐๐ ๐๐๐๐
39
+ (2.25) โ โ 2 โ โ 2 ๐๐ ๏ฟฝ๐๐ฬ ฮ๐๐ฬ ๏ฟฝ โ ๐๐ ๏ฟฝ๐๐ฬ โ ฮ๐๐ฬ ๏ฟฝ ๏ฟฝ โ ๏ฟฝ โค ๐๐ to s top t he procedure. I n E qs.ฮ๐๐ฬ (2.24) and (2.25), 1and 2 are small n umbers t o b e specified d epending o n the accu racy d esired. If t he ๐๐co nvergence๐๐ cr iteria st ated i n Eqs. (2.24) and (2.25) are not satisfied, a new quadratic function ( ) = + + 2
is us ed to a pproximate t he f unctionโโฒ ๐๐ (๐๐โฒ). T o๐๐โฒ๐๐ e valuate๐๐โฒ๐๐ the c onstants , , , the three be st f unction v alues of t he c๐๐ urrent๐๐ = ( = 0), = ๐๐โฒ ( ๐๐โฒ =๐๐ ๐๐๐๐0),๐๐ โฒ = ( = 2 0), and ห = = ( = ) are t๐๐ o๐ด๐ด be ๐๐ used.๐๐ T his ๐๐ process๐ต๐ต ๐๐ of๐๐ t rying๐ก๐ก t๐๐ o๐ถ๐ถ f it โ another๐๐ ๐๐ pol๐ก๐ก ynomial ๐๐t o obtain๐๐ ๐๐ a be tter๐๐ฬ a pproximation t o is know n a s refitting the โ polynomial. ๐๐ฬ For r efitting t he q uadratic, w e co nsider all p ossible situations an d select t he best t hree points of t he pr esent , , , a nd . T here a re f our pos sibilities. A new va lue of is โ โ computed by us ing t he๐ด๐ด ๐ต๐ต general๐ถ๐ถ f ormula,๐๐ฬ E q. (2.17). If th is also does not s atisfy ๐๐ฬt he โ convergence criteria stated in Eqs. (2.24) and (2.25), a new quadratic๐๐ฬ has to be refitted. Example 2.4 Find the minimum of = 5 5 3 20 + 5.
SOLUTION S ince this is n ot a m๐๐ ultivariable๐๐ โ o๐๐ ptimizationโ ๐๐ problem, w e c an p roceed directly to 2. Let the initial step size be taken as 0 = 0.5 = 0.
Iteration 1๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ก๐ก ๐๐๐๐๐๐ ๐ด๐ด = ( = 0) = 5
1 = ( = 0) = 0.03125๐๐๐ด๐ด ๐๐ 5(0.๐๐ 125) 20(0.5) + 5 = 5.59375 Since ๐๐1 < ๐๐, ๐๐ ๐ก๐ก = 1 = 5.โ59375, and findโ that โ ๐๐ ๐๐๐ด๐ด ๐ค๐ค๐ค๐ค ๐ ๐ ๐ ๐ ๐ ๐ ๐๐๐ต๐ต 2 ๐๐ = (โ = 2 0 = 1.0) = 19.0 As 2 < 1, we set new ๐๐0 = 1๐๐ and๐๐ 1 =๐ก๐ก 19.0. Again โwe find that 1 < and hence set ๐๐ = ๐๐1 = 19.0, a๐ก๐ก nd f ind t hat๐๐ 2 =โ ( = 2 0 = 2) = 43๐๐ . S ince๐๐๐ด๐ด 2 < 1, we๐๐ again๐ต๐ต ๐๐set 0 โ = 2 and 1 = 43. As๐๐ this 1๐๐ <๐๐ , set๐ก๐ก = 1 =โ 43 and evaluate๐๐ ๐๐ 2 = ( =๐ก๐ก 2 0 = 4)๐๐ = 629โ . This time๐๐ 2 >๐๐๐ด๐ด1 and๐๐ ๐ต๐ตhence๐๐ we setโ = 2 = 629 ๐๐and compute๐๐ ๐๐ from๐ก๐ก Eq. (2.21) as ๐๐ ๐๐ ๐๐๐ถ๐ถ ๐๐ โ 4( 43) 3(5)โ 629 1632 ๐๐ฬ = (2) = = 1.135 4( 43) 2(629) 2(5) 1440 โ โ โ ๐๐ฬ โ โ โ40
Convergence test: Since = 0, = 5, = 2, = 43, = 4, and = 629, the
values of , , can๐ด๐ด be found๐๐๐ด๐ด to be ๐ต๐ต ๐๐๐ต๐ต โ ๐ถ๐ถ ๐๐๐ถ๐ถ ๐๐ ๐๐ ๐๐๐๐๐๐ ๐๐ = 5, = 204, = 90 and ๐๐ ๐๐ โ ๐๐ ( ) = (1.135) = 5 204(1.135) + 90(1.135)2 = 110.9 โ Since โ ๐๐ฬ โ โ โ ห = = ( ) = (1.135)5 5(1.135)3 20(1.135) + 5.0 = 23.127 โ we have๐๐ ๐๐ ๐๐ฬ โ โ โ 116.5 + 23.127 = = 3.8 โ โ 23.127 โ๏ฟฝ๐๐ฬ ๏ฟฝ โ ๐๐ ๏ฟฝ๐๐ฬ ๏ฟฝ โ ๏ฟฝ โ ๏ฟฝ ๏ฟฝ ๏ฟฝ As t his q uantity i s v ery l๐๐ arge,๏ฟฝ๐๐ฬ ๏ฟฝco nvergence i sโ n ot ach ieved an d h ence we h ave t o use refitting. Iteration 2 Since < ห > , we take the new values of , , as โ ๐๐ฬ ๐ต๐ต ๐๐๐๐๐๐ ๐๐ ๐๐๐ต๐ต = 1.135, =๐ด๐ด ๐ต๐ต23๐๐๐๐๐๐.127๐ถ๐ถ ๐ด๐ด = 2.0, ๐๐๐ด๐ด = โ43.0 ๐ต๐ต = 4.0, ๐๐๐ต๐ต = 629โ .0 and compute new , using ๐ถ๐ถEq. (2.17), as ๐๐๐ถ๐ถ ( 23.127)(4.0โ 16.0) + ( 43.0)(16.0 1.29) + (629.0)(1.29 4.0) = ๐๐ฬ 2[( 23.127)(2.0 4.0) + ( 43.0)(4.0 1.135) + (629.0)(1.135 2.0)] โ โ โ โ โ โ ๐๐ฬ โ = 1.661 โ โ โ โ
Convergence test: To test the convergence, we compute the coefficients of the Quadratic as = 288.0, = 417.0, = 125.3
As ๐๐ ๐๐ โ ๐๐ ( ) = (1.661) = 288.0 417.0(1.661) + 125.3(1.661)2 = 59.7 โ โ ๐๐ฬ ห =โ ( ) = 12.8 5(4.โ 59) 20(1.661) + 5.0 = 38.37โ โ we obtain ๐๐ ๐๐ ๐๐ฬ โ โ โ
41
( ) ( ) 59.70 + 38.37 = = 0.556 โ ( ) โ 38.37 โ ๐๐ฬ โ ๐๐ ๐๐ฬ โ ๏ฟฝ โ ๏ฟฝ ๏ฟฝ ๏ฟฝ Since this quantity is not sufficiently๐๐ ๐๐ฬ small, weโ need to proceed to the next refit.
2.3. Multidimensional search
2.3.1 Unvariate Method In this method we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. By starting at a base point in the
iteration, we fix the values of 1 variables and vary the remaining varia๐๐๐๐ ble. Since ๐ก๐กโ ๐๐only one va riable i s c hanged, t he๐๐ โpr oblem b ecomes a o ne-dimensional m inimization problem a nd a ny of t he m ethods di scussed a bove c an be us ed t o p roduce a ne w ba se point +1. T he s earch is n ow c ontinued i n a ne w di rection. T his ne w di rection i s
obtained๐๐๐๐ by c hanging a ny one of t he 1 variables t hat w ere fixed i n t he pr evious iteration. In fact, the search procedure is๐๐ continuedโ by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence w e r epeat the e ntire p rocess o f seq uential m inimization. T he p rocedure i s continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows [9]:
1. Choose an arbitrary staring point 1 and set = 1.
2. Find the search direction as ๐๐ ๐๐ ๐๐๐๐ (1,0,0,...,0) for i = 1, n + 1, 2n + 1, . . . (1,0,0,...,0) for i = 2, n + 2, 2n + 2, . . . (0, 0, 1, . . . , 0) for i = 3, n + 3, 2n + 3, . . . = โง . (2.26) โช ๐๐ โช . ๐๐๐๐ . โจ(0,0,0,...,1) for i = n, 2n, 3n, . .. โช S 3. Determineโฉ whether should be positive or negative. For the current direction , this means find whether the๐๐ ๐๐function value decreases in the positive or negative direction.๐๐ For this w e take a sm all p robe l ength ( ) and e valuate = ( ), + = ( + ), + and = ( ). If < ๐๐ , will be the๐๐ correct๐๐ ๐๐ direction๐๐๐๐ ๐๐ for decreasing๐๐ ๐๐๐๐ ๐๐๐๐ the๐๐ โ ๐๐ ๐๐ ๐๐๐๐ โ ๐๐๐๐๐๐ ๐๐ ๐๐๐๐ ๐๐๐๐ 42
value of f and i f < , will be t he c orrect one . I f both +and are greater โ โ than , we take ๐๐ as the ๐๐minimum๐๐ โ๐๐๐๐ along the direction . ๐๐ ๐๐ 4. Find๐๐๐๐ the optimal๐๐๐๐ step length such that ๐๐๐๐ โ ( ยฑ ๐๐๐๐) = min( ยฑ ) (2.27) โ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ where + or sign has๐๐ ๐๐to be๐๐ used๐๐ depending๐๐๐๐ ๐๐ upon๐๐ ๐๐ whether or is the direction for
decreasing theโ function value. ๐๐๐๐ โ๐๐๐๐ 5. Set + = ยฑ Si depending on the direction for decreasing the function value, โ and +1๐๐๐๐ =๐๐ (๐๐๐๐+ ).๐๐ ๐๐ 6. if ๐๐๐๐ +1 ๐๐ ๐๐๐๐<๐๐ , stop. Otherwise go to 7 7. Setโ๐๐ t๐๐ he neโ w๐๐๐๐ โva lue๐๐ of = + 1 and go t o s tep2. C ontinue t his pr ocedure unt il no significant change is achieved๐๐ in๐๐ the value of the objective function. The univariate method is very simple and can be implemented easily. However, it will not c onverge r apidly t o t he opt imum s olution, a s it ha s a t endency t o os cillate with steadily d ecreasing p rogress t oward th e o ptimum. H ence it w ill b e b etter to s top th e computations a t s ome poi nt ne ar to t he op timum poi nt r ather t han t rying t o f ind the precise opt imum poi nt. I n t heory, t he uni variate m ethod c an be a pplied t o f ind t he minimum of any function that possesses continuous derivatives. However, if the function has a st eep valley, the method may not even converge. If the univariate search starts at
point P, t he f unction va lue c annot be de creased e ither i n t he di rection ยฑ 1 or i n t he
direction ยฑ 2. Thus the search comes to a halt and one may be misled to take๐๐ the point , which is c ertainly๐๐ n ot t he o ptimum p oint, a s the o ptimum p oint. T his s ituation a rises๐๐ whenever t he va lue of t he pr obe length ฮต needed f or de tecting t he proper d irection
(ยฑ 1 ยฑ 2) happens to be l ess t han t he nu mber of s ignificant f igures us ed i n t he
computations.๐๐ ๐๐๐๐ ๐๐ 2 2 Example 2.5 Minimize ( 1, 2) = 1 2 + 2 1 + 2 1 2 + 2 with the starting
point (0, 0). ๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ โ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ SOLUTION we will take the probe length ( ) as 0.01 to find the correct direction for
decreasing t he f unction va lue i n s tep 3. F urther,๐๐ w e w ill us e t he di fferential c alculus method to find the optimum step length along the direction ยฑ in step 4. โ Iteration = 1 ๐๐๐๐ ๐๐๐๐ 43 ๐๐
1 Step 2: Choose the search direction as = . 1 1 0
Step 3: To f ind w hether t he va lue ๐๐ of ๐๐decreases๏ฟฝ ๏ฟฝ al ong 1 or 1, w e us e t he pr obe
length . Since ๐๐ ๐๐ โ๐๐ 1 = ๐๐ ( 1) = (0, 0) = 0, + ๐๐ =๐๐ ๐๐ ( 1 +๐๐ 1 ) = ( , 0) = 0.01 0 + 2(0.0001) + 0 + 0 = 0.0102 ๐๐ ๐๐ ๐๐ > ๐๐๐๐1 ๐๐ ๐๐ โ ( ) ( ) ( ) = ๐๐ 1 โ 1 = , 0 = 0.01 0 + 2 0.0001 โ ๐๐ ๐๐ +๐๐ 0 +๐๐๐๐ 0 = ๐๐0.0098โ๐๐ < 1โ, โ is the correct direction for minimizingโ f from๐๐ 1. โ๐๐Step๐๐ 4: To find the optimum step length 1 , we minimize๐๐ โ ( 1 1 1) = ( 1, 0)๐๐ ( ) 2 2 ๐๐ ๐๐ โ ๐๐ ๐๐ = ๐๐ โ๐๐1 0 + 2( 1) + 0 + 0 = 2 1 1 โ๐๐ โ โ๐๐ ๐๐ โ ๐๐ As 1 is a step length we take 0 1 1 and solving above equation for 1 using golden 1 section method we get = ๐๐ 1 4 โค ๐๐ โค ๐๐ โ Step 5: Set ๐๐ 1 0 1 1 = โ = = 2 1 1 1 0 4 0 4 โ 0 โ ๐๐ ๐๐ ๐๐ ๐๐ ๏ฟฝ ๏ฟฝ1 โ ๏ฟฝ ๏ฟฝ ๏ฟฝ1 ๏ฟฝ = ( ) = ( , 0) = 2 2 4 8 Iteration = ๐๐ ๐๐ ๐๐ ๐๐ โ โ 0 Step 2: Choose๐๐ ๐๐ the search direction = 2 2 1
Step 3: Since 2 = ( 2) = 0.125๐๐ ๐๐, ๐๐ ๐๐ ๏ฟฝ ๏ฟฝ + ๐๐ =๐๐ (๐๐ 2 + โ2) = ( 0.25, 0.01) = 0.1399 < 2 ๐๐ = ๐๐ ( ๐๐2 + ๐๐๐๐2) = ๐๐ ( โ0.25, 0.01) =โ 0.1099 >๐๐2 โ 2 is the correct๐๐ direction๐๐ ๐๐ for decreasing๐๐๐๐ ๐๐ theโ valueโ of f from 2โ. ๐๐ 2 ๐๐Step 4: We minimize ( 2 + 2 2) to find 2 . ๐๐ Here ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ( 2 + 2 2) = ( 0.25, 2)
๐๐ ๐๐ ๐๐ ๐๐ 44 ๐๐ โ ๐๐
2 2 = 0.25 2 + 2(0.25) 2(0.25)( 2) + 2 2 โ โ ๐๐= 2 1.5 2 โ 0.125 ๐๐ ๐๐ using golden section๐๐ โ method๐๐ โ we get 2 = 0.75 โ Step 5: Set ๐๐ 025 0 0.25 = + = + 0.75 + = 3 2 2 2 0 1 0.75 โ โ โ ๐๐ ๐๐ ๐๐ 3๐๐ = ๏ฟฝ( 3) =๏ฟฝ 0.6875๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ
Next we set the iteration number๐๐ as ๐๐ = ๐๐3, and continueโ the procedure until the optimum 1.0 solution = ( ) ๐๐ = 1.25 is found. 1.5 โ โ โ Note: If the๐๐ method๏ฟฝ is๏ฟฝ to๐ค๐ค ๐ค๐ค๐ค๐คbeโ computerized,๐๐ ๐๐ โ a suitable convergence criterion has to be used
to test the point +1( = 1, 2, . . . ) for optimality.
๐๐ 2.3.2 Pattern Directions๐๐ ๐๐ In t he uni variate m ethod, w e s earch f or t he m inimum a long di rections pa rallel to t he coordinate axes. We noticed that this method may not converge in some cases and that even i f i t c onverges, i ts c onvergence w ill be ve ry s low a s w e a pproach t he opt imum point. These problems can be avoided by changing the directions of search in a favorable manner instead of retaining them always parallel to the coordinate axes. Let the points 1, 2, 3, . .. indicate the successive points found by the univariate method. It can be noticed that the lines joining the alternate points of the search ( . . ,1, 3; 2, 4; 3, 5; 4,6; . . . ) lie
in the general direction of the minimum and are known as๐๐ pattern๐๐ directions. It can be proved that if the objective function is a quadratic in two variables, all such lines pass t hrough t he m inimum. U nfortunately, t his pr operty w ill not be va lid f or multivariable f unctions even w hen t hey ar e q uadratics. H owever, t his idea can st ill b e used to achieve rapid convergence while finding the minimum of an n-variable function. Methods t hat u se p attern d irections as sear ch directions a re k nown as pattern search methods. One of the best-known pattern search methods, the Powellโs method, is discussed here. In general, a pattern search method takes n univariate steps, where n denotes the number of design va riables a nd t hen s earches f or t he minimum a long t he pa ttern di rection ,
defined by ๐๐๐๐ 45
= (2.28)
Where the point is obtained๐๐๐๐ ๐๐ at๐๐ โt he๐๐ ๐๐โ๐๐e nd of n univariate steps and is the st arting point before๐๐๐๐ taking the n univariate steps. In general, the directions used๐๐๐๐โ๐๐ prior to taking a move along a pattern direction need not be univariate directions.
2.3.4 POWELLโS METHOD Powellโs method is an extension of the basic pattern search method. It is the most widely used direct search method and can be proved to be a method of conjugate directions [9]. A conjugate directions method will minimize a quadratic function in a finite number of steps. S ince a g eneral nonl inear f unction c an be a pproximated r easonably w ell by a quadratic function near its minimum, a conjugate directions method is expected to speed up the convergence of even general nonlinear objective functions. The de finition, a method of ge neration of c onjugate d irections, a nd the pr operty of quadratic convergence are presented in this section.
2.3.4.1 Conjugate Directions Definition2.4 (Conjugate Directions). Let = [ ] be an ร symmetric matrix. A set of vectors (or directions) { } is said to๐ด๐ด be conjugate๐ด๐ด (more๐๐ accurately๐๐ A-conjugate) if ๐๐ ๐๐๐๐ = 0 , = 1, 2, . . . , , = 1, 2, . . . , (2.29) ๐๐ It can๐๐๐๐ ๐ด๐ด๐๐ b๐๐ e seen๐๐๐๐๐๐ t๐๐ hat๐๐๐๐ ๐๐ orthogonalโ ๐๐ ๐๐ di rections ๐๐ are๐๐ a s pecial๐๐ case o f co njugate directions( [ ] = [ ] . (2.29).
Definition2.5๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐ (Quadratically๐ค๐ค๐ค๐ค๐ค๐คโ ๐ด๐ด Convergent๐ผ๐ผ ๐๐๐๐ ๐ธ๐ธ๐ธ๐ธ Method): If a minimization method, us ing exact a rithmetic, c an f ind th e m inimum p oint in n steps while m inimizing a q uadratic function in n variables, the method is called a quadratically convergent method. Theorem 2.3Given a quadratic function of n variables and two parallel hyper planes 1 a nd 2 of di mension < . L et t he c onstrained s tationary poi nts of t he qua dratic
function in the hyper planes๐๐ be๐๐ 1 and 2, respectively. Then the line joining 1 and 2 is conjugate to any line parallel to๐๐ the hyperplanes.๐๐ ๐๐ ๐๐ Proof: Let the quadratic function be expressed as 1 ( ) = + + (2.30) 2 ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐ด๐ด๐ด๐ด ๐ต๐ต ๐๐ ๐ถ๐ถ 46
The gradient of is given by
๐๐ ( ) = + and hence ๐ป๐ป๐๐ ๐๐ ๐ด๐ด๐ด๐ด ๐ต๐ต ( 1) ( 2) = ( 1 2) (2.31)
If is any vector๐ป๐ป๐๐ ๐๐ parallelโ ๐ป๐ป๐๐ to๐๐ the hyper๐ด๐ด planes,๐๐ โ ๐๐ it must be orthogonal to the gradients ๐๐( ) ( 2). Thus ( ) ๐ป๐ป๐๐ ๐๐ ๐๐ ๐๐ ๐๐๐๐ ๐ป๐ป ๐๐ ๐๐ 1 = 1 + = 0 (2.32) ๐๐ ๐๐ ๐๐ ๐๐ ๐ป๐ป๐๐( ๐๐2) = ๐๐ ๐ด๐ด๐๐2 + ๐๐ ๐ต๐ต = 0 (2.33) ๐ป๐ป ๐๐ ๐๐ By subtracting ๐๐. (6๐ป๐ป๐๐.29๐๐) from ๐๐Eq.๐ด๐ด๐๐ (6.28),๐๐ we๐ต๐ต obtain ๐ธ๐ธ ๐ธ๐ธ ( 1 2) = 0 (2.34) ๐๐ Hence and (๐๐ 1 ๐ด๐ด ๐๐ 2โ) are๐๐ A-conjugate. Theorem๐๐ 2.4 If๐๐ a โquadratic๐๐ function 1 ( ) = + + (2.35) 2 ๐๐ ๐๐ is m inimized ๐๐seq๐๐ uentially,๐๐ ๐ด๐ด o๐ด๐ด nce along๐ต๐ต ๐๐ each๐ถ๐ถ d irection o f a set o f n mutually c onjugate directions, t he m inimum o f t he f unction will be f ound a t o r be fore t he nth st ep
irrespective of the starting point. ๐๐ Proof: Let X minimize the quadratic function (X). Then โ (X ) = B + AX = 0 ๐๐ (2.36) โ โ Given a point๐ป๐ป๐๐X1 and a set of linearly independent directions S ,S2,...,Sn, constants Can always be found such that ๐๐
๐ฝ๐ฝ๐๐ = 1 + ๐๐ (2.37) โ =1 ๐๐ ๐๐ ๏ฟฝ ๐ฝ๐ฝ๐๐ ๐๐๐๐ where the vectors S ,S2,...,Snhave๐๐ been used as basis vectors. If the directions S are A-
conjugate and none ๐๐of them is zero, the Si can easily be shown to be linearly independent๐๐ and the can be determined as follows.
Equations๐ฝ๐ฝ๐๐ (2.36) and (2.37) lead to
B + AX1 + A ๐๐ = 0 (2.38) =1 ๏ฟฝ๏ฟฝ ๐ฝ๐ฝ๐๐ ๐๐๐๐๏ฟฝ Multiplying this equation throughout๐๐ by , we obtain ๐ป๐ป ๐๐ ๐บ๐บ 47
T T Sj (B + AX1) + Sj A ๐๐ = 0 (2.39) =1 ๏ฟฝ๏ฟฝ ๐ฝ๐ฝ๐๐๐๐๐๐๏ฟฝ Equation (2.39) can be rewritten as ๐๐ T T (B + AX1) Sj + Sj ASj = 0 (2.40)
that is, ๐ฝ๐ฝ๐๐ T (B + AX1) Sj = T (2.41) Sj ASj ๐ฝ๐ฝ๐๐ โ Now consider an iterative minimization procedure starting at point 1, and successively
minimizing t he qua dratic (X) in th e d irections S ,S2,...,Sn , w here๐๐ t hese directions satisfy Eq. (2.29). The successive๐๐ points are determined๐๐ by the relation Xi+1 = Xi + i Si , i = 1 to n (2.42) โ where is found by minimizing (Xiฮป + Si) so that โ T ๐๐๐๐ Si ๐๐(Xi+1) ๐๐ =๐๐ 0 (2.43) Since the gradient of at the point๐ป๐ป๐๐ Xi+1 is given by ( ) ๐๐ Xi+1 = B + AXi+1 (2.44) Eq. (2.43) can be written๐ป๐ป๐๐ as T Si {B + A(Xi + S )} = 0 (2.45) โ This equation gives ๐๐๐๐ ๐๐ (B + AX ) S = i (2.46) ST AS ๐๐ โ i i ๐๐ ๐๐ From Eq. (2.42), we can express๐๐ โ as 1 ๐ฟ๐ฟ๐๐ Xi = X1 + ๐๐โ (2.47) =1 โ ๏ฟฝ ๐๐๐๐ ๐๐๐๐ so that ๐๐ 1 T T Xi ASi = X1 ASi + ๐๐โ =1 โ โ ๐๐ ๐๐ ๐๐ T ๏ฟฝ ๐๐ ๐๐ ๐ด๐ด๐๐ = X1 AS๐๐i (2.48) using the relation (2.29). Thus Eq. (2.46) becomes
48
S = (B + AX ) i (2.49) 1 STAS โ ๐๐ i i ๐๐ which can be seen๐๐ toโ be identical to Eq. (2.41). Hence the minimizing step lengths are given by . S ince t he opt imal poi nt X is o riginally ex pressed as a su m o f n โ โ quantities ๐ฝ๐ฝ1๐๐,๐๐๐๐2,...,๐๐๐๐ , which have been shown to be equivalent to the minimizing step lengths, the๐ฝ๐ฝ minimization๐ฝ๐ฝ ๐ฝ๐ฝ๐๐ process leads to the minimum point in n steps or less. Since we have not made any assumption regarding X1 and the order of S1,S2,...,Sn , the process converges in n steps or less, independent of the starting point as well as the order in which the minimization directions are used. Example 2.6 Consider the minimization of the function 2 2 ( 1, 2) = 6 1 + 2 2 6 1 2 1 2 2 1 If = denotes๐๐ a๐ฅ๐ฅ s๐ฅ๐ฅ earch d irection,๐ฅ๐ฅ f๐ฅ๐ฅ indโ a di๐ฅ๐ฅ rection๐ฅ๐ฅ โ ๐ฅ๐ฅ โ that๐ฅ๐ฅ is conjugate to the 2 2 ๐๐ direction๐บ๐บ ๏ฟฝ 1๏ฟฝ. ๐๐
SOLUTION๐๐ The objective function can be expressed in matrix form as 1 ( ) = + [ ] 2 ๐๐ ๐๐ ๐๐ ๐๐ ๐ต๐ต ๐๐1 1 ๐๐ ๐ด๐ด ๐๐ 12 6 1 = { 1 2} + { 1 2} 2 2 6 4 2 ๐ฅ๐ฅ โ ๐ฅ๐ฅ and the Hessian matrix [ ] can beโ identifiedโ ๏ฟฝ as๏ฟฝ ๐ฅ๐ฅ ๐ฅ๐ฅ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๐ฅ๐ฅ โ ๐ฅ๐ฅ 12 6 ๐ด๐ด [ ] = 6 4 โ 1 ๐ด๐ด ๏ฟฝ 1๏ฟฝ The direction 2 = will be conjugate to 1 = if 2 โ 2 ๐ ๐ ๐๐ ๏ฟฝ ๏ฟฝ ๐๐12 ๏ฟฝ 6๏ฟฝ 1 ๐ ๐ 1 [ ] 2 = (1 2) = 0 6 4 2 ๐๐ โ ๐ ๐ which upon e xpansion gives๐๐ ๐ด๐ด 2๐๐ = 0 or ๏ฟฝ = arbitrary๏ฟฝ ๏ฟฝ and๏ฟฝ = 0. Since can have 2 1 โ ๐ ๐ 2 1 any value, we select 1 = 1 and๐ ๐ the desired๐ ๐ conjugate direction๐ ๐ can be expressed๐ ๐ as 1 = . ๐ ๐ 2 0 Powellโs๐๐ ๏ฟฝ ๏ฟฝ Algorithm In Powellโs method for a two-variable function the function is first minimized once along each of the coordinate directions starting with the second coordinate direction and then in the corresponding pattern direction. For the next cycle of minimization, we discard one of 49
the co ordinate d irections ( the 1 direction in t he p resent ca se) i n f avor o f t he p attern
direction. ๐ฅ๐ฅ Then w e ge nerate a ne w pa ttern d irection S2. F or t he ne xt c ycle of m inimization, we
discard one of the previously used coordinate directions (the 2 direction in this case) in
favor of t he ne wly ge nerated pattern di rection. Then, we minimize๐ฅ๐ฅ a long di rections S1 and S2. F or t he ne xt c ycle of minimization, s ince t here i s n o c oordinate di rection to
discard, w e r estart t he w hole p rocedure b y minimizing a long t he 2 direction. T his
procedure is continued until the desired minimum point is found. ๐ฅ๐ฅ Note that the search will be made sequentially in the directions Sn ; S1,S2,S3,...,Sn 1,Sn ;
(1) (1) (1) (2) ( ) (1) (2) (3) โ Sp ; S2, , ; ; , , ; , . .. until the minimum point is found. ๐๐ ( ) Here indicates๐๐๐๐ ๐๐ ๐๐the coordinate๐๐๐๐ ๐๐๐๐ direction๐๐๐๐ ๐๐ ๐๐ and๐๐๐๐ the pattern direction. ๐๐ ๐ก๐กโ ๐๐ ๐๐ ๐๐(1) (2) (3) Quadratic๐๐ Convergence the pattern directions๐ข๐ข ๐๐Sp , ๐๐ , , . .. are nothing but the (1) (2) lines joining the minima found along the directions , ๐๐๐๐ , ๐๐๐๐ , . . ., respectively. Hence
(1)๐๐ ๐๐(1) ๐๐ (2) by T heorem 2.3, t he pa irs of directions (Sn ,Sp ๐๐ ),๐๐ (Sp ๐๐, ), and s o on, are A-
(1) (2) ๐๐ conjugate. Thus all the directions Sn ,Sp , , . .. are A-conjugate.๐๐ Since, by T heorem
2.4, a ny s earch m ethod i nvolving minimization๐๐๐๐ a long a s et of c onjugate di rections i s quadratically convergent, Powellโs method is quadratically convergent. From the method used for constructing the (1) (2) conjugate directions Sp , , . . . , we find that n minimization cycles are required to complete the construction of๐๐๐๐ n conjugate directions. In the cycle, the minimization is done along t he already c onstructed i conjugate directions๐๐ ๐ก๐กaโ nd t he n โ i nonconjugate (coordinate) d irections. T hus a fter n cycles, al l t he n search d irections ar e m utually conjugate a nd a qua dratic w ill t heoretically be m inimized in 2one-dimensional
minimizations. This proves the quadratic convergence of Powellโs method.๐๐ It is to be noted that as with most of the numerical techniques, the convergence in many practical problems may not be as good as the theory seems to indicate. Powellโs method may require a lot more iterations to minimize a function than the theoretically estimated number. There are several reasons for this:
50
1. Since the number of cycles n is valid only for quadratic functions, it will take generally greater than n cycles for nonquadratic functions. 2. The proof of quadratic convergence has been established with the assumption that the exact minimum is found in each of the one-dimensional minimizations. However, the actual minimizing step lengths will be only approximate, and hence the โ subsequent directions will not be conjugate. Thus๐๐๐๐ the method requires more number of iterations for achieving the overall convergence. 3. Powellโs m ethod, d escribed a bove, c an br eak dow n be fore t he m inimum poi nt i s found. T his i s because t he sear ch d irections might be come de pendent or a lmost
dependent during numerical computation. ๐๐๐๐ Powellโs method is a very popular means of successive minimizations along conjugate directions. It is a zero-order method, requiring the evaluation of (x) only. If the
Problem involves n design variables, the basic algorithm is given๐น๐น by the following [3] โข Choose a point 0 in the design space.
โข Choose the starting๐ฅ๐ฅ vectors , = 1, 2, . . . , (the usual choice is = , where is the unit vector in the -coordinate๐ฃ๐ฃ๐๐ ๐๐ direction). ๐๐ ๐ฃ๐ฃ๐๐ ๐๐๐๐ ๐๐๐๐ โข Cycle ๐ฅ๐ฅ๐๐ โ do with = 1, 2, . . . ,
* Minimize๐๐ ( ) along๐๐ the line through 1 in the direction of . Let the minimum point be xi. ๐น๐น ๐ฅ๐ฅ ๐ฅ๐ฅ๐๐โ ๐ฃ๐ฃ๐๐ โ end do
โ +1 0 (this vector can be shown to be conjugate to +1 produced in the
previous๐ฃ๐ฃ๐๐ โ cycle)๐ฅ๐ฅ โ ๐ฅ๐ฅ๐๐ ๐ฃ๐ฃ๐๐ โ Minimize ( ) along the line through 0 in the direction of +1. Let the minimum
Point be +1๐น๐น. ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฃ๐ฃ๐๐ โ if |xn+1๐ฅ๐ฅ ๐๐ x | < exit loop โ do with โ =๐๐ 1, 2, .ฮต . . , ๐๐ + ( 1 is discarded,๐๐ the other vectors are reused) โโ end๐๐๐๐ โdo ๐๐๐๐ ๐๐ ๐ฃ๐ฃ โข end cycle
51
Powell d emonstrated that th e v ectors +1 produced i n su ccessive cy cles ar e mutually
conjugate, so that the minimum point of๐ฃ๐ฃ ๐๐a quadratic surface is reached in precisely cycles. In practice, the merit function is seldom quadratic, but as long as it can be approximated locally, P owellโs m ethod w ill work. O f c ourse, i t us ually t akes m ore t han n cycles t o arrive at the minimum of a nonquadratic function. Note that it takes n line minimizations to construct each conjugate direction.
We s tart w ith p oint 0 and ve ctors v1and v2. T hen w e f ind t he di stance 1 that
minimizes (x0 + v1)๐ฅ๐ฅ, f inishing up a t poi nt x1 = x0 + 1 1. Next, w e d etermine๐ ๐ 2 that m inimizes๐น๐น (๐ ๐ 1 + 2) which t akes u s t o x2 = x๐ ๐ 1 ๐ฃ๐ฃ + 2v2. T he l ast sear ch๐ ๐ direction i s v3 =๐น๐น x๐ฅ๐ฅ2 x0๐ ๐ ๐ฃ๐ฃ. A fter f inding 3 by m inimizing ๐ ๐ (x0 + sv3) we g et to x3 = x0 + 3v3, completingโ the cycle. ๐ ๐ ๐น๐น As explained before,๐ ๐ the first cycle starts at point 0 and ends up at 3. The second cycle takes u s t o 6, w hich i s t he opt imal p oint. T he di๐๐ rections 0 3 and๐๐ 3 6 are m utually conjugate. ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ Powellโs m ethod d oes have a m ajor f law t hat h as t o b e remedied i f (x) is not a
quadratic; the algorithm tends to produce search directions that gradually become๐น๐น linearly dependent, thereby ruining the progress toward the minimum. The source of the problem
is the automatic discarding of 1 at the end of each cycle. It has been suggested that it is
better to throw out the direction๐ฃ๐ฃ that resulted in the largest decrease of ( ), a policy that we ad opt. I t seem s co unter-intuitive t o di scard t he be st di rection, bu๐น๐น t i t๐ฅ๐ฅ i s likely t o be close to the direction added in the next cycle, thereby contributing to linear dependence. As a result of the change, the search directions cease to be mutually conjugate, so that a quadratic form is not minimized in n cycles any more. This is not a significant loss, since in practice ( ) is seldom a quadratic anyway. 2 2 Example 2.7๐น๐น ๐ฅ๐ฅMinimize ( 1, 2) = 1 2 + 2 1 + 2 1 2 + 2 from the starting 0 point = using Powellโs๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ method.๐ฅ๐ฅ โ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ 0 SOLUTION๐ฟ๐ฟ๐๐ ๏ฟฝ ๏ฟฝ Cycle 1: Univariate Search
52
0 We minimize f along = = from . To find the correct direction (+ 2 1 1 2 ๐๐ 2) for decreasing the value๐๐ of๐๐ , we๏ฟฝ ๏ฟฝtake the๐๐ probe length as = 0.01. As ๐๐ ๐๐๐๐ โ
๐๐1 = ( 1) = 0.0, and ๐๐ ๐๐ + ๐๐ ๐๐ ๐๐ = ( 1 + 2) = (0.0, 0.01) = 0.0099 < 1 decreases along๐๐ the direction๐๐ ๐๐ +๐๐๐๐S2. To find๐๐ the minimizingโ step length๐๐ along 2, we โ ๐๐minimize ๐๐ ๐๐ 2 ( 1 + 2) = (0.0, ) = 0 Using golden section method๐๐ ๐๐ we get๐๐๐๐ = 12๐๐ , ๐๐ ๐๐ โ = ๐๐ + = . 2 1 2 0.5 โ โ 0 0.5 Next we minimize along = ๐๐from =๐ค๐ค๐ค๐ค โ๐๐ ๐๐๐๐since๐๐ ๐๐ ๐๐ ๐๐ ๏ฟฝ ๏ฟฝ 1 1 2 0.0 ๐๐ 2 ๐๐ = ๏ฟฝ( ๏ฟฝ2) =๐๐ (0.0,๏ฟฝ 0.5)๏ฟฝ = 0.25
+ = (๐๐ 2 +๐๐ ๐๐1) = ๐๐(0.01, 0.50) =โ 0.2298 > 2 ๐๐ =๐๐ ๐๐ ( 2 ๐๐๐๐ 1) =๐๐ ( 0.01, 0.50) โ = 0.2698๐๐ ( ) ( ) 2 decreases al ong๐๐ โ 1. ๐๐As๐๐ โ ๐๐๐๐2 1 ๐๐=โ , 0.50 =โ 2 2 0.25,using 1 0.5 golden๐๐ section methodโ๐๐ we get๐๐ ๐๐ = โ . Hence๐๐๐๐ ๐๐ =โ๐๐ =๐๐ โ ๐๐ โ 2 3 2 1 0.5 โ โ 0 0.5 โ Now we minimize along ๐๐ = from ๐๐ = ๐๐ โ ,as๐๐ ๐๐ = ๏ฟฝ ( ) ๏ฟฝ = 0.75, 2 1 3 0.5 3 + โ ๐๐ = ( 3 + ๐๐2) = ๐๐ ( 0.5,๏ฟฝ 0.๏ฟฝ51) =๐๐ 0.7599๏ฟฝ <๏ฟฝ 3๐๐, f decreases๐๐ ๐ฟ๐ฟ alโ ong + 2 direction.๐๐ ๐๐ ๐๐ ๐๐๐๐ ๐๐ โ โ ๐๐ ๐บ๐บ Since 2 ( 3 + 2) = ( 0.5, 0.5 + ) = 0.75, using gol den s ection m ethod 1 we get = ๐๐ ๐๐ ๐๐๐๐ 2 ๐๐ โ ๐๐ ๐๐ โ ๐๐ โ โ This gives๐๐ 0.5 = + = 4 3 2 1.0 โ โ Cycle 2: Pattern Search ๐๐ ๐๐ ๐๐ ๐๐ ๏ฟฝ ๏ฟฝ Now we generate the first pattern direction as 1 0 (1) 0.5 = = 1 = 4 2 2 0.5 1 2 โ ๐๐๐๐ ๐๐ โ ๐๐ ๏ฟฝโ ๏ฟฝ โ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ
53
( ) and minimize along from X4. Since ๐๐ ๐๐ ๐บ๐บ ๐๐ 4 = ( ) = 1.0 + (1) ๐๐ = ๐๐ ( ๐๐๐๐ + โ = ( 0.5 0.005, 1 + 0.005) ๐๐ ๐๐ ๐๐ ๐๐ ๐๐๐๐ =๐๐ ( ๐๐0.505โ , 1.โ005) = 1.004975 (1) f decreases in the positive direction of . As๐๐ โ โ
(1) ๐๐ 2 ( 4 + = ( 0.5 0.5 , 1.0๐๐ + 0.5 ) = 0.25 0.50 1.00, using๐๐ ๐๐ golden๐๐๐๐๐๐ section๐๐ methodโ โwe get๐๐ = 1.0 and๐๐ hence ๐๐ โ ๐๐ โ โ 1 ๐๐ 1 (1) 1 X = X + S = + 1.0 2 = 5 4 2 1 โ 1.5 โ 1 โ ๐๐ ๐๐ ๏ฟฝโ ๏ฟฝ ๏ฟฝ 2 ๏ฟฝ ๏ฟฝ ๏ฟฝ The point X can be identified to be the optimum point.
If we do not๐๐ recognize 5 as the optimum point at this stage, we proceed to 0 minimize f along the direction๐๐ = from . Then we would obtain 2 1 5 + 5 = (X5) = 1.25, =๐๐ ( ๏ฟฝ5 ๏ฟฝ+ ๐๐2) > 5, and = (X5 S2) > f5 โ This๐๐ shows๐๐ that f โcannot be๐๐ minimized๐๐ ๐๐ along๐๐๐๐ S2, and๐๐ hence๐๐ X5 will๐๐ be theโ optimum๐๐ point. In this example the convergence has been achieved in the second cycle itself. This is to be expected in this case, as f is a quadratic function, and the method is a Quadratically convergent method. The n umerical re sult f or th is example i s s ummarized be low ( see a ppendix f or t he Powellโs matlab code). xmin fmin ------(-1.503, 2.356) -0.872520631787970 (-1.000, 1.500) -1.249999992736614 (-1.000, 1.500) -1.249999999974476 The minimum point (-1.000, 1.500) is reached at the 3th cycle.
54
Chapter 3 Trust region methods
3.1Trust region frame work In the last section of the third chapter we explained the trust region methods as one of the solution procedures for treating a nonlinear programming problem. Here a b rief review will be helpful to follow the integration of trust region method into DFO algorithm. The trust region frame work is usually in the context when at least the gradient and sometimes Hessian of the objective function can be evaluated or estimated accurately.
Main steps of a typical trust region method are [2]
1. Given a current iterate, build a good local approximation model. 2. Choose a neighborhood around the current iterate when the model โis trustedโ to be accurate. Minimize the model in this neighborhood. 3. Determine if the step is successful by evaluating the true objective function at the new poi nt c omparing t he t rue r eduction i n value of t he obj ective w ith t he reduction predicted by the model. 4. If the step is successful, accept the new point as the next iterate. Increase the size of the trust region, if the success is really significant. Otherwise reject the new point and reduce the size of the trust region. 5. Repeat until convergence.
For a m odel ba sed on t he T aylor s eries e xpansion w e k now th at if th e t rust re gion is made small enough, then the approximation is sufficiently accurate and the algorithm will make a successful step (unless the optimum has been reached).
To use trust region fame work in derivative free case we use an alternative approximation technique, which does not use derivative estimates. Quadratic interpolation is one such technique which can be applied successfully with in a trust region method. However, we need to guarantee that the approximation model is locally good: that is that a successful step will be made after sufficient reduction of the trust region.
55
3.2 Quadratic interpolation Consider the problem of interpolating a given or suitably chosen function : by a { 1 2 }๐๐ quadratic p olynomial ( ) at a ch osen set o f p oints = , , โฆ ,๐๐ โ โ โ. the ๐๐ ๐๐ quadratic polynomial ๐๐( ๐ฅ๐ฅ) is an interpolation of the function๐๐ (๐ฆ๐ฆ ) ๐ฆ๐ฆwith respect๐ฆ๐ฆ โ to โ the set if ๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐๐ = ( = 1,2, โฆ , ) (3.1) ๐๐ ๐๐ ๐๐๏ฟฝ๐ฆ๐ฆ ๏ฟฝ ๐๐๏ฟฝ๐ฆ๐ฆ ๏ฟฝ ๐๐ ๐๐ Such, t hat f i s know n a t a ll f initely m any e lements o f . H ere w e n ote that is our model w hich w e de fined a s in th e f irst c hapter. W๐๐ ith th e c ontext o f trust๐๐ region methods; that ๐๐๐๐
( ) = ( )
๐๐ Suppose that the space of quadratic polynomials๐๐ ๐ฅ๐ฅ ๐๐ is๐ฅ๐ฅ spanned by a set of basis functions
(. )( = 1, 2, โฆ ).
๐๐ ๐๐Then any๐๐ quadratic๐๐ polynomial can be written in terms of these basis functions, that is
( ) = ๐๐ ( ), =1 ๐๐ ๐ฅ๐ฅ ๏ฟฝ ๐ผ๐ผ๐๐ ๐๐๐๐ ๐ฅ๐ฅ ๐๐ Where, the coefficient vector = ( 1, 2,โฆ, ) is to be determined. We need ๐๐ ๐๐ 1 1๐ผ๐ผ ๐ผ๐ผ ๐ผ๐ผ ๐ผ๐ผ = 1 + + + = ( + 1)( + 2) Points to f ind a ll o f th e in terpolation 2 2 ๐๐โ 1 parameters.๐๐ ๐๐ If๐๐ we ๐๐have ๏ฟฝ ๏ฟฝ( + 1๐๐)( + 2)๐๐ points, we can ensure that the quadratic model is 2 entirely de termined by t he๐๐ f ollowing๐๐ s ystem of e quations. When t his i s t he c ase, t he system of linear equations
q
i i = ( = 1,2, โฆ , ) (3.2) i=1 ๐๐ ๐๐ ๏ฟฝ ฮฑ ฯ ๏ฟฝ๐ฆ๐ฆ ๏ฟฝ ๐๐๏ฟฝ๐ฆ๐ฆ ๏ฟฝ ๐๐ ๐๐ Can be solved to, derive the interpolation parameters. The parameter or coefficient matrix of this system is of the type ร and looks as follows; 56 ๐๐ ๐๐
1 1 1( ) ( ) ( ) = . (3.3) ๐๐ ๐ฆ๐ฆ โฏ ๐๐๐๐ ๐ฆ๐ฆ 1( ) ( ) ๐๐ ๐๐ ๏ฟฝ โฎ๐๐ โฑ โฎ ๐๐ ๏ฟฝ ๐๐ ๐ฆ๐ฆ โฏ ๐๐๐๐ ๐ฆ๐ฆ For a given set of points and a set of function values, an interpolation polynomial exists and is unique if and only if ( ) is square, that is = , and nonsingular. Theoretically, this m eans that t he sy stem ๐๐(3.3)๐๐ can be s olved but๐๐ i๐๐ n pr actice t he s olvability of t his system depends on whether the matrix ( ) is ill-conditioned or not.
From t he a bove a rgument, w e c onclude๐๐ ๐๐ that if w e m anage t o de termine t he qu adratic 1 polynomial uniquely, then we have = = ( + 1)( + 2). however, we need to be 2 1 aware o f t he f act t hat not an y ( ๐๐+ 1)(๐๐ + 2)๐๐ point i๐๐ n can be i nterpolated by a 2 ๐๐ quadratic p olynomial. O bviously a๐๐ lthough,๐๐ 3 distinct po intsโ c an be interpolated by a quadratic f unction in u nivariate interpolation, this is not the c ase in multivariate interpolation. i n f act, 3 points w ill not be e nough t o obt ain a qua dratic i nterpolation polynomial w henever t he di mension of t he i nterpolation s pace i s gr eater t han one . B y inspection, one can s ee t hat 6 p oints a re ne cessary t o obt ain a unique qua dratic interpolation of a f unction i n t wo di mensions. H owever, a n i nterpolation s et Y of s ix points l ying on one l ine c annot be i nterpolated by a qua dratic f unction. Therefore, t he points of Y must satisfy a geometric condition to ensure the existence and uniqueness of the quadratic model. This geometric condition is known as the poisedness of the point set. Definition3.1: A set o f points Y i s cal led p oised, w ith r espect t o a g iven su bspace o f polynomials, if the considered function ( ) can be interpolated at the points of Y by the polynomials f rom t his s ubspace, t hat i๐๐ s, i๐ฅ๐ฅ f t here a lways e xists a s uitable i nterpolating polynomial in that sub space.
Remark: In DFO, poisedness is a necessary geometric condition on the interpolation set Y that ensures the existence and uniqueness of the quadratic model ( )wanted and used in
DFO algorithm. ๐๐ ๐ฅ๐ฅ
We illustrate the implied geometric character of poisedness by the following examples.
57
Example3.1: Suppose n = 2 a nd Y is a set of six points on a unit circle. Then, 2
cannot be interpolated by a polynomial of the form ๐๐ โโ 2 2 0 + 1 1 + 2 2 + 1,1 1 + 1,2 1 2 + 2,2 2 . Hence, is not poised with respect to
the๐๐ ๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐๐ Space of q uadratic po lynomials. On the o ther h and, Y can b e interpolated b y a polynomial of the form 2 2 3 0 + 1 1 + 2 2 + 1,1 1 + 1,2 1 2 + 2,2 2 + 1,1,1 1 . Therefore is poised in an
appropriate๐๐ ๐๐ ๐ฅ๐ฅ subspace๐๐ ๐ฅ๐ฅ of๐๐ the๐ฅ๐ฅ space๐๐ of๐ฅ๐ฅ cubic๐ฅ๐ฅ ๐๐polynomials.๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ ๐๐ 2 2 Example3.2: Consider the two quadrics 1( , ) = 2 + and ( ) 2 2 ( ) 2 , = + , Whose i ntersection๐๐ cu๐ฅ๐ฅ rve๐ฆ๐ฆ projects๐ฅ๐ฅ ๐ฅ๐ฅ in โt he๐ฆ๐ฆ , to th e ๐๐conics๐ฅ๐ฅ ๐ฆ๐ฆ ๐ฅ๐ฅ ๐ฆ๐ฆ ๐ผ๐ผ ๐ฅ๐ฅ ๐ฆ๐ฆ โ ๐๐๐๐๐๐๐๐๐๐ ( , ) = 0 ( , ) = 2 . Namely,
๐ถ๐ถ ๐ฅ๐ฅ ๐ฆ๐ฆ ๐ค๐ค๐ค๐ค๐ค๐คโ ๐ถ๐ถ ๐ฅ๐ฅ ๐ฆ๐ฆ ๐ฅ๐ฅ โ๐ฆ๐ฆ 2 + 2 2 = 2 + 2, 2 ๐ฅ๐ฅ 2๐ฅ๐ฅ =โ 2๐ฆ๐ฆ ๐ฅ๐ฅ ๐ฆ๐ฆ 2 โ ๐ฅ๐ฅ= ๐ฆ๐ฆ Definition3.2: A setโ o ๐ฅ๐ฅ f p oints๐ฆ๐ฆ Y i s cal led , i f it r emains poi sed unde r small pe rturbations. F or e xample, i f = 2, s๐ค๐ค ix๐ค๐ค๐ค๐ค๐ค๐ค poiโ nts๐๐๐๐๐๐๐๐๐๐๐๐ a lmost on a l ine m ay define a poised set . However, si nce so me s mall๐๐ pe rturbation of t he poi nts m ight m ake t hem aligned, it is not a well-poised set. As we mentioned before, a set of points is poised if ( ) is nonsingular with respect to
the space of quadratic polynomials. If we look for an๐๐ understanding๐๐ of this by the fact that an interpolation polynomial exists and is unique if and only if ( ) is square and
nonsingular, then we conclude: ๐๐ ๐๐
is poised if the determinant of ( ) is nonvanishing, that is, 1 1 ๐๐ 1( ) (๐๐ )๐๐ ( ) = = ( ) 0 (3.4) ๐๐ ๐ฆ๐ฆ โฏ ๐๐๐๐ ๐ฆ๐ฆ 1( ) ( ) ๐ฟ๐ฟ ๐ฆ๐ฆ ๐๐๐๐๐๐ ๏ฟฝ โฎ๐๐ โฑ โฎ ๐๐ ๏ฟฝ ๐๐๐๐๐๐๐๐ ๐๐ โ ๐๐ ๐ฆ๐ฆ โฏ ๐๐๐๐ ๐ฆ๐ฆ
58
The measure of poisedness in a DFO algorithm can be explained by a methodology based on Newton fundamental polynomials. In DFO, the approach of handling the poisedness in combination with the Newton fundamental polynomials is a distinctive issue. This is so, because it a llows us no t onl y to c hoose a goo d i nterpolation s et f rom a gi ven s et of sample points but also to find a new sample point which improves the poisedness of the interpolation set. If w e ha d no s uch a us eful t ool, t hen, removing a poi nt f rom t he s et would have caused the conditioning of the coefficient matrix to get worse in the updating step for the interpolation set of DFO. There is also a detailed work on DFO in which the quadratic a pproximation m odel i s de termined by L agrange i nterpolation pol ynomials instead of the Newton fundamental polynomials Let us focus on the Newton fundamental points: The points y in our interpolation Set = { 1, 2,โฆ, } is a subset of are organized into + 1 blocks, where ๐๐ ๐๐ + 1 [ ]๐๐ ( =๐ฆ๐ฆ 1, 2,๐ฆ๐ฆ โฆ , )๐ฆ๐ฆ is the๐ค๐คโ๐๐ ๐๐โ block, containingโ [ ] = ๐๐ points. ๐๐ ๐ก๐กโ ๐๐ ๐๐ ๐๐ โ Definition3.3:๐๐ ๐๐ A ๐๐single Newton๐๐ fundamental polynomial๏ฟฝ๐๐ ๏ฟฝ ๏ฟฝ of degree๏ฟฝ corresponds to each ๐๐ [ ] [ ] point ( ) satisfying the following conditions: ๐๐ [ ๐๐ ] ๐๐ ๐๐ [ ] ( )๐ฆ๐ฆ =โ ๐๐ for all ( ) with {0,1,2, โฆ , }. ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐Here๐๐ ๐ฆ๐ฆ we refer๐ฟ๐ฟ to๐๐๐๐ ๐ฟ๐ฟKroneckerโs๐๐๐๐ ๐ฆ๐ฆ symbol for๐๐ , = โ 0,1,2, โฆ ,๐๐ : 1, = = ๐๐ ๐๐ ๐๐ 0, . ๐๐๐๐ ๐๐ ๐๐ Consider t he s et of i nterpolation ๐ฟ๐ฟp๐๐๐๐ oints๏ฟฝ be ing pa rtitioned i nto t hree di sjoint bl ocks ๐๐๐๐๐๐๐๐ [0], [1], [2] which correspond to the constant term, the linear terms and the quadratic [0] [1] ๐๐terms๐๐ of a๐๐ quadratic polynomial, respectively. Hence has a single element, has ( +1) elements and [2] has elements. The basis { (. )} of NFP is also partitioned into 2 ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ three blocks {๐๐ 0(. )}, { 1(. )},{ 2(. )} with the approximate๐๐๐๐ number of elements in each 0 block. T he unique๐๐๐๐ e lement๐๐๐๐ of ๐๐{ ๐๐ (. )} is a pol ynomial of de gree zero. E ach of t he n ( +1) elements { 1(. )} is a polynomial of๐๐ degree one and, finally, each of the elements ๐๐ 2 ๐๐ ๐๐ of { 2(. )}๐๐ is๐๐ a polynomial of degree two.
The๐๐ basis๐๐ elements and the interpolation points are set in one to one correspondence, so that t he poi nts f rom bl ock [ ] correspond t o the pol ynomials f rom bl ock { 1(. )}. a ๐๐ ๐๐ 59 ๐๐๐๐
Newton Fundamental Polynomial (NFP) (. ) and a point are in the correspondence ๐๐ with e ach other i f a nd only i f t he value of๐๐๐๐ t hat pol ynomial๐ฆ๐ฆ a t that point i s one a nd i ts value a t a ny ot her poi nt i n the s ame bl ock or i n a ny p revious bl ock i s z ero. In ot her words, if corresponds to , then = 1 and = 0 for all other indices ๐๐ ๐๐ ๐๐ Example3.3๐ฆ๐ฆ : consider the quadratic๐๐๐๐ ๐๐interpolation๐๐๏ฟฝ๐ฆ๐ฆ ๏ฟฝ on๐๐ a๐๐ ๏ฟฝ๐ฆ๐ฆplane.๏ฟฝ We require six interpolation๐๐ points using three blocks. [0] = {(0,0)}, [1] = {(1,0), (0,1), } [2] = {(2,0), (1,1), (0,2)} 2 ๐๐ Corresponding๐๐ t o t he i nitial b asis๐๐ f๐๐๐๐ unctions;๐๐ 1, 1, 2, 1 2, 2 respectively. Applying some procedures we find the NFP: ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐๐๐๐๐๐ ๐ฅ๐ฅ 2 2 0 = 1, 1 = , 1 = , 2 = 1 1 , 2 = 2 = 2 2 1 1 1 2 2 1 2 2 1 2 3 2 ๐ฅ๐ฅ โ๐ฅ๐ฅ ๐ฅ๐ฅ โ๐ฅ๐ฅ Algorithm๐๐ ๐๐ (derivative๐ฅ๐ฅ ๐๐ free๐ฅ๐ฅ trust๐๐ region method)๐๐ ๐ฅ๐ฅ ๐ฅ๐ฅ ๐๐๐๐๐๐ ๐๐ The steps of derivative free trust region methods are given as follows [6] Step 0: initializations Let a starting point and the value of ( ) be given.
Choose an initial trust๐ฅ๐ฅ๐ ๐ region radius 0๐๐>๐ฅ๐ฅ 0.๐ ๐ Choose at least one additional point โnot further than 0> 0 away from to create an initial w ell-poised interpolation s et and initial โb asis of N ewton๐ฅ๐ฅ ๐ ๐ f undamental polynomials. ๐๐ Determine 0 Y which has the best objective function value; i.e. 0 solves the problem min ( ) ๐ฅ๐ฅ โ . ๐ฅ๐ฅ
Set = 0, ch oose p arameters 0, 1,๐ ๐ ๐ก๐ก ๐ฅ๐ฅโ๐๐ ๐๐ 0๐ฅ๐ฅ < 0 < 1 < 1 0 < 0 1 < 1
2 ๐๐ ๐๐ ๐๐ ๐ค๐คโ๐๐๐๐๐๐ ๐๐ ๐๐ ๐๐๐๐๐๐ ๐พ๐พ โค ๐พ๐พ โค ๐พ๐พStep 1: build the model using the interpolation set Y and basis of NFP, build a quadratic interpolation polynomial ( ).
Step 2: minimize the model๐๐๐๐ with๐ฅ๐ฅ in the trust region. Set = { : }. compute the point such that ๐๐ ( ) ( ) ๐ฝ๐ฝ๐๐ ๐ฅ๐ฅ โโ โ๐ฅ๐ฅ๐๐ โ๐ฅ๐ฅ โ โค โ๐๐ = min ๐ฅ๐ฅ๏ฟฝ. ๐๐ ๐๐ ๐๐ ๐๐ Compute ( ) and the ratio ๐๐ ๐ฅ๐ฅ๏ฟฝ ๐ฅ๐ฅโ๐ฝ๐ฝ๐๐ ๐๐ ๐ฅ๐ฅ
๐๐ ๐ฅ๐ฅ๏ฟฝ๐๐ 60
( ) ( ) = . ( ) ( ) ๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐ ๐ฅ๐ฅ๏ฟฝ๐๐ ๐๐๐๐ Step3: update the interpolation set ๐๐๐๐ ๐ฅ๐ฅ๐๐ โ ๐๐๐๐ ๐ฅ๐ฅ๏ฟฝ๐๐
โข If 0, Include in , dropping one of the existing interpolation points if
necessary.๐๐๐๐ โฅ ๐๐ ๐ฅ๐ฅ๏ฟฝ๐๐ ๐๐
โข If < 0, include in , if it improves the quality of the model
โข If ๐๐๐๐ < ๐๐ 0 and t here๐ฅ๐ฅ๏ฟฝ ๐๐ar e l๐๐ ess t hat + 1 points in t he i ntersection of and , generate๐๐๐๐ ๐๐ ne w i nterpolation poi nt ๐๐ in , w hile preserving/ i mpronving๐๐ w ell๐ฝ๐ฝ๐๐-
poissedness. ๐ฝ๐ฝ๐๐ โข Update the basis of the Newton Fundamental polynomials. Step 4: update the trust region radius.
โข If 1, increase the trust region radius [ ] ๐๐๐๐ โฅ ๐๐ +1 , 2 . โข If < 0 and the cardinalityโ ๐๐of โ โ๐๐ ๐พ๐พ wasโ๐๐ less than + 1 when was ๐๐ ๐๐ ๐๐ computed,๐๐ ๐๐ reduce the trust region radius๐๐ โฉ๐ฝ๐ฝ ๐๐ ๐ฅ๐ฅ๏ฟฝ +1 [ 0 , 1 ]
โ๐๐ โ ๐พ๐พ โ๐๐ ๐พ๐พ โ๐๐ +1=
Step 5: update the current iterate. ๐๐๐๐โ๐๐๐๐๐๐๐๐๐๐๐๐ ๐ ๐ ๐ ๐ ๐ ๐ โ๐๐ โ๐๐ Determine with b est objective function value b y so lving t he d iscrete
problem ๐ฅ๐ฅฬ ๐๐ min ๐๐ ๐๐ ๐ฆ๐ฆ โ๐๐ ๐๐๏ฟฝ๐ฆ๐ฆ ๏ฟฝ ๐๐ If improvement is sufficient (in the๐ฆ๐ฆ senseโ ๐ฅ๐ฅ๐๐ of prediction) that is ( ) ( ) ( ) ( ) Then we put = . Set = . otherwise ( ) ( ) 0 ( ) ( ) +1 ๐๐ ๐ฅ๐ฅ๐๐ โ๐๐ ๐ฅ๐ฅฬ ๐๐ ๐๐ ๐ฅ๐ฅ๐๐ โ๐๐ ๐ฅ๐ฅฬ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ ๐๐ +1๐ฅ๐ฅ =โ๐๐ ๐ฅ๐ฅ๏ฟฝ. increaseโฅ ๐๐ k by one, and๐๐ฬ go to๐๐ step๐ฅ๐ฅ โ๐๐ one.๐ฅ๐ฅ๏ฟฝ ๐ฅ๐ฅ ๐ฅ๐ฅฬ
๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ๐๐
61
Appendix (MATLAB codes) 1. Code for Golden search
function [xmin,fmin] = goldSearch(f,a,b)
% Golden section search for the minimum of f(x).
% The minimum point must be bracketed in a <= x <= b.
% usage: [fmin,xmin] = goldsearch(func,xstart,h)
% input:
% f = handle of function that returns f(x).
% a, b = limits of the interval containing the minimum.
% output:
% fmin = minimum value of f(x).
% xmin = value of x at the minimum point.
N=20;% N is the number of function evaluations done. c = (-1+sqrt(5))/2; x1 = c*a + (1-c)*b; f1 = feval(f,x1); x2 = (1-c)*a + c*b; f2 = feval(f,x2);
%fprintf('------\n');
fprintf(' x1 x2 f(x1) f(x2) b - a \n');
fprintf('------\n');
62
fprintf('%.4e %.4e %.4e %.4e %.4e\n', x1, x2, f1, f2, b-a);
% Main loop for i = 1:N-2
if f1 < f2
b = x2;
x2 = x1;
f2 = f1;
x1 = c*a + (1-c)*b;
f1 = feval(f,x1);
else
a = x1;
x1 = x2;
f1 = f2;
x2 = (1-c)*a + c*b;
f2 = feval(f,x2);
end;
fprintf('%.4e %.4e %.4e %.4e %.4e\n', x1, x2, f1, f2, b-a);
end
if (abs(b-a) < eps)
fprintf('succeeded after %d steps\n', i); 63
return;
end;
if f1 < f2; fmin = f1; xmin = x1;
else
fmin = f2; xmin = x2;
end
2. Code for Powell method
The algorithm for Powellโs method is listed below. It utilizes two arrays: df contains the decreases of the merit function in the first n moves of a cycle, and the matrix u stores the corresponding direction vectors (one vector per column).
๐๐ To i mplement t his algorithm w๐๐ e us e t he gol d bracket a nd t he go ld search algorithms together with it.
i. Gold bracket
function [a,b] = goldBracket(fun,x1,h)
% Brackets the minimum point of f(x).
% USAGE: [a,b] = goldBracket(func,xStart,h)
% INPUT:
% func = handle of function that returns f(x).
% x1 = starting value of x.
% h = initial step size used in search.
% OUTPUT:
% a, b = limits on x at the minimum point. 64
c = 1.618033989; f1 = feval(fun,x1); x2 = x1 + h; f2 = feval(fun,x2);
% Determine downhill direction & change sign of h if needed. if f2 > f1
h = -h;
x2 = x1 + h; f2 = feval(fun,x2);
% Check if minimum is between x1 - h and x1 + h
if f2 >f1
a = x2; b = x1 - h; return
end end
% Search loop for i = 1:50
h = c*h;
x3 = x2 + h; f3 = feval(fun,x3);
if f3 >f2
a = x1;
b = x3;
return
end 65
x1 = x2; x2 = x3; f2 = f3;
end
error('goldbracket did not find minimum')
ii. Golden search
function [xmin,fmin] = goldensearch(f,a,b)
% Golden section search for the minimum of f(x).
% The minimum point must be bracketed in a <= x <= b.
% usage: [fmin,xmin] = goldsearch(func,xstart,h)
% input:
% f = handle of function that returns f(x).
% a, b = limits of the interval containing the minimum.
% output:
% fmin = minimum value of f(x).
% xmin = value of x at the minimum point.
%eps = 1.0e-3;
N = 20; % N is the number of function evaluations done. c = (-1+sqrt(5))/2;
x1 = c*a + (1-c)*b;
f1 = feval(f,x1);
x2 = (1-c)*a + c*b;
f2 = feval(f,x2); 66
% Main loop for i = 1:N-2
if f1 < f2
b = x2;
x2 = x1;
f2 = f1;
x1 = c*a + (1-c)*b;
f1 = feval(f,x1);
else
a = x1;
x1 = x2;
f1 = f2;
x2 = (1-c)*a + c*b;
f2 = feval(f,x2);
end;
end
if f1 < f2
fmin = f1; xmin = x1; else
fmin = f2; xmin = x2; end 67
iii. function powell clc global x V x=[0;0]; tol = 1.0e-4;h = 0.5; if size(x,2) > 1; x = x'; end % x must be column vector n = length(x); % Number of design variables df = zeros(n,1); % Decreases of f stored here u = eye(n); % Columns of u store search directions V fprintf(' xmin fmin \n'); fprintf(' ------\n'); for j = 1:30 % Allow up to 30 cycles
xOld = x;
fOld = feval(@myfun2,xOld);
% First n line searches record the decrease of f
for i = 1:n
V = u(1:n,i);
[a,b] = goldBracket(@fLine,0.0,h);
[s,fmin] = goldensearch(@fLine,a,b);
df(i) = fOld - fmin;
fOld = fmin; 68
x = x + s*V;
end
% Last line search in the cycle
V = x - xOld;
[a,b] = goldBracket(@fLine,0.0,h);
[s,fmin] = goldensearch(@fLine,a,b); x = x + s*V; fprintf(' (%3.3f,%3.3f) %2.15f \n',x, fmin); if sqrt(dot(x-xOld,x-xOld)/n) < tol
y = x; break end
% Identify biggest decrease of f & update search
% directions
iMax = 1; dfMax = df(1);
for i = 2:n
if df(i) > dfMax
iMax = i; dfMax = df(i);
end
end
for i = iMax:n-1
u(1:n,i) = u(1:n,i+1); 69
end
u(1:n,n) = V;
end
fprintf('The minimum point (%2.3f,%2.3f)is reached at the %3dth cycle.\n',y,j)
function z = fLine(s) % f in the search direction V
global x V
z = feval(@myfun2,x+s*V);
for the example on page 55 we have the following. function y = myfun2(x)
%
% y = x(1)-x(2)+2*(x(1)).^2+2*x(1)*x(2)+(x(2)).^2;
70
Reference
[1] Basak Aktek, Derivative Free Optimization Methods: Application in Stirrer Configuration and data clustering, M.Sc Thesis, the Middle East Technical Universitye, July, 2005.
[2] Igor Griva, Stephen G.Nash, Ariela Sofer, Linear and Nonlinear Optimization second edition George Mason University Fairfax, Virginia 2009.
[3] Jaan Kiusalaas, Numerical Methods in engineering with matlab, Second Edition, Cambride university printing press, 2010. [4] Jorge J. More and Stefan M.Wild: Benchmarking Derivative Free Optimization Algorithms. Preprint ANL/MCS-P1471-1207 December 2007. [5] Jorge Necedah and Stephen J. W. Wright: Numerical methods in optimization. Springer-Verlag New York, Inc. 1999. 2 Edition. ๐๐๐๐ [6] Katya Sheinberg, Derivative Free Optimization method, CS 4/6-TE3, SEW ENG 4/6- TE3, Tamas Terlaky,IBM Watson Research Center
[7] Melissa Weber Men donรงa, M ultilevel O ptimization: C onvergence T heory, Algorithms a nd A pplication to D erivative-Free O ptimization, Ph.D T hesis, Facultรฉs Universitaires Notre-Dame de la Paix Facultรฉ des Sciences rue de Bruxelles, 61, B-5000 Namur, Belgium, 2009
[8] Mokhtar S. Bazaraa, Hanit D. Sherali C.M. Shetty; Nonlinear programming: Theory and algorithms, 2nd edition.
[9] Singiresu S. Rao Engineering Optimization Theory and Practice, by John Wiley & Sons, Fourth Edition, Copyright, 2009. [10] Wenyu Sun, Optimization Theory and Methods Nonlinear Programming, Nanjing Normal U niversity, N anjing, C hina YA-XIANG Y UAN Chinese A cademy o f S cience, Beijing, China Springer Science+Business Media, LLC, 2006.
71