AAddddiiss AAbbaabbaa UUnniivveerrssiittyy

SScchhooooll ooff GGrraadduuaattee SSttuuddiieess Department of Mathematics

A Graduate project report

On

Derivative Free Optimization

By: Teklebirhan Abraha

A Project submitted to the School of Graduate Studies of Addis Ababa University in Partial fulfillment of the requirements for the Degree of Master of Science in Mathematics

Advisor: Semu Mitku (Ph.D)

Addis Ababa, Ethiopia

January, 2011

Acknowledgment With much pleasure, I take the opportunity to thank all the people who have helped me through the course of my graduate study.

First and foremost I would like to thank my advisor and instructor, Dr. Semu Mitku, for his in valuable guidance, a dvice, e ncouragement a nd pr oviding m e pl enty of m aterials related to my project work, without which this well organized (compiled) project would have be en i mpossible. H is c onstant c onfidence, pe rsistent que stioning a nd de ep knowledge of the subject matter have been and will always be inspiring to me. Finally I would like to express my gratitude to all my; families, friends, and those who supported me in any means to complete this project work.

T/Birhan Abraha

A.A.U

January, 2011

i Abstract We w ill p resent d erivative f ree algorithms w hich opt imize non-linear unconstrained optimization problems of the following kind: min ( ) , : ๐‘›๐‘› ๐‘›๐‘› The algorithms developed for๐‘ฅ๐‘ฅโˆˆโ„ this ๐‘“๐‘“type๐‘ฅ๐‘ฅ of๐‘ค๐‘คโ„Ž๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ problems๐‘“๐‘“ โ„ areโ†’ categorized โ„ as one-dimensional search ( golden s ection and F ibonacci) m ethods a nd multidimensional s earch m ethods (Powellโ€™s method and trust region). These algorithms will, hopefully, find the value of for which is the lowest. ๐‘ฅ๐‘ฅ The dimension๐‘“๐‘“ n of the search space must be lower than some number (say 100). We do NOT have to know the derivatives of ( ). We must only have a code which evaluates

( ) for a given value of . Each component๐‘“๐‘“ ๐‘ฅ๐‘ฅ of the vector must be a continuous real ๐‘“๐‘“parameter๐‘ฅ๐‘ฅ of ( ). ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ

ii Table of contents Introduction ...... 1 Chapter 1 Preliminary concepts ...... 4 1.1 Nonlinear optimization methods ...... 4 Need for derivative free optimization ...... 4 1.2 Overview of differentiable nonlinear unconstrained optimization methods ...... 5 1.2.1 The problem ...... 5 . .2. Solution concepts ...... 6

๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ. Basic concepts of the methods ...... 6 ๐Ÿ๐Ÿ. ๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘ Necessary conditions for solutions of differentiable optimization problems ...... 8 1.3๐Ÿ๐Ÿ Methods๐Ÿ๐Ÿ ๐Ÿ’๐Ÿ’ of solving nonlinear differentiable optimization problems ...... 9 1.3.1 (first order derivative method) ...... 9 . . Newtonโ€™s method (second order derivative method) ...... 12

๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘. ๐Ÿ๐Ÿ methods ...... 14 ๐Ÿ๐Ÿ. ๐Ÿ‘๐Ÿ‘. ๐Ÿ‘๐Ÿ‘ Trust region methods ...... 20 Chapter๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘ 2๐Ÿ’๐Ÿ’ Derivative Free Optimization Methods ...... 23 2.1 What is derivative free optimization ...... 23 2.2 line search methods ...... 24 2.2.3 Interpolation Methods ...... 36 2.3. Multidimensional search...... 42 2.3.1 Unvariate Method ...... 42 2.3.2 Pattern Directions ...... 45 2.3.4 POWELLโ€™S METHOD ...... 46 Chapter 3 Trust region methods ...... 55 3.1Trust region frame work ...... 55 3.2 Quadratic interpolation ...... 56 Appendix(MATLAB codes) ...... 62 References ...... 71

Declaration Letter

I, Teklebirhan Abraha, declare that this project has been composed by me and that no part of

the project has formed the basis for the award of any Degree, Diploma, Associate ship,

Fellowship or any other similar title to me.

T/Birhan Abraha

______

Addis Ababa University

January, 2011

iii

Permission Letter

This is to certify that this project is compiled by Mr. Teklebirhan Abraha in the department of

Mathematics, College of Mathematics and Computational Sciences, Addis Ababa University, under my supervision.

Semu Mitku (Ph.D)

______

Addis Ababa University

January, 2011

iv Introduction In the process of solving optimization problems, it is well known that expensive useful information i s c ontained i n t he de rivatives of t he ob jective f unction one w ishes t o optimize. After all the standard mathematical characterization of a local minimum given by the first order necessary conditions, requires a continuous differentiability of functions that th e f irst o rder d erivatives a re zero. H owever, f or a v ariety o f re asons th ere have always b een many i nstances w here ( at l east so me) d erivatives ar e u navailable o r unreliable. Nevertheless, under such circumstances, it may still be desirable to carry out optimization [1].

Consequently c lasses of nonl inear opt imization t echniques c alled de rivative-free optimization m ethods a re ne eded. I n f act w e c onsider opt imization t echniques w ithout derivatives as one of the most important and challenging areas in computational science and e ngineering. D erivative f ree op timization ( DFO) i s de veloped to date f or s olving small di mensional p roblems(less t han 100 va riables) in which t he c omputation of a n objective function is relatively expensive and the derivatives of the objective functions are not a vailable. P roblems o f t his na ture m ore a nd m ore a rise i n m odern phy sical, chemical and econometric measurements and in engineering applications where computer simulation is employed for the evaluation of the objective function [6].

There are two important components of derivative free methods, sampling better points in the iteration procedure is the first one of these components. The other one is searching appropriate subspaces where the chance of finding a minimum is relatively high. In order to be able to use the extensive convergence theory for derivative based methods, these derivative f ree methods ne ed t o s atisfy s ome pr operties. F or i nstance, to gua rantee t he convergence of a derivative free method, we need to ensure that the error in the gradient converges to zero when the trust region or line search steps are reduced. Hence a descent step will be found if the gradient of the trust function is not zero at the current iterate.

1

The problem of minimizing a nonlinear function : of several variables when the ๐‘›๐‘› derivatives of the function are not available is attempted๐‘“๐‘“ โ„ โ†’ โ„to be solved by the derivative free methods (DFM). The formal statement for the above problem can be written as

min ( )

( ) s.t ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ( ) ( = 1,2 โ€ฆ , ) ( )

๐‘–๐‘– ๐‘–๐‘– ๐‘–๐‘– ๐‘ƒ๐‘ƒ ๐‘Ž๐‘Ž โ‰ค ๐‘”๐‘” ๐‘ฅ๐‘ฅ โ‰ค ๐‘๐‘ ๐‘–๐‘– ๐‘š๐‘š โˆ— ๐‘›๐‘› Where ( )cannot ๐‘ฅ๐‘ฅbe computed โˆˆ ๐‘‹๐‘‹โŠ†โ„ or just does not exist for every . Here is an arbitrary subset ofโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ , is a n e asy constraint. While th e f unctions๐‘ฅ๐‘ฅ ( ) (๐‘ฅ๐‘ฅ = 1, 2, โ€ฆ , ) ๐‘›๐‘› represent diโ„ fficult๐‘ฅ๐‘ฅ c onstraints. โˆˆ๐‘‹๐‘‹ B y e asy c onstraints w e m ean bound๐‘”๐‘”๐‘–๐‘– ๐‘ฅ๐‘ฅc onstraints๐‘–๐‘– on t๐‘š๐‘š he variables, linear constraints or more generally nonlinear smooth constraints whose values and the Jacobi matrix can be computed cheaply(easily).

Difficult constraints are only nonlinear constraints whose value is expensive (difficult) to compute and whose derivatives are unavailable for a small repetition and preparation, we include some basic concepts of differentiable optimization in this project.

If the problem function in ( ) above is differentiable then 0 is local minimizer of ( ) 2 and hence โˆ‡ ( 0) = 0 andโˆ— if ( 0) = 0 and ( 0) is๐‘ฅ๐‘ฅ positive definite then 0๐‘ƒ๐‘ƒ is a lo cal m inimizer๐‘“๐‘“ ๐‘ฅ๐‘ฅ o f . B ut i n mโˆ‡๐‘“๐‘“ ost๐‘ฅ๐‘ฅ of t he c asesโˆ‡ f๐‘“๐‘“ inding๐‘ฅ๐‘ฅ a lgebraically a poi nt ๐‘ฅ๐‘ฅ0 2 โŠ† such that (๐‘“๐‘“ ) = 0 may be difficult. Or computing ( ) or ( )P may๐‘ฅ๐‘ฅ beโˆˆ ๐‘›๐‘› expensive๐‘‹๐‘‹ โ„ as the functionโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ cannot (may not) be defined explicitly โˆ‡๐‘“๐‘“or even๐‘ฅ๐‘ฅ theโˆ‡ ๐‘“๐‘“function๐‘ฅ๐‘ฅ may not be differentiable at all. In this case numerical methods with derivative free algorithms are required. The methods such as line search and trust region methods will be discussed in this project. Because the line search without derivatives and the trust-region methods algorithms are used to solve Optimization problems without derivatives.

The f irst c hapter o f th is p roject d eals w ith n onlinear o ptimization p roblems a nd t he methods of solving these problems. The second and the third chapters are on de rivative free opt imization m ethods. P articularly in the second c hapter w e i nclude l ine search methods w hich he lp us t o s olve o ne di mensional m inimization p roblems such as t he 2

golden s ection m ethod a nd t he Fibonacci method, a nd be st kno wn methods f or minimization o f multidimensional sear ch m ethods l ike P owellโ€™s m ethod ar e d iscussed well. Finally in the last chapter we include the trust region method which is one of the methods for derivative free optimization methods.

3

Chapter 1 Preliminary concepts

1.1 Nonlinear optimization methods Optimization methods can be classified as d erivative based and derivative free methods depending on their use of derivative (or absence of it) in the process of finding a solution.

Derivative based o ptimization m ethods ar e ch aracterized b y: E xplicitly u ses th e derivatives of the objective function, analytical solution is possible, and convergence is faster. A nd d erivative free o ptimization m ethods ar e ch aracterized b y:only o bjective function evaluation, not require derivative information, and handle noisy functions(since methods only rely on function comparisons)

Need for derivative free optimization Some of the reasons to apply derivative free optimization (DFO) methods are: Growing sophistication of computer hard ware and mathematical algorithms and software(which opens ne w pos sibilities f or opt imization), D erivative e valuations c osty a nd noi sy(one canโ€™t t rust d erivatives o r ap proximate t hem b y f inite d ifference), B inary co des ( source codes not available or owned by a company) making automatic differentiation impossible to a pply, L egacy c odes(written i n the pa st a nd not m aintained by t he original a uthor), Lack of sophistication of the user(the user need improvement but want to use something simple)

With th e c urrent s tate o f th e ar t D FO methods w e can su ccessfully address p roblems where:

. The evaluation of derivatives is expensive and/or computed with noise (and for which accurate finite difference derivative estimation is ruled out). . The number of variables do not exceed ,say , a hundred(in serial computation) . The functions are not excessively non smooth. . Rapid assumption convergence is not of primary importance . Only a few digits of accuracy are required

4

It i s ha rd to m inimize non c onvex f unctions w ithout de rivatives, how ever, i t is generally a ccepted that D FO methods ha ve t he a bility t o f ind โ€œ goodโ€ l ocal optimization

1.2 Overview of differentiable nonlinear unconstrained optimization methods

1.2.1 The problem Optimization pr oblems c an be di vided i nto two l arge classes, na mely constrained an d unconstrained problems. The basic unconstrained optimization problem can be stated in its standard form as

( ), (1.1) ๐‘›๐‘› Where๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐‘š๐‘š: ๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐‘“๐‘“ ๐‘ฅ๐‘ฅ is๐‘ ๐‘ ๐‘ ๐‘  the๐‘ ๐‘ ๐‘ ๐‘  objective๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  ๐‘ก๐‘ก๐‘ก๐‘ก ๐‘ฅ๐‘ฅ๐‘ฅ๐‘ฅโ„ function. On the other hand, constrained optimization ๐‘›๐‘› problems๐‘“๐‘“ canโ„ โ†’be โ„written as

( ) (1.2 )

๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐‘“๐‘“ ๐‘ฅ๐‘ฅ , (1.2๐‘Ž๐‘Ž) ๐‘›๐‘› ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ (๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ) ๐‘ก๐‘ก๐‘ก๐‘ก0 ๐‘ฅ๐‘ฅ๐‘ฅ๐‘ฅโ„ (1.2 )๐‘๐‘

๐‘–๐‘– ๐‘”๐‘”( ๐‘ฅ๐‘ฅ) =โ‰ค 0 ๐‘–๐‘– โˆˆ๐ผ๐ผ (1.2๐‘๐‘ )

๐‘–๐‘– conditions (1.2 1.2 )๐‘”๐‘” indicate๐‘ฅ๐‘ฅ t he๐‘–๐‘– co nstraints. โˆˆโ„ฐ T he d isjoint๐‘‘๐‘‘ i ndex sets

correspond to t๐‘๐‘ he i โˆ’ nequality๐‘‘๐‘‘ a nd e quality c onstraints, respectively, defined ๐ผ๐ผby๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž t heโ„ฐ functions : , . the set is contained in and is also contained in the ๐‘›๐‘› ๐‘›๐‘› domain of๐‘”๐‘” ๐‘–๐‘– andโ„ โ†’, โ„ ๐‘–๐‘–. โˆˆ ๐ผ๐ผโ‹ƒโ„ฐ ๐‘‹๐‘‹ โ„ ๐‘–๐‘– A poi nt ๐‘“๐‘“ is said๐‘”๐‘” ๐‘–๐‘– t o b e โˆˆ f ๐ผ๐ผโ‹ƒโ„ฐ easible i f i t sat isfies a ll t he co nstraints. A nd t he set o f a ll

feasible points๐‘ฅ๐‘ฅ๐‘ฅ๐‘ฅ๐‘ฅ๐‘ฅ is called the feasible set, and denoted by .

The formulations (1.1) and (1.2) are called standard formulationsโ„ฑ due to the observation that

max ( ) = min( ( ))

๐‘“๐‘“ ๐‘ฅ๐‘ฅ โˆ’ โˆ’๐‘“๐‘“ ๐‘ฅ๐‘ฅ 5

1.2.2. Solution concepts The solution of an optimization problem can be characterized by c ertain properties. In a minimization problem, if we are looking for a point of such that โˆ— ( ) ( )for all , ๐‘ฅ๐‘ฅ ๐‘–๐‘–๐‘–๐‘– ๐‘ก๐‘กโ„Ž๐‘’๐‘’ ๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘ ๐”‡๐”‡ ๐‘“๐‘“ โˆ— ๐‘“๐‘“Then๐‘ฅ๐‘ฅ โ‰ค ๐‘“๐‘“is๐‘ฅ๐‘ฅ cal led a๐‘ฅ๐‘ฅ g lobal โˆˆ๐”‡๐”‡ m inimizer, w here as ( ) is t he gl obal m inimum o f . โˆ— โˆ— Similarly๐‘ฅ๐‘ฅ in, a constrained problem, the solution must๐‘“๐‘“ lie๐‘ฅ๐‘ฅ in the feasible set , and thus,๐‘“๐‘“ a global constrained minimizer satisfies โ„ฑ โˆ— ( ) ( ) for๐‘ฅ๐‘ฅ all . โˆ— However, in both๐‘“๐‘“ ๐‘ฅ๐‘ฅ cases,โ‰ค ๐‘“๐‘“finding๐‘ฅ๐‘ฅ a global๐‘ฅ๐‘ฅ minimizer โˆˆโ„ฑ of a function can prove to be very

difficult in pr actice. I t m ight be i nteresting, t hus, to look f or๐‘“๐‘“ a solution in a โˆ— neighborhood of , such that ๐‘ฅ๐‘ฅ โˆ— โ„ต ๐‘ฅ๐‘ฅ โ„ต โŠ†( ๐”‡๐”‡ ) ( ), for all (feasible) in (1.3) โˆ— The point is then called๐‘“๐‘“ a ๐‘ฅ๐‘ฅlocalโ‰ค minimizer ๐‘“๐‘“ ๐‘ฅ๐‘ฅ where as ( ๐‘ฅ๐‘ฅ ) isโ„ต a local minimum of in โˆ— โˆ— . If is๐‘ฅ๐‘ฅ such that ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘“๐‘“ โˆ— โ„ต ๐‘ฅ๐‘ฅ ( ) < ( ), for all (feasible) (1.4) โˆ— Then is said to๐‘“๐‘“ be๐‘ฅ๐‘ฅ a st๐‘“๐‘“ rict๐‘ฅ๐‘ฅ local minimizer, and๐‘ฅ๐‘ฅ โˆˆ โ„ต( ) is strict local minimum of โˆ— โˆ— in . ๐‘ฅ๐‘ฅ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘“๐‘“

โ„ต . . Basic concepts of the methods Derivatives and Taylorโ€™s Theorem ๐Ÿ๐Ÿ ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘ All methods considered here are based on the fact that the function to be minimized has derivatives. If is differentiable and all derivatives of are continuous with respect to , 1 then we say that๐‘“๐‘“ is continuously differentiable and is ๐‘“๐‘“denoted by . ๐‘ฅ๐‘ฅ

If all the second๐‘“๐‘“ partial derivatives of exist, then is said to be๐‘“๐‘“ twice โˆˆ๐ถ๐ถ differentiable. If further m ore a ll s econd pa rtial de rivatives๐‘“๐‘“ of f a re๐‘“๐‘“ c ontinuous w e s ay t hat is t wice 2 continuously differentiable and is denoted by . ๐‘“๐‘“ 6๐‘“๐‘“ โˆˆ๐ถ๐ถ

The gradient of is a vector that groups all its partial derivatives and is denoted by

๐‘“๐‘“ ( )

1 ๐œ•๐œ•๐œ•๐œ• ๐‘ฅ๐‘ฅ โŽ› ๐œ•๐œ•๐‘ฅ๐‘ฅ( )โŽž โŽœ โ‹ฎ โŽŸ ๐œ•๐œ•๐œ•๐œ• ๐‘ฅ๐‘ฅ ๐‘›๐‘› โŽ ๐œ•๐œ•๐‘ฅ๐‘ฅ 2 โŽ ( ) 2 ( ) 2 ๐œ•๐œ• ๐‘“๐‘“ 1๐‘ฅ๐‘ฅ ๐œ•๐œ• 1๐‘“๐‘“ ๐‘ฅ๐‘ฅ ร— 2 ( ) = The defined a s ๐œ•๐œ•๐‘ฅ๐‘ฅ ๐œ•๐œ•๐‘ฅ๐‘ฅ ๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘›๐‘› is ca lled the H essian 2 ( ) โ‹ฏ 2 ( ) โŽ› 2 โŽž ๐‘›๐‘› ๐‘›๐‘› ๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โŽœ ๐œ•๐œ• ๐‘“๐‘“โ‹ฎ๐‘ฅ๐‘ฅ 1 โ‹ฑ๐œ•๐œ• ๐‘“๐‘“ โ‹ฎ ๐‘ฅ๐‘ฅ โŽŸ matrix of . ๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘›๐‘› ๐œ•๐œ•๐‘ฅ๐‘ฅ โ‹ฏ ๐‘ฅ๐‘ฅ๐‘›๐‘› โŽ โŽ  The curvature๐‘“๐‘“ of at along a direction is given by ๐‘›๐‘› ๐‘›๐‘› ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘ฅ๐‘ฅโ„ , 2f(x)d๐‘‘๐‘‘๐‘‘๐‘‘โ„

โŒฉ๐‘‘๐‘‘ โˆ‡ โŒช If 2 ( ) is a positive-semi definite for everyโ€–๐‘‘๐‘‘โ€– in the domain of .we say that is a 2 ( ) convexโˆ‡ ๐‘“๐‘“ function.๐‘ฅ๐‘ฅ I f is positive d efinite ๐‘ฅ๐‘ฅin i ts domain w e ๐‘“๐‘“ say th at is s trictly๐‘“๐‘“ convex. โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘“๐‘“

Theorem1.1 (Taylorโ€™s Theorem)

Let : be c ontinuously di fferentiable a nd then there ex ists so me [ ] ๐‘›๐‘› ๐‘›๐‘› 0,1๐‘“๐‘“ sโ„uchโŸถ that โ„ ๐‘ ๐‘ ๐‘ ๐‘ โ„

๐‘ก๐‘ก๐‘ก๐‘ก( + ) = ( ) + ( + ), . Moreover, if is twice continuously differentiable, ( ) ( ) 1 2 then๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ ๐‘  +๐‘“๐‘“ ๐‘ฅ๐‘ฅ= โŒฉโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ+ 0๐‘ก๐‘ก๐‘ก๐‘ก ๐‘ ๐‘ โŒช( + ) ๐‘“๐‘“ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ ๐‘  โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ โˆซ โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ก๐‘ก๐‘ก๐‘ก ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  Furthermore, we have that for some (0,1)

๐‘ก๐‘ก๐‘ก๐‘ก 1 ( + ) = ( ) + ( ), + , 2 ( + ) (1.5) 2 ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ ๐‘  ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โŒฉโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ ๐‘ โŒช โŒฉ๐‘ ๐‘  โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ก๐‘ก๐‘ก๐‘ก ๐‘ ๐‘ โŒช A function is said to be differentiable if all its partial derivatives ( )/( ) ๐‘›๐‘› ๐‘š๐‘š exist. โ„ โ†’ โ„ ๐œ•๐œ•๐‘“๐‘“๐‘—๐‘— ๐‘ฅ๐‘ฅ ๐œ•๐œ•๐‘ฅ๐‘ฅ๐‘–๐‘– 7

. . Necessary conditions for solutions of differentiable nonlinear optimization problems (The case of unconstrained problems) All๐Ÿ๐Ÿ ๐Ÿ๐Ÿ the๐Ÿ’๐Ÿ’ unconstrained minimization methods are iterative in nature and hence they start from a n in itial tr ial s olution a nd p roceed to ward th e m inimum p oint in a s equential manner. The iterative process is given by

+ = + (1.6)

๐‘ฅ๐‘ฅWhere๐’Š๐’Š ๐Ÿ๐Ÿ ๐‘ฅ๐‘ฅ๐‘–๐‘– is the๐›ผ๐›ผ๐‘–๐‘– starting๐‘ ๐‘ ๐‘–๐‘– point, is the search direction, is the optimal step length, and +1 is ๐‘ฅ๐‘ฅth๐‘–๐‘– e f inal p oint in i teration๐‘†๐‘†๐‘–๐‘– . It i s im portant to๐›ผ๐›ผ n๐‘˜๐‘˜ ote th at a ll the u nconstrained minimization๐‘ฅ๐‘ฅ๐‘–๐‘– methods (1) require an๐‘–๐‘– initial point 1 to start the iterative procedure, and (2) differ from one another only in the method of generating๐‘ฅ๐‘ฅ the new point + (from ) and in testing the point +1for optimality. ๐‘ฅ๐‘ฅ๐’Š๐’Š ๐Ÿ๐Ÿ ๐‘ฅ๐‘ฅ๐‘–๐‘– Rate of Convergence๐‘ฅ๐‘ฅ๐‘–๐‘– Different iterative optimization methods have different rates of convergence. In general, an optimization method is said to have convergence of order p if

+1 , 0, 1 (1.7) โˆ— โ€–๐‘ฅ๐‘ฅ๐‘–๐‘– โˆ’ ๐‘ฅ๐‘ฅ โ€– โˆ— ๐‘๐‘ โ‰ค ๐‘˜๐‘˜ ๐‘˜๐‘˜ โ‰ฅ ๐‘๐‘ โ‰ฅ where and +1 denoteโ€–๐‘ฅ๐‘ฅ๐‘–๐‘– โˆ’ ๐‘ฅ๐‘ฅ theโ€– poi nts ob tained a t t he e nd of i terations and + 1, respectively,๐‘ฅ๐‘ฅ๐’Š๐’Š ๐‘ฅ๐‘ฅ๐‘–๐‘–represents the optimum point, and | | denotes the length or๐‘–๐‘– norm๐‘–๐‘– of the โˆ— vector : ๐‘ฅ๐‘ฅ ๏ฟฝ ๐‘ฅ๐‘ฅ ๏ฟฝ

๐‘ฅ๐‘ฅ 2 2 2 || || = ( 1 + 2 + + )

๐‘›๐‘› If = 1 and 0 1, the๐‘ฅ๐‘ฅ method๏ฟฝ is๐‘ฅ๐‘ฅ said๐‘ฅ๐‘ฅ to beโ‹ฏ linearly๐‘ฅ๐‘ฅ convergent (corresponds to

slow๐‘๐‘ c onvergence).โ‰ค ๐‘˜๐‘˜ I fโ‰ค = 2, the m ethod i s said to be quadratically c onvergent (corresponds to fast convergence).๐‘๐‘ An optimization method is said to have super linear convergence (corresponds to fast convergence) if

lim +1 0 (1.8) โˆ— โ€–๐‘ฅ๐‘ฅ๐‘–๐‘– โˆ’ ๐‘ฅ๐‘ฅ โ€– ๐‘–๐‘–โ†’โˆž โˆ— โ†’ The definitions of rates of convergenceโ€–๐‘ฅ๐‘ฅ๐‘–๐‘– โˆ’ ๐‘ฅ๐‘ฅgivenโ€– in Eqs. (1.7) and (1.8) are applicable to single v ariable a s w ell as m ultivariable o ptimization p roblems. In th e c ase o f s ingle- variable problems, the vector, for example, degenerates to a scalar, .

๐‘ฅ๐‘ฅ๐‘–๐‘– ๐‘ฅ๐‘ฅ๐‘–๐‘– 8

Theorem 1.2 (First- Order Necessary Conditions) If a l ocal minimizer โˆ— of : , where f is a continuously differentiable in an open๐‘ฅ๐‘ฅ ๐‘–๐‘– ๐‘–๐‘–neighborhood of , ๐‘›๐‘› โˆ— then๐‘“๐‘“ โ„ โŸถ โ„ โ„ต ๐‘ฅ๐‘ฅ

( ) = 0 (1.9) โˆ— 2( ) Exists a nd i s c ontinuousโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ i n a ne ighborhood of , w e ca n st ate an other โˆ— ๐‘–๐‘–necessary๐‘–๐‘– โˆ‡ ๐‘ฅ๐‘ฅ condition satisfied by a local minimizer. ๐‘ฅ๐‘ฅ

Theorem 1.3 (Second order necessary conditions) If is a local minimizer of , and โˆ— is twice continuously differentiable in an open neighborhood๐‘ฅ๐‘ฅ of , then ๐‘“๐‘“ ๐‘“๐‘“ โˆ— ( )=0 and 2 ( ) is positive-semi definite.โ„ต ๐‘ฅ๐‘ฅ (1.10) โˆ— โˆ— Any point โˆ‡๐‘“๐‘“that ๐‘ฅ๐‘ฅsatisfies (1.9)โˆ‡ ๐‘“๐‘“ is๐‘ฅ๐‘ฅ called a stationary point of . Thus Theorem1.2 states โˆ— that any local๐‘ฅ๐‘ฅ minimizer must be a stationary point; it is important๐‘“๐‘“ to note, however, that the converse is not necessarily true. Fortunately, if the next conditions called sufficient conditions are satisfied by a stationary point they guarantee that it is a local minimizer. โˆ— Theorem 1.4 (Second -Order Sufficient ๐‘ฅ๐‘ฅConditions): Let be t wice c ontinuously

differentiable on an open neighborhood of . If satisfies ๐‘“๐‘“ โˆ— โˆ— ( ) = 0, and 2 ( ) is positive definiteโ„ต ๐‘ฅ๐‘ฅ then ๐‘ฅ๐‘ฅ is a strict local minimizer of . โˆ— โˆ— โˆ— โˆ‡๐‘“๐‘“Note๐‘ฅ๐‘ฅ that the secondโˆ‡ ๐‘“๐‘“ order๐‘ฅ๐‘ฅ sufficient conditions are๐‘ฅ๐‘ฅ not necessary: a point can be ๐‘“๐‘“a strict local minimizer and fail to satisfy the sufficient conditions.

1.3 Methods of solving nonlinear differentiable optimization problems

. . Gradient method (first order derivative method) Mathematical pr ogramming i s c oncerned w ith methods t hat c an be us ed t o s olve ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ optimization problems. In practice we will be concerned with algorithms, defined so that their computational implementation finds either an approximate an exact solution to the original mathematical programming problem [7].

9

These algorithms are mostly iterative methods, which form a starting point 0, use some { } rule t o compute a s equence of poi nts 1, 2,โ€ฆ, ,โ€ฆ ๐‘ฅ๐‘ฅsuch t hat lim = , ๐‘˜๐‘˜ โˆž ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ โˆ— ๐‘˜๐‘˜ ๐‘˜๐‘˜โ†’ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ where the solution to the problem. โˆ— Here w๐‘ฅ๐‘ฅ e ๐‘–๐‘–a๐‘–๐‘– re m ostly i nterested i n unc onstrained pr oblems. T he s implest m ethod developed for the solution of a minimization problem is the steepest descent method. This method is based on the fact that, from any starting point , the direction in which any

function decrease most rapidly is the direction ( ) . ๐‘ฅ๐‘ฅ

Definition๐‘“๐‘“1.1(descent direction) let : beโˆ’โˆ‡๐‘“๐‘“ differentiable๐‘ฅ๐‘ฅ at 0 ๐‘›๐‘› ( ๐‘›๐‘› ) ( ) is a descent direction of at if๐‘“๐‘“ thereโ„ โ†’ exists โ„ > 0 such that ๐‘ฅ๐‘ฅฬ…+ โˆˆโ„ ๐‘ก๐‘กโ„Ž๐‘’๐‘’๐‘’๐‘’< โ‰  ๐‘‘๐‘‘ for โˆˆ ๐‘›๐‘› allโ„ (0, ]. ๐‘“๐‘“ ๐‘ฅ๐‘ฅฬ… ๐›ฟ๐›ฟ ๐‘“๐‘“ ๐‘ฅ๐‘ฅฬ… ๐œ†๐œ†๐œ†๐œ† ๐‘“๐‘“ ๐‘ฅ๐‘ฅฬ…

Consider๐œ†๐œ† โˆˆ ๐›ฟ๐›ฟ

min f(x) where f n

๐‘›๐‘› ๐‘ฅ๐‘ฅโˆˆโ„ โˆถ โ„ โ†’ โ„ If is continuously differentiable at ( ) 0 then there are infinitely ๐‘›๐‘› many๐‘“๐‘“ de scent di rection of f at ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆˆ โ„that ๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Ži s๐‘–๐‘– t๐‘–๐‘– โˆ‡ here๐‘“๐‘“ ๐‘ฅ๐‘ฅฬ… exโ‰  ists {0} such ๐‘›๐‘› that ( ) < 0. ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘‘๐‘‘๐‘˜๐‘˜ โˆˆ โ„ โˆ• ๐‘‡๐‘‡ ๐‘˜๐‘˜ ๐‘˜๐‘˜ To findโˆ‡๐‘“๐‘“ the๐‘ฅ๐‘ฅ direction๐‘‘๐‘‘ in which f decreases most rapidly we solve the problem

min ( ) = 1 to get ๐‘›๐‘› ๐‘‡๐‘‡ ๐‘‘๐‘‘๐‘˜๐‘˜ โˆˆโ„ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘‘๐‘‘ ๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘คโ„Ž โ€–๐‘‘๐‘‘ โ€– ( ) = ( ๐‘˜๐‘˜ ) ๐‘˜๐‘˜ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘‘๐‘‘ โˆ’ ๐‘˜๐‘˜ ( ) โ€–โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ€– Now = ( ) or = , will be , us ed f or s earching di rections. T his ( ) โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ method๐‘‘๐‘‘ ๐‘˜๐‘˜i s cโˆ’โˆ‡๐‘“๐‘“ alled ๐‘ฅ๐‘ฅt he๐‘˜๐‘˜ s teepest๐‘‘๐‘‘๐‘˜๐‘˜ deโˆ’ โ€– scentโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ mโ€– ethod. T he f ollowing a lgorithm was gi ven by Cauchy [8].

Algorithm (Method of steepest descent)

10

Given : continuously differentiable, at each iteration k, find the lowest ๐‘›๐‘› ( ) โˆ— ๐‘›๐‘› point of๐‘“๐‘“ f โ„in theโ†’ โ„direction of form ๐‘ฅ๐‘ฅ thatโˆˆ โ„is find that solves โˆ’โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ min ( ( )) , +1 = ( ) >0 ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐›ผ๐›ผ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โˆ’ ๐›ผ๐›ผ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ก๐‘กโ„Ž๐‘’๐‘’๐‘’๐‘’ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ โˆ’ ๐›ผ๐›ผ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ And then we continue until ( ) = 0.

โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ If ( ) = 0, t hen +1 = , the a lgorithm s tops. H owever may b e ei ther l ocal 2 minimizerโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ or saddle point๐‘ฅ๐‘ฅ๐‘˜๐‘˜ i.e.,๐‘ฅ๐‘ฅ ๐‘˜๐‘˜ ( ) is either positive semi-definite๐‘ฅ๐‘ฅ๐‘˜๐‘˜ or indefinite. ๐‘˜๐‘˜ We know that from second orderโˆ‡ ๐‘“๐‘“Taylor๐‘ฅ๐‘ฅ expansion for sufficiently small at

๐‘˜๐‘˜ 1 ๐›ผ๐›ผ ๐‘ฅ๐‘ฅ ( + ) = ( ) + ( ) + 2 ( ) 2 ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘‘๐‘‘๐‘˜๐‘˜ ๐›ผ๐›ผ ๐‘‘๐‘‘๐‘˜๐‘˜ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘‘๐‘‘๐‘˜๐‘˜ If the searching direction is the eigen vector corresponding to the largest negative eigen value say for the corresponding value of 2 ( ) ( ) = 0, we have

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ ๐œ†๐œ† โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ1 ๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘ค๐‘คโ„Ž๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ( + ) = ( ) + ( ) + 2 2 ( ) ( ) 2 ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐›ผ๐›ผ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ ๐›ผ๐›ผ ๐‘ฅ๐‘ฅ โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ โ‰ค๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ This implies that ( + ) < ( ).

๐‘˜๐‘˜ ๐‘˜๐‘˜ The steepest descent๐‘“๐‘“ ๐‘ฅ๐‘ฅ method๐›ผ๐›ผ๐›ผ๐›ผ is a ๐‘“๐‘“line๐‘ฅ๐‘ฅ search method that moves along = ( ) at

each step it can choose the step length in various ways. One advantage๐‘‘๐‘‘๐‘˜๐‘˜ of โˆ’โˆ‡๐‘“๐‘“this method๐‘ฅ๐‘ฅ๐‘˜๐‘˜ is i t r equires onl y c alculations of ๐›ผ๐›ผ(๐‘˜๐‘˜ ) but i t doe snot r equire s econd de rivative. However i t c an be s low on di fficult โˆ‡๐‘“๐‘“pr oblems,๐‘ฅ๐‘ฅ๐‘˜๐‘˜ t hat i s t his method us ually w orks qui te well dur ing t he e arly steps of t he opt imization pr ocess, depending on t he po int of initialization. The method of steepest descent is the simplest of the gradient methods. The choice of direction is where f decreases most quickly which is the direction opposite to

( ). the search starts at arbitrary point 0 and then slides down the gradient until we

โˆ‡๐‘“๐‘“close๐‘ฅ๐‘ฅ ๐‘˜๐‘˜enough to the solution. The iterative procedure๐‘ฅ๐‘ฅ is

+1 = f( ) = ( ), where ( ) is the gradient at one given point.

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Now๐‘ฅ๐‘ฅ the ๐‘ฅ๐‘ฅquestionโˆ’ ๐›ผ๐›ผ โˆ‡ is ๐‘ฅ๐‘ฅhow big๐‘ฅ๐‘ฅ shouldโˆ’ ๐›ผ๐›ผ ๐‘”๐‘” the๐‘ฅ๐‘ฅ step taken๐‘”๐‘” in๐‘ฅ๐‘ฅ that direction be, that is what is the value of ? obviously w e w ant t o m ove t he poi nt w here t he f unction f t akes on a 11 ๐›ผ๐›ผ๐‘˜๐‘˜

minimum v alue, w hich i s w here t he d irectional d erivative i s zer o. The d irectional derivative is given by

( +1) = ( +1) +1 = ( +1) ( ) Setting t his e xpression t o ๐‘‘๐‘‘ ๐‘‡๐‘‡ ๐‘‘๐‘‘ ๐‘‡๐‘‡ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘‘๐‘‘๐›ผ๐›ผzero,๐‘“๐‘“ we๐‘ฅ๐‘ฅ see thatโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ shouldโˆ™ ๐‘‘๐‘‘๐›ผ๐›ผ be ๐‘ฅ๐‘ฅchoosenโˆ’โˆ‡๐‘“๐‘“ so that๐‘ฅ๐‘ฅ f( โˆ™+1 ๐‘”๐‘” )๐‘ฅ๐‘ฅ ( ) are orthogonal. The

next step is taken ๐›ผ๐›ผin๐‘˜๐‘˜ the direction of the negativeโˆ‡ gradient๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Žat this๐‘”๐‘” ๐‘ฅ๐‘ฅ new๐‘˜๐‘˜ point and we may get a zigzag pattern.

However the steepest descent method can be extremely slow for some problems. There are f ortunately, s everal ot her m ethods t hat w ork ve ry w ell i n p ractice, a nd he re w e present some of them.

. . Newtonโ€™s method (second order derivative method) Consider any nonlinear system of equations of the form ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘ ๐Ÿ๐Ÿ ( ) = 0 (1.14)

F: n n. If the Jacobi of exists,๐น๐น ๐‘ฅ๐‘ฅthen we can write the Taylor first- order approximation๐‘Š๐‘Šโ„Ž๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ โ„ โ†’ to โ„this function as ๐น๐น

( + ) ( ) + ( ) (1.15)

Where ( ) denotes the Jacobean๐น๐น of๐‘ฅ๐‘ฅ F evaluated๐‘ ๐‘  โ‰ˆ ๐น๐น ๐‘ฅ๐‘ฅ at x.๐ฝ๐ฝ form๐‘ฅ๐‘ฅ ๐‘ ๐‘  these equations, we can derive

an iterative๐ฝ๐ฝ ๐‘ฅ๐‘ฅ method. Given an initial point 0, at each iteration , we will compute a new ( ) iterate +1 = + such that + ๐‘ฅ๐‘ฅ = 0 which means๐‘˜๐‘˜ that must satisfy the linear system๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜ ๐น๐น ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜

( ) = ( )

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ This is called the Newtonโ€™s๐ฝ๐ฝ ๐‘ฅ๐‘ฅ method๐‘ ๐‘  โˆ’๐น๐น for ๐‘ฅ๐‘ฅsolving nonlinear systems of equations.

Now r eturning t o our o ptimization problem, w e not e that when t he f irst a nd s econd derivatives of are av ailable, we c an us e N ewtonโ€™s m ethod t o s olve t he ( possible

nonlinear) system๐‘“๐‘“ of equations defined by

( ) = 0. (1.16) 12 โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ

Since w e know f orm Theorem 1.2 above t hat any minimizer of must s atisfy th is condition. This is the basis for the Newtonโ€™s method for optimization ๐‘“๐‘“problems, and with some va riables, i t i s t he ba sis f or m any ot her methods i n unc onstrained opt imization. More formally if we want to apply this method to equation (1.16). we also know that the second order Taylor approximation to f at is

๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ 1 ( + ) ( ) + ( ), + , 2 ( ) (1.17) 2 ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜ โ‰ˆ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โŒฉโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜ โŒช โŒฉ๐‘ ๐‘ ๐‘˜๐‘˜ โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜ โŒช In order to find a minimum of this function, we try to find a solution to ( + ) = 0

which is equivalent to โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜

( ) + 2 ( ) = 0.

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Thus weโˆ‡๐‘“๐‘“ have๐‘ฅ๐‘ฅ thatโˆ‡ ๐‘“๐‘“ must๐‘ฅ๐‘ฅ ๐‘ ๐‘  satisfy the so called Newton equations

๐‘˜๐‘˜ ๐‘ ๐‘  2 ( ) = ( ) (1.18)

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ If 2 ( ) positiveโˆ‡ de๐‘“๐‘“ finite๐‘ฅ๐‘ฅ ๐‘ ๐‘  ( PD),โˆ’โˆ‡๐‘“๐‘“ t hen๐‘ฅ๐‘ฅ w e c an f ind i ts i nverse, a nd t he s olution t o

(1.17โˆ‡)๐‘“๐‘“ is๐‘ฅ๐‘ฅ ๐‘˜๐‘˜

1 = 2 ( ) ( ) (1.19) โˆ’ ๐‘ ๐‘ ๐‘˜๐‘˜ โˆ’๏ฟฝโˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๏ฟฝ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ This direction is called the Newton direction.

๐‘˜๐‘˜ If 2 ( ) is๐‘ ๐‘  not positive definite the Newtonโ€™s method (direction) is not suitable for

searchโˆ‡ ๐‘“๐‘“ di๐‘ฅ๐‘ฅ rection.๐‘˜๐‘˜ O ne of t he s trategy t o ov ercome s uch pr oblem i s m odification of t he 2 ( ) during or before the method of solution 2 ( ) = ( )

so that it becomesโˆ‡ ๐‘“๐‘“positive๐‘ฅ๐‘ฅ๐‘˜๐‘˜ definite. โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜

One of the problems in Newtonโ€™s method is that the method is based on a necessary first order optimality condition (namely that the gradient of the objective function be equal to zero). In order to guarantee that we have found a minimizer of , it is also necessary to 2 ( ) guarantee t hat t he H essian be pos itive de finite moreover๐‘“๐‘“ t he a pproximations โˆ— โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ

13

(1.15) and 1.17) are only valid in a neighborhood of the solution of (1.15) and (1.16) respectively.

Thus N ewtonโ€™s m ethod is onl y a ppropriate w hen t he s tarting poi nt 0 is s ufficiently

close to the solution . However, when it works, it is very fast and most๐‘ฅ๐‘ฅ optimization โˆ— methods try to mimic ๐‘ฅ๐‘ฅits behavior around the solution.

There ar e so cal led globalization t echniques t hat can b e u sed t o g uarantee t he convergence of Newtonโ€™s method from any starting point. These techniques give rise to different methods which can be divided in to two classes: line search methods and trust region m ethods. T he main d ifference b etween these t wo c lasses is t hat i n l ine sea rch methods, t he di rection i n w hich w e c hoose t o t ake our ne xt i teration i s s elected f irst, while th e s ize o f th e s tep to b e ta ken in th is d irection is c omputed w ith t he di rection fixed. On the other hand, in trust region methods, the step size and the direction are more or less chosen simultaneously [7]. The two strategies are discussed below in more details.

. . Line search methods Although Newtonโ€™s di rection i s e ffective for s earching directions, t he method may not ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘ ๐Ÿ‘๐Ÿ‘ converge to the solution. The local model may not be a good representative for the given (objective) function. H ence w e h ave t o ba cktrack. T he s trategy we c onsider f or proceeding from a solution estimate outside the convergence of Newtonโ€™s method is the method of line searches.

As the name suggests, the idea behind line search methods is to find a step size along a certain line which gives us a good reduction on the function value, while being reasonably in expensive to compute. More formally, they are iterative methods that, at every step, choose a certain descent direction and move along this direction. Each iteration of a line search method computes a search direction and then decides how far to move along that direction. The iteration is given by ๐‘๐‘๐‘˜๐‘˜ +1 = +

๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ Where, t he p ositive sca lar , is c alled a s tep l ength, a nd can b e ch osen, as t he

Newton direction given by๐›ผ๐›ผ ๐‘˜๐‘˜(1.19). ๐‘๐‘๐‘˜๐‘˜ ๐‘ ๐‘ ๐‘˜๐‘˜ 14

Moreover, the search direction often has the form = 1 ( ) (1.20) โˆ’ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Where is a symmetric and nonsingular matrix.๐‘๐‘ In theโˆ’๐›ฝ๐›ฝ steepestโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ descent method is 2 ( ) simply the๐›ฝ๐›ฝ๐‘˜๐‘˜ identity matrix I, while in Newtonโ€™s method is the exact Hessian ๐›ฝ๐›ฝ๐‘˜๐‘˜ . in Quasi Newton method, is an approximation to the๐›ฝ๐›ฝ Hessian๐‘˜๐‘˜ that is updatedโˆ‡ at๐‘“๐‘“ every๐‘ฅ๐‘ฅ๐‘˜๐‘˜ iteration, by m eans o f a l๐›ฝ๐›ฝ ow๐‘˜๐‘˜ r ank f ormula. Where is de fined by (1.20) and is positive definite we have ๐‘๐‘๐‘˜๐‘˜ ๐›ฝ๐›ฝ๐‘˜๐‘˜

( ) = ( ) 1 ( ) < 0. ๐‘‡๐‘‡ ๐‘‡๐‘‡ โˆ’ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘๐‘andโˆ‡๐‘“๐‘“ therefore๐‘ฅ๐‘ฅ โˆ’โˆ‡๐‘“๐‘“ is๐‘ฅ๐‘ฅ a descent๐›ฝ๐›ฝ โˆ‡๐‘“๐‘“ direction.๐‘ฅ๐‘ฅ

๐‘˜๐‘˜ Now in the following๐‘๐‘ we study how to choose the matrix or more generally how to

compute t he sear ch d irection, and g ive ca reful consideration๐›ฝ๐›ฝ๐‘˜๐‘˜ t o the ch oice o f t he step length parameter .

๐‘˜๐‘˜ Step length: the ideal๐›ผ๐›ผ choice of the step length would be the global minimizer of the

univariate function (. ) defined by ๐›ผ๐›ผ๐‘˜๐‘˜

( ) = ( + ๐œ™๐œ™), > 0 (1.21)

๐‘˜๐‘˜ ๐‘˜๐‘˜ But๐œ™๐œ™ ๐›ผ๐›ผ in general,๐‘“๐‘“ ๐‘ฅ๐‘ฅ it๐›ผ๐›ผ๐‘๐‘ is too๐›ผ๐›ผ expensive to identify this value. To find even a local minimizer of to m oderate pr ecision ge nerally requires too m any e valuations of t he obj ective function๐œ™๐œ™ and possibly the gradient .

In or der ๐‘“๐‘“t o e nsure t hat e ven a n aโˆ‡๐‘“๐‘“ pproximate s olution i s e nough t o gua rantee t he convergence of t he l ine s earch m ethod t o a minimizer of t he obj ective f unction, s ome conditions are imposed on the step length at each iteration. One very important condition that must be satisfied is that the decrease in the objective function is not too small. One way of measuring this is by using the following inequality.

( + ) ( ) + 1 ( ), , for some 1 (0,1) (1.22)

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘“๐‘“Thi๐‘ฅ๐‘ฅs condition๐›ผ๐›ผ๐‘๐‘ โ‰ค is ๐‘“๐‘“ sometimes๐‘ฅ๐‘ฅ ๐‘๐‘ ๐›ผ๐›ผ calledโŒฉโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ the๐‘๐‘ ArmijoโŒช condition๐‘๐‘ โˆˆ and it states that the reduction in f should be proportional to the step length and the derivative of . On the other hand;

๐‘˜๐‘˜ ๐›ผ๐›ผ15 ๐‘“๐‘“

we must also guarantee that the step is not too short. Indeed, condition (1.22 is satisfied for a ll s ufficiently s mall v alues o f . O ne w ay of e nforcing t his i s by i mposing a

curvature condition, which requires that๐›ผ๐›ผ satisfy ๐›ผ๐›ผ๐‘˜๐‘˜ ( + ), 2 ( ), for some 2 ( 1, 1) (1.23)

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ โŒฉConditionsโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐›ผ๐›ผ (1.๐‘๐‘22)๐‘๐‘ andโŒช โ‰ฅ (1.๐‘๐‘ 23โŒฉโˆ‡๐‘“๐‘“) are๐‘ฅ๐‘ฅ known๐‘๐‘ โŒช as the Wolfe๐‘๐‘ โˆˆ conditions.๐‘๐‘

Remark: for every function that is smooth and bounded below, there exist step lengths

that satisfy the .๐‘“๐‘“

These conditions on the step length are very important in practice and are widely used in the line search methods.

Algorithm (line search)

Given a d escent direction , set +1 = + for so me > 0 that makes +1

acceptable i terate. H owever,๐‘๐‘๐‘˜๐‘˜ r ather๐‘ฅ๐‘ฅ๐‘˜๐‘˜ t han set๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ting๐›ผ๐›ผ ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ = ( )๐›ผ๐›ผ ๐‘˜๐‘˜we will u se N ewtonโ€™s๐‘ฅ๐‘ฅ๐‘˜๐‘˜ 1 2 direction o r its m odification f or in stance, ๐‘๐‘๐‘˜๐‘˜ ( โˆ’โˆ‡๐‘“๐‘“) where๐‘ฅ๐‘ฅ๐‘˜๐‘˜ = ( ) + is 2 โˆ’ ( ) ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ positive de finite, = 0 if is s afelyโˆ’๐ป๐ป๐‘˜๐‘˜ pโˆ‡๐‘“๐‘“ ositive๐‘ฅ๐‘ฅ d efinite ๐ป๐ปto re tainโˆ‡ ๐‘“๐‘“ i ts๐‘ฅ๐‘ฅ f ast ๐‘๐‘lo cal convergence t he term๐‘๐‘๐‘˜๐‘˜ l ine seโˆ‡ arch๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘˜๐‘˜ refers to t he ch oice of in this a lgorithm. T he common procedure is to use = 1 (the full Quassi-Newtonโ€™s๐›ผ๐›ผ๐‘˜๐‘˜ step) if it fails backtrack in t he symmetric w ay a long ๐›ผ๐›ผt๐‘˜๐‘˜ he s ame di rection, t he c ommon s ense i s t o r equire t hat ( +1) < ( ) but this simple condition doesnot guarantee that { } will converge to

๐‘“๐‘“the๐‘ฅ๐‘ฅ minimizer๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ of๐‘˜๐‘˜ . If very small reduction in are taken relative ๐‘ฅ๐‘ฅto๐‘˜๐‘˜ the length of the steps, the limiting ๐‘“๐‘“value ( ) may not be local minimize๐‘“๐‘“ [8]. ๐‘˜๐‘˜ 2๐‘“๐‘“ ๐‘ฅ๐‘ฅ +1 For example, ( ) = , 0 = 2, if we choose { } = {( 1) } ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ( +1)๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘๐‘ โˆ’ { } = {2 + 3(2 )}, then โˆ’ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐›ผ๐›ผ 3 5 9 { } = 2, , , ,โ€ฆ, = 1k 1 + 2 k , each p is a descent direction at x . 2 4 8 k k โˆ’ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๏ฟฝ โˆ’ โˆ’ ๏ฟฝ ๏ฟฝ๏ฟฝโˆ’ ๏ฟฝ๏ฟฝ ๏ฟฝ๏ฟฝ And ( +1) < ( ), ( ) is monotonically decreasing with

๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ 16

lim ( ) = 1 โˆž ๐‘˜๐‘˜ ๐‘˜๐‘˜โ†’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ Which is not the minimizer of and { } has limit ยฑ1 so it does not converge.We fix ( ) ( ) this by requiring that the average๐‘“๐‘“ rate of๐‘ฅ๐‘ฅ decrease๐‘˜๐‘˜ from to +1 be at least some prescribed fraction of the initial rate, that is we pick ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜(0,1)๐‘“๐‘“ and๐‘ฅ๐‘ฅ๐‘˜๐‘˜ choose among > 0 that sat ๐œ†๐œ† โˆˆ ๐›ผ๐›ผ๐‘˜๐‘˜ isfies ( ) ( ) ๐›ผ๐›ผ + (1.24) ๐‘‡๐‘‡ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ It๐‘“๐‘“ is๐‘ฅ๐‘ฅ also๐›ผ๐›ผ๐‘๐‘ possibleโ‰ค ๐œ†๐œ†๐œ†๐œ† thatโˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅif steps๐‘๐‘ are too small relative to the initial rate of decrease of ,

then any limiting value { } may not be local minimizer of . For example in the above๐‘“๐‘“ { } example taking the same๐‘ฅ๐‘ฅ function๐‘˜๐‘˜ with the same initial estimate๐‘“๐‘“ and let us take pk = 3 5 9 { 1}, { } = 2 (k+1) , then the sequence {x } = 2, , , ,โ€ฆ, = 1 + 2 k . each k k 2 4 8 โˆ’ โˆ’ โˆ’is a descentฮฑ ๏ฟฝ direction๏ฟฝ at , ( ) is monotonically๏ฟฝ decreasing ๏ฟฝwith๏ฟฝ ๏ฟฝ

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘๐‘ ๐‘ฅ๐‘ฅ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ lim = 1 โˆž ๐‘˜๐‘˜ ๐‘˜๐‘˜โ†’ ๐‘ฅ๐‘ฅ Which is n ot th e m inimizer o f . I n t he a bove e xample, we s ee t hat monotonically decreasing sequence o f i terates t๐‘“๐‘“ hat d oes n ot co nverge t o t he m inimizer, t o en sure sufficiently large steps we require that the rate of decrease of in the direction of at

+1 be larger than some prescribed fraction of the rate of decrease๐‘“๐‘“ of in the direction๐‘๐‘ ๐‘ฅ๐‘ฅof๐‘˜๐‘˜ at ๐‘“๐‘“ ๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ( +1 ) ( +1 ) for some ( , 1) (1.25) ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘˜๐‘˜ ๐‘˜๐‘˜ โˆ‡๐‘“๐‘“But ๐‘ฅ๐‘ฅthis condition๐‘๐‘ is โ‰ฅ ๐›ฝ๐›ฝโˆ‡๐‘“๐‘“not๐‘ฅ๐‘ฅ necessarily๐‘๐‘ required,๐›ฝ๐›ฝ โˆˆ because๐œ†๐œ† the use of the backtracking strategy avoids excessively small steps.

Step selection by backtracking

Here the strategy is to start with = 1, and then if +1 = + is not acceptable,

backtracking (reducing ) until ๐›ผ๐›ผan๐‘˜๐‘˜ acceptable +1 =๐‘ฅ๐‘ฅ๐‘˜๐‘˜ + ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ is๐›ผ๐›ผ๐‘˜๐‘˜ found.๐‘๐‘๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Example1.1: consider the๐›ผ๐›ผ problem ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐›ผ๐›ผ ๐‘๐‘

17

2 2 ( 1, 2) = 5 1 + 2 3 1 2,

Let = (2,3) and ๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š=๐‘š๐‘š (๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š5,๐‘“๐‘“ 7)๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ, s o t๐‘ฅ๐‘ฅ hat ๐‘ฅ๐‘ฅ (โˆ’ ) ๐‘ฅ๐‘ฅ=๐‘ฅ๐‘ฅ65.if = 1, then ( + ๐‘‡๐‘‡ ๐‘‡๐‘‡ =121>๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ( ) ๐‘๐‘๐‘˜๐‘˜ โˆ’ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜

๐›ผ๐›ผ๐‘˜๐‘˜๐‘๐‘๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ 1 1 1 9 So this is not an acceptable step length. If = then ( + ) = , = 2 2 2 4 ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐‘“๐‘“ ๏ฟฝโˆ’ โˆ’ ๏ฟฝ and so this step length produce a decrease in the function value, as desired. The frame work of this algorithm is given below.

Algorithm (backtracking line search frame work)

1 Given 0, , 0 < < < 1, = 1 2 ๐œ†๐œ† โˆˆ๏ฟฝ ๏ฟฝ ๐‘™๐‘™ ๐‘ข๐‘ข ๐›ผ๐›ผ๐‘˜๐‘˜ while ( + ) < ( ) + ( ) then = for s ome [ , ]. is ๐‘‡๐‘‡ choosen๐‘“๐‘“ ๐‘ฅ๐‘ฅ at๐‘˜๐‘˜ e ach๐›ผ๐›ผ๐‘˜๐‘˜ i๐‘๐‘ teration๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ by๐‘˜๐‘˜ line๐œ†๐œ†๐›ผ๐›ผ s๐‘˜๐‘˜ earch,โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘˜๐‘˜ +1๐‘๐‘๐‘˜๐‘˜ = ๐›ผ๐›ผ+ ๐œŒ๐œŒ๐›ผ๐›ผ๐‘˜๐‘˜. ๐œŒ๐œŒ โˆˆ is๐‘™๐‘™ s๐‘ข๐‘ข et t๐œŒ๐œŒ o b e very small so that no more functional value๐‘ฅ๐‘ฅ is๐‘˜๐‘˜ required๐‘ฅ๐‘ฅ๐‘˜๐‘˜ [8].๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐‘–๐‘–๐‘–๐‘– ๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘ ๐œ†๐œ†

Strategy for choosing ( )

๐’Œ๐’Œ Define ( ) = ( + ๐œถ๐œถ) which,๐†๐† is one dimensional restriction of f to the line through

in the๐œƒ๐œƒ direction๐›ผ๐›ผ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘˜๐‘˜ , if๐›ผ๐›ผ๐‘๐‘ we๐‘˜๐‘˜ need to backtrack we will use our most current information ๐‘ฅ๐‘ฅabout๐‘˜๐‘˜ f to model it and๐‘๐‘๐‘˜๐‘˜ then take the value of that minimizes the model as a next value of in the above algorithm. ๐›ผ๐›ผ ๐›ผ๐›ผ๐‘˜๐‘˜ Initially we have (0) = ( ) and (0) = ( ) (1.26) ๐‘‡๐‘‡ ๐‘˜๐‘˜ ฬ‡ ๐‘˜๐‘˜ ๐‘˜๐‘˜ (1) = (๐œƒ๐œƒ + ๐‘“๐‘“) ๐‘ฅ๐‘ฅ ๐œƒ๐œƒ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘๐‘ (1.27)

๐œƒ๐œƒ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ So if ( + ) does not satisfy (1.24), that is, (1) > (0) + (0), then we model

( ) by๐‘“๐‘“ one๐‘ฅ๐‘ฅ๐‘˜๐‘˜ dimensional๐‘๐‘๐‘˜๐‘˜ quadratic model satisfying๐œƒ๐œƒ (1.26)๐œƒ๐œƒ and (1.๐œ†๐œ†๐œƒ๐œƒ27ฬ‡), that is ๐œƒ๐œƒ ๐›ผ๐›ผ ( ) = (1) (0) (0) 2 + (0) + (0) . A nd c alculate t he poi nt, =

(0) ๐‘š๐‘š๏ฟฝ ๐›ผ๐›ผ ๏ฟฝ ๐œƒ๐œƒ โˆ’> ๐œƒ๐œƒ 0 forโˆ’ which ๐œƒ๐œƒฬ‡ ๏ฟฝ๐›ผ๐›ผ( ) =๐œƒ๐œƒฬ‡ 0. ๐›ผ๐›ผ ๐œƒ๐œƒฬ‡ ๐›ผ๐›ผ๏ฟฝ 2[ (1) (0) (0)] ๐œƒ๐œƒฬ‡ ๐œƒ๐œƒ โˆ’๐œƒ๐œƒ โˆ’๐œƒ๐œƒฬ‡ ๐‘š๐‘š๏ฟฝฬ‡ ๐›ผ๐›ผ 18

Now ( ) > 0.Since (1) > (0) + (0) > (0) + (0). Thus minimizes t he

model๐‘š๐‘š๏ฟฝ function.ฬˆ ๐›ผ๐›ผ๏ฟฝ Furthermore๐œƒ๐œƒ ๐œƒ๐œƒ> 0 because๐œ†๐œ†๐œƒ๐œƒฬ‡ ๐œƒ๐œƒ(0) < 0,๐œƒ๐œƒฬ‡ therefore ๐›ผ๐›ผ๏ฟฝwe take as our ( ) ( ) new value of in our backtracking๐›ผ๐›ผ๏ฟฝ a lgorithm.๐œƒ๐œƒฬ‡ We not e t hat s ince 1 >๐›ผ๐›ผ๏ฟฝ 0 + (0), we have๐›ผ๐›ผ ๐‘˜๐‘˜ ๐œƒ๐œƒ ๐œƒ๐œƒ

๐œ†๐œ†๐œƒ๐œƒฬ‡ 1 1 1 < . In fact (1) (0), then this gives an upper bound of = on the 2(1 ) 2 2 ๐›ผ๐›ผ๏ฟฝfirst valueโˆ’๐œ†๐œ† of in ๐œƒ๐œƒthe algorithm.โ‰ฅ ๐œƒ๐œƒ On the๐›ผ๐›ผ๏ฟฝโ‰ค other hand if (1) is much larger๐‘ข๐‘ข than (0), 1 can be very small but we impose a lower bound of = in the algorithm, that is at the ๐œŒ๐œŒ 10๐œƒ๐œƒ ๐œƒ๐œƒ ๐›ผ๐›ผ๏ฟฝ 1 first b acktrack if 0.1 then w e take = . we๐‘™๐‘™ can u se b acktracking as t he f irst 10 iteration us ing qua๐›ผ๐›ผ๏ฟฝโ‰ค dratic m odel i f ( )๐›ผ๐›ผ=๐‘˜๐‘˜ ( + )doesnot s atisfy (1.24) in th is

case we need to backtrack again. ๐œƒ๐œƒ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜

Although w e w ould us e qua dratic model a s i n t he f irst b acktrack w e now ha ve f our information on which are (0), (0), and the two values of ( ). so at this and any

subsequent backtrack๐œƒ๐œƒ d uring๐œƒ๐œƒ t he c๐œƒ๐œƒ urrentฬ‡ iteration, w e us e t he๐œƒ๐œƒ c๐›ผ๐›ผ ubic model of , the calculation of proceed as follows; ๐œƒ๐œƒ ๐›ผ๐›ผ Let , 2 be t he l ast t wo pr evious va lues of then c ubic m odel t hat s atisfies

๐‘๐‘ ๐‘๐‘ 3 2 ๐‘˜๐‘˜ (0)๐›ผ๐›ผ, (๐›ผ๐›ผ0), , 2 ( ) = + ๐›ผ๐›ผ + (0) + (0)

๐‘๐‘ ๐‘๐‘ ๐‘๐‘๐‘๐‘๐‘๐‘ ๐œƒ๐œƒ ๐œƒ๐œƒฬ‡ ๐œƒ๐œƒ๏ฟฝ๐›ผ๐›ผ 1๏ฟฝ ๐œƒ๐œƒ๏ฟฝ๐›ผ๐›ผ1 ๏ฟฝ ๐‘–๐‘–๐‘–๐‘– ๐‘€๐‘€๏ฟฝ ๐›ผ๐›ผ ๐‘Ž๐‘Ž๐›ผ๐›ผ ๐‘๐‘๐›ผ๐›ผ ๐œƒ๐œƒฬ‡ ๐›ผ๐›ผ ๐œƒ๐œƒ 1 2 2 (0) (0) = 2 2 = 2 ๐‘๐‘๐‘๐‘ ๐‘๐‘ (0) (0) ๐‘Ž๐‘Ž ๐›ผ๐›ผ 2 โˆ’ ๐›ผ๐›ผ 2 2๐‘๐‘ ๐‘๐‘2 ๐‘๐‘ ๐‘๐‘ ๐œƒ๐œƒ๏ฟฝ๐›ผ๐›ผ ๏ฟฝ โˆ’ ๐œƒ๐œƒ โˆ’ ๐œƒ๐œƒ ๐›ผ๐›ผ ๐‘๐‘ ๐›ผ๐›ผ๐‘๐‘ โˆ’๐›ผ๐›ผ ๐‘๐‘ โˆ’๐›ผ๐›ผ ๐›ผ๐›ผ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๐‘๐‘ ๐‘๐‘ ๏ฟฝ ๐›ผ๐›ผ๐‘๐‘ โˆ’ ๐›ผ๐›ผ๐‘๐‘ ๐œƒ๐œƒ๏ฟฝ๐›ผ๐›ผ ๏ฟฝ โˆ’ ๐œƒ๐œƒ โˆ’ ๐œƒ๐œƒฬ‡ ๐›ผ๐›ผ 1 1 1 2 2 (0) (0) 2 2 2 โŽก ๐‘๐‘ โˆ’ ๐‘๐‘ โŽค 2๐‘๐‘ (0) (0) ๐‘๐‘2 โŽข ๐›ผ๐›ผ 2 ๐›ผ๐›ผ 2โŽฅ ๐œƒ๐œƒ๏ฟฝ๐›ผ๐›ผ ๏ฟฝ โˆ’ ๐œƒ๐œƒ โˆ’ ๐œƒ๐œƒ ๐›ผ๐›ผ ๐‘๐‘ 2๐‘๐‘ ๏ฟฝ ๏ฟฝ ๐›ผ๐›ผ๐‘๐‘ โˆ’ ๐›ผ๐›ผ ๐‘๐‘ โŽข ๐›ผ๐›ผ ๐›ผ๐›ผ โŽฅ ๐‘๐‘ ฬ‡ ๐‘๐‘ โŽขโˆ’ โˆ’ โŽฅ ๐œƒ๐œƒ๏ฟฝ๐›ผ๐›ผ ๏ฟฝ โˆ’ ๐œƒ๐œƒ โˆ’ ๐œƒ๐œƒ ๐›ผ๐›ผ ๐›ผ๐›ผ๐‘๐‘ ๐›ผ๐›ผ ๐‘๐‘ โŽฃ + 2โŽฆ 3 (0) Its local minimizing point is = . It can be shown that is always real if 3 โˆ’๐‘๐‘ ๏ฟฝ๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž๐œƒ๐œƒฬ‡ 1 1 1 < . F inally w e us e uppe r๐›ผ๐›ผ๏ฟฝ bound f or ๐‘Ž๐‘Ž = and a l ower bound ๐›ผ๐›ผ๏ฟฝ= for this 4 2 10 ๐›ผ๐›ผ ๐‘ข๐‘ข ๐‘™๐‘™ ๐›ผ๐›ผ๏ฟฝ

19

1 1 1 means that if > , we set a new = and if < then we use new 2 2 10 1 = . ๐›ผ๐›ผ๏ฟฝ ๐›ผ๐›ผ ๐‘๐‘ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘๐‘ ๐›ผ๐›ผ๏ฟฝ ๐›ผ๐›ผ๐‘๐‘ 10 ๐›ผ๐›ผ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘๐‘ . . Trust region methods The c oncept of t he t rust r egion f irst a ppears in a pa per of L iebenberg ( 1944) and ๐Ÿ๐Ÿ ๐Ÿ‘๐Ÿ‘ ๐Ÿ’๐Ÿ’ Marquart (1963) for solving nonlinear least square problems. Trust region methods are iterative numerical procedures like the line search methods in which an approximation of the obj ective f unction ( ) by a model ( ) is c omputed i n a ne ighborhood of t he

current it erate , w hich๐‘“๐‘“ w๐‘ฅ๐‘ฅ e r efer t o a s t๐‘š๐‘š he๐‘˜๐‘˜ t rust๐‘๐‘ r egion. T he m odel ( ) should b e constructed so๐‘ฅ๐‘ฅ that๐‘˜๐‘˜ it is easier to handle than ( ) itself. Let us assume๐‘š๐‘š fo๐‘˜๐‘˜ r๐‘๐‘ this that our 2 function is of class [7]. ๐‘“๐‘“ ๐‘ฅ๐‘ฅ

We solve๐‘“๐‘“ the following๐ถ๐ถ subproblem to obtain the next iteration at each step k of a trust 1 min ( ) ( ) + ( )Tp + pT 2f( )p region method. ( ) 2

๐‘˜๐‘˜ ๐‘๐‘ ๐‘š๐‘š๐‘˜๐‘˜ ๐‘๐‘ โ‰” ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ‡ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘„๐‘„๐‘„๐‘„ ๐‘ก๐‘ก๐‘ก๐‘ก ๏ฟฝ ๐‘˜๐‘˜ Where, > 0 is the t rust region r adius,๐‘ ๐‘ ๐‘ ๐‘  a๐‘ ๐‘  nd๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  .๐‘ก๐‘ก๐‘ก๐‘ก โ€–is๐‘๐‘ dโ€– efinedโ‰ค โˆ† t o be t he E uclidean norm.

These sโˆ† ub๐‘˜๐‘˜ problems a re c onstrained opt imizationโ€– โ€– p roblems i n w hich t he ob jective function and the constraints are both quadratic. The constraint is a quadratic inequality constraint and can be written as + 2 0. In fact usually, the model ( ) is a ๐‘‡๐‘‡ quadratic function which is truncatedโˆ’๐‘๐‘ from๐‘๐‘ โˆ†a๐‘˜๐‘˜ Taylorโ‰ฅ series for around the point๐‘š๐‘š ๐‘˜๐‘˜ ๐‘๐‘: ๐‘˜๐‘˜ 1 ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ( ) = ( ) + ( )Tp + pT 2f( )p (k N ) 2 0 ๐‘š๐‘š๐‘˜๐‘˜ ๐‘๐‘ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ‡๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ‡ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆˆ Where N0 = {0,1,2, โ€ฆ }

We n ote t hat o ne c an choose a ny other no rm in t he f ormulations. I n t his p roject w e consider the Euclidean norm . = . 2 since it makes some computations easier. Hence

our trust region for the modelโ€– โ€– ( )โ€– isโ€– a bounded neighborhood of the current iterate ๐‘š๐‘š๐‘˜๐‘˜ ๐‘๐‘ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ( ) { + | 2 }.

๐›ฝ๐›ฝ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โ‰” ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘ โ€–๐‘๐‘โ€– โ‰ค โˆ†๐‘˜๐‘˜

20

After constructing the model ( ) and its trust region then one seeks a t rial step to

the next iteration = + ๐‘š๐‘š which๐‘˜๐‘˜ ๐‘๐‘ will result in a reduction for the model while๐‘๐‘ the ( ) size of t he step i s๐‘ฅ๐‘ฅ boun๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ ded๐‘˜๐‘˜ by๐‘๐‘ that i s 2 .Then, the obj ective function i s evaluated at + to compare its๐›ฝ๐›ฝ value๐‘ฅ๐‘ฅ๐‘˜๐‘˜ to theโ€– one๐‘๐‘โ€– predictedโ‰ค โˆ†๐‘˜๐‘˜ by the model at this point. If the sufficient๐‘ฅ๐‘ฅ reduction๐‘˜๐‘˜ ๐‘๐‘ predicted by the model is accomplished by the objective function, + is accepted as the next iterate and the trust region is possibly expanded to include

๐‘ฅ๐‘ฅthis๐‘˜๐‘˜ new๐‘๐‘ point ( increase). If the reduction in the model is a poor predictor of the actual reduction of theโˆ† objective๐‘˜๐‘˜ function, then the trial point is rejected. We conclude that the region is too large and the size of the trust region is reduced( decreases), with the hope that the model provides a better prediction in the smaller region.โˆ†๐‘˜๐‘˜

1.3.4.1 Outline and properties of the trust region algorithm In a trust region algorithm, a strategy for determining the trust region radius, , at each,

iteration is needed to be developed. The trust region radius can be determined โˆ†by๐‘˜๐‘˜ looking at the agreement between the model function and the objective function at previous iterations. Given a step we define the ratio ๐‘š๐‘š ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘˜๐‘˜ ( ) ๐‘๐‘ ( + ) = = (1.28) ( ๐‘˜๐‘˜) (๐‘˜๐‘˜ +๐‘˜๐‘˜ ) ( ) ๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ ๐œŒ๐œŒ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ There are various๐‘š๐‘š definitions๐‘ฅ๐‘ฅ โˆ’ ๐‘š๐‘š ๐‘ฅ๐‘ฅfor ๐‘๐‘ in the ๐‘๐‘mathematical๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘ ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ literature but we shall prefer the

above i n pa rticular he re. W e not๐œŒ๐œŒ๐‘˜๐‘˜ e t hat the de nominator of namely t he pr edicted reduction i s a lways non ne gative since t he s tep is c omputed๐œŒ๐œŒ๐‘˜๐‘˜ f rom t he s ubproblem ( ) . over a region that includes the step = 0.๐‘๐‘ In๐‘˜๐‘˜ fact can be called a measure of ๐‘˜๐‘˜ ( ) how๐‘„๐‘„๐‘„๐‘„ good๐‘ก๐‘ก ๐‘Ÿ๐‘Ÿ t he model predicts the reduction๐‘๐‘ in the ๐œŒ๐œŒ ๐‘˜๐‘˜value. If is closer to 0 or negative, then the actual๐‘š๐‘š๐‘˜๐‘˜ prediction๐‘๐‘ is smaller than the predicted๐‘“๐‘“ one.๐œŒ๐œŒ This๐‘˜๐‘˜ indicates that the model cannot be trusted in this region with radius . Thus will be rejected and

will be reduced. O n t he ot her h and i f is c lose โˆ†t o๐‘˜๐‘˜ one , an๐‘๐‘๐‘˜๐‘˜ a dequate pr ediction โˆ† is๐‘˜๐‘˜ obtained we safely expand the trust region๐œŒ๐œŒ๐‘˜๐‘˜ since the model can be trusted over a wider region that should be increased. If is positive but not close to 1, then the trust region

radius is not changed. ๐œŒ๐œŒ๐‘˜๐‘˜

Algorithm(Trust-Region Algorithm) 21

The steps in derivative based trust region methods can be summarized by the following [2].

1. Specify some initial guess of the solution 0. Select the initial trust-region bound

0 > 0. Specify the constants 0 < ฮผ <๐‘ฅ๐‘ฅ ฮท < 1 (perhaps ฮผ = ยผ and ฮท = ยพ). 2. For โˆ† = 0, 1, . . . (i) if ๐‘˜๐‘˜ is optimal, stop. (ii)solve๐‘ฅ๐‘ฅ๐‘˜๐‘˜ 1 min ( ) = ( ) + ( ) + 2 ( ) 2 ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘๐‘ ๐‘ž๐‘ž ๐‘˜๐‘˜ ๐‘๐‘ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ป๐›ป๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘˜๐‘˜ ๐‘๐‘ ๐‘๐‘ โˆ‡ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘

for the trial step ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  ๐‘ก๐‘ก๐‘ก๐‘ก โ€–๐‘๐‘โ€– โ‰ค โˆ†๐‘˜๐‘˜ (iii) Compute ๐‘๐‘๐‘˜๐‘˜ ( ) ( + ) = = . ( ) ( ) ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ (iv) ๐œŒ๐œŒ, ๐‘“๐‘“ +1๐‘ฅ๐‘ฅ =โˆ’ ๐‘ž๐‘ž ๐‘๐‘ (unsuccessful๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘ st๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ ep), el se +1 = + (successful

step).๐ผ๐ผ ๐ผ๐ผ ๐œŒ๐œŒ๐‘˜๐‘˜ โ‰ค ๐œ‡๐œ‡ ๐‘ก๐‘กโ„Ž๐‘’๐‘’๐‘’๐‘’ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ (v) Update : 1 ๐‘˜๐‘˜ = โˆ† +1 2 ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐œŒ๐œŒ <โ‰ค ๐œ‡๐œ‡ <โŸน โˆ† +1โˆ† =

๐œ‡๐œ‡ ๐œŒ๐œŒ ๐‘˜๐‘˜ ๐œ‚๐œ‚ โŸน+1 =โˆ†๐‘˜๐‘˜ 2 . โˆ†๐‘˜๐‘˜ ๐œŒ๐œŒ๐‘˜๐‘˜ โ‰ฅ ๐œ‚๐œ‚ โŸน โˆ†๐‘˜๐‘˜ โˆ†๐‘˜๐‘˜ The value of indicates how well the model predicts the reduction in the function value.

If is small๐œŒ๐œŒ ๐‘˜๐‘˜(that is, ), then the actual reduction in the function value is much smaller๐œŒ๐œŒ๐‘˜๐‘˜ than that predicted๐œŒ๐œŒ๐‘˜๐‘˜ โ‰ค by๐œ‡๐œ‡ ( ), indicating that the model cannot be trusted for a bound as large as ; in this case๐‘ž๐‘ž๐‘˜๐‘˜ the๐œŒ๐œŒ๐‘˜๐‘˜ step will be rejected and will be reduced. If is large (that is, โˆ†๐‘˜๐‘˜ ), then the model๐œŒ๐œŒ ๐‘˜๐‘˜is adequately predictingโˆ†๐‘˜๐‘˜ the reduction in the๐œŒ๐œŒ๐‘˜๐‘˜ function value, suggesting๐œŒ๐œŒ๐‘˜๐‘˜ โ‰ฅ ๐œ‚๐œ‚ that the model can be trusted over an even wider region; in this case the bound will be increased.

. โˆ†๐‘˜๐‘˜

22

Chapter 2 Derivative Free Optimization Methods

Introduction (What is derivative free optimization) Derivative Free O ptimization ( DFO) methods a re typically de signed t o solve optimization pr oblems whose obj ective function i s c omputed by a โ€œ black boxโ€ ; he nce, the gradient computation is unavailable. Each call to the โ€œblack boxโ€ is often expensive, so e stimating d erivatives b y finite differences may b e p rohibitively costly. F inally, t he objective f unction va lue may be c omputed w ith s ome noi se, a nd t he f inite di fferences estimates may not be accurate [6].

The de rivative f ree opt imization method which w e us e a pproximates t he obj ective function e xplicitly w ithout a pproximating i ts de rivatives. T he t heoretical an alysis presented i n t his project assumes t hat no noise is present. Expensive experiments and intuition s upport t he claim t hat r obustness of D FO doe snโ€™t s uffer f rom pr esence of modern level of noise [1].

Derivative free optimization has been developed for solving optimization problems of the following form

min ( )

( ) Such that๐‘“๐‘“ ๐‘ฅ๐‘ฅ ( ) ( = 1, 2 โ€ฆ , )

๐‘–๐‘– ๐‘–๐‘– ๐‘–๐‘– ๐‘ƒ๐‘ƒ ๐‘Ž๐‘Ž โ‰ค, ๐‘”๐‘” ๐‘ฅ๐‘ฅ โ‰ค ๐‘๐‘ ๐‘–๐‘– ๐‘š๐‘š ๐‘›๐‘› Where the objective function ( ) and the๐‘ฅ๐‘ฅ constraint โˆˆ ๐น๐นโŠ†โ„ s ( ) are expensive to compute at

a given vector and the derivatives๐‘“๐‘“ ๐‘ฅ๐‘ฅ of at are not available.๐‘”๐‘”๐‘–๐‘– ๐‘ฅ๐‘ฅ

The i dea i s t o๐‘ฅ๐‘ฅ a pproximate t he o bjective๐‘“๐‘“ ๐‘ฅ๐‘ฅ f unction by a model w hich i s a ssumed t o describe the obj ective f unction w ell i n a trust r egion w ithout e xplicitly m odeling its derivatives. T his m odel i s c omputationally l ess e xpensive t o e valuate a nd easier t o optimize t han t he obj ective f unction i tself. T he model i s obt ained b y i nterpolating t he objective function using a quadratic interpolation polynomial.

23

Derivative f ree opt imization m ethods f or unc onstrained op timization b uild a linear or quadratic model of the objective function and apply one of the fundamental approaches of c ontinuous opt imization i .e. a trust r egion o r a l ine s earch, t o opt imize t his m odel. While de rivative ba sed methods t ypically us e a T aylor-based m odel w hich i s a n approximation of t he objective f unction, de rivative f ree methods us e i nterpolation, regression or other sample-based models. If the problem has constraints, the strategy of derivative f ree m ethods i s us ually t o a pply s equential qua dratic p rogramming methods for th e li nearization of t he c onstraints [ 6]. T he main c oncern of t his pr oject i s on unconstrained problems without using derivatives.

We now c onsider t wo basic m ethods i n D FO: l ine s earch a nd t rust r egion m ethods without using derivatives.

2.2 line search methods Line search, also called one-dimensional search, refers to an optimization procedure for solving nonlinear Univariant problems. One dimensional line search is the back bone of many a lgorithms f or s olving non l inear p rogramming problems. M any non algorithms proceed as follows. Given a point , find a direction vector

and t hen a suitable s tep si ze , yi elding a ne w poi nt,๐‘ฅ๐‘ฅ๐‘˜๐‘˜ +1 = + ; t he process๐‘๐‘๐‘˜๐‘˜ is then repeated. ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜

Finding t he ne w s tep s ize involves s olving t he s ub pr oblem t o minimize ( +

which is a one dimensional๐›ผ๐›ผ๐‘˜๐‘˜ search problem in the variable . The minimization๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ is ๐›ผ๐›ผ๐‘˜๐‘˜๐‘๐‘๐‘˜๐‘˜over all real , a non negative , or such that + is feasible.๐›ผ๐›ผ ๐‘˜๐‘˜ ๐‘˜๐‘˜ As s tated b efore,๐›ผ๐›ผ in m ultivariable๐›ผ๐›ผ ๐›ผ๐›ผ optimization๐‘ฅ๐‘ฅ a lgorithms,๐›ผ๐›ผ๐‘‘๐‘‘ f or g iven , th e i terative scheme +1 = + . (2.1) ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ The key๐‘–๐‘– ๐‘–๐‘–is๐‘ฅ๐‘ฅ ๐‘˜๐‘˜to find ๐‘ฅ๐‘ฅthe๐‘˜๐‘˜ direction๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ vector and a suitable step size . Let ( ) ( ) = ๐‘๐‘๐‘˜๐‘˜ + . (2.2) ๐›ผ๐›ผ๐‘˜๐‘˜ So, the problem that departs ๐œ™๐œ™from๐›ผ๐›ผ and๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘˜๐‘˜finds ๐›ผ๐›ผ๐‘‘๐‘‘a step๐‘˜๐‘˜ size in the direction such that ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ( ) < (0) ๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ™๐œ™ 24

is just line search about ฮฑ. If we find such that the objective function in the direction is minimized,

i.e., ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ( + ) = min ( + ) >0 ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ or ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐›ผ๐›ผ ๐‘๐‘ ๐›ผ๐›ผ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐›ผ๐›ผ๐‘๐‘ ( ) = min ( ) , >0 ๐‘˜๐‘˜ such a l ine search i s c alled ex act๐œ™๐œ™ l ine๐›ผ๐›ผ sear๐›ผ๐›ผ ch o๐œ™๐œ™ r o๐›ผ๐›ผ ptimal l ine sear ch, a nd is cal led

optimal step size. If we choose such that the objective function has acceptable๐›ผ๐›ผ๐‘˜๐‘˜ descent amount, i .e., s uch that the de scent๐›ผ๐›ผ๐‘˜๐‘˜ ( ) ( + ) > 0is accep table by u sers, such a line search is called inexact line๐‘“๐‘“ ๐‘ฅ๐‘ฅsearch,๐‘˜๐‘˜ โˆ’ ๐‘“๐‘“ or๐‘ฅ๐‘ฅ ๐‘˜๐‘˜approximate๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ line search, or acceptable line search. Since, in practical computation, theoretically exact optimal step size generally cannot be found, and it is also expensive to find almost exact step size, therefore the inexact line search with less computation load is highly popular. The f ramework o f lin e s earch is as f ollows. First, d etermine o r g ive a n in itial s earch interval w hich co ntains t he m inimizer; t hen em ploy so me sect ion t echniques o r interpolations to reduce the interval iteratively until the length of the interval is less than some given tolerance [10]. Next, we give a notation about the search interval and a simple method to determine the initial search interval. Interval of uncertainty Definition 2.1. Let , [0, + ), โˆ— ( ) = min ( ) ๐œ™๐œ™ โˆถ โ„ โ†’ โ„ ๐›ผ๐›ผ โˆˆ โˆž 0๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž โˆ— If there exists a closed interval [ , ๐œ™๐œ™] ๐›ผ๐›ผ [0, + ๐›ผ๐›ผโ‰ฅ) such๐œ™๐œ™ ๐›ผ๐›ผ that [ , ], then โˆ— [ , ] is called a search interval๐‘Ž๐‘Ž for๐‘๐‘ one-โŠ‚ dimensionalโˆž minimization๐›ผ๐›ผ โˆˆ ๐‘Ž๐‘Ž ๐‘๐‘problem min๐‘Ž๐‘Ž ๐‘๐‘ 0 ( ). Since the exact location of the minimum of over [ , ] is not known, this interval๐›ผ๐›ผโ‰ฅ ๐œ™๐œ™is also๐›ผ๐›ผ called the interval of uncertainty. ๐œ™๐œ™ ๐‘Ž๐‘Ž ๐‘๐‘ A simple method to determine an initial interval is called the forward-backward method. The ba sic i dea of t his m ethod i s a s f ollows. Given a n i nitial poi nt a nd a n i nitial s tep length, we attempt to determine three points at which their function values show โ€œhighโ€“ 25

lowโ€“highโ€ ge ometry. I f i t i s not s uccessful t o go f orward, w e w ill go ba ckward.

Concretely, given an initial point, 0, and a step length, 0 > 0. If

๐›ผ๐›ผ( 0 + 0) < ( 0),โ„Ž then, next step, depart from 0 +๐œ™๐œ™ ๐›ผ๐›ผ0 and โ„Žcontinue๐œ™๐œ™ going๐›ผ๐›ผ forward with a larger step until the objective function increases.๐›ผ๐›ผ If โ„Ž ( 0 + 0) > ( 0),

then, next step, depart from 0 and๐œ™๐œ™ go๐›ผ๐›ผ backwardโ„Ž until๐œ™๐œ™ ๐›ผ๐›ผ the objective function increases. So, we will obtain an initial interval๐›ผ๐›ผ which contains the minimum . โˆ— Algorithm 2.1.2 (Forward-Backward Method) ๐›ผ๐›ผ Step 1. Given 0 [0, ), 0 > 0, the multiple coefficient t > 1

(Usually t = 2๐›ผ๐›ผ). Evaluateโˆˆ โˆž (โ„Ž 0), = 0. Step 2. Compare the objective๐œ™๐œ™ ๐›ผ๐›ผ function๐‘˜๐‘˜ โˆถ values. Set +1 = + and evaluate +1 = ( +1). If +1 < , go to Step 3; otherwise, go๐›ผ๐›ผ ๐‘˜๐‘˜to Step๐›ผ๐›ผ 4.๐‘˜๐‘˜ โ„Ž๐‘˜๐‘˜ ๐œ™๐œ™๐‘˜๐‘˜ Step๐œ™๐œ™ ๐›ผ๐›ผ ๐‘˜๐‘˜3. F orward๐œ™๐œ™๐‘˜๐‘˜ s tep.๐œ™๐œ™ Se๐‘˜๐‘˜ t +1 = , = , = +1, = +1, = + 1, go to Step 2. โ„Ž๐‘˜๐‘˜ โˆถ ๐‘ก๐‘กโ„Ž๐‘˜๐‘˜ ๐›ผ๐›ผ โˆถ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ โˆถ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ™๐œ™๐‘˜๐‘˜ โˆถ ๐œ™๐œ™๐‘˜๐‘˜ ๐‘˜๐‘˜ โˆถ ๐‘˜๐‘˜ Step 4. Backward step. If k = 0, invert the search direction. Set = , = + 1, go to Step 2; otherwise, set

โ„Ž๐‘˜๐‘˜ โˆถ โˆ’โ„Ž๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ โˆถ ๐›ผ๐›ผ๐‘˜๐‘˜ = { , +1}, = { , +1}, Output [a, b] and stop. ๐‘Ž๐‘Ž ๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐›ผ๐›ผ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘ ๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐›ผ๐›ผ ๐›ผ๐›ผ๐‘˜๐‘˜ The methods of line search presented in this chapter use the unimodality of the function and interval. T he f ollowing de finitions and t heorem i ntroduce their c oncepts a nd properties. Definition 2.2 Let ,[ , ] . If there [ , ] such that โˆ— ( ) is strictly decreasing๐œ™๐œ™ โˆถ โ„ โ†’ onโ„ [ ๐‘Ž๐‘Ž, ๐‘๐‘ ] andโŠ‚ โ„ strictly increasing๐‘–๐‘–๐‘–๐‘– ๐›ผ๐›ผ โˆˆ on๐‘Ž๐‘Ž [๐‘๐‘ , ], then โˆ— โˆ— ๐œ™๐œ™(๐›ผ๐›ผ) is called a unimodal function๐‘Ž๐‘Ž ๐›ผ๐›ผ on [ , ]. Such an interval [ ๐›ผ๐›ผ, ]๐‘๐‘ is called a unimodal ๐œ™๐œ™interval๐›ผ๐›ผ related to ( ). ๐‘Ž๐‘Ž ๐‘๐‘ ๐‘Ž๐‘Ž ๐‘๐‘ The unimodal function๐œ™๐œ™ ๐›ผ๐›ผ can also be defined as follows. Definition 2.3 If there exists a unique [ , ], such that for any โˆ— 1, 2 [ , ], 1 < 2, th e fo llowing๐›ผ๐›ผ โˆˆs tatements๐‘Ž๐‘Ž ๐‘๐‘ h old: if 2 < , t hen ( 1) > โˆ— ๐›ผ๐›ผ (๐›ผ๐›ผ2);โˆˆ if ๐‘Ž๐‘Ž1 ๐‘๐‘ > ๐›ผ๐›ผ , then๐›ผ๐›ผ ( 1) < ( 2); then ( ) is the unimodal๐›ผ๐›ผ ๐›ผ๐›ผfunction on๐œ™๐œ™ ๐›ผ๐›ผ[ , ]. โˆ— ๐œ™๐œ™ ๐›ผ๐›ผ ๐›ผ๐›ผ ๐›ผ๐›ผ ๐œ™๐œ™ ๐›ผ๐›ผ ๐œ™๐œ™ ๐›ผ๐›ผ 26 ๐œ™๐œ™ ๐›ผ๐›ผ ๐‘Ž๐‘Ž ๐‘๐‘

Note that, first, the unimodal function does not require the continuity and differentiability of t he f unction; s econd, us ing t he property of t he uni modal f unction, w e c an e xclude portions of the i nterval of unc ertainty t hat do n ot c ontain the m inimum, s uch that th e interval of uncertainty is reduced. The following theorem shows that if the function is

unimodal on [ , ], then the interval of uncertainty could be reduced by c omparing๐œ™๐œ™ the function values๐‘Ž๐‘Ž of๐‘๐‘ at two points within the interval. Theorem 2.1 Let ๐œ™๐œ™ be unimodal on [ , ]. Let 1, 2 [ , ], and 1 < 2. Then ๐œ™๐œ™ โˆถ โ„ โ†’ โ„ ๐‘Ž๐‘Ž ๐‘๐‘ ๐›ผ๐›ผ ๐›ผ๐›ผ โˆˆ ๐‘Ž๐‘Ž ๐‘๐‘ ๐›ผ๐›ผ ๐›ผ๐›ผ 1. if ( 1) ( 2), then [ , 2] is a unimodal interval related to ;

2. if ๐œ™๐œ™(๐›ผ๐›ผ1) โ‰ค ๐œ™๐œ™(๐›ผ๐›ผ2), then [๐‘Ž๐‘Ž1๐›ผ๐›ผ, ] is a unimodal interval related to ๐œ™๐œ™. Proof.๐œ™๐œ™ ๐›ผ๐›ผFromโ‰ฅ t๐œ™๐œ™ he ๐›ผ๐›ผD efinition๐›ผ๐›ผ 2.2,๐‘๐‘ there ex ists [ , ] such t๐œ™๐œ™ hat ( ) is s trictly โˆ— decreasing over [ , ] and strictly increasing over๐›ผ๐›ผ โˆˆ[ ๐‘Ž๐‘Ž, ๐‘๐‘]. since ( 1)๐œ™๐œ™ ๐›ผ๐›ผ ( 2), then โˆ— โˆ— [ , 2].Since๐‘Ž๐‘Ž ๐›ผ๐›ผ ( ) is unimodal on[ , ], it is also๐›ผ๐›ผ ๐‘๐‘ unimodal๐œ™๐œ™ on๐›ผ๐›ผ [ ,โ‰ค2]๐œ™๐œ™. Therefore๐›ผ๐›ผ โˆ— ๐›ผ๐›ผ[ , โˆˆ2] is๐‘Ž๐‘Ž a๐›ผ๐›ผ unimodal๐œ™๐œ™ interval๐›ผ๐›ผ related to ( ๐‘Ž๐‘Ž) ๐‘๐‘and the proof of the first part๐‘Ž๐‘Ž ๐›ผ๐›ผis complete. The๐‘Ž๐‘Ž ๐›ผ๐›ผ second part of the theorem can be proved๐œ™๐œ™ ๐›ผ๐›ผ similarly. This t heorem i ndicates t hat, f or reducing t he i nterval of unc ertainty, we must at l east select two observations, evaluate and compare their function values. Theorem 2.2 Let : be a st rictly q uassi-convex ove r t he interval[ , ] . Le t [ ] , , be such๐œ™๐œ™ thatโ„โ†’ โ„< .then; ๐‘Ž๐‘Ž ๐‘๐‘

๐›ผ๐›ผIf ๐œ‡๐œ‡( โˆˆ )๐‘Ž๐‘Ž>๐‘๐‘ ( ), t hen (๐›ผ๐›ผ ) ๐œ‡๐œ‡ ( ), fo r a ll [ , ). I f ( ) ( ), t hen ( ) ( ) [ ) ๐œ™๐œ™ ๐›ผ๐›ผ, for all๐œ™๐œ™ ๐œ‡๐œ‡ , ๐œ™๐œ™. ๐‘ง๐‘ง โ‰ฅ ๐œ™๐œ™ ๐œ‡๐œ‡ ๐‘ง๐‘ง โˆˆ ๐‘Ž๐‘Ž ๐›ผ๐›ผ ๐œ™๐œ™ ๐›ผ๐›ผ โ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡ ๐œ™๐œ™ ๐‘ง๐‘ง โ‰ฅ ๐œ™๐œ™Proof๐›ผ๐›ผ : suppose๐‘ง๐‘ง โˆˆthat๐œ‡๐œ‡ ๐‘๐‘( ) > ( ) , and let [ , ). ( ) ( ) By contradiction, suppose๐œ™๐œ™ ๐›ผ๐›ผ that๐œ™๐œ™ ๐œ‡๐œ‡ < ๐‘ง๐‘ง,โˆˆ since๐‘Ž๐‘Ž ๐›ผ๐›ผ can be written as a convex ( ) combination of and , and by๐œ™๐œ™ the๐‘ง๐‘ง strict๐œ™๐œ™ quasi-convexity๐œ‡๐œ‡ ๐›ผ๐›ผ of , we have < { ( ) ( )} ( ) ( ) ( ) ( ) ( ) , ๐‘ง๐‘ง = ๐œ‡๐œ‡ contradicting > . Hence,๐œ™๐œ™ ๐œ™๐œ™ ๐›ผ๐›ผ . The ๐‘š๐‘šsecond๐‘š๐‘š๐‘š๐‘š ๐œ™๐œ™ part๐‘ง๐‘ง ๐œ™๐œ™of ๐œ‡๐œ‡the Theorem๐œ™๐œ™ ๐œ‡๐œ‡ can be proved๐œ™๐œ™ similarly.๐›ผ๐›ผ ๐œ™๐œ™ ๐œ‡๐œ‡ ๐œ™๐œ™ ๐‘ง๐‘ง โ‰ฅ ๐œ™๐œ™ ๐œ‡๐œ‡

FromTheorem.2.2, unde r strict qua ssi-convexity, i f ( ) > ( ), th e n ew interval o f [ ] ( ) ( ) uncertainty is , , on t he other hand, if ๐œ™๐œ™ ,๐›ผ๐›ผ the new๐œ™๐œ™ ๐œ‡๐œ‡ interval of uncertainty [ ] is , . ๐›ผ๐›ผ ๐‘๐‘ ๐œ™๐œ™ ๐›ผ๐›ผ โ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡ 27 ๐‘Ž๐‘Ž ๐œ‡๐œ‡

The Golden Section Method and the Fibonacci Method The golden section method and the Fibonacci method are section methods. Their basic idea for minimizing a unimodal function over [ , ] is iteratively reducing the interval of uncertainty by comparing the function values of๐‘Ž๐‘Ž ๐‘๐‘the observations. When the length of the interval of uncertainty is reduced to some desired degree, the points on the interval can be r egarded as a pproximations of the m inimizer. S uch a class of m ethods only needs to evaluate the functions and has wide applications; especially it is suitable to nonsmooth problems and those problems with complicated derivative expressions.

2.2.1.1 The golden section method We now de scribe the more e fficient gol den s ection m ethod f or minimizing a s trictly quassi-convex f unction. A t a ge neral i teration k of t he gol den s ection method, l et t he interval of unc ertainty be [ , ] then by t he a bove t heorem the ne w i nterval of [ ] ( ) [ ] uncertainty [ +1, +1] is gi๐‘Ž๐‘Ž๐‘˜๐‘˜ ven๐‘๐‘๐‘˜๐‘˜ by , if > ( ) and by , ( ) if (๐‘Ž๐‘Ž๐‘˜๐‘˜ ). The๐‘๐‘๐‘˜๐‘˜ points and are๐›ผ๐›ผ ๐‘˜๐‘˜selected๐‘๐‘๐‘˜๐‘˜ such๐œ™๐œ™ ๐›ผ๐›ผ that;๐‘˜๐‘˜ ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ โ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ 1) The length of the new interval of uncertainty +1 +1 does not depend up on

the out come of iteration, t hat i s, on w hether๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž(๐‘˜๐‘˜ ) > ( ) or ( ) ๐‘ก๐‘กโ„Ž ( ). Therefore๐‘˜๐‘˜ we must have = . ๐œ™๐œ™Thus,๐›ผ๐›ผ๐‘˜๐‘˜ if ๐œ™๐œ™ is๐œ‡๐œ‡ ๐‘˜๐‘˜of the ๐œ™๐œ™form๐›ผ๐›ผ๐‘˜๐‘˜ โ‰ค ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐œ™๐œ™ ๐œ‡๐œ‡ = + (1 )(๐‘๐‘ โˆ’ ๐›ผ๐›ผ ) ๐œ‡๐œ‡ โˆ’ ๐‘Ž๐‘Ž ๐›ผ๐›ผ ( )

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ (0,1๐›ผ๐›ผ).Then๐‘Ž๐‘Ž mustโˆ’ be ๐œ†๐œ† of๐‘๐‘ theโˆ’ form ๐‘Ž๐‘Ž โˆ—

๐‘˜๐‘˜ ๐‘ค๐‘คโ„Ž๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ ๐œ†๐œ† โˆˆ = ๐œ‡๐œ‡+ ( ) ( )

๐œ‡๐œ‡๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ†๐œ† ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ โˆ—โˆ— So that +1 +1 = ( )

๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ†๐œ† ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ 2) As +1 and +1 are sel ected f or t he p urpose o f a n ew i teration, either +1

coincides๐œ‡๐œ‡๐‘˜๐‘˜ with๐›ผ๐›ผ ๐‘˜๐‘˜ or +1 coincides with . If this can be realized, then during๐›ผ๐›ผ๐‘˜๐‘˜ iteration + 1, ๐œ‡๐œ‡only๐‘˜๐‘˜ one๐œ‡๐œ‡๐‘˜๐‘˜ extra observation ๐›ผ๐›ผis๐‘˜๐‘˜ needed. Consider the following two cases ๐‘˜๐‘˜

28

Case 1 : ( ) > ( ) in t his cas e +1 = +1 = . To s atisfy +1 =

, and applying๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ ( ๐œ™๐œ™) with๐œ‡๐œ‡๐‘˜๐‘˜ replaced by๐‘Ž๐‘Ž ๐‘˜๐‘˜ + 1,๐›ผ๐›ผ we๐‘˜๐‘˜ ๐‘Ž๐‘Ž get๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘๐‘๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ โˆ— ๐‘˜๐‘˜ ๐‘˜๐‘˜ = +1 = +1 + (1 )( +1 +1) = + (1 )( )

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Substituting๐œ‡๐œ‡ ๐›ผ๐›ผ the๐‘Ž๐‘Ž expressionsโˆ’ ๐œ†๐œ†of ๐‘๐‘ andโˆ’ ๐‘Ž๐‘Ž from ๐›ผ๐›ผ( ) โˆ’( ๐œ†๐œ†) into๐‘๐‘ โˆ’ the ๐›ผ๐›ผ above equation 2 we get + 1 = 0. ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ โˆ— ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž โˆ—โˆ—

Case 2:๐œ†๐œ† ( ๐œ†๐œ†) โˆ’ ( )

๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ โ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ In this case +1 = and +1 = to satisfy +1 = and apply ( ) with

replaced by ๐‘Ž๐‘Ž๐‘˜๐‘˜+ 1. we๐‘Ž๐‘Ž ๐‘˜๐‘˜get ๐‘๐‘๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ โˆ—โˆ— ๐‘˜๐‘˜ ๐‘˜๐‘˜ = +1 = +1 + ( +1 +1)

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐›ผ๐›ผ ๐œ‡๐œ‡ = ๐‘Ž๐‘Ž+ ( ๐œ†๐œ† ๐‘๐‘ ). โˆ’ ๐‘Ž๐‘Ž

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Noting (*) and (**), the above equation๐‘Ž๐‘Ž gives๐œ†๐œ† ๐œ‡๐œ‡ โˆ’2 ๐‘Ž๐‘Ž+ 1 = 0.

The r oots of t he e quation 2 + 1 = 0. are๐œ†๐œ† ๐œ†๐œ†0.618 โˆ’ 1.618. since

must be in the interval (0,1)๐œ†๐œ† then๐œ†๐œ† โˆ’ 0.618. ๐œ†๐œ† โ‰… ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐œ†๐œ† โ‰… โˆ’ ๐œ†๐œ†

To s ummarize, if a t iteration k , ๐œ†๐œ† โ‰…and are c hosen a ccording t o ( *) and ( **),

where = 0.618, then the interval๐œ‡๐œ‡ of๐‘˜๐‘˜ uncertainty๐›ผ๐›ผ๐‘˜๐‘˜ is reduced by a factor of 0.618. At the first๐œ†๐œ† iteration, two observations are needed at 1 and 1, but at each subsequent iteration, only one evaluations is needed, since either๐›ผ๐›ผ +1๐œ‡๐œ‡= or +1 = . ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Algorithm (the golden section method) ๐›ผ๐›ผ ๐œ‡๐œ‡ ๐œ‡๐œ‡ ๐›ผ๐›ผ

The following is a summary of the golden section method for minimizing the strictly quassi-convex function over the interval [ , ] [8].

๐‘Ž๐‘Ž ๐‘๐‘ Initialization step: choose an allowable final length of uncertainty > 0.let [ 1, 1] ( ) be the interval of uncertainty, and let 1 = 1 + 1 ( 1 1) andโ„“ ๐‘Ž๐‘Ž ๐‘๐‘ ๐›ผ๐›ผ ๐‘Ž๐‘Ž โˆ’ ๐œ†๐œ† ๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž 1 = 1 + ( 1 1) for = 0.618. E valuate ( 1) ( 1). Le t = 1, a nd go

๐œ‡๐œ‡to the ๐‘Ž๐‘Žmain๐œ†๐œ† step.๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž ๐›ผ๐›ผ ๐œ™๐œ™ ๐›ผ๐›ผ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐œ‡๐œ‡ ๐‘˜๐‘˜ 29

step1: if < , : the optimal solution lies in the interval [ , ]. Otherwise, ( ) ( ) if >๐‘๐‘๐‘˜๐‘˜ โˆ’( ๐›ผ๐›ผ๐‘˜๐‘˜ go to๐‘™๐‘™ ๐‘ ๐‘ ๐‘ ๐‘ step๐‘ ๐‘ ๐‘ ๐‘  2; and if ( ), go to step 3. ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ โ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ Step 2: let +1 = and +1 = . Furthermore, let +1 = , and let +1 = +1 + ( ) +1 ๐‘Ž๐‘Ž๐‘˜๐‘˜+1 . Evaluate๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜( +1)๐‘๐‘ and๐‘˜๐‘˜ go to step 4. ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ†๐œ† ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ Step3: let +1 = and +1 = . Furthermore, let +1 =

๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž +1 = +1 + + (1 )( +1 +1). Evaluate ( +1) and go to step 4.

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Step4๐‘™๐‘™๐‘™๐‘™๐‘™๐‘™๐›ผ๐›ผ : replace๐‘Ž๐‘Ž by ๐›ผ๐›ผ + 1 andโˆ’ ๐œ†๐œ†go to๐‘๐‘ stepโˆ’ 1. ๐‘Ž๐‘Ž ๐œ™๐œ™ ๐›ผ๐›ผ

Example2.1: consider๐‘˜๐‘˜ ๐‘˜๐‘˜ the following problem

2 + 2 3 5

Note that๐‘š๐‘š the๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š ๐‘š๐‘štrue๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š minimum๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ ๐‘ ๐‘ ๐‘ is ๐‘ ๐‘ ๐‘ ๐‘ 1.0๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  ๐‘ก๐‘ก๐‘ก๐‘ก โˆ’ โ‰ค ๐‘ฅ๐‘ฅ โ‰ค

Table2.1; Summary of computationsโˆ’ for the above problem using golden section method

Iteration ( ) ( )

๐‘˜๐‘˜1 3๐‘Ž๐‘Ž.000๐‘˜๐‘˜ 5.๐‘๐‘000๐‘˜๐‘˜ 0.๐›ผ๐›ผ056๐‘˜๐‘˜ 1.๐œ‡๐œ‡944๐‘˜๐‘˜ ๐œ™๐œ™ 0.115๐›ผ๐›ผ๐‘˜๐‘˜ * ๐œ™๐œ™ 7.667๐œ‡๐œ‡๐‘˜๐‘˜ * 2 โˆ’3.000 1.944 1.112 0.056 1.987* 0.115 3 โˆ’3.000 0.056 โˆ’1.832 1.112 โˆ’0.308 0.987 4 โˆ’1.832 0.056 โˆ’1.112 โˆ’0.664 โˆ’ 0.987 โˆ— โˆ’0.887 5 โˆ’1.832 0.664 โˆ’1.384 โˆ’1.112 โˆ’0.853 โˆ’ 0.987 โˆ— 6 โˆ’1.384 โˆ’0.664 โˆ’1.112 โˆ’0.936 โˆ’ 0.987 โˆ— โˆ’0.996* 7 โˆ’1.112 โˆ’0.664 โˆ’0.936 โˆ’0.840 โˆ’0.996 โˆ’0.974* 8 โˆ’1.112 โˆ’0.840 โˆ’1.016 โˆ’0.936 โˆ’1.000* โˆ’0.996 โˆ’ โˆ’ โˆ’ โˆ’ โˆ’ โˆ’

30

Clearly the function to be minimized is strictly quassi-convex, and the initial interval

of uncertainty is of length๐œ™๐œ™ 8. We reduce this interval of uncertainty to one whose length is at most 0.2. The first two observations are located at

1 = 3 + 0.382(8) = 0.056

๐›ผ๐›ผ โˆ’ 1 = 3 + 0.618(8) = 1.944

๐œ‡๐œ‡ โˆ’ Note t hat ( 1) ( 1). H ence t he new i nterval of unc ertainty is[ 3, 1.944]. T he

process is ๐œ™๐œ™repeated,๐›ผ๐›ผ โ‰ค ๐œ™๐œ™and๐œ‡๐œ‡ the computations are summarized in the above table.โˆ’ The values of that are computed at each iteration are indicated by an asterisk.

After๐œ™๐œ™ e ight i terations i nvolving ni ne obs ervations, t he i nterval of unc ertainty i s [ 1.112, 0.936], so that the minimum can be estimated to be the midpoint 1.024.

Theโˆ’ numericalโˆ’ result for example 2.1 is -0.9998 (see appendix for the matlab โˆ’code)

2.2.1.2 The Fibonacci search The F ibonacci se arch is a l ine search p rocedure f or m inimizing st rictly q uassi-convex function over a c losed bounde d i nterval. S imilar t o t he golden s ection m ethod t he

Fibonacci๐œ™๐œ™ s earch p rocedure m akes t wo f unctional ev aluations at the first iteration an d then onl y o ne e valuation a t e ach of t he s ubsequent i terations. H owever, t he p rocedure differs from golden section method in the reduction of interval of uncertainty varies from one iteration to another [8].

The procedure is based on the Fibonacci sequence { } defined as follows;

๐น๐น๐‘ฃ๐‘ฃ +1 = + 1 = 1,2 โ€ฆ (2.3)

๐น๐น๐‘ฃ๐‘ฃ ๐น๐น๐‘ฃ๐‘ฃ ๐น๐น๐‘ฃ๐‘ฃโˆ’ ๐‘ฃ๐‘ฃ 0 = 1 = 1

The seq uence i s therefore๐น๐น 1,1,2,5,8,๐น๐น 13,21,34,55,89,144,233, โ€ฆ. at t he ite ration

suppose t hat t he i nterval of unc ertainty i s[ , ]. C onsider t he t wo poi nts and ๐‘˜๐‘˜ given below, where n is the total number of functional๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ evaluations planned. ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜

31

1 = + ( ), = 1,2, โ€ฆ , 1 (2.4) +1 ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘›๐‘› โˆ’ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ = + ( ), = 1,2, โ€ฆ , 1 (2.5) +1 ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘›๐‘› โˆ’ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ By theorem(2.2), the new interval of uncertainty [ +1, +1] is given by

๐‘˜๐‘˜ ๐‘˜๐‘˜ [ , ] ( ) > ( ) and is given by[ , ]๐‘Ž๐‘Ž (๐‘๐‘ ) ( ).

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ In๐›ผ๐›ผ the๐‘๐‘ former๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– ๐›ผ๐›ผcase, nothing๐œ™๐œ™ ๐œ‡๐œ‡ (2.4) and letting ๐‘Ž๐‘Ž =๐œ‡๐œ‡ ๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– in๐›ผ๐›ผ(2.3)โ‰ค ๐œ™๐œ™, we๐œ‡๐œ‡ get

๐‘ฃ๐‘ฃ ๐‘›๐‘› โˆ’๐‘˜๐‘˜ +1 +1 =

๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž 1 ๐‘๐‘ โˆ’ ๐›ผ๐›ผ = ( ) (2.6) +1 ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ โˆ’ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ = ( ) +1 ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ In the later case, nothing (2.5) we get +1 +1 = = ( ๐น๐น๐‘›๐‘›+1 โˆ’๐‘˜๐‘˜ (2.7) ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐น๐น๐‘›๐‘› โˆ’๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’

๐‘Ž๐‘Ž๐‘˜๐‘˜ Thus, in either case the interval of uncertainty is reduced by the factor .We now ๐น๐น๐‘›๐‘›+1 โˆ’๐‘˜๐‘˜ ๐‘›๐‘› โˆ’๐‘˜๐‘˜ show th at a t ite ration k+1, e ither +1 = +1 = , s o t hat onl y ๐น๐นo ne f unctional ( ) evaluation is ne eded. S uppose t๐›ผ๐›ผ๐‘˜๐‘˜ hat ๐œ‡๐œ‡๐‘˜๐‘˜ ๐‘œ๐‘œ๐‘œ๐‘œ>๐œ‡๐œ‡๐‘˜๐‘˜ ( ๐›ผ๐›ผthen๐‘˜๐‘˜ by t heorem(2.2), +1 = , +1 = . ๐œ™๐œ™ ๐œ†๐œ†๐‘˜๐‘˜ ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Thus,๐›ผ๐›ผ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž applying๐‘Ž๐‘Ž (2.4)๐‘๐‘ with k replaced by k+1, we get

2 +1 = +1 + ( +1 +1) ๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐น๐น ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐›ผ๐›ผ ๐‘Ž๐‘Ž ๐‘›๐‘›โˆ’๐‘˜๐‘˜ ๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž ๐น๐น 2 = + ( ) ๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐‘˜๐‘˜ ๐น๐น ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐›ผ๐›ผ ๐‘›๐‘›โˆ’๐‘˜๐‘˜ ๐‘๐‘ โˆ’ ๐œ†๐œ† Substituting for from (2.4) we get ๐น๐น

๐›ผ๐›ผ๐‘˜๐‘˜ 32

1 2 1 +1 = + ( ) + 1 ( ) +1 +1 ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๏ฟฝ โˆ’ ๏ฟฝ ๏ฟฝ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๏ฟฝ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ Letting = in (2.3), it follows that 1 1 = . ๐น๐น๐‘›๐‘›+1 โˆ’๐‘˜๐‘˜โˆ’ ๐น๐น๐‘›๐‘›+1 โˆ’๐‘˜๐‘˜ ๐‘ฃ๐‘ฃ ๐‘›๐‘› โˆ’๐‘˜๐‘˜ โˆ’ ๐น๐น๐‘›๐‘› โˆ’๐‘˜๐‘˜ ๐น๐น๐‘›๐‘› โˆ’๐‘˜๐‘˜ 1 + 2 +1 = + ( )( ) +1 ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜โˆ’ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐น๐น๐‘›๐‘›โˆ’๐‘˜๐‘˜ Now l et = 1 in (2.3), and not hing (2.5), it f ollows th at +1 = +

( ๐‘ฃ๐‘ฃ ๐‘›๐‘›) = .Similarly, โˆ’๐‘˜๐‘˜ if ( ) ( ), and then we can easily ๐›ผ๐›ผverify๐‘˜๐‘˜ that๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐น๐น๐‘›๐‘›+1 โˆ’๐‘˜๐‘˜ ๐น๐น๐‘›๐‘› โˆ’๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ โˆ’ ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ โ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜ +1 = . Thus, in either case, only one observation is needed at iteration + 1.

๐‘˜๐‘˜ ๐‘˜๐‘˜ To๐œ‡๐œ‡ summarize,๐›ผ๐›ผ at the first iteration, two observations are made, and at each,๐‘˜๐‘˜ subsequent iteration, only one observation is necessary. Thus, at the end of iteration 2, we have ( ) computed 1 functional evaluations. Further, = 1 it follows ๐‘›๐‘›from โˆ’ 2.4 and ( ) 2.5 that ๐‘›๐‘› โˆ’ ๐‘“๐‘“๐‘“๐‘“๐‘“๐‘“ ๐‘˜๐‘˜ ๐‘›๐‘› โˆ’

1 = = ( + ). Since either = or = , 1 1 2 1 1 1 2 1 2 ๐›ผ๐›ผtheoretically๐‘›๐‘›โˆ’ ๐œ‡๐œ‡๐‘›๐‘›โˆ’ no new๐‘Ž๐‘Ž ๐‘›๐‘›โˆ’observations๐‘๐‘๐‘›๐‘›โˆ’ are to be made๐›ผ๐›ผ๐‘›๐‘›โˆ’ at this๐œ‡๐œ‡ stage.๐‘›๐‘›โˆ’ However๐œ‡๐œ‡๐‘›๐‘›โˆ’ ๐›ผ๐›ผ in๐‘›๐‘›โˆ’ order to further reduce the interval of uncertainty; the last observation is placed slightly to the 1 right or to the left of the middle point = so that ( ) is the length 1 1 2 1 1 of the final uncertainty[ , ]. ๐›ผ๐›ผ๐‘›๐‘›โˆ’ ๐œ‡๐œ‡๐‘›๐‘›โˆ’ ๐‘๐‘๐‘›๐‘›โˆ’ โˆ’ ๐‘Ž๐‘Ž๐‘›๐‘›โˆ’

๐‘›๐‘› ๐‘›๐‘› Algorithm (Fibonacci search๐‘Ž๐‘Ž ๐‘๐‘ method)

The f ollowing i s a s ummary o f t he F ibonacci search f or minimizing a qua ssi-convex

function over the interval[ 1, 1].

Initialization s tep: ch ose๐‘Ž๐‘Ž an๐‘๐‘ al lowable f inal l ength o f u n c ertainty > 0 and a

distinguishability c onstant > 0. let [ 1, 1] be t he in itial interval o f uโ„“ ncertainty, a nd ( 1 1) choose the number of observations๐œ€๐œ€ n to ๐‘Ž๐‘Žto be๐‘๐‘ taken such that > . Let 1 = 1 + ๐‘๐‘ โˆ’๐‘Ž๐‘Ž 2 ๐‘›๐‘› โ„“ ( 1 1) and ๐น๐น ๐›ผ๐›ผ ๐‘Ž๐‘Ž ๐น๐น๐‘›๐‘› โˆ’ ๐น๐น๐‘›๐‘› ๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž 33

1 1 = 1 + ( 1 1). Evaluate ( 1) and ( 1), let k=1, and go to the main step ๐น๐น๐‘›๐‘› โˆ’ ๐œ‡๐œ‡ ๐‘Ž๐‘Ž ๐น๐น๐‘›๐‘› ๐‘๐‘ โˆ’ ๐‘Ž๐‘Ž ๐œ™๐œ™ ๐œ†๐œ† ๐œ™๐œ™ ๐œ‡๐œ‡ Main steps

1) If ( ) > ( ), go to step2: and if ( ) ( ), go to step 3

2) Let๐œ™๐œ™ ๐›ผ๐›ผ+1๐‘˜๐‘˜ = ๐œ™๐œ™ ๐œ‡๐œ‡and๐‘˜๐‘˜ +1 = . Furthermore,๐œ™๐œ™ ๐œ†๐œ†๐‘˜๐‘˜ letโ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡+1๐‘˜๐‘˜ = , and let ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ 1 ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ +1๐‘Ž๐‘Ž= +1๐›ผ๐›ผ+ ๐‘๐‘ ( ๐‘๐‘+1 +1). ๐›ผ๐›ผ ๐œ‡๐œ‡ ๐น๐น๐‘›๐‘› โˆ’๐‘˜๐‘˜โˆ’ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘›๐‘› โˆ’๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐œ‡๐œ‡If = ๐‘Ž๐‘Ž 2, go๏ฟฝ to๐น๐น step5;๏ฟฝ ๐‘๐‘otherwiseโˆ’ ๐‘Ž๐‘Ž evaluate ( +1) and go to step 4.

3) Let๐‘˜๐‘˜ +1๐‘›๐‘›= โˆ’ and +1 = . Furthermore, let๐œ™๐œ™ ๐œ‡๐œ‡๐‘˜๐‘˜+1 = , and let ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ 2 ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ +1๐‘Ž๐‘Ž = +1๐‘Ž๐‘Ž + ๐‘๐‘ ( ๐œ‡๐œ‡ +1 +1). I f =๐œ‡๐œ‡ 2: ๐›ผ๐›ผgo t o s tep 5: ot herwise ๐น๐น๐‘›๐‘› โˆ’๐‘˜๐‘˜โˆ’ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘›๐‘› โˆ’๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ evalua๐›ผ๐›ผ te ๐‘Ž๐‘Ž ( +1)๏ฟฝ and๐น๐น go๏ฟฝ to๐‘๐‘ stepโˆ’ 4 ๐‘Ž๐‘Ž ๐‘˜๐‘˜ ๐‘›๐‘› โˆ’

4) Replace ๐œ™๐œ™ by๐›ผ๐›ผ ๐‘˜๐‘˜ + 1 and go to step 1 ( ) ( ) 5) Let =๐‘˜๐‘˜ 1๐‘˜๐‘˜, a nd = 1 + . If > , le t = and = ( ) ( ) ๐›ผ๐›ผ1๐‘›๐‘›. O therwise,๐›ผ๐›ผ๐‘›๐‘›โˆ’ if ๐œ‡๐œ‡ ๐‘›๐‘› ๐›ผ๐›ผ๐‘›๐‘›โˆ’ ๐œ€๐œ€ , le๐œ™๐œ™ t๐›ผ๐›ผ ๐‘›๐‘› =๐œ™๐œ™ ๐œ‡๐œ‡๐‘›๐‘›1 and ๐‘Ž๐‘Ž๐‘›๐‘›= ๐›ผ๐›ผ.๐‘›๐‘› S top;๐‘๐‘ t๐‘›๐‘› he [ ] ๐‘๐‘optimal๐‘›๐‘›โˆ’ solution lies in๐œ™๐œ™ the๐›ผ๐›ผ๐‘›๐‘› intervalโ‰ค ๐œ™๐œ™ ๐œ‡๐œ‡ ๐‘›๐‘› , ๐‘Ž๐‘Ž๐‘›๐‘› ๐‘Ž๐‘Ž๐‘›๐‘›โˆ’ ๐‘๐‘๐‘›๐‘› ๐›ผ๐›ผ๐‘›๐‘› ๐‘›๐‘› ๐‘›๐‘› Example2.2: consider the following problem๐‘Ž๐‘Ž ๐‘๐‘

2 + 2 3 5

Note that the function ๐‘š๐‘šis ๐‘š๐‘š๐‘š๐‘š๐‘š๐‘šstrictly๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š๐‘š quassi๐‘ฅ๐‘ฅ -convex๐‘ฅ๐‘ฅ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  ๐‘ ๐‘ on๐‘ ๐‘ ๐‘ ๐‘  the๐‘ก๐‘ก๐‘ก๐‘ก โˆ’intervalโ‰ค ๐‘ฅ๐‘ฅ and โ‰ค that the true minimum occurs a t = 1. We r educe t he i nterval of unc ertainty t o one whose l ength i s a t 8 most 0.2. H ence w e m ust h ave > = 40. S o t hat = 9.we a dopt t he ๐‘ฅ๐‘ฅ โˆ’ 0.2 distinguishabilty constant = 0.01 ๐น๐น๐‘›๐‘› ๐‘›๐‘›

The first two observations ๐œ€๐œ€are located at

7 8 1 = 3 + (8) = 0.054545, 1 = 3 + (8) = 1.9454554 . ๐น๐น9 ๐น๐น9 ๐›ผ๐›ผ โˆ’ ๐น๐น ๐œ‡๐œ‡ โˆ’ ๐น๐น Note that ( 1) < ( 1). Hence, the interval of uncertainty is [ 3.000000,1.945454]

Table2.2; ๐œ™๐œ™Summary๐›ผ๐›ผ ๐œ™๐œ™ of๐œ‡๐œ‡ computations for the Fibonacci search methodโˆ’

34

Iteration ( ) ( 1)

k ๐‘Ž๐‘Ž๐‘˜๐‘˜ ๐‘๐‘๐‘˜๐‘˜ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ™๐œ™ ๐›ผ๐›ผ๐‘˜๐‘˜ ๐œ™๐œ™ ๐œ‡๐œ‡ 1 3.000000 5.000000 0.054545 1.945454 0.1120* 7.67569

2 โˆ’3.000000 1.945454 1.109091 0.05454 0.9886* 0.112065โˆ— 3 โˆ’3.000000 0.054545 โˆ’1.836363 1.109091 0.30049* 0.988099 4 โˆ’1.836363 0.054545 โˆ’1.109091 โˆ’ 0.67727 0.98809 โˆ’0.89289* 0.8400* 5 โˆ’1.836363 0.672727 โˆ’1.399999 โˆ’1.109091 โˆ’ 0.988099 0.9889 6 โˆ’1.399999 โˆ’0.672727 โˆ’1.109091 โˆ’0.963636 โˆ’0.99867* 0.9986 7 โˆ’1.109091 โˆ’0.672727 โˆ’0.963636 โˆ’0.818182 โˆ’0.96694* 0.9986 8 โˆ’1.109091 โˆ’0.818182 โˆ’0.963636 โˆ’0.963636 โˆ’ 0.99867 0.9986* 9 โˆ’1.109091 โˆ’0.963636 โˆ’0.963636 โˆ’0.953636 โˆ’0.99785* โˆ’ โˆ’ โˆ’ โˆ’ โˆ’

The process is repeated, and the computations are summarized in the above table. The values of that ar e computed at each i teration are i ndicated b y an ast erisk. N ote t hat

at = 8, ๐œ™๐œ™ = = 1, s o t hat no f unctional e valuations a re ne eded a t t his stage. ( ) For๐‘˜๐‘˜ = 9๐›ผ๐›ผ, ๐‘˜๐‘˜ =๐œ‡๐œ‡๐‘˜๐‘˜ ๐›ผ๐›ผ1 ๐‘˜๐‘˜โˆ’= 0.963636 and = + = 0.953636. S ince, > ( ๐‘˜๐‘˜), th e f๐›ผ๐›ผ inal๐‘˜๐‘˜ ๐›ผ๐›ผ interval๐‘˜๐‘˜โˆ’ โˆ’o f u ncertainty [๐œ‡๐œ‡9๐‘˜๐‘˜, 9]๐›ผ๐›ผ ๐‘˜๐‘˜is [๐œ€๐œ€ 1.109091โˆ’ , 0.963636],๐œ™๐œ™ whose๐œ‡๐œ‡๐‘˜๐‘˜ ๐œ™๐œ™length๐›ผ๐›ผ๐‘˜๐‘˜ is 0.145455. We a pproximate t he๐‘Ž๐‘Ž m๐‘๐‘ inimumโˆ’ t o be t he mโˆ’ idpoint 1.036364. Note f romโ„“ example 2.1 that w ith t he same nu mber of obs ervations = 9โˆ’, t he gol den section method gave a final interval of uncertainty whose length is 0.176๐‘›๐‘› .

The Fibonacci method has the following limitations [9] 1. The initial interval of uncertainty, in which the optimum lies, has to be known. 2. The function being optimized has to be unimodal in the initial interval of uncertainty. 3. The exact optimum cannot be located in this method. Only an interval known as the final interval of uncertainty will be known. The final interval of uncertainty can be made as small as desired by using more computations. 4. The number of function evaluations to be used in the search or the resolution required has to be specified beforehand.

35

2.2.3Interpolation Methods The interpolation methods were originally developed as one-dimensional searches within multivariable optimization techniques, and are generally more efficient than Fibonacci-type approaches. The aim of all the one-dimensional minimization methods is to find , the smallest nonnegative value of , for which the function โˆ— ๐œ†๐œ† ( ) = ( + ) ๐œ†๐œ† (2.8) attains a local๐‘“๐‘“ minimum.๐œ†๐œ† ๐‘“๐‘“ ๐‘‹๐‘‹ Hence๐œ†๐œ†๐œ†๐œ† if the original function (X) is expressible as an explicit function of ( = 1, 2, . . . , ), we can readily write the๐‘“๐‘“ expression for ( ) = ๐‘ฅ๐‘ฅ(X๐‘–๐‘– +๐‘–๐‘– S) for any ๐‘›๐‘›specified vector S, set ๐‘“๐‘“ ๐œ†๐œ† ๐‘“๐‘“ ๐œ†๐œ† ( ) = 0 (2.9) ๐‘‘๐‘‘๐‘‘๐‘‘ ๐œ†๐œ† ๐‘‘๐‘‘๐‘‘๐‘‘ and solve Eq. (2.9) to find in terms of and . However, in many practical problems, โˆ— the f unction ( ) cannot b๐œ†๐œ† e ex pressed ๐‘‹๐‘‹ex plicitly๐‘†๐‘† i n t erms o f .In such cases t he interpolation methods๐‘“๐‘“ ๐œ†๐œ† can be used to find the value of . ๐œ†๐œ† โˆ— Example 2.3 Derive the one-dimensional minimization๐œ†๐œ† problem for the following case: 2 2 2 Minimize ( ) = 1 โ€“ 2 + 1 โ€“ 1 ( 1) 2 1.00 from the starting๐‘“๐‘“ ๐‘‹๐‘‹ point๏ฟฝ๐‘ฅ๐‘ฅ =๐‘ฅ๐‘ฅ ๏ฟฝ ๏ฟฝalong๐‘ฅ๐‘ฅ the๏ฟฝ search direction๐ธ๐ธ = . 1 2 0.25 โˆ’ SOLUTION the new design๐‘‹๐‘‹ point๏ฟฝ ๏ฟฝ X can be expressed as ๐‘†๐‘† ๏ฟฝ ๏ฟฝ โˆ’ 1 2 + = = 1 + S = 2 2 + 0.25 ๐‘ฅ๐‘ฅ โˆ’ ๐œ†๐œ† By s ubstituting = ๐‘‹๐‘‹2 + ๏ฟฝ ๏ฟฝ ๐‘‹๐‘‹ = ๐œ†๐œ† 2 +๏ฟฝ 0.25 in E q.๏ฟฝ ( 1), w e obt ain as a 1 ๐‘ฅ๐‘ฅ 2 โˆ’ ๐œ†๐œ† function of as ๐‘ฅ๐‘ฅ โˆ’ ๐œ†๐œ† ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ โˆ’ ๐œ†๐œ† ๐ธ๐ธ ๐‘“๐‘“ 2 + ( ) = ๐œ†๐œ† = [( 2 + )2 ( 2 + 0.25 )]2 + [1 ( 2 + )]2 2 + 0.25 โˆ’ ๐œ†๐œ† ๐‘“๐‘“ ๐œ†๐œ† ๐‘“๐‘“ ๏ฟฝ = 4 8.5๏ฟฝ 3 โˆ’+ 31.0625ฮป โˆ’2 โˆ’ 57.0 +ฮป 45.0 โˆ’ โˆ’ ฮป โˆ’ ๐œ†๐œ† ฮป โˆ’ ฮป ฮป โˆ’ ฮป The value of at which ( ) attains a minimum gives . โˆ— In th e f ollowing๐œ†๐œ† s ections,๐‘“๐‘“ w๐œ†๐œ† e d iscuss d ifferent in terpolation๐œ†๐œ† methods with re ference to one-dimensional m inimization p roblems th at a rise d uring multivariable o ptimization problems. 36

2.2.3.1Quadratic interpolation method The qua dratic i nterpolation method uses t he function values onl y; he nce i t i s useful t o find t he minimizing s tep ( ) of functions (X) for which the partial derivatives with โˆ— respect to the variables are๐œ†๐œ† not available or๐‘“๐‘“ difficult to compute. This method finds the minimizing๐‘ฅ๐‘ฅ๐‘–๐‘– step length in three stages. In the first stage the โˆ— S-vector is normalized so that a step length of ๐œ†๐œ† = 1 is acceptable. In the second stage the function ( ) is approximated by a quadratic๐œ†๐œ† function ( ) and the minimum, , of โˆ— ( ) is found.๐‘“๐‘“ If๐œ†๐œ† is not sufficiently close to the true minimumโ„Ž ๐œ†๐œ† a third stage is used.๐œ†๐œ†ฬƒ In โˆ— โˆ— 2 โ„Žthis๐œ†๐œ† stage a new quadratic๐œ†๐œ†ฬƒ function (refit) ( ) = + +๐œ†๐œ† is used to approximate ( ), and a new value of isโ„Žโ€ฒ found.๐œ†๐œ† This๐‘Ž๐‘Žโ€ฒ procedure๐‘๐‘โ€ฒ๐œ†๐œ† ๐‘๐‘ โ€ฒis๐œ†๐œ† continued until a โˆ— that is sufficiently๐‘“๐‘“ ๐œ†๐œ† close to is found.๐œ†๐œ†ฬƒ โˆ— โˆ— ๐œ†๐œ†Stageฬƒ 1. In this stage, the vector๐œ†๐œ† is normalized as follows: Find | | ๐‘บ๐‘บ = max , ๐‘–๐‘– where is t he component of ฮ”and di๐‘–๐‘– vide๐‘ ๐‘  e ach c omponent of S by . A nother ๐‘ก๐‘กโ„Ž 1 ๐‘–๐‘– 2 2 2 method๐‘ ๐‘  o f nor malization๐‘–๐‘– i s t o f ind๐‘บ๐‘บ = ( 1 + 2 + + )2 and diฮ” vide e ach component of S by . ฮ” ๐‘ ๐‘  ๐‘ ๐‘  โ‹ฏ ๐‘ ๐‘ ๐‘›๐‘› ( ) 2 Stage 2. Let =ฮ” + + (2.10) be the quadraticโ„Ž ๐œ†๐œ† function๐‘Ž๐‘Ž used๐‘๐‘๐‘๐‘ for๐‘๐‘ ๐œ†๐œ†approximating the function ( ). It is worth noting at this point that a quadratic is the lowest-order polynomial for which๐‘“๐‘“ ๐œ†๐œ† a finite minimum can exist. The necessary condition for the minimum of ( ) is that

= + 2 =โ„Ž ๐œ†๐œ† 0. ๐‘‘๐‘‘โ„Ž that is, ๐‘๐‘ ๐‘๐‘๐‘๐‘ ๐‘‘๐‘‘๐‘‘๐‘‘ = (2.11) 2 โˆ— ๐‘๐‘ The sufficiency condition for๐œ†๐œ† ฬƒthe minimumโˆ’ of ( ) is that ๐‘๐‘ 2 2 > 0 โ„Ž ๐œ†๐œ† ๐‘‘๐‘‘ ๐‘“๐‘“ โˆ— that๐‘‘๐‘‘๐œ†๐œ† is,๏ฟฝ๐œ†๐œ†๏ฟฝ > 0 (2.12)

To evaluate the constants๐‘๐‘ , , in Eq. (2.10), we need to evaluate the function 37 ๐‘Ž๐‘Ž ๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘๐‘

( ) at three points. Let = , = , = be the points at which the function

๐‘“๐‘“ (๐œ†๐œ†) is evaluated and let ๐œ†๐œ† , ๐ด๐ด, ๐œ†๐œ† ๐ต๐ต be๐‘Ž๐‘Ž the๐‘Ž๐‘Ž๐‘Ž๐‘Ž corres๐œ†๐œ† ๐ถ๐ถponding function values, that is, 2 ๐‘“๐‘“ ๐œ†๐œ† ๐‘“๐‘“๐ด๐ด ๐‘“๐‘“๐ต๐ต ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž =๐‘“๐‘“๐ถ๐ถ + + 2 ๐‘“๐‘“๐ด๐ด = ๐‘Ž๐‘Ž + ๐‘๐‘๐‘๐‘ + ๐‘๐‘๐ด๐ด 2 ๐‘“๐‘“ ๐ต๐ต =๐‘Ž๐‘Ž +๐‘๐‘๐‘๐‘ +๐‘๐‘๐ต๐ต (2.13) The solution of Eqs. (2.13) gives ๐‘“๐‘“๐ถ๐ถ ๐‘Ž๐‘Ž ๐‘๐‘๐‘๐‘ ๐‘๐‘๐ถ๐ถ ( ) + ( ) + ( ) = (2.14) ( )( )( ) ๏ฟฝ๐‘“๐‘“๐ด๐ด๐ต๐ต๐ต๐ต ๐ถ๐ถ โˆ’ ๐ต๐ต ๐‘“๐‘“๐ต๐ต๐ถ๐ถ๐ถ๐ถ ๐ด๐ด โˆ’ ๐ถ๐ถ ๐‘“๐‘“๐ถ๐ถ๐ด๐ด๐ด๐ด ๐ต๐ต โˆ’ ๐ด๐ด ๏ฟฝ ๐‘Ž๐‘Ž ( 2 2) + ( 2 2) + ( 2 2) = ๐ด๐ด โˆ’ ๐ต๐ต ๐ต๐ต โˆ’ ๐ถ๐ถ ๐ถ๐ถ โˆ’ ๐ด๐ด (2.15) ( )( )( ) ๏ฟฝ๐‘“๐‘“๐‘“๐‘“ ๐ต๐ต โˆ’ ๐ถ๐ถ ๐‘“๐‘“๐‘“๐‘“ ๐ถ๐ถ โˆ’ ๐ด๐ด ๐‘“๐‘“๐‘“๐‘“ ๐ด๐ด โˆ’ ๐ต๐ต ๏ฟฝ ๐‘๐‘ ( ) + ( ) + ( ) = ๐ด๐ด โˆ’ ๐ต๐ต ๐ต๐ต โˆ’ ๐ถ๐ถ ๐ถ๐ถ โˆ’ ๐ด๐ด (2.16) ( )( )( ) ๐‘“๐‘“๐ด๐ด ๐ต๐ต โˆ’ ๐ถ๐ถ ๐‘“๐‘“๐ต๐ต ๐ถ๐ถ โˆ’ ๐ด๐ด ๐‘“๐‘“๐ถ๐ถ ๐ด๐ด โˆ’ ๐ต๐ต ๐‘๐‘ โˆ’ From Eqs. (2.11), (2.15),๐ด๐ด โˆ’ ๐ต๐ต (2.๐ต๐ต16โˆ’), the๐ถ๐ถ minimum๐ถ๐ถ โˆ’ ๐ด๐ด of ( ) can be obtained as ( 2 2) + ( 2 2) + ( 2 2) = = ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž โ„Ž ๐œ†๐œ† (2.17) 2 2[ ( ) + ( ) + ( )] โˆ— ๐‘๐‘ ๐‘“๐‘“๐ด๐ด ๐ต๐ต โˆ’ ๐ถ๐ถ ๐‘“๐‘“๐ต๐ต ๐ถ๐ถ โˆ’ ๐ด๐ด ๐ด๐ด โˆ’ ๐ต๐ต ๐œ†๐œ†ฬƒ โˆ’ Provided that ๐‘๐‘, as given๐‘“๐‘“๐ด๐ด ๐ต๐ต byโˆ’ Eq.๐ถ๐ถ (2.16๐‘“๐‘“๐ต๐ต),๐ถ๐ถ is โˆ’positive.๐ด๐ด ๐‘“๐‘“๐ถ๐ถ ๐ด๐ด โˆ’ ๐ต๐ต To s tart w ith,๐‘๐‘ f or s implicity, th e p oints , , a nd can b e ch osen as 0, , a nd 2 , respectively, where is a preselected trial step๐ด๐ด ๐ต๐ต length.๐ถ๐ถ By this procedure, we can๐‘ก๐‘ก save one๐‘ก๐‘ก function e valuation๐‘ก๐‘ก s ince = ( = 0) is ge nerally kno wn f rom t he pr evious iteration (of a multivariable search).๐‘“๐‘“๐ด๐ด ๐‘“๐‘“ For๐œ†๐œ† this case, Eqs. (2.14) to (2.17) reduce to = (2.18)

4๐ด๐ด 3 ๐‘Ž๐‘Ž = ๐‘“๐‘“ (2.19) 2 ๐‘“๐‘“๐ต๐ต โˆ’ ๐‘“๐‘“๐ด๐ด โˆ’ ๐‘“๐‘“๐ถ๐ถ ๐‘๐‘ + 2 = ๐‘ก๐‘ก (2.20) 2 2 ๐‘“๐‘“๐ถ๐ถ ๐‘“๐‘“๐ด๐ด โˆ’ ๐‘“๐‘“๐ต๐ต ๐‘๐‘ (4 3 ) = ๐‘ก๐‘ก (2.21) 4 2 2 โˆ— ๐‘“๐‘“๐ต๐ต โˆ’ ๐‘“๐‘“๐ด๐ด โˆ’ ๐‘“๐‘“๐ถ๐ถ ๐œ†๐œ†ฬƒ ๐‘ก๐‘ก Provided that ๐‘“๐‘“๐ต๐ต โˆ’ ๐‘“๐‘“๐ถ๐ถ โˆ’ ๐‘“๐‘“๐ด๐ด + 2 = > 0 (2.22) 2 2 ๐‘“๐‘“๐ถ๐ถ ๐‘“๐‘“๐ด๐ด โˆ’ ๐‘“๐‘“๐ต๐ต The inequality (2.22) can๐‘๐‘ be satisfied if ๐‘ก๐‘ก

38

+ > (2.23) 2 ๐‘“๐‘“๐ด๐ด ๐‘“๐‘“๐ถ๐ถ (i.e., the function value should be smaller๐‘“๐‘“๐ต๐ต than the average value of and ).

This can be satisfied if ๐‘“๐‘“๐ต๐ต lies below the line joining and . ๐‘“๐‘“๐ด๐ด ๐‘“๐‘“๐ถ๐ถ The following procedure๐‘“๐‘“๐ต๐ต can be used not only to satisfy๐‘“๐‘“๐ด๐ด the๐‘“๐‘“ ๐ถ๐ถinequality (2.23) but also to ensure that the minimum lies in the interval0 < < 2 . โˆ— โˆ— . Assuming that = ๐œ†๐œ†ฬƒ ( = 0) and the initial๐œ†๐œ† ฬƒstep size๐‘ก๐‘ก 0 are known, evaluate the ๐Ÿ๐Ÿfunction at = ๐‘“๐‘“๐ด๐ด0 and๐‘“๐‘“ obtain๐œ†๐œ† 1 = ( = 0). ๐‘ก๐‘ก . If 1 >๐‘“๐‘“ ๐œ†๐œ†,set ๐‘ก๐‘ก = 1 and evaluate๐‘“๐‘“ ๐‘“๐‘“the๐œ†๐œ† function๐‘ก๐‘ก at = 0 ๐ด๐ด using๐ถ๐ถ Eq. (2.21) with = 0 . ๐Ÿ๐Ÿ ๐‘“๐‘“2 ๐‘“๐‘“ ๐‘“๐‘“ ๐‘“๐‘“ 2 ๐‘“๐‘“ ๐‘ก๐‘ก โˆ— ๐‘ก๐‘ก ๐œ†๐œ† ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐œ†๐œ†ฬƒ ๐‘ก๐‘ก ๏ฟฝ . If 1 , set = 1, and evaluate the function

at๐Ÿ‘๐Ÿ‘ ๐‘“๐‘“ = โ‰ค2 0๐‘“๐‘“ ๐ด๐ดto find๐‘“๐‘“๐ต๐ต 2 =๐‘“๐‘“ ( = 2 0). ๐‘“๐‘“ . ๐œ†๐œ†If 2 turns๐‘ก๐‘ก out to ๐‘“๐‘“be greater๐‘“๐‘“ ๐œ†๐œ† than 1๐‘ก๐‘ก, set = 2and compute ๐Ÿ’๐Ÿ’ according๐‘“๐‘“ to Eq. (2.21) with =๐‘“๐‘“ 0. ๐‘“๐‘“๐ถ๐ถ ๐‘“๐‘“ โˆ— ๐œ†๐œ†ฬƒ. If 2 turns out to be smaller ๐‘ก๐‘กthan ๐‘ก๐‘ก1, set new 1 = 2 and 0 = 2 0 , and repeat steps ๐Ÿ“๐Ÿ“2 4๐‘“๐‘“ until we are able to find . ๐‘“๐‘“ ๐‘“๐‘“ ๐‘“๐‘“ ๐‘ก๐‘ก ๐‘ก๐‘ก โˆ— Stage๐‘ก๐‘ก๐‘ก๐‘ก 3. The found in stage 2๐œ†๐œ†ฬƒ is the minimum of the approximating quadratic ( ) and โˆ— we have to make๐œ†๐œ†ฬƒ sure that this is sufficiently close to the true minimum โ„Žof๐œ†๐œ† ( ) โˆ— โˆ— before taking . several tests๐œ†๐œ†ฬƒ are possible to ascertain this. One possible๐œ†๐œ† test๐‘“๐‘“ is ๐œ†๐œ†to โˆ— โˆ— compare ( ๐œ†๐œ†ฬƒ) withโ‰ƒ ๐œ†๐œ†ฬƒ ( ) and c onsider a s ufficiently go od a pproximation i f t hey โˆ— โˆ— โˆ— differ not ๐‘“๐‘“more๐œ†๐œ†ฬƒ than byโ„Ž a ๐œ†๐œ†ฬƒsmall amount. This๐œ†๐œ†ฬƒ criterion can be stated as

(2.24) โˆ— โˆ— 1 โ„Ž๏ฟฝ๐œ†๐œ†ฬƒ ๏ฟฝ โˆ’ ๐‘“๐‘“ ๏ฟฝ๐œ†๐œ†ฬƒ ๏ฟฝ ๏ฟฝ โˆ— ๏ฟฝ โ‰ค ๐œ€๐œ€ Another p ossible test i s t o ex๐‘“๐‘“ amine๏ฟฝ๐œ†๐œ†ฬƒ ๏ฟฝ w hether / is cl ose t o zero at . S ince t he โˆ— derivatives of are not used in this method, we๐‘‘๐‘‘๐‘‘๐‘‘ can๐‘‘๐‘‘๐‘‘๐‘‘ use a finite-difference๐œ†๐œ†ฬƒ formula for / and use๐‘“๐‘“ the criterion ๐‘‘๐‘‘ ๐‘‘๐‘‘ ๐‘‘๐‘‘๐‘‘๐‘‘

39

+ (2.25) โˆ— โˆ— 2 โˆ— โˆ— 2 ๐‘“๐‘“ ๏ฟฝ๐œ†๐œ†ฬƒ ฮ”๐œ†๐œ†ฬƒ ๏ฟฝ โˆ’ ๐‘“๐‘“ ๏ฟฝ๐œ†๐œ†ฬƒ โˆ’ ฮ”๐œ†๐œ†ฬƒ ๏ฟฝ ๏ฟฝ โˆ— ๏ฟฝ โ‰ค ๐œ€๐œ€ to s top t he procedure. I n E qs.ฮ”๐œ†๐œ†ฬƒ (2.24) and (2.25), 1and 2 are small n umbers t o b e specified d epending o n the accu racy d esired. If t he ๐œ€๐œ€co nvergence๐œ€๐œ€ cr iteria st ated i n Eqs. (2.24) and (2.25) are not satisfied, a new quadratic function ( ) = + + 2

is us ed to a pproximate t he f unctionโ„Žโ€ฒ ๐œ†๐œ† (๐‘Ž๐‘Žโ€ฒ). T o๐‘๐‘โ€ฒ๐œ†๐œ† e valuate๐‘๐‘โ€ฒ๐œ†๐œ† the c onstants , , , the three be st f unction v alues of t he c๐‘“๐‘“ urrent๐œ†๐œ† = ( = 0), = ๐‘Ž๐‘Žโ€ฒ ( ๐‘๐‘โ€ฒ =๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Ž0),๐‘๐‘ โ€ฒ = ( = 2 0), and หœ = = ( = ) are t๐‘“๐‘“ o๐ด๐ด be ๐‘“๐‘“ used.๐œ†๐œ† T his ๐‘“๐‘“ process๐ต๐ต ๐‘“๐‘“ of๐œ†๐œ† t rying๐‘ก๐‘ก t๐‘“๐‘“ o๐ถ๐ถ f it โˆ— another๐‘“๐‘“ ๐œ†๐œ† pol๐‘ก๐‘ก ynomial ๐‘“๐‘“t o obtain๐‘“๐‘“ ๐œ†๐œ† a be tter๐œ†๐œ†ฬƒ a pproximation t o is know n a s refitting the โˆ— polynomial. ๐œ†๐œ†ฬƒ For r efitting t he q uadratic, w e co nsider all p ossible situations an d select t he best t hree points of t he pr esent , , , a nd . T here a re f our pos sibilities. A new va lue of is โˆ— โˆ— computed by us ing t he๐ด๐ด ๐ต๐ต general๐ถ๐ถ f ormula,๐œ†๐œ†ฬƒ E q. (2.17). If th is also does not s atisfy ๐œ†๐œ†ฬƒt he โˆ— convergence criteria stated in Eqs. (2.24) and (2.25), a new quadratic๐œ†๐œ†ฬƒ has to be refitted. Example 2.4 Find the minimum of = 5 5 3 20 + 5.

SOLUTION S ince this is n ot a m๐‘“๐‘“ ultivariable๐œ†๐œ† โˆ’ o๐œ†๐œ† ptimizationโˆ’ ๐œ†๐œ† problem, w e c an p roceed directly to 2. Let the initial step size be taken as 0 = 0.5 = 0.

Iteration 1๐‘ ๐‘ ๐‘ ๐‘  ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  ๐‘ก๐‘ก ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐ด๐ด = ( = 0) = 5

1 = ( = 0) = 0.03125๐‘“๐‘“๐ด๐ด ๐‘“๐‘“ 5(0.๐œ†๐œ† 125) 20(0.5) + 5 = 5.59375 Since ๐‘“๐‘“1 < ๐‘“๐‘“, ๐œ†๐œ† ๐‘ก๐‘ก = 1 = 5.โˆ’59375, and findโˆ’ that โˆ’ ๐‘“๐‘“ ๐‘“๐‘“๐ด๐ด ๐‘ค๐‘ค๐‘ค๐‘ค ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  ๐‘“๐‘“๐ต๐ต 2 ๐‘“๐‘“ = (โˆ’ = 2 0 = 1.0) = 19.0 As 2 < 1, we set new ๐‘“๐‘“0 = 1๐‘“๐‘“ and๐œ†๐œ† 1 =๐‘ก๐‘ก 19.0. Again โˆ’we find that 1 < and hence set ๐‘“๐‘“ = ๐‘“๐‘“1 = 19.0, a๐‘ก๐‘ก nd f ind t hat๐‘“๐‘“ 2 =โˆ’ ( = 2 0 = 2) = 43๐‘“๐‘“ . S ince๐‘“๐‘“๐ด๐ด 2 < 1, we๐‘“๐‘“ again๐ต๐ต ๐‘“๐‘“set 0 โˆ’ = 2 and 1 = 43. As๐‘“๐‘“ this 1๐‘“๐‘“ <๐œ†๐œ† , set๐‘ก๐‘ก = 1 =โˆ’ 43 and evaluate๐‘“๐‘“ ๐‘“๐‘“ 2 = ( =๐‘ก๐‘ก 2 0 = 4)๐‘“๐‘“ = 629โˆ’ . This time๐‘“๐‘“ 2 >๐‘“๐‘“๐ด๐ด1 and๐‘“๐‘“ ๐ต๐ตhence๐‘“๐‘“ we setโˆ’ = 2 = 629 ๐‘“๐‘“and compute๐‘“๐‘“ ๐œ†๐œ† from๐‘ก๐‘ก Eq. (2.21) as ๐‘“๐‘“ ๐‘“๐‘“ ๐‘“๐‘“๐ถ๐ถ ๐‘“๐‘“ โˆ— 4( 43) 3(5)โ€“ 629 1632 ๐œ†๐œ†ฬƒ = (2) = = 1.135 4( 43) 2(629) 2(5) 1440 โˆ— โˆ’ โˆ’ ๐œ†๐œ†ฬƒ โˆ’ โˆ’ โˆ’40

Convergence test: Since = 0, = 5, = 2, = 43, = 4, and = 629, the

values of , , can๐ด๐ด be found๐‘“๐‘“๐ด๐ด to be ๐ต๐ต ๐‘“๐‘“๐ต๐ต โˆ’ ๐ถ๐ถ ๐‘“๐‘“๐ถ๐ถ ๐‘Ž๐‘Ž ๐‘๐‘ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘๐‘ = 5, = 204, = 90 and ๐‘Ž๐‘Ž ๐‘๐‘ โˆ’ ๐‘๐‘ ( ) = (1.135) = 5 204(1.135) + 90(1.135)2 = 110.9 โˆ— Since โ„Ž ๐œ†๐œ†ฬƒ โ„Ž โˆ’ โˆ’ หœ = = ( ) = (1.135)5 5(1.135)3 20(1.135) + 5.0 = 23.127 โˆ— we have๐‘“๐‘“ ๐‘“๐‘“ ๐œ†๐œ†ฬƒ โˆ’ โˆ’ โˆ’ 116.5 + 23.127 = = 3.8 โˆ— โˆ— 23.127 โ„Ž๏ฟฝ๐œ†๐œ†ฬƒ ๏ฟฝ โˆ’ ๐‘“๐‘“ ๏ฟฝ๐œ†๐œ†ฬƒ ๏ฟฝ โˆ’ ๏ฟฝ โˆ— ๏ฟฝ ๏ฟฝ ๏ฟฝ As t his q uantity i s v ery l๐‘“๐‘“ arge,๏ฟฝ๐œ†๐œ†ฬƒ ๏ฟฝco nvergence i sโˆ’ n ot ach ieved an d h ence we h ave t o use refitting. Iteration 2 Since < หœ > , we take the new values of , , as โˆ— ๐œ†๐œ†ฬƒ ๐ต๐ต ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘“๐‘“ ๐‘“๐‘“๐ต๐ต = 1.135, =๐ด๐ด ๐ต๐ต23๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž.127๐ถ๐ถ ๐ด๐ด = 2.0, ๐‘“๐‘“๐ด๐ด = โˆ’43.0 ๐ต๐ต = 4.0, ๐‘“๐‘“๐ต๐ต = 629โˆ’ .0 and compute new , using ๐ถ๐ถEq. (2.17), as ๐‘“๐‘“๐ถ๐ถ ( 23.127)(4.0โˆ— 16.0) + ( 43.0)(16.0 1.29) + (629.0)(1.29 4.0) = ๐œ†๐œ†ฬƒ 2[( 23.127)(2.0 4.0) + ( 43.0)(4.0 1.135) + (629.0)(1.135 2.0)] โˆ— โˆ’ โˆ’ โˆ’ โˆ’ โˆ’ ๐œ†๐œ†ฬƒ โˆ’ = 1.661 โˆ’ โˆ’ โˆ’ โˆ’

Convergence test: To test the convergence, we compute the coefficients of the Quadratic as = 288.0, = 417.0, = 125.3

As ๐‘Ž๐‘Ž ๐‘๐‘ โˆ’ ๐‘๐‘ ( ) = (1.661) = 288.0 417.0(1.661) + 125.3(1.661)2 = 59.7 โˆ— โ„Ž ๐œ†๐œ†ฬƒ หœ =โ„Ž ( ) = 12.8 5(4.โˆ’ 59) 20(1.661) + 5.0 = 38.37โˆ’ โˆ— we obtain ๐‘“๐‘“ ๐‘“๐‘“ ๐œ†๐œ†ฬƒ โˆ’ โˆ’ โˆ’

41

( ) ( ) 59.70 + 38.37 = = 0.556 โˆ— ( ) โˆ— 38.37 โ„Ž ๐œ†๐œ†ฬƒ โˆ’ ๐‘“๐‘“ ๐œ†๐œ†ฬƒ โˆ’ ๏ฟฝ โˆ— ๏ฟฝ ๏ฟฝ ๏ฟฝ Since this quantity is not sufficiently๐‘“๐‘“ ๐œ†๐œ†ฬƒ small, weโˆ’ need to proceed to the next refit.

2.3. Multidimensional search

2.3.1 Unvariate Method In this method we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. By starting at a base point in the

iteration, we fix the values of 1 variables and vary the remaining varia๐‘‹๐‘‹๐‘–๐‘– ble. Since ๐‘ก๐‘กโ„Ž ๐‘–๐‘–only one va riable i s c hanged, t he๐‘›๐‘› โˆ’pr oblem b ecomes a o ne-dimensional m inimization problem a nd a ny of t he m ethods di scussed a bove c an be us ed t o p roduce a ne w ba se point +1. T he s earch is n ow c ontinued i n a ne w di rection. T his ne w di rection i s

obtained๐‘‹๐‘‹๐‘–๐‘– by c hanging a ny one of t he 1 variables t hat w ere fixed i n t he pr evious iteration. In fact, the search procedure is๐‘›๐‘› continuedโˆ’ by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence w e r epeat the e ntire p rocess o f seq uential m inimization. T he p rocedure i s continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows [9]:

1. Choose an arbitrary staring point 1 and set = 1.

2. Find the search direction as ๐‘‹๐‘‹ ๐‘–๐‘– ๐’”๐’”๐’Š๐’Š (1,0,0,...,0) for i = 1, n + 1, 2n + 1, . . . (1,0,0,...,0) for i = 2, n + 2, 2n + 2, . . . (0, 0, 1, . . . , 0) for i = 3, n + 3, 2n + 3, . . . = โŽง . (2.26) โŽช ๐‘‡๐‘‡ โŽช . ๐‘†๐‘†๐‘–๐‘– . โŽจ(0,0,0,...,1) for i = n, 2n, 3n, . .. โŽช S 3. DetermineโŽฉ whether should be positive or negative. For the current direction , this means find whether the๐œ†๐œ† ๐‘–๐‘–function value decreases in the positive or negative direction.๐’Š๐’Š For this w e take a sm all p robe l ength ( ) and e valuate = ( ), + = ( + ), + and = ( ). If < ๐œ€๐œ€ , will be the๐‘“๐‘“ correct๐‘–๐‘– ๐‘“๐‘“ direction๐‘‹๐‘‹๐‘–๐‘– ๐‘“๐‘“ for decreasing๐‘“๐‘“ ๐‘‹๐‘‹๐‘–๐‘– ๐œ€๐œ€๐‘†๐‘† the๐‘–๐‘– โˆ’ ๐‘“๐‘“ ๐‘“๐‘“ ๐‘‹๐‘‹๐‘–๐‘– โˆ’ ๐œ€๐œ€๐‘†๐‘†๐‘–๐‘– ๐‘“๐‘“ ๐‘“๐‘“๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘– 42

value of f and i f < , will be t he c orrect one . I f both +and are greater โˆ’ โˆ’ than , we take ๐‘“๐‘“ as the ๐‘“๐‘“minimum๐‘“๐‘“ โˆ’๐‘†๐‘†๐‘–๐‘– along the direction . ๐‘“๐‘“ ๐‘“๐‘“ 4. Find๐‘“๐‘“๐‘–๐‘– the optimal๐‘‹๐‘‹๐‘–๐‘– step length such that ๐‘†๐‘†๐‘–๐‘– โˆ— ( ยฑ ๐œ†๐œ†๐‘–๐‘–) = min( ยฑ ) (2.27) โˆ— ๐‘–๐‘– ๐‘–๐‘– ๐‘–๐‘– ๐‘–๐‘– ๐‘–๐‘– ๐‘–๐‘– where + or sign has๐‘“๐‘“ ๐‘‹๐‘‹to be๐œ†๐œ† used๐‘†๐‘† depending๐œ†๐œ†๐‘–๐‘– ๐‘‹๐‘‹ upon๐œ†๐œ† ๐‘†๐‘† whether or is the direction for

decreasing theโˆ’ function value. ๐‘†๐‘†๐‘–๐‘– โˆ’๐‘†๐‘†๐‘–๐‘– 5. Set + = ยฑ Si depending on the direction for decreasing the function value, โˆ— and +1๐‘‹๐‘‹๐’Š๐’Š =๐Ÿ๐Ÿ (๐‘‹๐‘‹๐‘–๐‘–+ ).๐œ†๐œ† ๐‘–๐‘– 6. if ๐‘“๐‘“๐‘–๐‘– +1 ๐‘“๐‘“ ๐‘‹๐‘‹๐’Š๐’Š<๐Ÿ๐Ÿ , stop. Otherwise go to 7 7. Setโ€–๐‘‹๐‘‹ t๐‘–๐‘– he neโˆ’ w๐‘‹๐‘‹๐‘–๐‘– โ€–va lue๐œ€๐œ€ of = + 1 and go t o s tep2. C ontinue t his pr ocedure unt il no significant change is achieved๐‘–๐‘– in๐‘–๐‘– the value of the objective function. The univariate method is very simple and can be implemented easily. However, it will not c onverge r apidly t o t he opt imum s olution, a s it ha s a t endency t o os cillate with steadily d ecreasing p rogress t oward th e o ptimum. H ence it w ill b e b etter to s top th e computations a t s ome poi nt ne ar to t he op timum poi nt r ather t han t rying t o f ind the precise opt imum poi nt. I n t heory, t he uni variate m ethod c an be a pplied t o f ind t he minimum of any function that possesses continuous derivatives. However, if the function has a st eep valley, the method may not even converge. If the univariate search starts at

point P, t he f unction va lue c annot be de creased e ither i n t he di rection ยฑ 1 or i n t he

direction ยฑ 2. Thus the search comes to a halt and one may be misled to take๐‘†๐‘† the point , which is c ertainly๐‘†๐‘† n ot t he o ptimum p oint, a s the o ptimum p oint. T his s ituation a rises๐‘ƒ๐‘ƒ whenever t he va lue of t he pr obe length ฮต needed f or de tecting t he proper d irection

(ยฑ 1 ยฑ 2) happens to be l ess t han t he nu mber of s ignificant f igures us ed i n t he

computations.๐‘†๐‘† ๐‘œ๐‘œ๐‘œ๐‘œ ๐‘†๐‘† 2 2 Example 2.5 Minimize ( 1, 2) = 1 2 + 2 1 + 2 1 2 + 2 with the starting

point (0, 0). ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ SOLUTION we will take the probe length ( ) as 0.01 to find the correct direction for

decreasing t he f unction va lue i n s tep 3. F urther,๐œ€๐œ€ w e w ill us e t he di fferential c alculus method to find the optimum step length along the direction ยฑ in step 4. โˆ— Iteration = 1 ๐œ†๐œ†๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘– 43 ๐‘–๐‘–

1 Step 2: Choose the search direction as = . 1 1 0

Step 3: To f ind w hether t he va lue ๐‘†๐‘† of ๐‘†๐‘†decreases๏ฟฝ ๏ฟฝ al ong 1 or 1, w e us e t he pr obe

length . Since ๐‘“๐‘“ ๐‘†๐‘† โˆ’๐‘†๐‘† 1 = ๐œ€๐œ€ ( 1) = (0, 0) = 0, + ๐‘“๐‘“ =๐‘“๐‘“ ๐‘‹๐‘‹ ( 1 +๐‘“๐‘“ 1 ) = ( , 0) = 0.01 0 + 2(0.0001) + 0 + 0 = 0.0102 ๐‘“๐‘“ ๐‘“๐‘“ ๐‘‹๐‘‹ > ๐œ€๐œ€๐‘†๐‘†1 ๐‘“๐‘“ ๐œ€๐œ€ โˆ’ ( ) ( ) ( ) = ๐‘“๐‘“ 1 โ€“ 1 = , 0 = 0.01 0 + 2 0.0001 โˆ’ ๐‘“๐‘“ ๐‘“๐‘“ +๐‘‹๐‘‹ 0 +๐œ€๐œ€๐‘†๐‘† 0 = ๐‘“๐‘“0.0098โˆ’๐œ€๐œ€ < 1โˆ’, โˆ’ is the correct direction for minimizingโˆ’ f from๐‘“๐‘“ 1. โˆ’๐‘†๐‘†Step๐Ÿ๐Ÿ 4: To find the optimum step length 1 , we minimize๐‘‹๐‘‹ โˆ— ( 1 1 1) = ( 1, 0)๐œ†๐œ† ( ) 2 2 ๐‘“๐‘“ ๐‘‹๐‘‹ โˆ’ ๐œ†๐œ† ๐‘†๐‘† = ๐‘“๐‘“ โˆ’๐œ†๐œ†1 0 + 2( 1) + 0 + 0 = 2 1 1 โˆ’๐œ†๐œ† โˆ’ โˆ’๐œ†๐œ† ๐œ†๐œ† โˆ’ ๐œ†๐œ† As 1 is a step length we take 0 1 1 and solving above equation for 1 using golden 1 section method we get = ๐œ†๐œ† 1 4 โ‰ค ๐œ†๐œ† โ‰ค ๐œ†๐œ† โˆ— Step 5: Set ๐œ†๐œ† 1 0 1 1 = โ€“ = = 2 1 1 1 0 4 0 4 โˆ— 0 โˆ’ ๐‘‹๐‘‹ ๐‘‹๐‘‹ ๐œ†๐œ† ๐‘†๐‘† ๏ฟฝ ๏ฟฝ1 โˆ’ ๏ฟฝ ๏ฟฝ ๏ฟฝ1 ๏ฟฝ = ( ) = ( , 0) = 2 2 4 8 Iteration = ๐‘“๐‘“ ๐‘“๐‘“ ๐‘“๐‘“ ๐‘“๐‘“ โˆ’ โˆ’ 0 Step 2: Choose๐‘–๐‘– ๐Ÿ๐Ÿ the search direction = 2 2 1

Step 3: Since 2 = ( 2) = 0.125๐‘†๐‘† ๐‘Ž๐‘Ž, ๐‘Ž๐‘Ž ๐‘†๐‘† ๏ฟฝ ๏ฟฝ + ๐‘“๐‘“ =๐‘“๐‘“ (๐‘‹๐‘‹ 2 + โˆ’2) = ( 0.25, 0.01) = 0.1399 < 2 ๐‘“๐‘“ = ๐‘“๐‘“ ( ๐‘‹๐‘‹2 + ๐œ€๐œ€๐‘†๐‘†2) = ๐‘“๐‘“ ( โˆ’0.25, 0.01) =โˆ’ 0.1099 >๐‘“๐‘“2 โˆ’ 2 is the correct๐‘“๐‘“ direction๐‘“๐‘“ ๐‘‹๐‘‹ for decreasing๐œ€๐œ€๐‘†๐‘† ๐‘“๐‘“ theโˆ’ valueโˆ’ of f from 2โˆ’. ๐‘“๐‘“ 2 ๐‘†๐‘†Step 4: We minimize ( 2 + 2 2) to find 2 . ๐‘‹๐‘‹ Here ๐‘“๐‘“ ๐‘‹๐‘‹ ๐œ†๐œ† ๐‘†๐‘† ๐œ†๐œ† ( 2 + 2 2) = ( 0.25, 2)

๐‘“๐‘“ ๐‘‹๐‘‹ ๐œ†๐œ† ๐‘†๐‘† 44 ๐‘“๐‘“ โˆ’ ๐œ†๐œ†

2 2 = 0.25 2 + 2(0.25) 2(0.25)( 2) + 2 2 โˆ’ โˆ’ ๐œ†๐œ†= 2 1.5 2 โˆ’ 0.125 ๐œ†๐œ† ๐œ†๐œ† using golden section๐œ†๐œ† โˆ’ method๐œ†๐œ† โˆ’ we get 2 = 0.75 โˆ— Step 5: Set ๐œ†๐œ† 025 0 0.25 = + = + 0.75 + = 3 2 2 2 0 1 0.75 โˆ— โˆ’ โˆ’ ๐‘‹๐‘‹ ๐‘‹๐‘‹ ๐œ†๐œ† 3๐‘†๐‘† = ๏ฟฝ( 3) =๏ฟฝ 0.6875๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ

Next we set the iteration number๐‘“๐‘“ as ๐‘“๐‘“ = ๐‘‹๐‘‹3, and continueโˆ’ the procedure until the optimum 1.0 solution = ( ) ๐‘–๐‘– = 1.25 is found. 1.5 โˆ— โˆ’ โˆ— Note: If the๐‘‹๐‘‹ method๏ฟฝ is๏ฟฝ to๐‘ค๐‘ค ๐‘ค๐‘ค๐‘ค๐‘คbeโ„Ž computerized,๐‘“๐‘“ ๐‘‹๐‘‹ โˆ’ a suitable convergence criterion has to be used

to test the point +1( = 1, 2, . . . ) for optimality.

๐‘–๐‘– 2.3.2 Pattern Directions๐‘‹๐‘‹ ๐‘–๐‘– In t he uni variate m ethod, w e s earch f or t he m inimum a long di rections pa rallel to t he coordinate axes. We noticed that this method may not converge in some cases and that even i f i t c onverges, i ts c onvergence w ill be ve ry s low a s w e a pproach t he opt imum point. These problems can be avoided by changing the directions of search in a favorable manner instead of retaining them always parallel to the coordinate axes. Let the points 1, 2, 3, . .. indicate the successive points found by the univariate method. It can be noticed that the lines joining the alternate points of the search ( . . ,1, 3; 2, 4; 3, 5; 4,6; . . . ) lie

in the general direction of the minimum and are known as๐‘’๐‘’ pattern๐‘”๐‘” directions. It can be proved that if the objective function is a quadratic in two variables, all such lines pass t hrough t he m inimum. U nfortunately, t his pr operty w ill not be va lid f or multivariable f unctions even w hen t hey ar e q uadratics. H owever, t his idea can st ill b e used to achieve rapid convergence while finding the minimum of an n-variable function. Methods t hat u se p attern d irections as sear ch directions a re k nown as pattern search methods. One of the best-known pattern search methods, the Powellโ€™s method, is discussed here. In general, a pattern search method takes n univariate steps, where n denotes the number of design va riables a nd t hen s earches f or t he minimum a long t he pa ttern di rection ,

defined by ๐‘†๐‘†๐‘–๐‘– 45

= (2.28)

Where the point is obtained๐‘†๐‘†๐‘–๐‘– ๐‘‹๐‘‹ at๐‘–๐‘– โˆ’t he๐‘‹๐‘‹ ๐‘–๐‘–โˆ’๐‘›๐‘›e nd of n univariate steps and is the st arting point before๐‘‹๐‘‹๐‘–๐‘– taking the n univariate steps. In general, the directions used๐‘‹๐‘‹๐‘–๐‘–โˆ’๐‘›๐‘› prior to taking a move along a pattern direction need not be univariate directions.

2.3.4 POWELLโ€™S METHOD Powellโ€™s method is an extension of the basic pattern search method. It is the most widely used direct search method and can be proved to be a method of conjugate directions [9]. A conjugate directions method will minimize a quadratic function in a finite number of steps. S ince a g eneral nonl inear f unction c an be a pproximated r easonably w ell by a quadratic function near its minimum, a conjugate directions method is expected to speed up the convergence of even general nonlinear objective functions. The de finition, a method of ge neration of c onjugate d irections, a nd the pr operty of quadratic convergence are presented in this section.

2.3.4.1 Conjugate Directions Definition2.4 (Conjugate Directions). Let = [ ] be an ร— symmetric matrix. A set of vectors (or directions) { } is said to๐ด๐ด be conjugate๐ด๐ด (more๐‘›๐‘› accurately๐‘›๐‘› A-conjugate) if ๐‘›๐‘› ๐‘†๐‘†๐‘–๐‘– = 0 , = 1, 2, . . . , , = 1, 2, . . . , (2.29) ๐‘‡๐‘‡ It can๐‘†๐‘†๐’Š๐’Š ๐ด๐ด๐‘†๐‘† b๐‘—๐‘— e seen๐‘“๐‘“๐‘“๐‘“๐‘“๐‘“ t๐‘Ž๐‘Ž hat๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘–๐‘– orthogonalโ‰  ๐‘—๐‘— ๐‘–๐‘– di rections ๐‘›๐‘› are๐‘—๐‘— a s pecial๐‘›๐‘› case o f co njugate directions( [ ] = [ ] . (2.29).

Definition2.5๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ๐‘œ (Quadratically๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘คโ„Ž ๐ด๐ด Convergent๐ผ๐ผ ๐‘–๐‘–๐‘–๐‘– ๐ธ๐ธ๐ธ๐ธ Method): If a minimization method, us ing exact a rithmetic, c an f ind th e m inimum p oint in n steps while m inimizing a q uadratic function in n variables, the method is called a quadratically convergent method. Theorem 2.3Given a quadratic function of n variables and two parallel hyper planes 1 a nd 2 of di mension < . L et t he c onstrained s tationary poi nts of t he qua dratic

function in the hyper planes๐‘˜๐‘˜ be๐‘›๐‘› 1 and 2, respectively. Then the line joining 1 and 2 is conjugate to any line parallel to๐‘‹๐‘‹ the hyperplanes.๐‘‹๐‘‹ ๐‘‹๐‘‹ ๐‘‹๐‘‹ Proof: Let the quadratic function be expressed as 1 ( ) = + + (2.30) 2 ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘„๐‘„ ๐‘‹๐‘‹ ๐‘‹๐‘‹ ๐ด๐ด๐ด๐ด ๐ต๐ต ๐‘‹๐‘‹ ๐ถ๐ถ 46

The gradient of is given by

๐‘„๐‘„ ( ) = + and hence ๐›ป๐›ป๐‘„๐‘„ ๐‘‹๐‘‹ ๐ด๐ด๐ด๐ด ๐ต๐ต ( 1) ( 2) = ( 1 2) (2.31)

If is any vector๐›ป๐›ป๐‘„๐‘„ ๐‘‹๐‘‹ parallelโˆ’ ๐›ป๐›ป๐‘„๐‘„ to๐‘‹๐‘‹ the hyper๐ด๐ด planes,๐‘‹๐‘‹ โˆ’ ๐‘‹๐‘‹ it must be orthogonal to the gradients ๐‘†๐‘†( ) ( 2). Thus ( ) ๐›ป๐›ป๐‘„๐‘„ ๐‘‹๐‘‹ ๐Ÿ๐Ÿ ๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐›ป๐›ป ๐‘„๐‘„ ๐‘‹๐‘‹ 1 = 1 + = 0 (2.32) ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘†๐‘† ๐›ป๐›ป๐‘„๐‘„( ๐‘‹๐‘‹2) = ๐‘†๐‘† ๐ด๐ด๐‘‹๐‘‹2 + ๐‘†๐‘† ๐ต๐ต = 0 (2.33) ๐‘ป๐‘ป ๐‘‡๐‘‡ ๐‘‡๐‘‡ By subtracting ๐‘†๐‘†. (6๐›ป๐›ป๐‘„๐‘„.29๐‘‹๐‘‹) from ๐‘†๐‘†Eq.๐ด๐ด๐‘‹๐‘‹ (6.28),๐‘†๐‘† we๐ต๐ต obtain ๐ธ๐ธ ๐ธ๐ธ ( 1 2) = 0 (2.34) ๐‘‡๐‘‡ Hence and (๐‘†๐‘† 1 ๐ด๐ด ๐‘‹๐‘‹ 2โˆ’) are๐‘‹๐‘‹ A-conjugate. Theorem๐‘†๐‘† 2.4 If๐‘‹๐‘‹ a โˆ’quadratic๐‘‹๐‘‹ function 1 ( ) = + + (2.35) 2 ๐‘‡๐‘‡ ๐‘‡๐‘‡ is m inimized ๐‘„๐‘„seq๐‘‹๐‘‹ uentially,๐‘‹๐‘‹ ๐ด๐ด o๐ด๐ด nce along๐ต๐ต ๐‘‹๐‘‹ each๐ถ๐ถ d irection o f a set o f n mutually c onjugate directions, t he m inimum o f t he f unction will be f ound a t o r be fore t he nth st ep

irrespective of the starting point. ๐‘„๐‘„ Proof: Let X minimize the quadratic function (X). Then โˆ— (X ) = B + AX = 0 ๐‘„๐‘„ (2.36) โˆ— โˆ— Given a point๐›ป๐›ป๐‘„๐‘„X1 and a set of linearly independent directions S ,S2,...,Sn, constants Can always be found such that ๐Ÿ๐Ÿ

๐›ฝ๐›ฝ๐‘–๐‘– = 1 + ๐‘›๐‘› (2.37) โˆ— =1 ๐‘‹๐‘‹ ๐‘‹๐‘‹ ๏ฟฝ ๐›ฝ๐›ฝ๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘– where the vectors S ,S2,...,Snhave๐‘–๐‘– been used as basis vectors. If the directions S are A-

conjugate and none ๐Ÿ๐Ÿof them is zero, the Si can easily be shown to be linearly independent๐’Š๐’Š and the can be determined as follows.

Equations๐›ฝ๐›ฝ๐‘–๐‘– (2.36) and (2.37) lead to

B + AX1 + A ๐‘›๐‘› = 0 (2.38) =1 ๏ฟฝ๏ฟฝ ๐›ฝ๐›ฝ๐‘–๐‘– ๐‘†๐‘†๐‘–๐‘–๏ฟฝ Multiplying this equation throughout๐‘–๐‘– by , we obtain ๐‘ป๐‘ป ๐’‹๐’‹ ๐‘บ๐‘บ 47

T T Sj (B + AX1) + Sj A ๐‘›๐‘› = 0 (2.39) =1 ๏ฟฝ๏ฟฝ ๐›ฝ๐›ฝ๐‘–๐‘–๐‘†๐‘†๐‘–๐‘–๏ฟฝ Equation (2.39) can be rewritten as ๐‘–๐‘– T T (B + AX1) Sj + Sj ASj = 0 (2.40)

that is, ๐›ฝ๐›ฝ๐‘—๐‘— T (B + AX1) Sj = T (2.41) Sj ASj ๐›ฝ๐›ฝ๐‘—๐‘— โˆ’ Now consider an iterative minimization procedure starting at point 1, and successively

minimizing t he qua dratic (X) in th e d irections S ,S2,...,Sn , w here๐‘‹๐‘‹ t hese directions satisfy Eq. (2.29). The successive๐‘„๐‘„ points are determined๐Ÿ๐Ÿ by the relation Xi+1 = Xi + i Si , i = 1 to n (2.42) โˆ— where is found by minimizing (Xiฮป + Si) so that โˆ— T ๐œ†๐œ†๐‘–๐‘– Si ๐‘„๐‘„(Xi+1) ๐œ†๐œ† =๐‘–๐‘– 0 (2.43) Since the gradient of at the point๐›ป๐›ป๐‘„๐‘„ Xi+1 is given by ( ) ๐‘„๐‘„ Xi+1 = B + AXi+1 (2.44) Eq. (2.43) can be written๐›ป๐›ป๐‘„๐‘„ as T Si {B + A(Xi + S )} = 0 (2.45) โˆ— This equation gives ๐œ†๐œ†๐‘–๐‘– ๐’Š๐’Š (B + AX ) S = i (2.46) ST AS ๐‘‡๐‘‡ โˆ— i i ๐’Š๐’Š ๐‘–๐‘– From Eq. (2.42), we can express๐œ†๐œ† โˆ’ as 1 ๐‘ฟ๐‘ฟ๐’Š๐’Š Xi = X1 + ๐‘–๐‘–โˆ’ (2.47) =1 โˆ— ๏ฟฝ ๐œ†๐œ†๐‘—๐‘— ๐‘†๐‘†๐‘—๐‘— so that ๐‘—๐‘— 1 T T Xi ASi = X1 ASi + ๐‘–๐‘–โˆ’ =1 โˆ— โˆ— ๐‘—๐‘— ๐‘—๐‘— ๐‘—๐‘— T ๏ฟฝ ๐œ†๐œ† ๐‘†๐‘† ๐ด๐ด๐‘†๐‘† = X1 AS๐‘—๐‘—i (2.48) using the relation (2.29). Thus Eq. (2.46) becomes

48

S = (B + AX ) i (2.49) 1 STAS โˆ— ๐‘‡๐‘‡ i i ๐‘–๐‘– which can be seen๐œ†๐œ† toโˆ’ be identical to Eq. (2.41). Hence the minimizing step lengths are given by . S ince t he opt imal poi nt X is o riginally ex pressed as a su m o f n โˆ— โˆ— quantities ๐›ฝ๐›ฝ1๐‘–๐‘–,๐‘œ๐‘œ๐‘œ๐‘œ2,...,๐œ†๐œ†๐‘–๐‘– , which have been shown to be equivalent to the minimizing step lengths, the๐›ฝ๐›ฝ minimization๐›ฝ๐›ฝ ๐›ฝ๐›ฝ๐‘›๐‘› process leads to the minimum point in n steps or less. Since we have not made any assumption regarding X1 and the order of S1,S2,...,Sn , the process converges in n steps or less, independent of the starting point as well as the order in which the minimization directions are used. Example 2.6 Consider the minimization of the function 2 2 ( 1, 2) = 6 1 + 2 2 6 1 2 1 2 2 1 If = denotes๐‘“๐‘“ a๐‘ฅ๐‘ฅ s๐‘ฅ๐‘ฅ earch d irection,๐‘ฅ๐‘ฅ f๐‘ฅ๐‘ฅ indโˆ’ a di๐‘ฅ๐‘ฅ rection๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅ โˆ’ that๐‘ฅ๐‘ฅ is conjugate to the 2 2 ๐Ÿ๐Ÿ direction๐‘บ๐‘บ ๏ฟฝ 1๏ฟฝ. ๐‘†๐‘†

SOLUTION๐‘†๐‘† The objective function can be expressed in matrix form as 1 ( ) = + [ ] 2 ๐‘‡๐‘‡ ๐‘‡๐‘‡ ๐‘“๐‘“ ๐‘‹๐‘‹ ๐ต๐ต ๐‘‹๐‘‹1 1 ๐‘‹๐‘‹ ๐ด๐ด ๐‘‹๐‘‹ 12 6 1 = { 1 2} + { 1 2} 2 2 6 4 2 ๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅ and the Hessian matrix [ ] can beโˆ’ identifiedโˆ’ ๏ฟฝ as๏ฟฝ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅ 12 6 ๐ด๐ด [ ] = 6 4 โˆ’ 1 ๐ด๐ด ๏ฟฝ 1๏ฟฝ The direction 2 = will be conjugate to 1 = if 2 โˆ’ 2 ๐‘ ๐‘  ๐‘†๐‘† ๏ฟฝ ๏ฟฝ ๐‘†๐‘†12 ๏ฟฝ 6๏ฟฝ 1 ๐‘ ๐‘  1 [ ] 2 = (1 2) = 0 6 4 2 ๐‘‡๐‘‡ โˆ’ ๐‘ ๐‘  which upon e xpansion gives๐‘†๐‘† ๐ด๐ด 2๐‘†๐‘† = 0 or ๏ฟฝ = arbitrary๏ฟฝ ๏ฟฝ and๏ฟฝ = 0. Since can have 2 1 โˆ’ ๐‘ ๐‘  2 1 any value, we select 1 = 1 and๐‘ ๐‘  the desired๐‘ ๐‘  conjugate direction๐‘ ๐‘  can be expressed๐‘ ๐‘  as 1 = . ๐‘ ๐‘  2 0 Powellโ€™s๐‘†๐‘† ๏ฟฝ ๏ฟฝ Algorithm In Powellโ€™s method for a two-variable function the function is first minimized once along each of the coordinate directions starting with the second coordinate direction and then in the corresponding pattern direction. For the next cycle of minimization, we discard one of 49

the co ordinate d irections ( the 1 direction in t he p resent ca se) i n f avor o f t he p attern

direction. ๐‘ฅ๐‘ฅ Then w e ge nerate a ne w pa ttern d irection S2. F or t he ne xt c ycle of m inimization, we

discard one of the previously used coordinate directions (the 2 direction in this case) in

favor of t he ne wly ge nerated pattern di rection. Then, we minimize๐‘ฅ๐‘ฅ a long di rections S1 and S2. F or t he ne xt c ycle of minimization, s ince t here i s n o c oordinate di rection to

discard, w e r estart t he w hole p rocedure b y minimizing a long t he 2 direction. T his

procedure is continued until the desired minimum point is found. ๐‘ฅ๐‘ฅ Note that the search will be made sequentially in the directions Sn ; S1,S2,S3,...,Sn 1,Sn ;

(1) (1) (1) (2) ( ) (1) (2) (3) โˆ’ Sp ; S2, , ; ; , , ; , . .. until the minimum point is found. ๐Ÿ๐Ÿ ( ) Here indicates๐‘†๐‘†๐‘๐‘ ๐‘†๐‘† ๐‘๐‘the coordinate๐‘†๐‘†๐‘๐‘ ๐‘†๐‘†๐’‘๐’‘ direction๐‘†๐‘†๐‘๐‘ ๐‘†๐‘† ๐’‘๐’‘ and๐‘†๐‘†๐‘๐‘ the pattern direction. ๐‘—๐‘— ๐‘ก๐‘กโ„Ž ๐‘–๐‘– ๐‘–๐‘– ๐‘๐‘(1) (2) (3) Quadratic๐‘†๐‘† Convergence the pattern directions๐‘ข๐‘ข ๐‘†๐‘†Sp , ๐‘—๐‘— , , . .. are nothing but the (1) (2) lines joining the minima found along the directions , ๐‘†๐‘†๐‘๐‘ , ๐‘†๐‘†๐‘๐‘ , . . ., respectively. Hence

(1)๐‘›๐‘› ๐‘๐‘(1) ๐‘๐‘ (2) by T heorem 2.3, t he pa irs of directions (Sn ,Sp ๐‘†๐‘† ),๐‘†๐‘† (Sp ๐‘†๐‘†, ), and s o on, are A-

(1) (2) ๐‘๐‘ conjugate. Thus all the directions Sn ,Sp , , . .. are A-conjugate.๐‘†๐‘† Since, by T heorem

2.4, a ny s earch m ethod i nvolving minimization๐‘†๐‘†๐‘๐‘ a long a s et of c onjugate di rections i s quadratically convergent, Powellโ€™s method is quadratically convergent. From the method used for constructing the (1) (2) conjugate directions Sp , , . . . , we find that n minimization cycles are required to complete the construction of๐‘†๐‘†๐‘๐‘ n conjugate directions. In the cycle, the minimization is done along t he already c onstructed i conjugate directions๐‘–๐‘– ๐‘ก๐‘กaโ„Ž nd t he n โˆ’ i nonconjugate (coordinate) d irections. T hus a fter n cycles, al l t he n search d irections ar e m utually conjugate a nd a qua dratic w ill t heoretically be m inimized in 2one-dimensional

minimizations. This proves the quadratic convergence of Powellโ€™s method.๐‘›๐‘› It is to be noted that as with most of the numerical techniques, the convergence in many practical problems may not be as good as the theory seems to indicate. Powellโ€™s method may require a lot more iterations to minimize a function than the theoretically estimated number. There are several reasons for this:

50

1. Since the number of cycles n is valid only for quadratic functions, it will take generally greater than n cycles for nonquadratic functions. 2. The proof of quadratic convergence has been established with the assumption that the exact minimum is found in each of the one-dimensional minimizations. However, the actual minimizing step lengths will be only approximate, and hence the โˆ— subsequent directions will not be conjugate. Thus๐œ†๐œ†๐‘–๐‘– the method requires more number of iterations for achieving the overall convergence. 3. Powellโ€™s m ethod, d escribed a bove, c an br eak dow n be fore t he m inimum poi nt i s found. T his i s because t he sear ch d irections might be come de pendent or a lmost

dependent during numerical computation. ๐‘†๐‘†๐’Š๐’Š Powellโ€™s method is a very popular means of successive minimizations along conjugate directions. It is a zero-order method, requiring the evaluation of (x) only. If the

Problem involves n design variables, the basic algorithm is given๐น๐น by the following [3] โ€ข Choose a point 0 in the design space.

โ€ข Choose the starting๐‘ฅ๐‘ฅ vectors , = 1, 2, . . . , (the usual choice is = , where is the unit vector in the -coordinate๐‘ฃ๐‘ฃ๐‘–๐‘– ๐‘–๐‘– direction). ๐‘›๐‘› ๐‘ฃ๐‘ฃ๐‘–๐‘– ๐‘’๐‘’๐‘–๐‘– ๐‘’๐‘’๐‘–๐‘– โ€ข Cycle ๐‘ฅ๐‘ฅ๐‘–๐‘– โ€“ do with = 1, 2, . . . ,

* Minimize๐‘–๐‘– ( ) along๐‘›๐‘› the line through 1 in the direction of . Let the minimum point be xi. ๐น๐น ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ๐‘–๐‘–โˆ’ ๐‘ฃ๐‘ฃ๐‘–๐‘– โ€“ end do

โ€“ +1 0 (this vector can be shown to be conjugate to +1 produced in the

previous๐‘ฃ๐‘ฃ๐‘›๐‘› โ† cycle)๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅ๐‘›๐‘› ๐‘ฃ๐‘ฃ๐‘›๐‘› โ€“ Minimize ( ) along the line through 0 in the direction of +1. Let the minimum

Point be +1๐น๐น. ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฃ๐‘ฃ๐‘›๐‘› โ€“ if |xn+1๐‘ฅ๐‘ฅ ๐‘›๐‘› x | < exit loop โ€“ do with โˆ’ =๐ŸŽ๐ŸŽ 1, 2, .ฮต . . , ๐‘–๐‘– + ( 1 is discarded,๐‘›๐‘› the other vectors are reused) โˆ—โ€“ end๐’—๐’—๐’Š๐’Š โ†do ๐’—๐’—๐’Š๐’Š ๐Ÿ๐Ÿ ๐‘ฃ๐‘ฃ โ€ข end cycle

51

Powell d emonstrated that th e v ectors +1 produced i n su ccessive cy cles ar e mutually

conjugate, so that the minimum point of๐‘ฃ๐‘ฃ ๐‘›๐‘›a quadratic surface is reached in precisely cycles. In practice, the merit function is seldom quadratic, but as long as it can be approximated locally, P owellโ€™s m ethod w ill work. O f c ourse, i t us ually t akes m ore t han n cycles t o arrive at the minimum of a nonquadratic function. Note that it takes n line minimizations to construct each conjugate direction.

We s tart w ith p oint 0 and ve ctors v1and v2. T hen w e f ind t he di stance 1 that

minimizes (x0 + v1)๐‘ฅ๐‘ฅ, f inishing up a t poi nt x1 = x0 + 1 1. Next, w e d etermine๐‘ ๐‘  2 that m inimizes๐น๐น (๐‘ ๐‘  1 + 2) which t akes u s t o x2 = x๐‘ ๐‘ 1 ๐‘ฃ๐‘ฃ + 2v2. T he l ast sear ch๐‘ ๐‘  direction i s v3 =๐น๐น x๐‘ฅ๐‘ฅ2 x0๐‘ ๐‘ ๐‘ฃ๐‘ฃ. A fter f inding 3 by m inimizing ๐‘ ๐‘  (x0 + sv3) we g et to x3 = x0 + 3v3, completingโˆ’ the cycle. ๐‘ ๐‘  ๐น๐น As explained before,๐‘ ๐‘  the first cycle starts at point 0 and ends up at 3. The second cycle takes u s t o 6, w hich i s t he opt imal p oint. T he di๐‘ƒ๐‘ƒ rections 0 3 and๐‘ƒ๐‘ƒ 3 6 are m utually conjugate. ๐‘ƒ๐‘ƒ ๐‘ƒ๐‘ƒ ๐‘ƒ๐‘ƒ ๐‘ƒ๐‘ƒ ๐‘ƒ๐‘ƒ Powellโ€™s m ethod d oes have a m ajor f law t hat h as t o b e remedied i f (x) is not a

quadratic; the algorithm tends to produce search directions that gradually become๐น๐น linearly dependent, thereby ruining the progress toward the minimum. The source of the problem

is the automatic discarding of 1 at the end of each cycle. It has been suggested that it is

better to throw out the direction๐‘ฃ๐‘ฃ that resulted in the largest decrease of ( ), a policy that we ad opt. I t seem s co unter-intuitive t o di scard t he be st di rection, bu๐น๐น t i t๐‘ฅ๐‘ฅ i s likely t o be close to the direction added in the next cycle, thereby contributing to linear dependence. As a result of the change, the search directions cease to be mutually conjugate, so that a quadratic form is not minimized in n cycles any more. This is not a significant loss, since in practice ( ) is seldom a quadratic anyway. 2 2 Example 2.7๐น๐น ๐‘ฅ๐‘ฅMinimize ( 1, 2) = 1 2 + 2 1 + 2 1 2 + 2 from the starting 0 point = using Powellโ€™s๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ method.๐‘ฅ๐‘ฅ โˆ’ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ 0 SOLUTION๐‘ฟ๐‘ฟ๐Ÿ๐Ÿ ๏ฟฝ ๏ฟฝ Cycle 1: Univariate Search

52

0 We minimize f along = = from . To find the correct direction (+ 2 1 1 2 ๐‘›๐‘› 2) for decreasing the value๐‘†๐‘† of๐‘†๐‘† , we๏ฟฝ ๏ฟฝtake the๐‘‹๐‘‹ probe length as = 0.01. As ๐‘†๐‘† ๐‘œ๐‘œ๐‘œ๐‘œ โˆ’

๐‘†๐‘†1 = ( 1) = 0.0, and ๐‘“๐‘“ ๐œ€๐œ€ + ๐‘“๐‘“ ๐‘“๐‘“ ๐‘‹๐‘‹ = ( 1 + 2) = (0.0, 0.01) = 0.0099 < 1 decreases along๐‘“๐‘“ the direction๐‘“๐‘“ ๐‘‹๐‘‹ +๐œ€๐œ€๐‘†๐‘†S2. To find๐‘“๐‘“ the minimizingโˆ’ step length๐‘“๐‘“ along 2, we โˆ— ๐‘“๐‘“minimize ๐œ†๐œ† ๐‘†๐‘† 2 ( 1 + 2) = (0.0, ) = 0 Using golden section method๐‘“๐‘“ ๐‘‹๐‘‹ we get๐œ†๐œ†๐‘†๐‘† = 12๐‘“๐‘“ , ๐œ†๐œ† ๐œ†๐œ† โˆ’ = ๐œ†๐œ† + = . 2 1 2 0.5 โˆ— โˆ— 0 0.5 Next we minimize along = ๐œ†๐œ†from =๐‘ค๐‘ค๐‘ค๐‘ค โ„Ž๐‘Ž๐‘Ž ๐‘Ž๐‘Ž๐‘Ž๐‘Žsince๐‘‹๐‘‹ ๐‘‹๐‘‹ ๐œ†๐œ† ๐‘†๐‘† ๏ฟฝ ๏ฟฝ 1 1 2 0.0 ๐‘“๐‘“ 2 ๐‘†๐‘† = ๏ฟฝ( ๏ฟฝ2) =๐‘‹๐‘‹ (0.0,๏ฟฝ 0.5)๏ฟฝ = 0.25

+ = (๐‘“๐‘“ 2 +๐‘“๐‘“ ๐‘‹๐‘‹1) = ๐‘“๐‘“(0.01, 0.50) =โˆ’ 0.2298 > 2 ๐‘“๐‘“ =๐‘“๐‘“ ๐‘‹๐‘‹ ( 2 ๐œ€๐œ€๐‘†๐‘† 1) =๐‘“๐‘“ ( 0.01, 0.50) โˆ’ = 0.2698๐‘“๐‘“ ( ) ( ) 2 decreases al ong๐‘“๐‘“ โˆ’ 1. ๐‘“๐‘“As๐‘‹๐‘‹ โˆ’ ๐œ€๐œ€๐‘†๐‘†2 1 ๐‘“๐‘“=โˆ’ , 0.50 =โˆ’ 2 2 0.25,using 1 0.5 golden๐‘“๐‘“ section methodโˆ’๐‘†๐‘† we get๐‘“๐‘“ ๐‘‹๐‘‹ = โˆ’ . Hence๐œ†๐œ†๐‘†๐‘† ๐‘“๐‘“ =โˆ’๐œ†๐œ† =๐œ†๐œ† โˆ’ ๐œ†๐œ† โˆ’ 2 3 2 1 0.5 โˆ— โˆ— 0 0.5 โˆ’ Now we minimize along ๐œ†๐œ† = from ๐‘‹๐‘‹ = ๐‘‹๐‘‹ โˆ’ ,as๐œ†๐œ† ๐‘†๐‘† = ๏ฟฝ ( ) ๏ฟฝ = 0.75, 2 1 3 0.5 3 + โˆ’ ๐Ÿ‘๐Ÿ‘ = ( 3 + ๐‘“๐‘“2) = ๐‘†๐‘† ( 0.5,๏ฟฝ 0.๏ฟฝ51) =๐‘‹๐‘‹ 0.7599๏ฟฝ <๏ฟฝ 3๐‘“๐‘“, f decreases๐‘“๐‘“ ๐‘ฟ๐‘ฟ alโˆ’ ong + 2 direction.๐‘“๐‘“ ๐‘“๐‘“ ๐‘‹๐‘‹ ๐œ€๐œ€๐‘†๐‘† ๐‘“๐‘“ โˆ’ โˆ’ ๐‘“๐‘“ ๐‘บ๐‘บ Since 2 ( 3 + 2) = ( 0.5, 0.5 + ) = 0.75, using gol den s ection m ethod 1 we get = ๐‘“๐‘“ ๐‘‹๐‘‹ ๐œ†๐œ†๐‘†๐‘† 2 ๐‘“๐‘“ โˆ’ ๐œ†๐œ† ๐œ†๐œ† โˆ’ ๐œ†๐œ† โˆ’ โˆ— This gives๐œ†๐œ† 0.5 = + = 4 3 2 1.0 โˆ— โˆ’ Cycle 2: Pattern Search ๐‘‹๐‘‹ ๐‘‹๐‘‹ ๐œ†๐œ† ๐‘†๐‘† ๏ฟฝ ๏ฟฝ Now we generate the first pattern direction as 1 0 (1) 0.5 = = 1 = 4 2 2 0.5 1 2 โˆ’ ๐‘†๐‘†๐‘๐‘ ๐‘‹๐‘‹ โˆ’ ๐‘‹๐‘‹ ๏ฟฝโˆ’ ๏ฟฝ โˆ’ ๏ฟฝ ๏ฟฝ ๏ฟฝ ๏ฟฝ

53

( ) and minimize along from X4. Since ๐Ÿ๐Ÿ ๐‘“๐‘“ ๐‘บ๐‘บ ๐’‘๐’‘ 4 = ( ) = 1.0 + (1) ๐‘“๐‘“ = ๐‘“๐‘“ ( ๐‘‹๐‘‹๐Ÿ’๐Ÿ’ + โˆ’ = ( 0.5 0.005, 1 + 0.005) ๐‘“๐‘“ ๐‘“๐‘“ ๐‘‹๐‘‹ ๐Ÿ’๐Ÿ’ ๐œ€๐œ€๐‘†๐‘† =๐’‘๐’‘ ( ๐‘“๐‘“0.505โˆ’ , 1.โˆ’005) = 1.004975 (1) f decreases in the positive direction of . As๐‘“๐‘“ โˆ’ โˆ’

(1) ๐’‘๐’‘ 2 ( 4 + = ( 0.5 0.5 , 1.0๐‘†๐‘† + 0.5 ) = 0.25 0.50 1.00, using๐‘“๐‘“ ๐‘‹๐‘‹ golden๐œ†๐œ†๐‘†๐‘†๐’‘๐’‘ section๐‘“๐‘“ methodโˆ’ โˆ’we get๐œ†๐œ† = 1.0 and๐œ†๐œ† hence ๐œ†๐œ† โˆ’ ๐œ†๐œ† โˆ’ โˆ— 1 ๐œ†๐œ† 1 (1) 1 X = X + S = + 1.0 2 = 5 4 2 1 โˆ’ 1.5 โˆ— 1 โˆ’ ๐œ†๐œ† ๐’‘๐’‘ ๏ฟฝโˆ’ ๏ฟฝ ๏ฟฝ 2 ๏ฟฝ ๏ฟฝ ๏ฟฝ The point X can be identified to be the optimum point.

If we do not๐Ÿ“๐Ÿ“ recognize 5 as the optimum point at this stage, we proceed to 0 minimize f along the direction๐‘‹๐‘‹ = from . Then we would obtain 2 1 5 + 5 = (X5) = 1.25, =๐‘†๐‘† ( ๏ฟฝ5 ๏ฟฝ+ ๐‘‹๐‘‹2) > 5, and = (X5 S2) > f5 โˆ’ This๐‘“๐‘“ shows๐‘“๐‘“ that f โˆ’cannot be๐‘“๐‘“ minimized๐‘“๐‘“ ๐‘‹๐‘‹ along๐œ€๐œ€๐‘†๐‘† S2, and๐‘“๐‘“ hence๐‘“๐‘“ X5 will๐‘“๐‘“ be theโˆ’ optimum๐œ€๐œ€ point. In this example the convergence has been achieved in the second cycle itself. This is to be expected in this case, as f is a quadratic function, and the method is a Quadratically convergent method. The n umerical re sult f or th is example i s s ummarized be low ( see a ppendix f or t he Powellโ€™s matlab code). xmin fmin ------(-1.503, 2.356) -0.872520631787970 (-1.000, 1.500) -1.249999992736614 (-1.000, 1.500) -1.249999999974476 The minimum point (-1.000, 1.500) is reached at the 3th cycle.

54

Chapter 3 Trust region methods

3.1Trust region frame work In the last section of the third chapter we explained the trust region methods as one of the solution procedures for treating a nonlinear programming problem. Here a b rief review will be helpful to follow the integration of trust region method into DFO algorithm. The trust region frame work is usually in the context when at least the gradient and sometimes Hessian of the objective function can be evaluated or estimated accurately.

Main steps of a typical trust region method are [2]

1. Given a current iterate, build a good local approximation model. 2. Choose a neighborhood around the current iterate when the model โ€˜is trustedโ€™ to be accurate. Minimize the model in this neighborhood. 3. Determine if the step is successful by evaluating the true objective function at the new poi nt c omparing t he t rue r eduction i n value of t he obj ective w ith t he reduction predicted by the model. 4. If the step is successful, accept the new point as the next iterate. Increase the size of the trust region, if the success is really significant. Otherwise reject the new point and reduce the size of the trust region. 5. Repeat until convergence.

For a m odel ba sed on t he T aylor s eries e xpansion w e k now th at if th e t rust re gion is made small enough, then the approximation is sufficiently accurate and the algorithm will make a successful step (unless the optimum has been reached).

To use trust region fame work in derivative free case we use an alternative approximation technique, which does not use derivative estimates. Quadratic interpolation is one such technique which can be applied successfully with in a trust region method. However, we need to guarantee that the approximation model is locally good: that is that a successful step will be made after sufficient reduction of the trust region.

55

3.2 Quadratic interpolation Consider the problem of interpolating a given or suitably chosen function : by a { 1 2 }๐‘›๐‘› quadratic p olynomial ( ) at a ch osen set o f p oints = , , โ€ฆ ,๐‘“๐‘“ โ„ โ†’ โ„. the ๐‘๐‘ ๐‘›๐‘› quadratic polynomial ๐‘„๐‘„( ๐‘ฅ๐‘ฅ) is an interpolation of the function๐‘Œ๐‘Œ (๐‘ฆ๐‘ฆ ) ๐‘ฆ๐‘ฆwith respect๐‘ฆ๐‘ฆ โŠ† to โ„ the set if ๐‘„๐‘„ ๐‘ฅ๐‘ฅ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ ๐‘Œ๐‘Œ = ( = 1,2, โ€ฆ , ) (3.1) ๐‘—๐‘— ๐‘—๐‘— ๐‘„๐‘„๏ฟฝ๐‘ฆ๐‘ฆ ๏ฟฝ ๐‘“๐‘“๏ฟฝ๐‘ฆ๐‘ฆ ๏ฟฝ ๐‘—๐‘— ๐‘๐‘ Such, t hat f i s know n a t a ll f initely m any e lements o f . H ere w e n ote that is our model w hich w e de fined a s in th e f irst c hapter. W๐‘Œ๐‘Œ ith th e c ontext o f trust๐‘„๐‘„ region methods; that ๐‘š๐‘š๐‘˜๐‘˜

( ) = ( )

๐‘˜๐‘˜ Suppose that the space of quadratic polynomials๐‘š๐‘š ๐‘ฅ๐‘ฅ ๐‘„๐‘„ is๐‘ฅ๐‘ฅ spanned by a set of basis functions

(. )( = 1, 2, โ€ฆ ).

๐‘–๐‘– ๐œ™๐œ™Then any๐‘–๐‘– quadratic๐‘ž๐‘ž polynomial can be written in terms of these basis functions, that is

( ) = ๐‘ž๐‘ž ( ), =1 ๐‘„๐‘„ ๐‘ฅ๐‘ฅ ๏ฟฝ ๐›ผ๐›ผ๐‘–๐‘– ๐œ™๐œ™๐‘–๐‘– ๐‘ฅ๐‘ฅ ๐‘–๐‘– Where, the coefficient vector = ( 1, 2,โ€ฆ, ) is to be determined. We need ๐‘‡๐‘‡ ๐‘ž๐‘ž 1 1๐›ผ๐›ผ ๐›ผ๐›ผ ๐›ผ๐›ผ ๐›ผ๐›ผ = 1 + + + = ( + 1)( + 2) Points to f ind a ll o f th e in terpolation 2 2 ๐‘›๐‘›โˆ’ 1 parameters.๐‘ž๐‘ž ๐‘›๐‘› If๐‘›๐‘› we ๐‘›๐‘›have ๏ฟฝ ๏ฟฝ( + 1๐‘›๐‘›)( + 2)๐‘›๐‘› points, we can ensure that the quadratic model is 2 entirely de termined by t he๐‘›๐‘› f ollowing๐‘›๐‘› s ystem of e quations. When t his i s t he c ase, t he system of linear equations

q

i i = ( = 1,2, โ€ฆ , ) (3.2) i=1 ๐‘—๐‘— ๐‘—๐‘— ๏ฟฝ ฮฑ ฯ• ๏ฟฝ๐‘ฆ๐‘ฆ ๏ฟฝ ๐‘“๐‘“๏ฟฝ๐‘ฆ๐‘ฆ ๏ฟฝ ๐‘—๐‘— ๐‘๐‘ Can be solved to, derive the interpolation parameters. The parameter or coefficient matrix of this system is of the type ร— and looks as follows; 56 ๐‘๐‘ ๐‘ž๐‘ž

1 1 1( ) ( ) ( ) = . (3.3) ๐œ™๐œ™ ๐‘ฆ๐‘ฆ โ‹ฏ ๐œ™๐œ™๐‘ž๐‘ž ๐‘ฆ๐‘ฆ 1( ) ( ) ๐œ™๐œ™ ๐‘Œ๐‘Œ ๏ฟฝ โ‹ฎ๐‘๐‘ โ‹ฑ โ‹ฎ ๐‘๐‘ ๏ฟฝ ๐œ™๐œ™ ๐‘ฆ๐‘ฆ โ‹ฏ ๐œ™๐œ™๐‘ž๐‘ž ๐‘ฆ๐‘ฆ For a given set of points and a set of function values, an interpolation polynomial exists and is unique if and only if ( ) is square, that is = , and nonsingular. Theoretically, this m eans that t he sy stem ๐œ™๐œ™(3.3)๐‘Œ๐‘Œ can be s olved but๐‘๐‘ i๐‘ž๐‘ž n pr actice t he s olvability of t his system depends on whether the matrix ( ) is ill-conditioned or not.

From t he a bove a rgument, w e c onclude๐œ™๐œ™ ๐‘Œ๐‘Œ that if w e m anage t o de termine t he qu adratic 1 polynomial uniquely, then we have = = ( + 1)( + 2). however, we need to be 2 1 aware o f t he f act t hat not an y ( ๐‘๐‘+ 1)(๐‘ž๐‘ž + 2)๐‘›๐‘› point i๐‘›๐‘› n can be i nterpolated by a 2 ๐‘›๐‘› quadratic p olynomial. O bviously a๐‘›๐‘› lthough,๐‘›๐‘› 3 distinct po intsโ„ c an be interpolated by a quadratic f unction in u nivariate interpolation, this is not the c ase in multivariate interpolation. i n f act, 3 points w ill not be e nough t o obt ain a qua dratic i nterpolation polynomial w henever t he di mension of t he i nterpolation s pace i s gr eater t han one . B y inspection, one can s ee t hat 6 p oints a re ne cessary t o obt ain a unique qua dratic interpolation of a f unction i n t wo di mensions. H owever, a n i nterpolation s et Y of s ix points l ying on one l ine c annot be i nterpolated by a qua dratic f unction. Therefore, t he points of Y must satisfy a geometric condition to ensure the existence and uniqueness of the quadratic model. This geometric condition is known as the poisedness of the point set. Definition3.1: A set o f points Y i s cal led p oised, w ith r espect t o a g iven su bspace o f polynomials, if the considered function ( ) can be interpolated at the points of Y by the polynomials f rom t his s ubspace, t hat i๐‘“๐‘“ s, i๐‘ฅ๐‘ฅ f t here a lways e xists a s uitable i nterpolating polynomial in that sub space.

Remark: In DFO, poisedness is a necessary geometric condition on the interpolation set Y that ensures the existence and uniqueness of the quadratic model ( )wanted and used in

DFO algorithm. ๐‘„๐‘„ ๐‘ฅ๐‘ฅ

We illustrate the implied geometric character of poisedness by the following examples.

57

Example3.1: Suppose n = 2 a nd Y is a set of six points on a unit circle. Then, 2

cannot be interpolated by a polynomial of the form ๐‘Œ๐‘Œ โŠ†โ„ 2 2 0 + 1 1 + 2 2 + 1,1 1 + 1,2 1 2 + 2,2 2 . Hence, is not poised with respect to

the๐‘Ž๐‘Ž ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘Œ๐‘Œ Space of q uadratic po lynomials. On the o ther h and, Y can b e interpolated b y a polynomial of the form 2 2 3 0 + 1 1 + 2 2 + 1,1 1 + 1,2 1 2 + 2,2 2 + 1,1,1 1 . Therefore is poised in an

appropriate๐‘Ž๐‘Ž ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ subspace๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ of๐‘Ž๐‘Ž the๐‘ฅ๐‘ฅ space๐‘Ž๐‘Ž of๐‘ฅ๐‘ฅ cubic๐‘ฅ๐‘ฅ ๐‘Ž๐‘Žpolynomials.๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ ๐‘Œ๐‘Œ 2 2 Example3.2: Consider the two quadrics 1( , ) = 2 + and ( ) 2 2 ( ) 2 , = + , Whose i ntersection๐‘ž๐‘ž cu๐‘ฅ๐‘ฅ rve๐‘ฆ๐‘ฆ projects๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ in โˆ’t he๐‘ฆ๐‘ฆ , to th e ๐‘ž๐‘žconics๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ ๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ ๐ผ๐ผ ๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ โˆ’ ๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘ ( , ) = 0 ( , ) = 2 . Namely,

๐ถ๐ถ ๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ ๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘คโ„Ž ๐ถ๐ถ ๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ ๐‘ฅ๐‘ฅ โˆ’๐‘ฆ๐‘ฆ 2 + 2 2 = 2 + 2, 2 ๐‘ฅ๐‘ฅ 2๐‘ฅ๐‘ฅ =โˆ’ 2๐‘ฆ๐‘ฆ ๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ 2 โ‡” ๐‘ฅ๐‘ฅ= ๐‘ฆ๐‘ฆ Definition3.2: A setโ‡” o ๐‘ฅ๐‘ฅ f p oints๐‘ฆ๐‘ฆ Y i s cal led , i f it r emains poi sed unde r small pe rturbations. F or e xample, i f = 2, s๐‘ค๐‘ค ix๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค๐‘ค poiโˆ’ nts๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘๐‘ a lmost on a l ine m ay define a poised set . However, si nce so me s mall๐‘›๐‘› pe rturbation of t he poi nts m ight m ake t hem aligned, it is not a well-poised set. As we mentioned before, a set of points is poised if ( ) is nonsingular with respect to

the space of quadratic polynomials. If we look for an๐œ™๐œ™ understanding๐‘Œ๐‘Œ of this by the fact that an interpolation polynomial exists and is unique if and only if ( ) is square and

nonsingular, then we conclude: ๐œ™๐œ™ ๐‘Œ๐‘Œ

is poised if the determinant of ( ) is nonvanishing, that is, 1 1 ๐‘Œ๐‘Œ 1( ) (๐œ™๐œ™ )๐‘Œ๐‘Œ ( ) = = ( ) 0 (3.4) ๐œ™๐œ™ ๐‘ฆ๐‘ฆ โ‹ฏ ๐œ™๐œ™๐‘ž๐‘ž ๐‘ฆ๐‘ฆ 1( ) ( ) ๐›ฟ๐›ฟ ๐‘ฆ๐‘ฆ ๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘ ๏ฟฝ โ‹ฎ๐‘๐‘ โ‹ฑ โ‹ฎ ๐‘๐‘ ๏ฟฝ ๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘๐‘‘ ๐‘Œ๐‘Œ โ‰  ๐œ™๐œ™ ๐‘ฆ๐‘ฆ โ‹ฏ ๐œ™๐œ™๐‘ž๐‘ž ๐‘ฆ๐‘ฆ

58

The measure of poisedness in a DFO algorithm can be explained by a methodology based on Newton fundamental polynomials. In DFO, the approach of handling the poisedness in combination with the Newton fundamental polynomials is a distinctive issue. This is so, because it a llows us no t onl y to c hoose a goo d i nterpolation s et f rom a gi ven s et of sample points but also to find a new sample point which improves the poisedness of the interpolation set. If w e ha d no s uch a us eful t ool, t hen, removing a poi nt f rom t he s et would have caused the conditioning of the coefficient matrix to get worse in the updating step for the interpolation set of DFO. There is also a detailed work on DFO in which the quadratic a pproximation m odel i s de termined by L agrange i nterpolation pol ynomials instead of the Newton fundamental polynomials Let us focus on the Newton fundamental points: The points y in our interpolation Set = { 1, 2,โ€ฆ, } is a subset of are organized into + 1 blocks, where ๐‘๐‘ ๐‘›๐‘› + 1 [ ]๐‘Œ๐‘Œ ( =๐‘ฆ๐‘ฆ 1, 2,๐‘ฆ๐‘ฆ โ€ฆ , )๐‘ฆ๐‘ฆ is the๐‘ค๐‘คโ„Ž๐‘–๐‘– ๐‘–๐‘–โ„Ž block, containingโ„ [ ] = ๐‘‘๐‘‘ points. ๐‘™๐‘™ ๐‘ก๐‘กโ„Ž ๐‘™๐‘™ ๐‘™๐‘™ ๐‘›๐‘› โˆ’ Definition3.3:๐‘Œ๐‘Œ ๐‘™๐‘™ A ๐‘‘๐‘‘single Newton๐‘™๐‘™ fundamental polynomial๏ฟฝ๐‘Œ๐‘Œ ๏ฟฝ ๏ฟฝ of degree๏ฟฝ corresponds to each ๐‘™๐‘™ [ ] [ ] point ( ) satisfying the following conditions: ๐‘™๐‘™ [ ๐‘–๐‘– ] ๐‘™๐‘™ ๐‘™๐‘™ [ ] ( )๐‘ฆ๐‘ฆ =โˆˆ ๐‘Œ๐‘Œ for all ( ) with {0,1,2, โ€ฆ , }. ๐‘™๐‘™ ๐‘—๐‘— ๐‘š๐‘š ๐‘—๐‘— ๐‘š๐‘š ๐‘๐‘Here๐‘–๐‘– ๐‘ฆ๐‘ฆ we refer๐›ฟ๐›ฟ to๐‘–๐‘–๐‘–๐‘– ๐›ฟ๐›ฟKroneckerโ€™s๐‘™๐‘™๐‘™๐‘™ ๐‘ฆ๐‘ฆ symbol for๐‘š๐‘š , = โˆˆ 0,1,2, โ€ฆ ,๐‘™๐‘™ : 1, = = ๐‘–๐‘– ๐‘—๐‘— ๐‘™๐‘™ 0, . ๐‘–๐‘–๐‘–๐‘– ๐‘–๐‘– ๐‘—๐‘— Consider t he s et of i nterpolation ๐›ฟ๐›ฟp๐‘–๐‘–๐‘–๐‘– oints๏ฟฝ be ing pa rtitioned i nto t hree di sjoint bl ocks ๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ [0], [1], [2] which correspond to the constant term, the linear terms and the quadratic [0] [1] ๐‘Œ๐‘Œterms๐‘Œ๐‘Œ of a๐‘Œ๐‘Œ quadratic polynomial, respectively. Hence has a single element, has ( +1) elements and [2] has elements. The basis { (. )} of NFP is also partitioned into 2 ๐‘Œ๐‘Œ ๐‘Œ๐‘Œ ๐‘›๐‘› ๐‘›๐‘› ๐‘›๐‘› three blocks {๐‘Œ๐‘Œ 0(. )}, { 1(. )},{ 2(. )} with the approximate๐‘๐‘๐‘–๐‘– number of elements in each 0 block. T he unique๐‘๐‘๐‘–๐‘– e lement๐‘๐‘๐‘–๐‘– of ๐‘๐‘{ ๐‘–๐‘– (. )} is a pol ynomial of de gree zero. E ach of t he n ( +1) elements { 1(. )} is a polynomial of๐‘–๐‘– degree one and, finally, each of the elements ๐‘๐‘ 2 ๐‘›๐‘› ๐‘›๐‘› of { 2(. )}๐‘๐‘ is๐‘–๐‘– a polynomial of degree two.

The๐‘๐‘ basis๐‘–๐‘– elements and the interpolation points are set in one to one correspondence, so that t he poi nts f rom bl ock [ ] correspond t o the pol ynomials f rom bl ock { 1(. )}. a ๐‘™๐‘™ ๐‘Œ๐‘Œ 59 ๐‘๐‘๐‘–๐‘–

Newton Fundamental Polynomial (NFP) (. ) and a point are in the correspondence ๐‘–๐‘– with e ach other i f a nd only i f t he value of๐‘๐‘๐‘–๐‘– t hat pol ynomial๐‘ฆ๐‘ฆ a t that point i s one a nd i ts value a t a ny ot her poi nt i n the s ame bl ock or i n a ny p revious bl ock i s z ero. In ot her words, if corresponds to , then = 1 and = 0 for all other indices ๐‘–๐‘– ๐‘–๐‘– ๐‘—๐‘— Example3.3๐‘ฆ๐‘ฆ : consider the quadratic๐‘๐‘๐‘–๐‘– ๐‘๐‘interpolation๐‘–๐‘–๏ฟฝ๐‘ฆ๐‘ฆ ๏ฟฝ on๐‘๐‘ a๐‘–๐‘– ๏ฟฝ๐‘ฆ๐‘ฆplane.๏ฟฝ We require six interpolation๐‘—๐‘— points using three blocks. [0] = {(0,0)}, [1] = {(1,0), (0,1), } [2] = {(2,0), (1,1), (0,2)} 2 ๐‘Œ๐‘Œ Corresponding๐‘Œ๐‘Œ t o t he i nitial b asis๐‘Ž๐‘Ž f๐‘Ž๐‘Ž๐‘Ž๐‘Ž unctions;๐‘Œ๐‘Œ 1, 1, 2, 1 2, 2 respectively. Applying some procedures we find the NFP: ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘ฅ๐‘ฅ 2 2 0 = 1, 1 = , 1 = , 2 = 1 1 , 2 = 2 = 2 2 1 1 1 2 2 1 2 2 1 2 3 2 ๐‘ฅ๐‘ฅ โˆ’๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ โˆ’๐‘ฅ๐‘ฅ Algorithm๐‘๐‘ ๐‘๐‘ (derivative๐‘ฅ๐‘ฅ ๐‘๐‘ free๐‘ฅ๐‘ฅ trust๐‘๐‘ region method)๐‘๐‘ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐‘๐‘ The steps of derivative free trust region methods are given as follows [6] Step 0: initializations Let a starting point and the value of ( ) be given.

Choose an initial trust๐‘ฅ๐‘ฅ๐‘ ๐‘  region radius 0๐‘“๐‘“>๐‘ฅ๐‘ฅ 0.๐‘ ๐‘  Choose at least one additional point โˆ†not further than 0> 0 away from to create an initial w ell-poised interpolation s et and initial โˆ†b asis of N ewton๐‘ฅ๐‘ฅ ๐‘ ๐‘ f undamental polynomials. ๐‘Œ๐‘Œ Determine 0 Y which has the best objective function value; i.e. 0 solves the problem min ( ) ๐‘ฅ๐‘ฅ โˆˆ . ๐‘ฅ๐‘ฅ

Set = 0, ch oose p arameters 0, 1,๐‘ ๐‘  ๐‘ก๐‘ก ๐‘ฅ๐‘ฅโˆˆ๐‘Œ๐‘Œ ๐‘“๐‘“ 0๐‘ฅ๐‘ฅ < 0 < 1 < 1 0 < 0 1 < 1

2 ๐‘˜๐‘˜ ๐œ‚๐œ‚ ๐œ‚๐œ‚ ๐‘ค๐‘คโ„Ž๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ ๐œ‚๐œ‚ ๐œ‚๐œ‚ ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž ๐›พ๐›พ โ‰ค ๐›พ๐›พ โ‰ค ๐›พ๐›พStep 1: build the model using the interpolation set Y and basis of NFP, build a quadratic interpolation polynomial ( ).

Step 2: minimize the model๐‘„๐‘„๐‘˜๐‘˜ with๐‘ฅ๐‘ฅ in the trust region. Set = { : }. compute the point such that ๐‘›๐‘› ( ) ( ) ๐›ฝ๐›ฝ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ โˆˆโ„ โ€–๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’๐‘ฅ๐‘ฅ โ€– โ‰ค โˆ†๐‘˜๐‘˜ = min ๐‘ฅ๐‘ฅ๏ฟฝ. ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ Compute ( ) and the ratio ๐‘„๐‘„ ๐‘ฅ๐‘ฅ๏ฟฝ ๐‘ฅ๐‘ฅโˆˆ๐›ฝ๐›ฝ๐‘˜๐‘˜ ๐‘„๐‘„ ๐‘ฅ๐‘ฅ

๐‘“๐‘“ ๐‘ฅ๐‘ฅ๏ฟฝ๐‘˜๐‘˜ 60

( ) ( ) = . ( ) ( ) ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๏ฟฝ๐‘˜๐‘˜ ๐œŒ๐œŒ๐‘˜๐‘˜ Step3: update the interpolation set ๐‘„๐‘„๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐‘„๐‘„๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๏ฟฝ๐‘˜๐‘˜

โ€ข If 0, Include in , dropping one of the existing interpolation points if

necessary.๐œŒ๐œŒ๐‘˜๐‘˜ โ‰ฅ ๐œ‚๐œ‚ ๐‘ฅ๐‘ฅ๏ฟฝ๐‘˜๐‘˜ ๐‘Œ๐‘Œ

โ€ข If < 0, include in , if it improves the quality of the model

โ€ข If ๐œŒ๐œŒ๐‘˜๐‘˜ < ๐œ‚๐œ‚ 0 and t here๐‘ฅ๐‘ฅ๏ฟฝ ๐‘˜๐‘˜ar e l๐‘Œ๐‘Œ ess t hat + 1 points in t he i ntersection of and , generate๐œŒ๐œŒ๐‘˜๐‘˜ ๐œ‚๐œ‚ ne w i nterpolation poi nt ๐‘›๐‘› in , w hile preserving/ i mpronving๐‘Œ๐‘Œ w ell๐›ฝ๐›ฝ๐‘˜๐‘˜-

poissedness. ๐›ฝ๐›ฝ๐‘˜๐‘˜ โ€ข Update the basis of the Newton Fundamental polynomials. Step 4: update the trust region radius.

โ€ข If 1, increase the trust region radius [ ] ๐œŒ๐œŒ๐‘˜๐‘˜ โ‰ฅ ๐œ‚๐œ‚ +1 , 2 . โ€ข If < 0 and the cardinalityโˆ† ๐‘˜๐‘˜of โˆˆ โˆ†๐‘˜๐‘˜ ๐›พ๐›พ wasโˆ†๐‘˜๐‘˜ less than + 1 when was ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ computed,๐œŒ๐œŒ ๐œ‚๐œ‚ reduce the trust region radius๐‘Œ๐‘Œ โˆฉ๐›ฝ๐›ฝ ๐‘›๐‘› ๐‘ฅ๐‘ฅ๏ฟฝ +1 [ 0 , 1 ]

โˆ†๐‘˜๐‘˜ โˆˆ ๐›พ๐›พ โˆ†๐‘˜๐‘˜ ๐›พ๐›พ โˆ†๐‘˜๐‘˜ +1=

Step 5: update the current iterate. ๐‘œ๐‘œ๐‘œ๐‘œโ„Ž๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’๐‘’ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  โˆ†๐‘˜๐‘˜ โˆ†๐‘˜๐‘˜ Determine with b est objective function value b y so lving t he d iscrete

problem ๐‘ฅ๐‘ฅฬ…๐‘˜๐‘˜ min ๐‘–๐‘– ๐‘–๐‘– ๐‘ฆ๐‘ฆ โˆˆ๐‘Œ๐‘Œ ๐‘“๐‘“๏ฟฝ๐‘ฆ๐‘ฆ ๏ฟฝ ๐‘–๐‘– If improvement is sufficient (in the๐‘ฆ๐‘ฆ senseโ‰ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ of prediction) that is ( ) ( ) ( ) ( ) Then we put = . Set = . otherwise ( ) ( ) 0 ( ) ( ) +1 ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’๐‘“๐‘“ ๐‘ฅ๐‘ฅฬ…๐‘˜๐‘˜ ๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’๐‘“๐‘“ ๐‘ฅ๐‘ฅฬ…๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘˜๐‘˜ ๐‘„๐‘„ +1๐‘ฅ๐‘ฅ =โˆ’๐‘„๐‘„ ๐‘ฅ๐‘ฅ๏ฟฝ. increaseโ‰ฅ ๐œ‚๐œ‚ k by one, and๐œŒ๐œŒฬ… go to๐‘„๐‘„ step๐‘ฅ๐‘ฅ โˆ’๐‘„๐‘„ one.๐‘ฅ๐‘ฅ๏ฟฝ ๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅฬ…

๐‘ฅ๐‘ฅ ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜

61

Appendix (MATLAB codes) 1. Code for Golden search

function [xmin,fmin] = goldSearch(f,a,b)

% Golden section search for the minimum of f(x).

% The minimum point must be bracketed in a <= x <= b.

% usage: [fmin,xmin] = goldsearch(func,xstart,h)

% input:

% f = handle of function that returns f(x).

% a, b = limits of the interval containing the minimum.

% output:

% fmin = minimum value of f(x).

% xmin = value of x at the minimum point.

N=20;% N is the number of function evaluations done. c = (-1+sqrt(5))/2; x1 = c*a + (1-c)*b; f1 = feval(f,x1); x2 = (1-c)*a + c*b; f2 = feval(f,x2);

%fprintf('------\n');

fprintf(' x1 x2 f(x1) f(x2) b - a \n');

fprintf('------\n');

62

fprintf('%.4e %.4e %.4e %.4e %.4e\n', x1, x2, f1, f2, b-a);

% Main loop for i = 1:N-2

if f1 < f2

b = x2;

x2 = x1;

f2 = f1;

x1 = c*a + (1-c)*b;

f1 = feval(f,x1);

else

a = x1;

x1 = x2;

f1 = f2;

x2 = (1-c)*a + c*b;

f2 = feval(f,x2);

end;

fprintf('%.4e %.4e %.4e %.4e %.4e\n', x1, x2, f1, f2, b-a);

end

if (abs(b-a) < eps)

fprintf('succeeded after %d steps\n', i); 63

return;

end;

if f1 < f2; fmin = f1; xmin = x1;

else

fmin = f2; xmin = x2;

end

2. Code for Powell method

The algorithm for Powellโ€™s method is listed below. It utilizes two arrays: df contains the decreases of the merit function in the first n moves of a cycle, and the matrix u stores the corresponding direction vectors (one vector per column).

๐’Š๐’Š To i mplement t his algorithm w๐’—๐’— e us e t he gol d bracket a nd t he go ld search algorithms together with it.

i. Gold bracket

function [a,b] = goldBracket(fun,x1,h)

% Brackets the minimum point of f(x).

% USAGE: [a,b] = goldBracket(func,xStart,h)

% INPUT:

% func = handle of function that returns f(x).

% x1 = starting value of x.

% h = initial step size used in search.

% OUTPUT:

% a, b = limits on x at the minimum point. 64

c = 1.618033989; f1 = feval(fun,x1); x2 = x1 + h; f2 = feval(fun,x2);

% Determine downhill direction & change sign of h if needed. if f2 > f1

h = -h;

x2 = x1 + h; f2 = feval(fun,x2);

% Check if minimum is between x1 - h and x1 + h

if f2 >f1

a = x2; b = x1 - h; return

end end

% Search loop for i = 1:50

h = c*h;

x3 = x2 + h; f3 = feval(fun,x3);

if f3 >f2

a = x1;

b = x3;

return

end 65

x1 = x2; x2 = x3; f2 = f3;

end

error('goldbracket did not find minimum')

ii. Golden search

function [xmin,fmin] = goldensearch(f,a,b)

% Golden section search for the minimum of f(x).

% The minimum point must be bracketed in a <= x <= b.

% usage: [fmin,xmin] = goldsearch(func,xstart,h)

% input:

% f = handle of function that returns f(x).

% a, b = limits of the interval containing the minimum.

% output:

% fmin = minimum value of f(x).

% xmin = value of x at the minimum point.

%eps = 1.0e-3;

N = 20; % N is the number of function evaluations done. c = (-1+sqrt(5))/2;

x1 = c*a + (1-c)*b;

f1 = feval(f,x1);

x2 = (1-c)*a + c*b;

f2 = feval(f,x2); 66

% Main loop for i = 1:N-2

if f1 < f2

b = x2;

x2 = x1;

f2 = f1;

x1 = c*a + (1-c)*b;

f1 = feval(f,x1);

else

a = x1;

x1 = x2;

f1 = f2;

x2 = (1-c)*a + c*b;

f2 = feval(f,x2);

end;

end

if f1 < f2

fmin = f1; xmin = x1; else

fmin = f2; xmin = x2; end 67

iii. function powell clc global x V x=[0;0]; tol = 1.0e-4;h = 0.5; if size(x,2) > 1; x = x'; end % x must be column vector n = length(x); % Number of design variables df = zeros(n,1); % Decreases of f stored here u = eye(n); % Columns of u store search directions V fprintf(' xmin fmin \n'); fprintf(' ------\n'); for j = 1:30 % Allow up to 30 cycles

xOld = x;

fOld = feval(@myfun2,xOld);

% First n line searches record the decrease of f

for i = 1:n

V = u(1:n,i);

[a,b] = goldBracket(@fLine,0.0,h);

[s,fmin] = goldensearch(@fLine,a,b);

df(i) = fOld - fmin;

fOld = fmin; 68

x = x + s*V;

end

% Last line search in the cycle

V = x - xOld;

[a,b] = goldBracket(@fLine,0.0,h);

[s,fmin] = goldensearch(@fLine,a,b); x = x + s*V; fprintf(' (%3.3f,%3.3f) %2.15f \n',x, fmin); if sqrt(dot(x-xOld,x-xOld)/n) < tol

y = x; break end

% Identify biggest decrease of f & update search

% directions

iMax = 1; dfMax = df(1);

for i = 2:n

if df(i) > dfMax

iMax = i; dfMax = df(i);

end

end

for i = iMax:n-1

u(1:n,i) = u(1:n,i+1); 69

end

u(1:n,n) = V;

end

fprintf('The minimum point (%2.3f,%2.3f)is reached at the %3dth cycle.\n',y,j)

function z = fLine(s) % f in the search direction V

global x V

z = feval(@myfun2,x+s*V);

for the example on page 55 we have the following. function y = myfun2(x)

%

% y = x(1)-x(2)+2*(x(1)).^2+2*x(1)*x(2)+(x(2)).^2;

70

Reference

[1] Basak Aktek, Derivative Free Optimization Methods: Application in Stirrer Configuration and data clustering, M.Sc Thesis, the Middle East Technical Universitye, July, 2005.

[2] Igor Griva, Stephen G.Nash, Ariela Sofer, Linear and Nonlinear Optimization second edition George Mason University Fairfax, Virginia 2009.

[3] Jaan Kiusalaas, Numerical Methods in engineering with matlab, Second Edition, Cambride university printing press, 2010. [4] Jorge J. More and Stefan M.Wild: Benchmarking Derivative Free Optimization Algorithms. Preprint ANL/MCS-P1471-1207 December 2007. [5] Jorge Necedah and Stephen J. W. Wright: Numerical methods in optimization. Springer-Verlag New York, Inc. 1999. 2 Edition. ๐‘›๐‘›๐‘›๐‘› [6] Katya Sheinberg, Derivative Free Optimization method, CS 4/6-TE3, SEW ENG 4/6- TE3, Tamas Terlaky,IBM Watson Research Center

[7] Melissa Weber Men donรงa, M ultilevel O ptimization: C onvergence T heory, Algorithms a nd A pplication to D erivative-Free O ptimization, Ph.D T hesis, Facultรฉs Universitaires Notre-Dame de la Paix Facultรฉ des Sciences rue de Bruxelles, 61, B-5000 Namur, Belgium, 2009

[8] Mokhtar S. Bazaraa, Hanit D. Sherali C.M. Shetty; Nonlinear programming: Theory and algorithms, 2nd edition.

[9] Singiresu S. Rao Engineering Optimization Theory and Practice, by John Wiley & Sons, Fourth Edition, Copyright, 2009. [10] Wenyu Sun, Optimization Theory and Methods Nonlinear Programming, Nanjing Normal U niversity, N anjing, C hina YA-XIANG Y UAN Chinese A cademy o f S cience, Beijing, China Springer Science+Business Media, LLC, 2006.

71