Addis Ababa University School of Graduate Studies

AAddddiiss AAbbaabbaa UUnniivveerrssiittyy

SScchhooooll ooff GGrraadduuaattee SSttuuddiieess Department of Mathematics

A Graduate project report

Derivative Free Optimization

By: Teklebirhan Abraha

A Project submitted to the School of Graduate Studies of Addis Ababa University in Partial fulfillment of the requirements for the Degree of Master of Science in Mathematics

Advisor: Semu Mitku (Ph.D)

Addis Ababa, Ethiopia

January, 2011

Acknowledgment With much pleasure, I take the opportunity to thank all the people who have helped me through the course of my graduate study.

First and foremost I would like to thank my advisor and instructor, Dr. Semu Mitku, for his in valuable guidance, a dvice, e ncouragement a nd pr oviding m e pl enty of m aterials related to my project work, without which this well organized (compiled) project would have be en i mpossible. H is c onstant c onfidence, pe rsistent que stioning a nd de ep knowledge of the subject matter have been and will always be inspiring to me. Finally I would like to express my gratitude to all my; families, friends, and those who supported me in any means to complete this project work.

T/Birhan Abraha

A.A.U

January, 2011

i Abstract We w ill p resent d erivative f ree algorithms w hich opt imize non-linear unconstrained optimization problems of the following kind: min ( ) , : 𝑛𝑛 𝑛𝑛 The algorithms developed for𝑥𝑥∈ℝ this 𝑓𝑓type𝑥𝑥 of𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 problems𝑓𝑓 ℝ are→ categorized ℝ as one-dimensional search ( golden s ection and F ibonacci) m ethods a nd multidimensional s earch m ethods (Powell’s method and trust region). These algorithms will, hopefully, find the value of for which is the lowest. 𝑥𝑥 The dimension𝑓𝑓 n of the search space must be lower than some number (say 100). We do NOT have to know the derivatives of ( ). We must only have a code which evaluates

( ) for a given value of . Each component𝑓𝑓 𝑥𝑥 of the vector must be a continuous real 𝑓𝑓parameter𝑥𝑥 of ( ). 𝑥𝑥 𝑥𝑥 𝑓𝑓 𝑥𝑥

ii Table of contents Introduction ...... 1 Chapter 1 Preliminary concepts ...... 4 1.1 Nonlinear optimization methods ...... 4 Need for derivative free optimization ...... 4 1.2 Overview of differentiable nonlinear unconstrained optimization methods ...... 5 1.2.1 The problem ...... 5 . .2. Solution concepts ...... 6

𝟏𝟏. 𝟐𝟐. Basic concepts of the methods ...... 6 𝟏𝟏. 𝟐𝟐. 𝟑𝟑 Necessary conditions for solutions of differentiable optimization problems ...... 8 1.3𝟏𝟏 Methods𝟐𝟐 𝟒𝟒 of solving nonlinear differentiable optimization problems ...... 9 1.3.1 Gradient method (first order derivative method) ...... 9 . . Newton’s method (second order derivative method) ...... 12

𝟏𝟏. 𝟑𝟑. 𝟐𝟐 Line search methods ...... 14 𝟏𝟏. 𝟑𝟑. 𝟑𝟑 Trust region methods ...... 20 Chapter𝟏𝟏 𝟑𝟑 2𝟒𝟒 Derivative Free Optimization Methods ...... 23 2.1 What is derivative free optimization ...... 23 2.2 line search methods ...... 24 2.2.3 Interpolation Methods ...... 36 2.3. Multidimensional search...... 42 2.3.1 Unvariate Method ...... 42 2.3.2 Pattern Directions ...... 45 2.3.4 POWELL’S METHOD ...... 46 Chapter 3 Trust region methods ...... 55 3.1Trust region frame work ...... 55 3.2 Quadratic interpolation ...... 56 Appendix(MATLAB codes) ...... 62 References ...... 71

Declaration Letter

I, Teklebirhan Abraha, declare that this project has been composed by me and that no part of

the project has formed the basis for the award of any Degree, Diploma, Associate ship,

Fellowship or any other similar title to me.

T/Birhan Abraha

______

Addis Ababa University

January, 2011

iii

Permission Letter

This is to certify that this project is compiled by Mr. Teklebirhan Abraha in the department of

Mathematics, College of Mathematics and Computational Sciences, Addis Ababa University, under my supervision.

Semu Mitku (Ph.D)

______

Addis Ababa University

January, 2011

iv Introduction In the process of solving optimization problems, it is well known that expensive useful information i s c ontained i n t he de rivatives of t he ob jective f unction one w ishes t o optimize. After all the standard mathematical characterization of a local minimum given by the first order necessary conditions, requires a continuous differentiability of functions that th e f irst o rder d erivatives a re zero. H owever, f or a v ariety o f re asons th ere have always b een many i nstances w here ( at l east so me) d erivatives ar e u navailable o r unreliable. Nevertheless, under such circumstances, it may still be desirable to carry out optimization [1].

Consequently c lasses of nonl inear opt imization t echniques c alled de rivative-free optimization m ethods a re ne eded. I n f act w e c onsider opt imization t echniques w ithout derivatives as one of the most important and challenging areas in computational science and e ngineering. D erivative f ree op timization ( DFO) i s de veloped to date f or s olving small di mensional p roblems(less t han 100 va riables) in which t he c omputation of a n objective function is relatively expensive and the derivatives of the objective functions are not a vailable. P roblems o f t his na ture m ore a nd m ore a rise i n m odern phy sical, chemical and econometric measurements and in engineering applications where computer simulation is employed for the evaluation of the objective function [6].

There are two important components of derivative free methods, sampling better points in the iteration procedure is the first one of these components. The other one is searching appropriate subspaces where the chance of finding a minimum is relatively high. In order to be able to use the extensive convergence theory for derivative based methods, these derivative f ree methods ne ed t o s atisfy s ome pr operties. F or i nstance, to gua rantee t he convergence of a derivative free method, we need to ensure that the error in the gradient converges to zero when the trust region or line search steps are reduced. Hence a descent step will be found if the gradient of the trust function is not zero at the current iterate.

The problem of minimizing a nonlinear function : of several variables when the 𝑛𝑛 derivatives of the function are not available is attempted𝑓𝑓 ℝ → ℝto be solved by the derivative free methods (DFM). The formal statement for the above problem can be written as

min ( )

( ) s.t 𝑓𝑓 𝑥𝑥 ( ) ( = 1,2 … , ) ( )

𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑃𝑃 𝑎𝑎 ≤ 𝑔𝑔 𝑥𝑥 ≤ 𝑏𝑏 𝑖𝑖 𝑚𝑚 ∗ 𝑛𝑛 Where ( )cannot 𝑥𝑥be computed ∈ 𝑋𝑋⊆ℝ or just does not exist for every . Here is an arbitrary subset of∇𝑓𝑓 𝑥𝑥 , is a n e asy constraint. While th e f unctions𝑥𝑥 ( ) (𝑥𝑥 = 1, 2, … , ) 𝑛𝑛 represent diℝ fficult𝑥𝑥 c onstraints. ∈𝑋𝑋 B y e asy c onstraints w e m ean bound𝑔𝑔𝑖𝑖 𝑥𝑥c onstraints𝑖𝑖 on t𝑚𝑚 he variables, linear constraints or more generally nonlinear smooth constraints whose values and the Jacobi matrix can be computed cheaply(easily).

Difficult constraints are only nonlinear constraints whose value is expensive (difficult) to compute and whose derivatives are unavailable for a small repetition and preparation, we include some basic concepts of differentiable optimization in this project.

If the problem function in ( ) above is differentiable then 0 is local minimizer of ( ) 2 and hence ∇ ( 0) = 0 and∗ if ( 0) = 0 and ( 0) is𝑥𝑥 positive definite then 0𝑃𝑃 is a lo cal m inimizer𝑓𝑓 𝑥𝑥 o f . B ut i n m∇𝑓𝑓 ost𝑥𝑥 of t he c ases∇ f𝑓𝑓 inding𝑥𝑥 a lgebraically a poi nt 𝑥𝑥0 2 ⊆ such that (𝑓𝑓 ) = 0 may be difficult. Or computing ( ) or ( )P may𝑥𝑥 be∈ 𝑛𝑛 expensive𝑋𝑋 ℝ as the function∇𝑓𝑓 𝑥𝑥 cannot (may not) be defined explicitly ∇𝑓𝑓or even𝑥𝑥 the∇ 𝑓𝑓function𝑥𝑥 may not be differentiable at all. In this case numerical methods with derivative free algorithms are required. The methods such as line search and trust region methods will be discussed in this project. Because the line search without derivatives and the trust-region methods algorithms are used to solve Optimization problems without derivatives.

The f irst c hapter o f th is p roject d eals w ith n onlinear o ptimization p roblems a nd t he methods of solving these problems. The second and the third chapters are on de rivative free opt imization m ethods. P articularly in the second c hapter w e i nclude l ine search methods w hich he lp us t o s olve o ne di mensional m inimization p roblems such as t he 2

golden s ection m ethod a nd t he Fibonacci method, a nd be st kno wn methods f or minimization o f multidimensional sear ch m ethods l ike P owell’s m ethod ar e d iscussed well. Finally in the last chapter we include the trust region method which is one of the methods for derivative free optimization methods.

Chapter 1 Preliminary concepts

1.1 Nonlinear optimization methods Optimization methods can be classified as d erivative based and derivative free methods depending on their use of derivative (or absence of it) in the process of finding a solution.

Derivative based o ptimization m ethods ar e ch aracterized b y: E xplicitly u ses th e derivatives of the objective function, analytical solution is possible, and convergence is faster. A nd d erivative free o ptimization m ethods ar e ch aracterized b y:only o bjective function evaluation, not require derivative information, and handle noisy functions(since methods only rely on function comparisons)

Need for derivative free optimization Some of the reasons to apply derivative free optimization (DFO) methods are: Growing sophistication of computer hard ware and mathematical algorithms and software(which opens ne w pos sibilities f or opt imization), D erivative e valuations c osty a nd noi sy(one can’t t rust d erivatives o r ap proximate t hem b y f inite d ifference), B inary co des ( source codes not available or owned by a company) making automatic differentiation impossible to a pply, L egacy c odes(written i n the pa st a nd not m aintained by t he original a uthor), Lack of sophistication of the user(the user need improvement but want to use something simple)

With th e c urrent s tate o f th e ar t D FO methods w e can su ccessfully address p roblems where:

. The evaluation of derivatives is expensive and/or computed with noise (and for which accurate finite difference derivative estimation is ruled out). . The number of variables do not exceed ,say , a hundred(in serial computation) . The functions are not excessively non smooth. . Rapid assumption convergence is not of primary importance . Only a few digits of accuracy are required

It i s ha rd to m inimize non c onvex f unctions w ithout de rivatives, how ever, i t is generally a ccepted that D FO methods ha ve t he a bility t o f ind “ good” l ocal optimization

1.2 Overview of differentiable nonlinear unconstrained optimization methods

1.2.1 The problem Optimization pr oblems c an be di vided i nto two l arge classes, na mely constrained an d unconstrained problems. The basic unconstrained optimization problem can be stated in its standard form as

( ), (1.1) 𝑛𝑛 Where𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚: 𝑚𝑚𝑚𝑚𝑚𝑚 𝑓𝑓 𝑥𝑥 is𝑠𝑠𝑠𝑠 the𝑠𝑠𝑠𝑠 objective𝑠𝑠𝑠𝑠𝑠𝑠 𝑡𝑡𝑡𝑡 𝑥𝑥𝑥𝑥ℝ function. On the other hand, constrained optimization 𝑛𝑛 problems𝑓𝑓 canℝ →be ℝwritten as

( ) (1.2 )

𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑓𝑓 𝑥𝑥 , (1.2𝑎𝑎) 𝑛𝑛 𝑠𝑠𝑠𝑠𝑠𝑠(𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) 𝑡𝑡𝑡𝑡0 𝑥𝑥𝑥𝑥ℝ (1.2 )𝑏𝑏

𝑖𝑖 𝑔𝑔( 𝑥𝑥) =≤ 0 𝑖𝑖 ∈𝐼𝐼 (1.2𝑐𝑐 )

𝑖𝑖 conditions (1.2 1.2 )𝑔𝑔 indicate𝑥𝑥 t he𝑖𝑖 co nstraints. ∈ℰ T he d isjoint𝑑𝑑 i ndex sets

correspond to t𝑏𝑏 he i − nequality𝑑𝑑 a nd e quality c onstraints, respectively, defined 𝐼𝐼by𝑎𝑎𝑎𝑎𝑎𝑎 t heℰ functions : , . the set is contained in and is also contained in the 𝑛𝑛 𝑛𝑛 domain of𝑔𝑔 𝑖𝑖 andℝ →, ℝ 𝑖𝑖. ∈ 𝐼𝐼⋃ℰ 𝑋𝑋 ℝ 𝑖𝑖 A poi nt 𝑓𝑓 is said𝑔𝑔 𝑖𝑖 t o b e ∈ f 𝐼𝐼⋃ℰ easible i f i t sat isfies a ll t he co nstraints. A nd t he set o f a ll

feasible points𝑥𝑥𝑥𝑥𝑥𝑥 is called the feasible set, and denoted by .

The formulations (1.1) and (1.2) are called standard formulationsℱ due to the observation that

max ( ) = min( ( ))

𝑓𝑓 𝑥𝑥 − −𝑓𝑓 𝑥𝑥 5

1.2.2. Solution concepts The solution of an optimization problem can be characterized by c ertain properties. In a minimization problem, if we are looking for a point of such that ∗ ( ) ( )for all , 𝑥𝑥 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝔇𝔇 𝑓𝑓 ∗ 𝑓𝑓Then𝑥𝑥 ≤ 𝑓𝑓is𝑥𝑥 cal led a𝑥𝑥 g lobal ∈𝔇𝔇 m inimizer, w here as ( ) is t he gl obal m inimum o f . ∗ ∗ Similarly𝑥𝑥 in, a constrained problem, the solution must𝑓𝑓 lie𝑥𝑥 in the feasible set , and thus,𝑓𝑓 a global constrained minimizer satisfies ℱ ∗ ( ) ( ) for𝑥𝑥 all . ∗ However, in both𝑓𝑓 𝑥𝑥 cases,≤ 𝑓𝑓finding𝑥𝑥 a global𝑥𝑥 minimizer ∈ℱ of a function can prove to be very

difficult in pr actice. I t m ight be i nteresting, t hus, to look f or𝑓𝑓 a solution in a ∗ neighborhood of , such that 𝑥𝑥 ∗ ℵ 𝑥𝑥 ℵ ⊆( 𝔇𝔇 ) ( ), for all (feasible) in (1.3) ∗ The point is then called𝑓𝑓 a 𝑥𝑥local≤ minimizer 𝑓𝑓 𝑥𝑥 where as ( 𝑥𝑥 ) isℵ a local minimum of in ∗ ∗ . If is𝑥𝑥 such that 𝑓𝑓 𝑥𝑥 𝑓𝑓 ∗ ℵ 𝑥𝑥 ( ) < ( ), for all (feasible) (1.4) ∗ Then is said to𝑓𝑓 be𝑥𝑥 a st𝑓𝑓 rict𝑥𝑥 local minimizer, and𝑥𝑥 ∈ ℵ( ) is strict local minimum of ∗ ∗ in . 𝑥𝑥 𝑓𝑓 𝑥𝑥 𝑓𝑓

ℵ . . Basic concepts of the methods Derivatives and Taylor’s Theorem 𝟏𝟏 𝟐𝟐 𝟑𝟑 All methods considered here are based on the fact that the function to be minimized has derivatives. If is differentiable and all derivatives of are continuous with respect to , 1 then we say that𝑓𝑓 is continuously differentiable and is 𝑓𝑓denoted by . 𝑥𝑥

If all the second𝑓𝑓 partial derivatives of exist, then is said to be𝑓𝑓 twice ∈𝐶𝐶 differentiable. If further m ore a ll s econd pa rtial de rivatives𝑓𝑓 of f a re𝑓𝑓 c ontinuous w e s ay t hat is t wice 2 continuously differentiable and is denoted by . 𝑓𝑓 6𝑓𝑓 ∈𝐶𝐶

The gradient of is a vector that groups all its partial derivatives and is denoted by

𝑓𝑓 ( )

1 𝜕𝜕𝜕𝜕 𝑥𝑥 ⎛ 𝜕𝜕𝑥𝑥( )⎞ ⎜ ⋮ ⎟ 𝜕𝜕𝜕𝜕 𝑥𝑥 𝑛𝑛 ⎝ 𝜕𝜕𝑥𝑥 2 ⎠( ) 2 ( ) 2 𝜕𝜕 𝑓𝑓 1𝑥𝑥 𝜕𝜕 1𝑓𝑓 𝑥𝑥 × 2 ( ) = The defined a s 𝜕𝜕𝑥𝑥 𝜕𝜕𝑥𝑥 𝜕𝜕𝑥𝑥𝑛𝑛 is ca lled the H essian 2 ( ) ⋯ 2 ( ) ⎛ 2 ⎞ 𝑛𝑛 𝑛𝑛 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ∇ 𝑓𝑓 𝑥𝑥 ⎜ 𝜕𝜕 𝑓𝑓⋮𝑥𝑥 1 ⋱𝜕𝜕 𝑓𝑓 ⋮ 𝑥𝑥 ⎟ matrix of . 𝜕𝜕𝑥𝑥𝑛𝑛 𝜕𝜕𝑥𝑥 ⋯ 𝑥𝑥𝑛𝑛 ⎝ ⎠ The curvature𝑓𝑓 of at along a direction is given by 𝑛𝑛 𝑛𝑛 𝑓𝑓 𝑥𝑥𝑥𝑥ℝ , 2f(x)d𝑑𝑑𝑑𝑑ℝ

〈𝑑𝑑 ∇ 〉 If 2 ( ) is a positive-semi definite for every‖𝑑𝑑‖ in the domain of .we say that is a 2 ( ) convex∇ 𝑓𝑓 function.𝑥𝑥 I f is positive d efinite 𝑥𝑥in i ts domain w e 𝑓𝑓 say th at is s trictly𝑓𝑓 convex. ∇ 𝑓𝑓 𝑥𝑥 𝑓𝑓

Theorem1.1 (Taylor’s Theorem)

Let : be c ontinuously di fferentiable a nd then there ex ists so me [ ] 𝑛𝑛 𝑛𝑛 0,1𝑓𝑓 sℝuch⟶ that ℝ 𝑠𝑠𝑠𝑠ℝ

𝑡𝑡𝑡𝑡( + ) = ( ) + ( + ), . Moreover, if is twice continuously differentiable, ( ) ( ) 1 2 then𝑓𝑓 𝑥𝑥 𝑠𝑠 +𝑓𝑓 𝑥𝑥= 〈∇𝑓𝑓 𝑥𝑥+ 0𝑡𝑡𝑡𝑡 𝑠𝑠〉( + ) 𝑓𝑓 ∇𝑓𝑓 𝑥𝑥 𝑠𝑠 ∇𝑓𝑓 𝑥𝑥 ∫ ∇ 𝑓𝑓 𝑥𝑥 𝑡𝑡𝑡𝑡 𝑠𝑠𝑠𝑠𝑠𝑠 Furthermore, we have that for some (0,1)

𝑡𝑡𝑡𝑡 1 ( + ) = ( ) + ( ), + , 2 ( + ) (1.5) 2 𝑓𝑓 𝑥𝑥 𝑠𝑠 𝑓𝑓 𝑥𝑥 〈∇𝑓𝑓 𝑥𝑥 𝑠𝑠〉 〈𝑠𝑠 ∇ 𝑓𝑓 𝑥𝑥 𝑡𝑡𝑡𝑡 𝑠𝑠〉 A function is said to be differentiable if all its partial derivatives ( )/( ) 𝑛𝑛 𝑚𝑚 exist. ℝ → ℝ 𝜕𝜕𝑓𝑓𝑗𝑗 𝑥𝑥 𝜕𝜕𝑥𝑥𝑖𝑖 7

. . Necessary conditions for solutions of differentiable nonlinear optimization problems (The case of unconstrained problems) All𝟏𝟏 𝟐𝟐 the𝟒𝟒 unconstrained minimization methods are iterative in nature and hence they start from a n in itial tr ial s olution a nd p roceed to ward th e m inimum p oint in a s equential manner. The iterative process is given by

+ = + (1.6)

𝑥𝑥Where𝒊𝒊 𝟏𝟏 𝑥𝑥𝑖𝑖 is the𝛼𝛼𝑖𝑖 starting𝑠𝑠𝑖𝑖 point, is the search direction, is the optimal step length, and +1 is 𝑥𝑥th𝑖𝑖 e f inal p oint in i teration𝑆𝑆𝑖𝑖 . It i s im portant to𝛼𝛼 n𝑘𝑘 ote th at a ll the u nconstrained minimization𝑥𝑥𝑖𝑖 methods (1) require an𝑖𝑖 initial point 1 to start the iterative procedure, and (2) differ from one another only in the method of generating𝑥𝑥 the new point + (from ) and in testing the point +1for optimality. 𝑥𝑥𝒊𝒊 𝟏𝟏 𝑥𝑥𝑖𝑖 Rate of Convergence𝑥𝑥𝑖𝑖 Different iterative optimization methods have different rates of convergence. In general, an optimization method is said to have convergence of order p if

+1 , 0, 1 (1.7) ∗ ‖𝑥𝑥𝑖𝑖 − 𝑥𝑥 ‖ ∗ 𝑝𝑝 ≤ 𝑘𝑘 𝑘𝑘 ≥ 𝑝𝑝 ≥ where and +1 denote‖𝑥𝑥𝑖𝑖 − 𝑥𝑥 the‖ poi nts ob tained a t t he e nd of i terations and + 1, respectively,𝑥𝑥𝒊𝒊 𝑥𝑥𝑖𝑖represents the optimum point, and | | denotes the length or𝑖𝑖 norm𝑖𝑖 of the ∗ vector : 𝑥𝑥 � 𝑥𝑥 �

𝑥𝑥 2 2 2 || || = ( 1 + 2 + + )

𝑛𝑛 If = 1 and 0 1, the𝑥𝑥 method� is𝑥𝑥 said𝑥𝑥 to be⋯ linearly𝑥𝑥 convergent (corresponds to

slow𝑝𝑝 c onvergence).≤ 𝑘𝑘 I f≤ = 2, the m ethod i s said to be quadratically c onvergent (corresponds to fast convergence).𝑝𝑝 An optimization method is said to have super linear convergence (corresponds to fast convergence) if

lim +1 0 (1.8) ∗ ‖𝑥𝑥𝑖𝑖 − 𝑥𝑥 ‖ 𝑖𝑖→∞ ∗ → The definitions of rates of convergence‖𝑥𝑥𝑖𝑖 − 𝑥𝑥given‖ in Eqs. (1.7) and (1.8) are applicable to single v ariable a s w ell as m ultivariable o ptimization p roblems. In th e c ase o f s ingle- variable problems, the vector, for example, degenerates to a scalar, .

𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖 8

Theorem 1.2 (First- Order Necessary Conditions) If a l ocal minimizer ∗ of : , where f is a continuously differentiable in an open𝑥𝑥 𝑖𝑖 𝑖𝑖neighborhood of , 𝑛𝑛 ∗ then𝑓𝑓 ℝ ⟶ ℝ ℵ 𝑥𝑥

( ) = 0 (1.9) ∗ 2( ) Exists a nd i s c ontinuous∇𝑓𝑓 𝑥𝑥 i n a ne ighborhood of , w e ca n st ate an other ∗ 𝑖𝑖necessary𝑖𝑖 ∇ 𝑥𝑥 condition satisfied by a local minimizer. 𝑥𝑥

Theorem 1.3 (Second order necessary conditions) If is a local minimizer of , and ∗ is twice continuously differentiable in an open neighborhood𝑥𝑥 of , then 𝑓𝑓 𝑓𝑓 ∗ ( )=0 and 2 ( ) is positive-semi definite.ℵ 𝑥𝑥 (1.10) ∗ ∗ Any point ∇𝑓𝑓that 𝑥𝑥satisfies (1.9)∇ 𝑓𝑓 is𝑥𝑥 called a stationary point of . Thus Theorem1.2 states ∗ that any local𝑥𝑥 minimizer must be a stationary point; it is important𝑓𝑓 to note, however, that the converse is not necessarily true. Fortunately, if the next conditions called sufficient conditions are satisfied by a stationary point they guarantee that it is a local minimizer. ∗ Theorem 1.4 (Second -Order Sufficient 𝑥𝑥Conditions): Let be t wice c ontinuously

differentiable on an open neighborhood of . If satisfies 𝑓𝑓 ∗ ∗ ( ) = 0, and 2 ( ) is positive definiteℵ 𝑥𝑥 then 𝑥𝑥 is a strict local minimizer of . ∗ ∗ ∗ ∇𝑓𝑓Note𝑥𝑥 that the second∇ 𝑓𝑓 order𝑥𝑥 sufficient conditions are𝑥𝑥 not necessary: a point can be 𝑓𝑓a strict local minimizer and fail to satisfy the sufficient conditions.

1.3 Methods of solving nonlinear differentiable optimization problems

. . Gradient method (first order derivative method) Mathematical pr ogramming i s c oncerned w ith methods t hat c an be us ed t o s olve 𝟏𝟏 𝟑𝟑 𝟏𝟏 optimization problems. In practice we will be concerned with algorithms, defined so that their computational implementation finds either an approximate an exact solution to the original mathematical programming problem [7].

These algorithms are mostly iterative methods, which form a starting point 0, use some { } rule t o compute a s equence of poi nts 1, 2,…, ,… 𝑥𝑥such t hat lim = , 𝑘𝑘 ∞ 𝑥𝑥 𝑥𝑥 𝑥𝑥 ∗ 𝑘𝑘 𝑘𝑘→ 𝑥𝑥 𝑥𝑥 where the solution to the problem. ∗ Here w𝑥𝑥 e 𝑖𝑖a𝑖𝑖 re m ostly i nterested i n unc onstrained pr oblems. T he s implest m ethod developed for the solution of a minimization problem is the steepest descent method. This method is based on the fact that, from any starting point , the direction in which any

function decrease most rapidly is the direction ( ) . 𝑥𝑥

Definition𝑓𝑓1.1(descent direction) let : be−∇𝑓𝑓 differentiable𝑥𝑥 at 0 𝑛𝑛 ( 𝑛𝑛 ) ( ) is a descent direction of at if𝑓𝑓 thereℝ → exists ℝ > 0 such that 𝑥𝑥̅+ ∈ℝ 𝑡𝑡ℎ𝑒𝑒𝑒𝑒< ≠ 𝑑𝑑 for ∈ 𝑛𝑛 allℝ (0, ]. 𝑓𝑓 𝑥𝑥̅ 𝛿𝛿 𝑓𝑓 𝑥𝑥̅ 𝜆𝜆𝜆𝜆 𝑓𝑓 𝑥𝑥̅

Consider𝜆𝜆 ∈ 𝛿𝛿

min f(x) where f n

𝑛𝑛 𝑥𝑥∈ℝ ∶ ℝ → ℝ If is continuously differentiable at ( ) 0 then there are infinitely 𝑛𝑛 many𝑓𝑓 de scent di rection of f at 𝑥𝑥𝑘𝑘 ∈ ℝthat 𝑎𝑎 𝑎𝑎𝑎𝑎i s𝑖𝑖 t𝑖𝑖 ∇ here𝑓𝑓 𝑥𝑥̅ ex≠ ists {0} such 𝑛𝑛 that ( ) < 0. 𝑥𝑥𝑘𝑘 𝑑𝑑𝑘𝑘 ∈ ℝ ∕ 𝑇𝑇 𝑘𝑘 𝑘𝑘 To find∇𝑓𝑓 the𝑥𝑥 direction𝑑𝑑 in which f decreases most rapidly we solve the problem

min ( ) = 1 to get 𝑛𝑛 𝑇𝑇 𝑑𝑑𝑘𝑘 ∈ℝ 𝑘𝑘 𝑘𝑘 𝑘𝑘 ∇𝑓𝑓 𝑥𝑥 𝑑𝑑 𝑤𝑤𝑤𝑤𝑤𝑤ℎ ‖𝑑𝑑 ‖ ( ) = ( 𝑘𝑘 ) 𝑘𝑘 ∇𝑓𝑓 𝑥𝑥 𝑑𝑑 − 𝑘𝑘 ( ) ‖∇𝑓𝑓 𝑥𝑥 ‖ Now = ( ) or = , will be , us ed f or s earching di rections. T his ( ) ∇𝑓𝑓 𝑥𝑥𝑘𝑘 method𝑑𝑑 𝑘𝑘i s c−∇𝑓𝑓 alled 𝑥𝑥t he𝑘𝑘 s teepest𝑑𝑑𝑘𝑘 de− ‖ scent∇𝑓𝑓 𝑥𝑥𝑘𝑘 m‖ ethod. T he f ollowing a lgorithm was gi ven by Cauchy [8].

Algorithm (Method of steepest descent)

Given : continuously differentiable, at each iteration k, find the lowest 𝑛𝑛 ( ) ∗ 𝑛𝑛 point of𝑓𝑓 f ℝin the→ ℝdirection of form 𝑥𝑥 that∈ ℝis find that solves −∇𝑓𝑓 𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝛼𝛼𝑘𝑘 min ( ( )) , +1 = ( ) >0 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝛼𝛼 𝑓𝑓 𝑥𝑥 − 𝛼𝛼 ∇𝑓𝑓 𝑥𝑥 𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝑥𝑥 𝑥𝑥 − 𝛼𝛼 ∇𝑓𝑓 𝑥𝑥 And then we continue until ( ) = 0.

∇𝑓𝑓 𝑥𝑥𝑘𝑘 If ( ) = 0, t hen +1 = , the a lgorithm s tops. H owever may b e ei ther l ocal 2 minimizer∇𝑓𝑓 𝑥𝑥𝑘𝑘 or saddle point𝑥𝑥𝑘𝑘 i.e.,𝑥𝑥 𝑘𝑘 ( ) is either positive semi-definite𝑥𝑥𝑘𝑘 or indefinite. 𝑘𝑘 We know that from second order∇ 𝑓𝑓Taylor𝑥𝑥 expansion for sufficiently small at

𝑘𝑘 1 𝛼𝛼 𝑥𝑥 ( + ) = ( ) + ( ) + 2 ( ) 2 𝑇𝑇 𝑇𝑇 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼𝑥𝑥𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼∇𝑓𝑓 𝑥𝑥𝑘𝑘 𝑑𝑑𝑘𝑘 𝛼𝛼 𝑑𝑑𝑘𝑘 ∇𝑓𝑓 𝑥𝑥𝑘𝑘 𝑑𝑑𝑘𝑘 If the searching direction is the eigen vector corresponding to the largest negative eigen value say for the corresponding value of 2 ( ) ( ) = 0, we have

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑥𝑥 𝜆𝜆 ∇ 𝑓𝑓 𝑥𝑥1 𝑎𝑎𝑎𝑎 𝑥𝑥 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 ∇𝑓𝑓 𝑥𝑥 ( + ) = ( ) + ( ) + 2 2 ( ) ( ) 2 𝑇𝑇 𝑇𝑇 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼𝛼𝛼 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼∇𝑓𝑓 𝑥𝑥𝑘𝑘 𝑥𝑥 𝛼𝛼 𝑥𝑥 ∇ 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑥𝑥 ≤𝑓𝑓 𝑥𝑥𝑘𝑘 This implies that ( + ) < ( ).

𝑘𝑘 𝑘𝑘 The steepest descent𝑓𝑓 𝑥𝑥 method𝛼𝛼𝛼𝛼 is a 𝑓𝑓line𝑥𝑥 search method that moves along = ( ) at

each step it can choose the step length in various ways. One advantage𝑑𝑑𝑘𝑘 of −∇𝑓𝑓this method𝑥𝑥𝑘𝑘 is i t r equires onl y c alculations of 𝛼𝛼(𝑘𝑘 ) but i t doe snot r equire s econd de rivative. However i t c an be s low on di fficult ∇𝑓𝑓pr oblems,𝑥𝑥𝑘𝑘 t hat i s t his method us ually w orks qui te well dur ing t he e arly steps of t he opt imization pr ocess, depending on t he po int of initialization. The method of steepest descent is the simplest of the gradient methods. The choice of direction is where f decreases most quickly which is the direction opposite to

( ). the search starts at arbitrary point 0 and then slides down the gradient until we

∇𝑓𝑓close𝑥𝑥 𝑘𝑘enough to the solution. The iterative procedure𝑥𝑥 is

+1 = f( ) = ( ), where ( ) is the gradient at one given point.

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Now𝑥𝑥 the 𝑥𝑥question− 𝛼𝛼 ∇ is 𝑥𝑥how big𝑥𝑥 should− 𝛼𝛼 𝑔𝑔 the𝑥𝑥 step taken𝑔𝑔 in𝑥𝑥 that direction be, that is what is the value of ? obviously w e w ant t o m ove t he poi nt w here t he f unction f t akes on a 11 𝛼𝛼𝑘𝑘

minimum v alue, w hich i s w here t he d irectional d erivative i s zer o. The d irectional derivative is given by

( +1) = ( +1) +1 = ( +1) ( ) Setting t his e xpression t o 𝑑𝑑 𝑇𝑇 𝑑𝑑 𝑇𝑇 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑑𝑑𝛼𝛼zero,𝑓𝑓 we𝑥𝑥 see that∇𝑓𝑓 𝑥𝑥 should∙ 𝑑𝑑𝛼𝛼 be 𝑥𝑥choosen−∇𝑓𝑓 so that𝑥𝑥 f( ∙+1 𝑔𝑔 )𝑥𝑥 ( ) are orthogonal. The

next step is taken 𝛼𝛼in𝑘𝑘 the direction of the negative∇ gradient𝑥𝑥𝑘𝑘 𝑎𝑎 𝑎𝑎𝑎𝑎at this𝑔𝑔 𝑥𝑥 new𝑘𝑘 point and we may get a zigzag pattern.

However the steepest descent method can be extremely slow for some problems. There are f ortunately, s everal ot her m ethods t hat w ork ve ry w ell i n p ractice, a nd he re w e present some of them.

. . Newton’s method (second order derivative method) Consider any nonlinear system of equations of the form 𝟏𝟏 𝟑𝟑 𝟐𝟐 ( ) = 0 (1.14)

F: n n. If the Jacobi of exists,𝐹𝐹 𝑥𝑥then we can write the Taylor first- order approximation𝑊𝑊ℎ𝑒𝑒𝑒𝑒𝑒𝑒 ℝ → to ℝthis function as 𝐹𝐹

( + ) ( ) + ( ) (1.15)

Where ( ) denotes the Jacobean𝐹𝐹 of𝑥𝑥 F evaluated𝑠𝑠 ≈ 𝐹𝐹 𝑥𝑥 at x.𝐽𝐽 form𝑥𝑥 𝑠𝑠 these equations, we can derive

an iterative𝐽𝐽 𝑥𝑥 method. Given an initial point 0, at each iteration , we will compute a new ( ) iterate +1 = + such that + 𝑥𝑥 = 0 which means𝑘𝑘 that must satisfy the linear system𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝑠𝑠𝑘𝑘 𝐹𝐹 𝑥𝑥𝑘𝑘 𝑠𝑠𝑘𝑘 𝑠𝑠𝑘𝑘

( ) = ( )

𝑘𝑘 𝑘𝑘 𝑘𝑘 This is called the Newton’s𝐽𝐽 𝑥𝑥 method𝑠𝑠 −𝐹𝐹 for 𝑥𝑥solving nonlinear systems of equations.

Now r eturning t o our o ptimization problem, w e not e that when t he f irst a nd s econd derivatives of are av ailable, we c an us e N ewton’s m ethod t o s olve t he ( possible

nonlinear) system𝑓𝑓 of equations defined by

( ) = 0. (1.16) 12 ∇𝑓𝑓 𝑥𝑥

Since w e know f orm Theorem 1.2 above t hat any minimizer of must s atisfy th is condition. This is the basis for the Newton’s method for optimization 𝑓𝑓problems, and with some va riables, i t i s t he ba sis f or m any ot her methods i n unc onstrained opt imization. More formally if we want to apply this method to equation (1.16). we also know that the second order Taylor approximation to f at is

𝑘𝑘 𝑥𝑥 1 ( + ) ( ) + ( ), + , 2 ( ) (1.17) 2 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑠𝑠𝑘𝑘 ≈ 𝑓𝑓 𝑥𝑥𝑘𝑘 〈∇𝑓𝑓 𝑥𝑥𝑘𝑘 𝑠𝑠𝑘𝑘 〉 〈𝑠𝑠𝑘𝑘 ∇ 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑠𝑠𝑘𝑘 〉 In order to find a minimum of this function, we try to find a solution to ( + ) = 0

which is equivalent to ∇𝑓𝑓 𝑥𝑥𝑘𝑘 𝑠𝑠𝑘𝑘

( ) + 2 ( ) = 0.

𝑘𝑘 𝑘𝑘 𝑘𝑘 Thus we∇𝑓𝑓 have𝑥𝑥 that∇ 𝑓𝑓 must𝑥𝑥 𝑠𝑠 satisfy the so called Newton equations

𝑘𝑘 𝑠𝑠 2 ( ) = ( ) (1.18)

𝑘𝑘 𝑘𝑘 𝑘𝑘 If 2 ( ) positive∇ de𝑓𝑓 finite𝑥𝑥 𝑠𝑠 ( PD),−∇𝑓𝑓 t hen𝑥𝑥 w e c an f ind i ts i nverse, a nd t he s olution t o

(1.17∇)𝑓𝑓 is𝑥𝑥 𝑘𝑘

1 = 2 ( ) ( ) (1.19) − 𝑠𝑠𝑘𝑘 −�∇ 𝑓𝑓 𝑥𝑥𝑘𝑘 � ∇𝑓𝑓 𝑥𝑥𝑘𝑘 This direction is called the Newton direction.

𝑘𝑘 If 2 ( ) is𝑠𝑠 not positive definite the Newton’s method (direction) is not suitable for

search∇ 𝑓𝑓 di𝑥𝑥 rection.𝑘𝑘 O ne of t he s trategy t o ov ercome s uch pr oblem i s m odification of t he Hessian matrix 2 ( ) during or before the method of solution 2 ( ) = ( )

so that it becomes∇ 𝑓𝑓positive𝑥𝑥𝑘𝑘 definite. ∇ 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑝𝑝𝑘𝑘 −∇𝑓𝑓 𝑥𝑥𝑘𝑘

One of the problems in Newton’s method is that the method is based on a necessary first order optimality condition (namely that the gradient of the objective function be equal to zero). In order to guarantee that we have found a minimizer of , it is also necessary to 2 ( ) guarantee t hat t he H essian be pos itive de finite moreover𝑓𝑓 t he a pproximations ∗ ∇ 𝑓𝑓 𝑥𝑥

(1.15) and 1.17) are only valid in a neighborhood of the solution of (1.15) and (1.16) respectively.

Thus N ewton’s m ethod is onl y a ppropriate w hen t he s tarting poi nt 0 is s ufficiently

close to the solution . However, when it works, it is very fast and most𝑥𝑥 optimization ∗ methods try to mimic 𝑥𝑥its behavior around the solution.

There ar e so cal led globalization t echniques t hat can b e u sed t o g uarantee t he convergence of Newton’s method from any starting point. These techniques give rise to different methods which can be divided in to two classes: line search methods and trust region m ethods. T he main d ifference b etween these t wo c lasses is t hat i n l ine sea rch methods, t he di rection i n w hich w e c hoose t o t ake our ne xt i teration i s s elected f irst, while th e s ize o f th e s tep to b e ta ken in th is d irection is c omputed w ith t he di rection fixed. On the other hand, in trust region methods, the step size and the direction are more or less chosen simultaneously [7]. The two strategies are discussed below in more details.

. . Line search methods Although Newton’s di rection i s e ffective for s earching directions, t he method may not 𝟏𝟏 𝟑𝟑 𝟑𝟑 converge to the solution. The local model may not be a good representative for the given (objective) function. H ence w e h ave t o ba cktrack. T he s trategy we c onsider f or proceeding from a solution estimate outside the convergence of Newton’s method is the method of line searches.

As the name suggests, the idea behind line search methods is to find a step size along a certain line which gives us a good reduction on the function value, while being reasonably in expensive to compute. More formally, they are iterative methods that, at every step, choose a certain descent direction and move along this direction. Each iteration of a line search method computes a search direction and then decides how far to move along that direction. The iteration is given by 𝑝𝑝𝑘𝑘 +1 = +

𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘 Where, t he p ositive sca lar , is c alled a s tep l ength, a nd can b e ch osen, as t he

Newton direction given by𝛼𝛼 𝑘𝑘(1.19). 𝑝𝑝𝑘𝑘 𝑠𝑠𝑘𝑘 14

Moreover, the search direction often has the form = 1 ( ) (1.20) − 𝑘𝑘 𝑘𝑘 𝑘𝑘 Where is a symmetric and nonsingular matrix.𝑝𝑝 In the−𝛽𝛽 steepest∇𝑓𝑓 𝑥𝑥 descent method is 2 ( ) simply the𝛽𝛽𝑘𝑘 identity matrix I, while in Newton’s method is the exact Hessian 𝛽𝛽𝑘𝑘 . in Quasi Newton method, is an approximation to the𝛽𝛽 Hessian𝑘𝑘 that is updated∇ at𝑓𝑓 every𝑥𝑥𝑘𝑘 iteration, by m eans o f a l𝛽𝛽 ow𝑘𝑘 r ank f ormula. Where is de fined by (1.20) and is positive definite we have 𝑝𝑝𝑘𝑘 𝛽𝛽𝑘𝑘

( ) = ( ) 1 ( ) < 0. 𝑇𝑇 𝑇𝑇 − 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑝𝑝and∇𝑓𝑓 therefore𝑥𝑥 −∇𝑓𝑓 is𝑥𝑥 a descent𝛽𝛽 ∇𝑓𝑓 direction.𝑥𝑥

𝑘𝑘 Now in the following𝑝𝑝 we study how to choose the matrix or more generally how to

compute t he sear ch d irection, and g ive ca reful consideration𝛽𝛽𝑘𝑘 t o the ch oice o f t he step length parameter .

𝑘𝑘 Step length: the ideal𝛼𝛼 choice of the step length would be the global minimizer of the

univariate function (. ) defined by 𝛼𝛼𝑘𝑘

( ) = ( + 𝜙𝜙), > 0 (1.21)

𝑘𝑘 𝑘𝑘 But𝜙𝜙 𝛼𝛼 in general,𝑓𝑓 𝑥𝑥 it𝛼𝛼𝑝𝑝 is too𝛼𝛼 expensive to identify this value. To find even a local minimizer of to m oderate pr ecision ge nerally requires too m any e valuations of t he obj ective function𝜙𝜙 and possibly the gradient .

In or der 𝑓𝑓t o e nsure t hat e ven a n a∇𝑓𝑓 pproximate s olution i s e nough t o gua rantee t he convergence of t he l ine s earch m ethod t o a minimizer of t he obj ective f unction, s ome conditions are imposed on the step length at each iteration. One very important condition that must be satisfied is that the decrease in the objective function is not too small. One way of measuring this is by using the following inequality.

( + ) ( ) + 1 ( ), , for some 1 (0,1) (1.22)

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑓𝑓Thi𝑥𝑥s condition𝛼𝛼𝑝𝑝 ≤ is 𝑓𝑓 sometimes𝑥𝑥 𝑐𝑐 𝛼𝛼 called〈∇𝑓𝑓 𝑥𝑥 the𝑝𝑝 Armijo〉 condition𝑐𝑐 ∈ and it states that the reduction in f should be proportional to the step length and the derivative of . On the other hand;

𝑘𝑘 𝛼𝛼15 𝑓𝑓

we must also guarantee that the step is not too short. Indeed, condition (1.22 is satisfied for a ll s ufficiently s mall v alues o f . O ne w ay of e nforcing t his i s by i mposing a

curvature condition, which requires that𝛼𝛼 satisfy 𝛼𝛼𝑘𝑘 ( + ), 2 ( ), for some 2 ( 1, 1) (1.23)

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 〈Conditions∇𝑓𝑓 𝑥𝑥 𝛼𝛼 (1.𝑝𝑝22)𝑝𝑝 and〉 ≥ (1.𝑐𝑐 23〈∇𝑓𝑓) are𝑥𝑥 known𝑝𝑝 〉 as the Wolfe𝑐𝑐 ∈ conditions.𝑐𝑐

Remark: for every function that is smooth and bounded below, there exist step lengths

that satisfy the Wolfe conditions.𝑓𝑓

These conditions on the step length are very important in practice and are widely used in the line search methods.

Algorithm (line search)

Given a d escent direction , set +1 = + for so me > 0 that makes +1

acceptable i terate. H owever,𝑝𝑝𝑘𝑘 r ather𝑥𝑥𝑘𝑘 t han set𝑥𝑥𝑘𝑘 ting𝛼𝛼 𝑘𝑘 𝑝𝑝𝑘𝑘 = ( )𝛼𝛼 𝑘𝑘we will u se N ewton’s𝑥𝑥𝑘𝑘 1 2 direction o r its m odification f or in stance, 𝑝𝑝𝑘𝑘 ( −∇𝑓𝑓) where𝑥𝑥𝑘𝑘 = ( ) + is 2 − ( ) 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 positive de finite, = 0 if is s afely−𝐻𝐻𝑘𝑘 p∇𝑓𝑓 ositive𝑥𝑥 d efinite 𝐻𝐻to re tain∇ 𝑓𝑓 i ts𝑥𝑥 f ast 𝑁𝑁lo cal convergence t he term𝑁𝑁𝑘𝑘 l ine se∇ arch𝑓𝑓 𝑥𝑥 𝑘𝑘 refers to t he ch oice of in this a lgorithm. T he common procedure is to use = 1 (the full Quassi-Newton’s𝛼𝛼𝑘𝑘 step) if it fails backtrack in t he symmetric w ay a long 𝛼𝛼t𝑘𝑘 he s ame di rection, t he c ommon s ense i s t o r equire t hat ( +1) < ( ) but this simple condition doesnot guarantee that { } will converge to

𝑓𝑓the𝑥𝑥 minimizer𝑘𝑘 𝑓𝑓 𝑥𝑥 of𝑘𝑘 . If very small reduction in are taken relative 𝑥𝑥to𝑘𝑘 the length of the steps, the limiting 𝑓𝑓value ( ) may not be local minimize𝑓𝑓 [8]. 𝑘𝑘 2𝑓𝑓 𝑥𝑥 +1 For example, ( ) = , 0 = 2, if we choose { } = {( 1) } 𝑘𝑘 𝑘𝑘 𝑓𝑓 𝑥𝑥 ( +1)𝑥𝑥 𝑥𝑥 𝑝𝑝 − { } = {2 + 3(2 )}, then − 𝑘𝑘 𝑘𝑘 𝛼𝛼 3 5 9 { } = 2, , , ,…, = 1k 1 + 2 k , each p is a descent direction at x . 2 4 8 k k − 𝑥𝑥𝑘𝑘 � − − � ��− �� And ( +1) < ( ), ( ) is monotonically decreasing with

𝑓𝑓 𝑥𝑥𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑎𝑎𝑎𝑎𝑎𝑎 𝑓𝑓 𝑥𝑥𝑘𝑘 16

lim ( ) = 1 ∞ 𝑘𝑘 𝑘𝑘→ 𝑓𝑓 𝑥𝑥 Which is not the minimizer of and { } has limit ±1 so it does not converge.We fix ( ) ( ) this by requiring that the average𝑓𝑓 rate of𝑥𝑥 decrease𝑘𝑘 from to +1 be at least some prescribed fraction of the initial rate, that is we pick 𝑓𝑓 𝑥𝑥𝑘𝑘(0,1)𝑓𝑓 and𝑥𝑥𝑘𝑘 choose among > 0 that sat 𝜆𝜆 ∈ 𝛼𝛼𝑘𝑘 isfies ( ) ( ) 𝛼𝛼 + (1.24) 𝑇𝑇 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 It𝑓𝑓 is𝑥𝑥 also𝛼𝛼𝑝𝑝 possible≤ 𝜆𝜆𝜆𝜆 that∇𝑓𝑓 𝑥𝑥if steps𝑝𝑝 are too small relative to the initial rate of decrease of ,

then any limiting value { } may not be local minimizer of . For example in the above𝑓𝑓 { } example taking the same𝑥𝑥 function𝑘𝑘 with the same initial estimate𝑓𝑓 and let us take pk = 3 5 9 { 1}, { } = 2 (k+1) , then the sequence {x } = 2, , , ,…, = 1 + 2 k . each k k 2 4 8 − − −is a descentα � direction� at , ( ) is monotonically� decreasing �with� �

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑝𝑝 𝑥𝑥 𝑓𝑓 𝑥𝑥 lim = 1 ∞ 𝑘𝑘 𝑘𝑘→ 𝑥𝑥 Which is n ot th e m inimizer o f . I n t he a bove e xample, we s ee t hat monotonically decreasing sequence o f i terates t𝑓𝑓 hat d oes n ot co nverge t o t he m inimizer, t o en sure sufficiently large steps we require that the rate of decrease of in the direction of at

+1 be larger than some prescribed fraction of the rate of decrease𝑓𝑓 of in the direction𝑝𝑝 𝑥𝑥of𝑘𝑘 at 𝑓𝑓 𝑝𝑝 𝑥𝑥𝑘𝑘 ( +1 ) ( +1 ) for some ( , 1) (1.25) 𝑇𝑇 𝑇𝑇 𝑘𝑘 𝑘𝑘 ∇𝑓𝑓But 𝑥𝑥this condition𝑝𝑝 is ≥ 𝛽𝛽∇𝑓𝑓not𝑥𝑥 necessarily𝑝𝑝 required,𝛽𝛽 ∈ because𝜆𝜆 the use of the backtracking strategy avoids excessively small steps.

Step selection by backtracking

Here the strategy is to start with = 1, and then if +1 = + is not acceptable,

backtracking (reducing ) until 𝛼𝛼an𝑘𝑘 acceptable +1 =𝑥𝑥𝑘𝑘 + 𝑥𝑥𝑘𝑘 is𝛼𝛼𝑘𝑘 found.𝑝𝑝𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Example1.1: consider the𝛼𝛼 problem 𝑥𝑥 𝑥𝑥 𝛼𝛼 𝑝𝑝

2 2 ( 1, 2) = 5 1 + 2 3 1 2,

Let = (2,3) and 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚=𝑚𝑚 (𝑚𝑚𝑚𝑚𝑚𝑚5,𝑓𝑓 7)𝑥𝑥 𝑥𝑥, s o t𝑥𝑥 hat 𝑥𝑥 (− ) 𝑥𝑥=𝑥𝑥65.if = 1, then ( + 𝑇𝑇 𝑇𝑇 =121>𝑥𝑥𝑘𝑘 ( ) 𝑝𝑝𝑘𝑘 − − 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘

𝛼𝛼𝑘𝑘𝑝𝑝𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘 1 1 1 9 So this is not an acceptable step length. If = then ( + ) = , = 2 2 2 4 𝛼𝛼𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘 𝑓𝑓 �− − � and so this step length produce a decrease in the function value, as desired. The frame work of this algorithm is given below.

Algorithm (backtracking line search frame work)

1 Given 0, , 0 < < < 1, = 1 2 𝜆𝜆 ∈� � 𝑙𝑙 𝑢𝑢 𝛼𝛼𝑘𝑘 while ( + ) < ( ) + ( ) then = for s ome [ , ]. is 𝑇𝑇 choosen𝑓𝑓 𝑥𝑥 at𝑘𝑘 e ach𝛼𝛼𝑘𝑘 i𝑝𝑝 teration𝑘𝑘 𝑓𝑓 𝑥𝑥 by𝑘𝑘 line𝜆𝜆𝛼𝛼 s𝑘𝑘 earch,∇𝑓𝑓 𝑥𝑥 𝑘𝑘 +1𝑝𝑝𝑘𝑘 = 𝛼𝛼+ 𝜌𝜌𝛼𝛼𝑘𝑘. 𝜌𝜌 ∈ is𝑙𝑙 s𝑢𝑢 et t𝜌𝜌 o b e very small so that no more functional value𝑥𝑥 is𝑘𝑘 required𝑥𝑥𝑘𝑘 [8].𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝜆𝜆

Strategy for choosing ( )

𝒌𝒌 Define ( ) = ( + 𝜶𝜶) which,𝝆𝝆 is one dimensional restriction of f to the line through

in the𝜃𝜃 direction𝛼𝛼 𝑓𝑓 𝑥𝑥 𝑘𝑘 , if𝛼𝛼𝑝𝑝 we𝑘𝑘 need to backtrack we will use our most current information 𝑥𝑥about𝑘𝑘 f to model it and𝑝𝑝𝑘𝑘 then take the value of that minimizes the model as a next value of in the above algorithm. 𝛼𝛼 𝛼𝛼𝑘𝑘 Initially we have (0) = ( ) and (0) = ( ) (1.26) 𝑇𝑇 𝑘𝑘 ̇ 𝑘𝑘 𝑘𝑘 (1) = (𝜃𝜃 + 𝑓𝑓) 𝑥𝑥 𝜃𝜃 ∇𝑓𝑓 𝑥𝑥 𝑝𝑝 (1.27)

𝜃𝜃 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑝𝑝𝑘𝑘 So if ( + ) does not satisfy (1.24), that is, (1) > (0) + (0), then we model

( ) by𝑓𝑓 one𝑥𝑥𝑘𝑘 dimensional𝑝𝑝𝑘𝑘 quadratic model satisfying𝜃𝜃 (1.26)𝜃𝜃 and (1.𝜆𝜆𝜃𝜃27̇), that is 𝜃𝜃 𝛼𝛼 ( ) = (1) (0) (0) 2 + (0) + (0) . A nd c alculate t he poi nt, =

(0) 𝑚𝑚� 𝛼𝛼 � 𝜃𝜃 −> 𝜃𝜃 0 for− which 𝜃𝜃̇ �𝛼𝛼( ) =𝜃𝜃̇ 0. 𝛼𝛼 𝜃𝜃̇ 𝛼𝛼� 2[ (1) (0) (0)] 𝜃𝜃̇ 𝜃𝜃 −𝜃𝜃 −𝜃𝜃̇ 𝑚𝑚�̇ 𝛼𝛼 18

Now ( ) > 0.Since (1) > (0) + (0) > (0) + (0). Thus minimizes t he

model𝑚𝑚� function.̈ 𝛼𝛼� Furthermore𝜃𝜃 𝜃𝜃> 0 because𝜆𝜆𝜃𝜃̇ 𝜃𝜃(0) < 0,𝜃𝜃̇ therefore 𝛼𝛼�we take as our ( ) ( ) new value of in our backtracking𝛼𝛼� a lgorithm.𝜃𝜃̇ We not e t hat s ince 1 >𝛼𝛼� 0 + (0), we have𝛼𝛼 𝑘𝑘 𝜃𝜃 𝜃𝜃

𝜆𝜆𝜃𝜃̇ 1 1 1 < . In fact (1) (0), then this gives an upper bound of = on the 2(1 ) 2 2 𝛼𝛼�first value−𝜆𝜆 of in 𝜃𝜃the algorithm.≥ 𝜃𝜃 On the𝛼𝛼�≤ other hand if (1) is much larger𝑢𝑢 than (0), 1 can be very small but we impose a lower bound of = in the algorithm, that is at the 𝜌𝜌 10𝜃𝜃 𝜃𝜃 𝛼𝛼� 1 first b acktrack if 0.1 then w e take = . we𝑙𝑙 can u se b acktracking as t he f irst 10 iteration us ing qua𝛼𝛼�≤ dratic m odel i f ( )𝛼𝛼=𝑘𝑘 ( + )doesnot s atisfy (1.24) in th is

case we need to backtrack again. 𝜃𝜃 𝛼𝛼𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘

Although w e w ould us e qua dratic model a s i n t he f irst b acktrack w e now ha ve f our information on which are (0), (0), and the two values of ( ). so at this and any

subsequent backtrack𝜃𝜃 d uring𝜃𝜃 t he c𝜃𝜃 urrenṫ iteration, w e us e t he𝜃𝜃 c𝛼𝛼 ubic model of , the calculation of proceed as follows; 𝜃𝜃 𝛼𝛼 Let , 2 be t he l ast t wo pr evious va lues of then c ubic m odel t hat s atisfies

𝑝𝑝 𝑝𝑝 3 2 𝑘𝑘 (0)𝛼𝛼, (𝛼𝛼0), , 2 ( ) = + 𝛼𝛼 + (0) + (0)

𝑝𝑝 𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐 𝜃𝜃 𝜃𝜃̇ 𝜃𝜃�𝛼𝛼 1� 𝜃𝜃�𝛼𝛼1 � 𝑖𝑖𝑖𝑖 𝑀𝑀� 𝛼𝛼 𝑎𝑎𝛼𝛼 𝑏𝑏𝛼𝛼 𝜃𝜃̇ 𝛼𝛼 𝜃𝜃 1 2 2 (0) (0) = 2 2 = 2 𝑝𝑝𝑝𝑝 𝑝𝑝 (0) (0) 𝑎𝑎 𝛼𝛼 2 − 𝛼𝛼 2 2𝑝𝑝 𝑝𝑝2 𝑝𝑝 𝑝𝑝 𝜃𝜃�𝛼𝛼 � − 𝜃𝜃 − 𝜃𝜃 𝛼𝛼 𝑏𝑏 𝛼𝛼𝑝𝑝 −𝛼𝛼 𝑝𝑝 −𝛼𝛼 𝛼𝛼 � � � � � 𝑝𝑝 𝑝𝑝 � 𝛼𝛼𝑝𝑝 − 𝛼𝛼𝑝𝑝 𝜃𝜃�𝛼𝛼 � − 𝜃𝜃 − 𝜃𝜃̇ 𝛼𝛼 1 1 1 2 2 (0) (0) 2 2 2 ⎡ 𝑝𝑝 − 𝑝𝑝 ⎤ 2𝑝𝑝 (0) (0) 𝑝𝑝2 ⎢ 𝛼𝛼 2 𝛼𝛼 2⎥ 𝜃𝜃�𝛼𝛼 � − 𝜃𝜃 − 𝜃𝜃 𝛼𝛼 𝑝𝑝 2𝑝𝑝 � � 𝛼𝛼𝑝𝑝 − 𝛼𝛼 𝑝𝑝 ⎢ 𝛼𝛼 𝛼𝛼 ⎥ 𝑝𝑝 ̇ 𝑝𝑝 ⎢− − ⎥ 𝜃𝜃�𝛼𝛼 � − 𝜃𝜃 − 𝜃𝜃 𝛼𝛼 𝛼𝛼𝑝𝑝 𝛼𝛼 𝑝𝑝 ⎣ + 2⎦ 3 (0) Its local minimizing point is = . It can be shown that is always real if 3 −𝑏𝑏 �𝑏𝑏 − 𝑎𝑎𝜃𝜃̇ 1 1 1 < . F inally w e us e uppe r𝛼𝛼� bound f or 𝑎𝑎 = and a l ower bound 𝛼𝛼�= for this 4 2 10 𝛼𝛼 𝑢𝑢 𝑙𝑙 𝛼𝛼�

1 1 1 means that if > , we set a new = and if < then we use new 2 2 10 1 = . 𝛼𝛼� 𝛼𝛼 𝑝𝑝 𝛼𝛼𝑘𝑘 𝛼𝛼𝑝𝑝 𝛼𝛼� 𝛼𝛼𝑝𝑝 10 𝛼𝛼𝑘𝑘 𝛼𝛼𝑝𝑝 . . Trust region methods The c oncept of t he t rust r egion f irst a ppears in a pa per of L iebenberg ( 1944) and 𝟏𝟏 𝟑𝟑 𝟒𝟒 Marquart (1963) for solving nonlinear least square problems. Trust region methods are iterative numerical procedures like the line search methods in which an approximation of the obj ective f unction ( ) by a model ( ) is c omputed i n a ne ighborhood of t he

current it erate , w hich𝑓𝑓 w𝑥𝑥 e r efer t o a s t𝑚𝑚 he𝑘𝑘 t rust𝑝𝑝 r egion. T he m odel ( ) should b e constructed so𝑥𝑥 that𝑘𝑘 it is easier to handle than ( ) itself. Let us assume𝑚𝑚 fo𝑘𝑘 r𝑝𝑝 this that our 2 function is of class [7]. 𝑓𝑓 𝑥𝑥

We solve𝑓𝑓 the following𝐶𝐶 subproblem to obtain the next iteration at each step k of a trust 1 min ( ) ( ) + ( )Tp + pT 2f( )p region method. ( ) 2

𝑘𝑘 𝑝𝑝 𝑚𝑚𝑘𝑘 𝑝𝑝 ≔ 𝑓𝑓 𝑥𝑥𝑘𝑘 ∇𝑓𝑓 𝑥𝑥𝑘𝑘 ∇ 𝑥𝑥𝑘𝑘 𝑄𝑄𝑄𝑄 𝑡𝑡𝑡𝑡 � 𝑘𝑘 Where, > 0 is the t rust region r adius,𝑠𝑠𝑠𝑠 a𝑠𝑠 nd𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 .𝑡𝑡𝑡𝑡 ‖is𝑝𝑝 d‖ efined≤ ∆ t o be t he E uclidean norm.

These s∆ ub𝑘𝑘 problems a re c onstrained opt imization‖ ‖ p roblems i n w hich t he ob jective function and the constraints are both quadratic. The constraint is a quadratic inequality constraint and can be written as + 2 0. In fact usually, the model ( ) is a 𝑇𝑇 quadratic function which is truncated−𝑝𝑝 from𝑝𝑝 ∆a𝑘𝑘 Taylor≥ series for around the point𝑚𝑚 𝑘𝑘 𝑝𝑝: 𝑘𝑘 1 𝑓𝑓 𝑥𝑥 ( ) = ( ) + ( )Tp + pT 2f( )p (k N ) 2 0 𝑚𝑚𝑘𝑘 𝑝𝑝 𝑓𝑓 𝑥𝑥𝑘𝑘 ∇𝑓𝑓 𝑥𝑥𝑘𝑘 ∇ 𝑥𝑥𝑘𝑘 ∈ Where N0 = {0,1,2, … }

We n ote t hat o ne c an choose a ny other no rm in t he f ormulations. I n t his p roject w e consider the Euclidean norm . = . 2 since it makes some computations easier. Hence

our trust region for the model‖ ‖ ( )‖ is‖ a bounded neighborhood of the current iterate 𝑚𝑚𝑘𝑘 𝑝𝑝 𝑥𝑥𝑘𝑘 ( ) { + | 2 }.

𝛽𝛽 𝑥𝑥𝑘𝑘 ≔ 𝑥𝑥𝑘𝑘 𝑝𝑝 ‖𝑝𝑝‖ ≤ ∆𝑘𝑘

After constructing the model ( ) and its trust region then one seeks a t rial step to

the next iteration = + 𝑚𝑚 which𝑘𝑘 𝑝𝑝 will result in a reduction for the model while𝑝𝑝 the ( ) size of t he step i s𝑥𝑥 boun𝑘𝑘 𝑥𝑥 ded𝑘𝑘 by𝑝𝑝 that i s 2 .Then, the obj ective function i s evaluated at + to compare its𝛽𝛽 value𝑥𝑥𝑘𝑘 to the‖ one𝑝𝑝‖ predicted≤ ∆𝑘𝑘 by the model at this point. If the sufficient𝑥𝑥 reduction𝑘𝑘 𝑝𝑝 predicted by the model is accomplished by the objective function, + is accepted as the next iterate and the trust region is possibly expanded to include

𝑥𝑥this𝑘𝑘 new𝑝𝑝 point ( increase). If the reduction in the model is a poor predictor of the actual reduction of the∆ objective𝑘𝑘 function, then the trial point is rejected. We conclude that the region is too large and the size of the trust region is reduced( decreases), with the hope that the model provides a better prediction in the smaller region.∆𝑘𝑘

1.3.4.1 Outline and properties of the trust region algorithm In a trust region algorithm, a strategy for determining the trust region radius, , at each,

iteration is needed to be developed. The trust region radius can be determined ∆by𝑘𝑘 looking at the agreement between the model function and the objective function at previous iterations. Given a step we define the ratio 𝑚𝑚 𝑘𝑘 𝑓𝑓 𝑘𝑘 ( ) 𝑝𝑝 ( + ) = = (1.28) ( 𝑘𝑘) (𝑘𝑘 +𝑘𝑘 ) ( ) 𝑘𝑘 𝑓𝑓 𝑥𝑥 − 𝑓𝑓 𝑥𝑥 𝑝𝑝 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝜌𝜌 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 There are various𝑚𝑚 definitions𝑥𝑥 − 𝑚𝑚 𝑥𝑥for 𝑝𝑝 in the 𝑝𝑝mathematical𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 literature but we shall prefer the

above i n pa rticular he re. W e not𝜌𝜌𝑘𝑘 e t hat the de nominator of namely t he pr edicted reduction i s a lways non ne gative since t he s tep is c omputed𝜌𝜌𝑘𝑘 f rom t he s ubproblem ( ) . over a region that includes the step = 0.𝑝𝑝 In𝑘𝑘 fact can be called a measure of 𝑘𝑘 ( ) how𝑄𝑄𝑄𝑄 good𝑡𝑡 𝑟𝑟 t he model predicts the reduction𝑝𝑝 in the 𝜌𝜌 𝑘𝑘value. If is closer to 0 or negative, then the actual𝑚𝑚𝑘𝑘 prediction𝑝𝑝 is smaller than the predicted𝑓𝑓 one.𝜌𝜌 This𝑘𝑘 indicates that the model cannot be trusted in this region with radius . Thus will be rejected and

will be reduced. O n t he ot her h and i f is c lose ∆t o𝑘𝑘 one , an𝑝𝑝𝑘𝑘 a dequate pr ediction ∆ is𝑘𝑘 obtained we safely expand the trust region𝜌𝜌𝑘𝑘 since the model can be trusted over a wider region that should be increased. If is positive but not close to 1, then the trust region

radius is not changed. 𝜌𝜌𝑘𝑘

Algorithm(Trust-Region Algorithm) 21

The steps in derivative based trust region methods can be summarized by the following [2].

1. Specify some initial guess of the solution 0. Select the initial trust-region bound

0 > 0. Specify the constants 0 < μ <𝑥𝑥 η < 1 (perhaps μ = ¼ and η = ¾). 2. For ∆ = 0, 1, . . . (i) if 𝑘𝑘 is optimal, stop. (ii)solve𝑥𝑥𝑘𝑘 1 min ( ) = ( ) + ( ) + 2 ( ) 2 𝑇𝑇 𝑇𝑇 𝑝𝑝 𝑞𝑞 𝑘𝑘 𝑝𝑝 𝑓𝑓 𝑥𝑥𝑘𝑘 𝛻𝛻𝑓𝑓 𝑥𝑥 𝑘𝑘 𝑝𝑝 𝑝𝑝 ∇ 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑝𝑝

for the trial step 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑡𝑡𝑡𝑡 ‖𝑝𝑝‖ ≤ ∆𝑘𝑘 (iii) Compute 𝑝𝑝𝑘𝑘 ( ) ( + ) = = . ( ) ( ) 𝑓𝑓 𝑥𝑥𝑘𝑘 − 𝑓𝑓 𝑥𝑥𝑘𝑘 𝑝𝑝𝑘𝑘 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 (iv) 𝜌𝜌, 𝑓𝑓 +1𝑥𝑥 =− 𝑞𝑞 𝑝𝑝 (unsuccessful𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 st𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 ep), el se +1 = + (successful

step).𝐼𝐼 𝐼𝐼 𝜌𝜌𝑘𝑘 ≤ 𝜇𝜇 𝑡𝑡ℎ𝑒𝑒𝑒𝑒 𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝑝𝑝𝑘𝑘 (v) Update : 1 𝑘𝑘 = ∆ +1 2 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝜌𝜌 <≤ 𝜇𝜇 <⟹ ∆ +1∆ =

𝜇𝜇 𝜌𝜌 𝑘𝑘 𝜂𝜂 ⟹+1 =∆𝑘𝑘 2 . ∆𝑘𝑘 𝜌𝜌𝑘𝑘 ≥ 𝜂𝜂 ⟹ ∆𝑘𝑘 ∆𝑘𝑘 The value of indicates how well the model predicts the reduction in the function value.

If is small𝜌𝜌 𝑘𝑘(that is, ), then the actual reduction in the function value is much smaller𝜌𝜌𝑘𝑘 than that predicted𝜌𝜌𝑘𝑘 ≤ by𝜇𝜇 ( ), indicating that the model cannot be trusted for a bound as large as ; in this case𝑞𝑞𝑘𝑘 the𝜌𝜌𝑘𝑘 step will be rejected and will be reduced. If is large (that is, ∆𝑘𝑘 ), then the model𝜌𝜌 𝑘𝑘is adequately predicting∆𝑘𝑘 the reduction in the𝜌𝜌𝑘𝑘 function value, suggesting𝜌𝜌𝑘𝑘 ≥ 𝜂𝜂 that the model can be trusted over an even wider region; in this case the bound will be increased.

. ∆𝑘𝑘

Chapter 2 Derivative Free Optimization Methods

Introduction (What is derivative free optimization) Derivative Free O ptimization ( DFO) methods a re typically de signed t o solve optimization pr oblems whose obj ective function i s c omputed by a “ black box” ; he nce, the gradient computation is unavailable. Each call to the “black box” is often expensive, so e stimating d erivatives b y finite differences may b e p rohibitively costly. F inally, t he objective f unction va lue may be c omputed w ith s ome noi se, a nd t he f inite di fferences estimates may not be accurate [6].

The de rivative f ree opt imization method which w e us e a pproximates t he obj ective function e xplicitly w ithout a pproximating i ts de rivatives. T he t heoretical an alysis presented i n t his project assumes t hat no noise is present. Expensive experiments and intuition s upport t he claim t hat r obustness of D FO doe sn’t s uffer f rom pr esence of modern level of noise [1].

Derivative free optimization has been developed for solving optimization problems of the following form

min ( )

( ) Such that𝑓𝑓 𝑥𝑥 ( ) ( = 1, 2 … , )

𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑃𝑃 𝑎𝑎 ≤, 𝑔𝑔 𝑥𝑥 ≤ 𝑏𝑏 𝑖𝑖 𝑚𝑚 𝑛𝑛 Where the objective function ( ) and the𝑥𝑥 constraint ∈ 𝐹𝐹⊆ℝ s ( ) are expensive to compute at

a given vector and the derivatives𝑓𝑓 𝑥𝑥 of at are not available.𝑔𝑔𝑖𝑖 𝑥𝑥

The i dea i s t o𝑥𝑥 a pproximate t he o bjective𝑓𝑓 𝑥𝑥 f unction by a model w hich i s a ssumed t o describe the obj ective f unction w ell i n a trust r egion w ithout e xplicitly m odeling its derivatives. T his m odel i s c omputationally l ess e xpensive t o e valuate a nd easier t o optimize t han t he obj ective f unction i tself. T he model i s obt ained b y i nterpolating t he objective function using a quadratic interpolation polynomial.

Derivative f ree opt imization m ethods f or unc onstrained op timization b uild a linear or quadratic model of the objective function and apply one of the fundamental approaches of c ontinuous opt imization i .e. a trust r egion o r a l ine s earch, t o opt imize t his m odel. While de rivative ba sed methods t ypically us e a T aylor-based m odel w hich i s a n approximation of t he objective f unction, de rivative f ree methods us e i nterpolation, regression or other sample-based models. If the problem has constraints, the strategy of derivative f ree m ethods i s us ually t o a pply s equential qua dratic p rogramming methods for th e li nearization of t he c onstraints [ 6]. T he main c oncern of t his pr oject i s on unconstrained nonlinear programming problems without using derivatives.

We now c onsider t wo basic m ethods i n D FO: l ine s earch a nd t rust r egion m ethods without using derivatives.

2.2 line search methods Line search, also called one-dimensional search, refers to an optimization procedure for solving nonlinear Univariant problems. One dimensional line search is the back bone of many a lgorithms f or s olving non l inear p rogramming problems. M any non linear programming algorithms proceed as follows. Given a point , find a direction vector

and t hen a suitable s tep si ze , yi elding a ne w poi nt,𝑥𝑥𝑘𝑘 +1 = + ; t he process𝑝𝑝𝑘𝑘 is then repeated. 𝛼𝛼𝑘𝑘 𝑥𝑥𝑘𝑘 𝑥𝑥𝑘𝑘 𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘

Finding t he ne w s tep s ize involves s olving t he s ub pr oblem t o minimize ( +

which is a one dimensional𝛼𝛼𝑘𝑘 search problem in the variable . The minimization𝑓𝑓 𝑥𝑥𝑘𝑘 is 𝛼𝛼𝑘𝑘𝑝𝑝𝑘𝑘over all real , a non negative , or such that + is feasible.𝛼𝛼 𝑘𝑘 𝑘𝑘 As s tated b efore,𝛼𝛼 in m ultivariable𝛼𝛼 𝛼𝛼 optimization𝑥𝑥 a lgorithms,𝛼𝛼𝑑𝑑 f or g iven , th e i terative scheme +1 = + . (2.1) 𝑥𝑥𝑘𝑘 The key𝑖𝑖 𝑖𝑖is𝑥𝑥 𝑘𝑘to find 𝑥𝑥the𝑘𝑘 direction𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘 vector and a suitable step size . Let ( ) ( ) = 𝑝𝑝𝑘𝑘 + . (2.2) 𝛼𝛼𝑘𝑘 So, the problem that departs 𝜙𝜙from𝛼𝛼 and𝑓𝑓 𝑥𝑥 𝑘𝑘finds 𝛼𝛼𝑑𝑑a step𝑘𝑘 size in the direction such that 𝑥𝑥𝑘𝑘 𝑝𝑝𝑘𝑘 ( ) < (0) 𝜙𝜙 𝛼𝛼𝑘𝑘 𝜙𝜙 24

is just line search about α. If we find such that the objective function in the direction is minimized,

i.e., 𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘 ( + ) = min ( + ) >0 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 or 𝑓𝑓 𝑥𝑥 𝛼𝛼 𝑝𝑝 𝛼𝛼 𝑓𝑓 𝑥𝑥 𝛼𝛼𝑝𝑝 ( ) = min ( ) , >0 𝑘𝑘 such a l ine search i s c alled ex act𝜙𝜙 l ine𝛼𝛼 sear𝛼𝛼 ch o𝜙𝜙 r o𝛼𝛼 ptimal l ine sear ch, a nd is cal led

optimal step size. If we choose such that the objective function has acceptable𝛼𝛼𝑘𝑘 descent amount, i .e., s uch that the de scent𝛼𝛼𝑘𝑘 ( ) ( + ) > 0is accep table by u sers, such a line search is called inexact line𝑓𝑓 𝑥𝑥search,𝑘𝑘 − 𝑓𝑓 or𝑥𝑥 𝑘𝑘approximate𝛼𝛼𝑘𝑘 𝑝𝑝𝑘𝑘 line search, or acceptable line search. Since, in practical computation, theoretically exact optimal step size generally cannot be found, and it is also expensive to find almost exact step size, therefore the inexact line search with less computation load is highly popular. The f ramework o f lin e s earch is as f ollows. First, d etermine o r g ive a n in itial s earch interval w hich co ntains t he m inimizer; t hen em ploy so me sect ion t echniques o r interpolations to reduce the interval iteratively until the length of the interval is less than some given tolerance [10]. Next, we give a notation about the search interval and a simple method to determine the initial search interval. Interval of uncertainty Definition 2.1. Let , [0, + ), ∗ ( ) = min ( ) 𝜙𝜙 ∶ ℝ → ℝ 𝛼𝛼 ∈ ∞ 0𝑎𝑎𝑎𝑎𝑎𝑎 ∗ If there exists a closed interval [ , 𝜙𝜙] 𝛼𝛼 [0, + 𝛼𝛼≥) such𝜙𝜙 𝛼𝛼 that [ , ], then ∗ [ , ] is called a search interval𝑎𝑎 for𝑏𝑏 one-⊂ dimensional∞ minimization𝛼𝛼 ∈ 𝑎𝑎 𝑏𝑏problem min𝑎𝑎 𝑏𝑏 0 ( ). Since the exact location of the minimum of over [ , ] is not known, this interval𝛼𝛼≥ 𝜙𝜙is also𝛼𝛼 called the interval of uncertainty. 𝜙𝜙 𝑎𝑎 𝑏𝑏 A simple method to determine an initial interval is called the forward-backward method. The ba sic i dea of t his m ethod i s a s f ollows. Given a n i nitial poi nt a nd a n i nitial s tep length, we attempt to determine three points at which their function values show “high– 25

low–high” ge ometry. I f i t i s not s uccessful t o go f orward, w e w ill go ba ckward.

Concretely, given an initial point, 0, and a step length, 0 > 0. If

𝛼𝛼( 0 + 0) < ( 0),ℎ then, next step, depart from 0 +𝜙𝜙 𝛼𝛼0 and ℎcontinue𝜙𝜙 going𝛼𝛼 forward with a larger step until the objective function increases.𝛼𝛼 If ℎ ( 0 + 0) > ( 0),

then, next step, depart from 0 and𝜙𝜙 go𝛼𝛼 backwardℎ until𝜙𝜙 𝛼𝛼 the objective function increases. So, we will obtain an initial interval𝛼𝛼 which contains the minimum . ∗ Algorithm 2.1.2 (Forward-Backward Method) 𝛼𝛼 Step 1. Given 0 [0, ), 0 > 0, the multiple coefficient t > 1

(Usually t = 2𝛼𝛼). Evaluate∈ ∞ (ℎ 0), = 0. Step 2. Compare the objective𝜙𝜙 𝛼𝛼 function𝑘𝑘 ∶ values. Set +1 = + and evaluate +1 = ( +1). If +1 < , go to Step 3; otherwise, go𝛼𝛼 𝑘𝑘to Step𝛼𝛼 4.𝑘𝑘 ℎ𝑘𝑘 𝜙𝜙𝑘𝑘 Step𝜙𝜙 𝛼𝛼 𝑘𝑘3. F orward𝜙𝜙𝑘𝑘 s tep.𝜙𝜙 Se𝑘𝑘 t +1 = , = , = +1, = +1, = + 1, go to Step 2. ℎ𝑘𝑘 ∶ 𝑡𝑡ℎ𝑘𝑘 𝛼𝛼 ∶ 𝛼𝛼𝑘𝑘 𝛼𝛼𝑘𝑘 ∶ 𝛼𝛼𝑘𝑘 𝜙𝜙𝑘𝑘 ∶ 𝜙𝜙𝑘𝑘 𝑘𝑘 ∶ 𝑘𝑘 Step 4. Backward step. If k = 0, invert the search direction. Set = , = + 1, go to Step 2; otherwise, set

ℎ𝑘𝑘 ∶ −ℎ𝑘𝑘 𝛼𝛼𝑘𝑘 ∶ 𝛼𝛼𝑘𝑘 = { , +1}, = { , +1}, Output [a, b] and stop. 𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚 𝛼𝛼 𝛼𝛼𝑘𝑘 𝑏𝑏 𝑚𝑚𝑚𝑚𝑚𝑚 𝛼𝛼 𝛼𝛼𝑘𝑘 The methods of line search presented in this chapter use the unimodality of the function and interval. T he f ollowing de finitions and t heorem i ntroduce their c oncepts a nd properties. Definition 2.2 Let ,[ , ] . If there [ , ] such that ∗ ( ) is strictly decreasing𝜙𝜙 ∶ ℝ → onℝ [ 𝑎𝑎, 𝑏𝑏 ] and⊂ ℝ strictly increasing𝑖𝑖𝑖𝑖 𝛼𝛼 ∈ on𝑎𝑎 [𝑏𝑏 , ], then ∗ ∗ 𝜙𝜙(𝛼𝛼) is called a unimodal function𝑎𝑎 𝛼𝛼 on [ , ]. Such an interval [ 𝛼𝛼, ]𝑏𝑏 is called a unimodal 𝜙𝜙interval𝛼𝛼 related to ( ). 𝑎𝑎 𝑏𝑏 𝑎𝑎 𝑏𝑏 The unimodal function𝜙𝜙 𝛼𝛼 can also be defined as follows. Definition 2.3 If there exists a unique [ , ], such that for any ∗ 1, 2 [ , ], 1 < 2, th e fo llowing𝛼𝛼 ∈s tatements𝑎𝑎 𝑏𝑏 h old: if 2 < , t hen ( 1) > ∗ 𝛼𝛼 (𝛼𝛼2);∈ if 𝑎𝑎1 𝑏𝑏 > 𝛼𝛼 , then𝛼𝛼 ( 1) < ( 2); then ( ) is the unimodal𝛼𝛼 𝛼𝛼function on𝜙𝜙 𝛼𝛼[ , ]. ∗ 𝜙𝜙 𝛼𝛼 𝛼𝛼 𝛼𝛼 𝜙𝜙 𝛼𝛼 𝜙𝜙 𝛼𝛼 26 𝜙𝜙 𝛼𝛼 𝑎𝑎 𝑏𝑏

Note that, first, the unimodal function does not require the continuity and differentiability of t he f unction; s econd, us ing t he property of t he uni modal f unction, w e c an e xclude portions of the i nterval of unc ertainty t hat do n ot c ontain the m inimum, s uch that th e interval of uncertainty is reduced. The following theorem shows that if the function is

unimodal on [ , ], then the interval of uncertainty could be reduced by c omparing𝜙𝜙 the function values𝑎𝑎 of𝑏𝑏 at two points within the interval. Theorem 2.1 Let 𝜙𝜙 be unimodal on [ , ]. Let 1, 2 [ , ], and 1 < 2. Then 𝜙𝜙 ∶ ℝ → ℝ 𝑎𝑎 𝑏𝑏 𝛼𝛼 𝛼𝛼 ∈ 𝑎𝑎 𝑏𝑏 𝛼𝛼 𝛼𝛼 1. if ( 1) ( 2), then [ , 2] is a unimodal interval related to ;

2. if 𝜙𝜙(𝛼𝛼1) ≤ 𝜙𝜙(𝛼𝛼2), then [𝑎𝑎1𝛼𝛼, ] is a unimodal interval related to 𝜙𝜙. Proof.𝜙𝜙 𝛼𝛼From≥ t𝜙𝜙 he 𝛼𝛼D efinition𝛼𝛼 2.2,𝑏𝑏 there ex ists [ , ] such t𝜙𝜙 hat ( ) is s trictly ∗ decreasing over [ , ] and strictly increasing over𝛼𝛼 ∈[ 𝑎𝑎, 𝑏𝑏]. since ( 1)𝜙𝜙 𝛼𝛼 ( 2), then ∗ ∗ [ , 2].Since𝑎𝑎 𝛼𝛼 ( ) is unimodal on[ , ], it is also𝛼𝛼 𝑏𝑏 unimodal𝜙𝜙 on𝛼𝛼 [ ,≤2]𝜙𝜙. Therefore𝛼𝛼 ∗ 𝛼𝛼[ , ∈2] is𝑎𝑎 a𝛼𝛼 unimodal𝜙𝜙 interval𝛼𝛼 related to ( 𝑎𝑎) 𝑏𝑏and the proof of the first part𝑎𝑎 𝛼𝛼is complete. The𝑎𝑎 𝛼𝛼 second part of the theorem can be proved𝜙𝜙 𝛼𝛼 similarly. This t heorem i ndicates t hat, f or reducing t he i nterval of unc ertainty, we must at l east select two observations, evaluate and compare their function values. Theorem 2.2 Let : be a st rictly q uassi-convex ove r t he interval[ , ] . Le t [ ] , , be such𝜙𝜙 thatℝ→ ℝ< .then; 𝑎𝑎 𝑏𝑏

𝛼𝛼If 𝜇𝜇( ∈ )𝑎𝑎>𝑏𝑏 ( ), t hen (𝛼𝛼 ) 𝜇𝜇 ( ), fo r a ll [ , ). I f ( ) ( ), t hen ( ) ( ) [ ) 𝜙𝜙 𝛼𝛼, for all𝜙𝜙 𝜇𝜇 , 𝜙𝜙. 𝑧𝑧 ≥ 𝜙𝜙 𝜇𝜇 𝑧𝑧 ∈ 𝑎𝑎 𝛼𝛼 𝜙𝜙 𝛼𝛼 ≤ 𝜙𝜙 𝜇𝜇 𝜙𝜙 𝑧𝑧 ≥ 𝜙𝜙Proof𝛼𝛼 : suppose𝑧𝑧 ∈that𝜇𝜇 𝑏𝑏( ) > ( ) , and let [ , ). ( ) ( ) By contradiction, suppose𝜙𝜙 𝛼𝛼 that𝜙𝜙 𝜇𝜇 < 𝑧𝑧,∈ since𝑎𝑎 𝛼𝛼 can be written as a convex ( ) combination of and , and by𝜙𝜙 the𝑧𝑧 strict𝜙𝜙 quasi-convexity𝜇𝜇 𝛼𝛼 of , we have < { ( ) ( )} ( ) ( ) ( ) ( ) ( ) , 𝑧𝑧 = 𝜇𝜇 contradicting > . Hence,𝜙𝜙 𝜙𝜙 𝛼𝛼 . The 𝑚𝑚second𝑚𝑚𝑚𝑚 𝜙𝜙 part𝑧𝑧 𝜙𝜙of 𝜇𝜇the Theorem𝜙𝜙 𝜇𝜇 can be proved𝜙𝜙 similarly.𝛼𝛼 𝜙𝜙 𝜇𝜇 𝜙𝜙 𝑧𝑧 ≥ 𝜙𝜙 𝜇𝜇

FromTheorem.2.2, unde r strict qua ssi-convexity, i f ( ) > ( ), th e n ew interval o f [ ] ( ) ( ) uncertainty is , , on t he other hand, if 𝜙𝜙 ,𝛼𝛼 the new𝜙𝜙 𝜇𝜇 interval of uncertainty [ ] is , . 𝛼𝛼 𝑏𝑏 𝜙𝜙 𝛼𝛼 ≤ 𝜙𝜙 𝜇𝜇 27 𝑎𝑎 𝜇𝜇

The Golden Section Method and the Fibonacci Method The golden section method and the Fibonacci method are section methods. Their basic idea for minimizing a unimodal function over [ , ] is iteratively reducing the interval of uncertainty by comparing the function values of𝑎𝑎 𝑏𝑏the observations. When the length of the interval of uncertainty is reduced to some desired degree, the points on the interval can be r egarded as a pproximations of the m inimizer. S uch a class of m ethods only needs to evaluate the functions and has wide applications; especially it is suitable to nonsmooth problems and those problems with complicated derivative expressions.

2.2.1.1 The golden section method We now de scribe the more e fficient gol den s ection m ethod f or minimizing a s trictly quassi-convex f unction. A t a ge neral i teration k of t he gol den s ection method, l et t he interval of unc ertainty be [ , ] then by t he a bove t heorem the ne w i nterval of [ ] ( ) [ ] uncertainty [ +1, +1] is gi𝑎𝑎𝑘𝑘 ven𝑏𝑏𝑘𝑘 by , if > ( ) and by , ( ) if (𝑎𝑎𝑘𝑘 ). The𝑏𝑏𝑘𝑘 points and are𝛼𝛼 𝑘𝑘selected𝑏𝑏𝑘𝑘 such𝜙𝜙 𝛼𝛼 that;𝑘𝑘 𝜙𝜙 𝜇𝜇𝑘𝑘 𝑎𝑎𝑘𝑘 𝜇𝜇𝑘𝑘 𝜙𝜙 𝛼𝛼𝑘𝑘 ≤ 𝜙𝜙 𝜇𝜇𝑘𝑘 𝛼𝛼𝑘𝑘 𝜇𝜇𝑘𝑘 1) The length of the new interval of uncertainty +1 +1 does not depend up on

the out come of iteration, t hat i s, on w hether𝑏𝑏𝑘𝑘 − 𝑎𝑎(𝑘𝑘 ) > ( ) or ( ) 𝑡𝑡ℎ ( ). Therefore𝑘𝑘 we must have = . 𝜙𝜙Thus,𝛼𝛼𝑘𝑘 if 𝜙𝜙 is𝜇𝜇 𝑘𝑘of the 𝜙𝜙form𝛼𝛼𝑘𝑘 ≤ 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝜙𝜙 𝜇𝜇 = + (1 )(𝑏𝑏 − 𝛼𝛼 ) 𝜇𝜇 − 𝑎𝑎 𝛼𝛼 ( )

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 (0,1𝛼𝛼).Then𝑎𝑎 must− be 𝜆𝜆 of𝑏𝑏 the− form 𝑎𝑎 ∗

𝑘𝑘 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝜆𝜆 ∈ = 𝜇𝜇+ ( ) ( )

𝜇𝜇𝑘𝑘 𝑎𝑎𝑘𝑘 𝜆𝜆 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 ∗∗ So that +1 +1 = ( )

𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝜆𝜆 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 2) As +1 and +1 are sel ected f or t he p urpose o f a n ew i teration, either +1

coincides𝜇𝜇𝑘𝑘 with𝛼𝛼 𝑘𝑘 or +1 coincides with . If this can be realized, then during𝛼𝛼𝑘𝑘 iteration + 1, 𝜇𝜇only𝑘𝑘 one𝜇𝜇𝑘𝑘 extra observation 𝛼𝛼is𝑘𝑘 needed. Consider the following two cases 𝑘𝑘

Case 1 : ( ) > ( ) in t his cas e +1 = +1 = . To s atisfy +1 =

, and applying𝜙𝜙 𝛼𝛼𝑘𝑘 ( 𝜙𝜙) with𝜇𝜇𝑘𝑘 replaced by𝑎𝑎 𝑘𝑘 + 1,𝛼𝛼 we𝑘𝑘 𝑎𝑎 get𝑎𝑎𝑎𝑎 𝑏𝑏𝑘𝑘 𝑏𝑏𝑘𝑘 𝛼𝛼𝑘𝑘 𝜇𝜇𝑘𝑘 ∗ 𝑘𝑘 𝑘𝑘 = +1 = +1 + (1 )( +1 +1) = + (1 )( )

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Substituting𝜇𝜇 𝛼𝛼 the𝑎𝑎 expressions− 𝜆𝜆of 𝑏𝑏 and− 𝑎𝑎 from 𝛼𝛼( ) −( 𝜆𝜆) into𝑏𝑏 − the 𝛼𝛼 above equation 2 we get + 1 = 0. 𝛼𝛼𝑘𝑘 𝜇𝜇𝑘𝑘 ∗ 𝑎𝑎𝑎𝑎𝑎𝑎 ∗∗

Case 2:𝜆𝜆 ( 𝜆𝜆) − ( )

𝜙𝜙 𝛼𝛼𝑘𝑘 ≤ 𝜙𝜙 𝜇𝜇𝑘𝑘 In this case +1 = and +1 = to satisfy +1 = and apply ( ) with

replaced by 𝑎𝑎𝑘𝑘+ 1. we𝑎𝑎 𝑘𝑘get 𝑏𝑏𝑘𝑘 𝜇𝜇𝑘𝑘 𝜇𝜇𝑘𝑘 𝛼𝛼𝑘𝑘 ∗∗ 𝑘𝑘 𝑘𝑘 = +1 = +1 + ( +1 +1)

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝛼𝛼 𝜇𝜇 = 𝑎𝑎+ ( 𝜆𝜆 𝑏𝑏 ). − 𝑎𝑎

𝑘𝑘 𝑘𝑘 𝑘𝑘 Noting (*) and (**), the above equation𝑎𝑎 gives𝜆𝜆 𝜇𝜇 −2 𝑎𝑎+ 1 = 0.

The r oots of t he e quation 2 + 1 = 0. are𝜆𝜆 𝜆𝜆0.618 − 1.618. since

must be in the interval (0,1)𝜆𝜆 then𝜆𝜆 − 0.618. 𝜆𝜆 ≅ 𝑎𝑎𝑎𝑎𝑎𝑎 𝜆𝜆 ≅ − 𝜆𝜆

To s ummarize, if a t iteration k , 𝜆𝜆 ≅and are c hosen a ccording t o ( *) and ( **),

where = 0.618, then the interval𝜇𝜇 of𝑘𝑘 uncertainty𝛼𝛼𝑘𝑘 is reduced by a factor of 0.618. At the first𝜆𝜆 iteration, two observations are needed at 1 and 1, but at each subsequent iteration, only one evaluations is needed, since either𝛼𝛼 +1𝜇𝜇= or +1 = . 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Algorithm (the golden section method) 𝛼𝛼 𝜇𝜇 𝜇𝜇 𝛼𝛼

The following is a summary of the golden section method for minimizing the strictly quassi-convex function over the interval [ , ] [8].

𝑎𝑎 𝑏𝑏 Initialization step: choose an allowable final length of uncertainty > 0.let [ 1, 1] ( ) be the interval of uncertainty, and let 1 = 1 + 1 ( 1 1) andℓ 𝑎𝑎 𝑏𝑏 𝛼𝛼 𝑎𝑎 − 𝜆𝜆 𝑏𝑏 − 𝑎𝑎 1 = 1 + ( 1 1) for = 0.618. E valuate ( 1) ( 1). Le t = 1, a nd go

𝜇𝜇to the 𝑎𝑎main𝜆𝜆 step.𝑏𝑏 − 𝑎𝑎 𝛼𝛼 𝜙𝜙 𝛼𝛼 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝜇𝜇 𝑘𝑘 29

step1: if < , : the optimal solution lies in the interval [ , ]. Otherwise, ( ) ( ) if >𝑏𝑏𝑘𝑘 −( 𝛼𝛼𝑘𝑘 go to𝑙𝑙 𝑠𝑠𝑠𝑠step𝑠𝑠𝑠𝑠 2; and if ( ), go to step 3. 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 𝜙𝜙 𝛼𝛼𝑘𝑘 𝜙𝜙 𝜇𝜇𝑘𝑘 𝜙𝜙 𝛼𝛼𝑘𝑘 ≤ 𝜙𝜙 𝜇𝜇𝑘𝑘 Step 2: let +1 = and +1 = . Furthermore, let +1 = , and let +1 = +1 + ( ) +1 𝑎𝑎𝑘𝑘+1 . Evaluate𝛼𝛼𝑘𝑘 𝑏𝑏𝑘𝑘( +1)𝑏𝑏 and𝑘𝑘 go to step 4. 𝛼𝛼𝑘𝑘 𝜇𝜇𝑘𝑘 𝜇𝜇𝑘𝑘 𝑎𝑎𝑘𝑘 𝜆𝜆 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝜙𝜙 𝜇𝜇𝑘𝑘 Step3: let +1 = and +1 = . Furthermore, let +1 =

𝑎𝑎𝑘𝑘 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 𝜇𝜇𝑘𝑘 𝜇𝜇𝑘𝑘 𝛼𝛼𝑘𝑘 𝑎𝑎𝑎𝑎𝑎𝑎 +1 = +1 + + (1 )( +1 +1). Evaluate ( +1) and go to step 4.

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Step4𝑙𝑙𝑙𝑙𝑙𝑙𝛼𝛼 : replace𝑎𝑎 by 𝛼𝛼 + 1 and− 𝜆𝜆go to𝑏𝑏 step− 1. 𝑎𝑎 𝜙𝜙 𝛼𝛼

Example2.1: consider𝑘𝑘 𝑘𝑘 the following problem

2 + 2 3 5

Note that𝑚𝑚 the𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚true𝑚𝑚𝑚𝑚𝑚𝑚 minimum𝑥𝑥 𝑥𝑥 𝑠𝑠𝑠𝑠is 𝑠𝑠𝑠𝑠1.0𝑠𝑠𝑠𝑠𝑠𝑠 𝑡𝑡𝑡𝑡 − ≤ 𝑥𝑥 ≤

Table2.1; Summary of computations− for the above problem using golden section method

Iteration ( ) ( )

𝑘𝑘1 3𝑎𝑎.000𝑘𝑘 5.𝑏𝑏000𝑘𝑘 0.𝛼𝛼056𝑘𝑘 1.𝜇𝜇944𝑘𝑘 𝜙𝜙 0.115𝛼𝛼𝑘𝑘 * 𝜙𝜙 7.667𝜇𝜇𝑘𝑘 * 2 −3.000 1.944 1.112 0.056 1.987* 0.115 3 −3.000 0.056 −1.832 1.112 −0.308 0.987 4 −1.832 0.056 −1.112 −0.664 − 0.987 ∗ −0.887 5 −1.832 0.664 −1.384 −1.112 −0.853 − 0.987 ∗ 6 −1.384 −0.664 −1.112 −0.936 − 0.987 ∗ −0.996* 7 −1.112 −0.664 −0.936 −0.840 −0.996 −0.974* 8 −1.112 −0.840 −1.016 −0.936 −1.000* −0.996 − − − − − −

Clearly the function to be minimized is strictly quassi-convex, and the initial interval

of uncertainty is of length𝜙𝜙 8. We reduce this interval of uncertainty to one whose length is at most 0.2. The first two observations are located at

1 = 3 + 0.382(8) = 0.056

𝛼𝛼 − 1 = 3 + 0.618(8) = 1.944

𝜇𝜇 − Note t hat ( 1) ( 1). H ence t he new i nterval of unc ertainty is[ 3, 1.944]. T he

process is 𝜙𝜙repeated,𝛼𝛼 ≤ 𝜙𝜙and𝜇𝜇 the computations are summarized in the above table.− The values of that are computed at each iteration are indicated by an asterisk.

After𝜙𝜙 e ight i terations i nvolving ni ne obs ervations, t he i nterval of unc ertainty i s [ 1.112, 0.936], so that the minimum can be estimated to be the midpoint 1.024.

The− numerical− result for example 2.1 is -0.9998 (see appendix for the matlab −code)

2.2.1.2 The Fibonacci search The F ibonacci se arch is a l ine search p rocedure f or m inimizing st rictly q uassi-convex function over a c losed bounde d i nterval. S imilar t o t he golden s ection m ethod t he

Fibonacci𝜙𝜙 s earch p rocedure m akes t wo f unctional ev aluations at the first iteration an d then onl y o ne e valuation a t e ach of t he s ubsequent i terations. H owever, t he p rocedure differs from golden section method in the reduction of interval of uncertainty varies from one iteration to another [8].

The procedure is based on the Fibonacci sequence { } defined as follows;

𝐹𝐹𝑣𝑣 +1 = + 1 = 1,2 … (2.3)

𝐹𝐹𝑣𝑣 𝐹𝐹𝑣𝑣 𝐹𝐹𝑣𝑣− 𝑣𝑣 0 = 1 = 1

The seq uence i s therefore𝐹𝐹 1,1,2,5,8,𝐹𝐹 13,21,34,55,89,144,233, …. at t he ite ration

suppose t hat t he i nterval of unc ertainty i s[ , ]. C onsider t he t wo poi nts and 𝑘𝑘 given below, where n is the total number of functional𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 evaluations planned. 𝛼𝛼𝑘𝑘 𝜇𝜇𝑘𝑘

1 = + ( ), = 1,2, … , 1 (2.4) +1 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝛼𝛼𝑘𝑘 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝑘𝑘 𝑛𝑛 − 𝐹𝐹𝑛𝑛−𝑘𝑘 = + ( ), = 1,2, … , 1 (2.5) +1 𝐹𝐹𝑛𝑛−𝑘𝑘 𝜇𝜇𝑘𝑘 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝑘𝑘 𝑛𝑛 − 𝐹𝐹𝑛𝑛−𝑘𝑘 By theorem(2.2), the new interval of uncertainty [ +1, +1] is given by

𝑘𝑘 𝑘𝑘 [ , ] ( ) > ( ) and is given by[ , ]𝑎𝑎 (𝑏𝑏 ) ( ).

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 In𝛼𝛼 the𝑏𝑏 former𝑖𝑖𝑖𝑖𝑖𝑖 𝛼𝛼case, nothing𝜙𝜙 𝜇𝜇 (2.4) and letting 𝑎𝑎 =𝜇𝜇 𝑖𝑖𝑖𝑖𝑖𝑖 in𝛼𝛼(2.3)≤ 𝜙𝜙, we𝜇𝜇 get

𝑣𝑣 𝑛𝑛 −𝑘𝑘 +1 +1 =

𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑏𝑏 − 𝑎𝑎 1 𝑏𝑏 − 𝛼𝛼 = ( ) (2.6) +1 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 − 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝐹𝐹𝑛𝑛−𝑘𝑘 = ( ) +1 𝐹𝐹𝑛𝑛−𝑘𝑘 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝐹𝐹𝑛𝑛−𝑘𝑘 In the later case, nothing (2.5) we get +1 +1 = = ( 𝐹𝐹𝑛𝑛+1 −𝑘𝑘 (2.7) 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝜇𝜇𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝐹𝐹𝑛𝑛 −𝑘𝑘 𝑏𝑏𝑘𝑘 −

𝑎𝑎𝑘𝑘 Thus, in either case the interval of uncertainty is reduced by the factor .We now 𝐹𝐹𝑛𝑛+1 −𝑘𝑘 𝑛𝑛 −𝑘𝑘 show th at a t ite ration k+1, e ither +1 = +1 = , s o t hat onl y 𝐹𝐹o ne f unctional ( ) evaluation is ne eded. S uppose t𝛼𝛼𝑘𝑘 hat 𝜇𝜇𝑘𝑘 𝑜𝑜𝑜𝑜>𝜇𝜇𝑘𝑘 ( 𝛼𝛼then𝑘𝑘 by t heorem(2.2), +1 = , +1 = . 𝜙𝜙 𝜆𝜆𝑘𝑘 𝜙𝜙 𝜇𝜇𝑘𝑘 𝑎𝑎𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Thus,𝛼𝛼 𝑎𝑎𝑎𝑎𝑎𝑎 applying𝑎𝑎 (2.4)𝑏𝑏 with k replaced by k+1, we get

2 +1 = +1 + ( +1 +1) 𝑛𝑛−𝑘𝑘− 𝑘𝑘 𝑘𝑘 𝐹𝐹 𝑘𝑘 𝑘𝑘 𝛼𝛼 𝑎𝑎 𝑛𝑛−𝑘𝑘 𝑏𝑏 − 𝑎𝑎 𝐹𝐹 2 = + ( ) 𝑛𝑛−𝑘𝑘− 𝑘𝑘 𝐹𝐹 𝑘𝑘 𝑘𝑘 𝛼𝛼 𝑛𝑛−𝑘𝑘 𝑏𝑏 − 𝜆𝜆 Substituting for from (2.4) we get 𝐹𝐹

𝛼𝛼𝑘𝑘 32

1 2 1 +1 = + ( ) + 1 ( ) +1 +1 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝛼𝛼𝑘𝑘 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 � − � � 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 � 𝐹𝐹𝑛𝑛−𝑘𝑘 𝐹𝐹𝑛𝑛−𝑘𝑘 𝐹𝐹𝑛𝑛−𝑘𝑘 Letting = in (2.3), it follows that 1 1 = . 𝐹𝐹𝑛𝑛+1 −𝑘𝑘− 𝐹𝐹𝑛𝑛+1 −𝑘𝑘 𝑣𝑣 𝑛𝑛 −𝑘𝑘 − 𝐹𝐹𝑛𝑛 −𝑘𝑘 𝐹𝐹𝑛𝑛 −𝑘𝑘 1 + 2 +1 = + ( )( ) +1 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝐹𝐹𝑛𝑛−𝑘𝑘− 𝛼𝛼𝑘𝑘 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝐹𝐹𝑛𝑛−𝑘𝑘 Now l et = 1 in (2.3), and not hing (2.5), it f ollows th at +1 = +

( 𝑣𝑣 𝑛𝑛) = .Similarly, −𝑘𝑘 if ( ) ( ), and then we can easily 𝛼𝛼verify𝑘𝑘 that𝑎𝑎𝑘𝑘 𝐹𝐹𝑛𝑛+1 −𝑘𝑘 𝐹𝐹𝑛𝑛 −𝑘𝑘 𝑏𝑏𝑘𝑘 − 𝑎𝑎𝑘𝑘 𝜇𝜇𝑘𝑘 𝜙𝜙 𝛼𝛼𝑘𝑘 ≤ 𝜙𝜙 𝜇𝜇𝑘𝑘 +1 = . Thus, in either case, only one observation is needed at iteration + 1.

𝑘𝑘 𝑘𝑘 To𝜇𝜇 summarize,𝛼𝛼 at the first iteration, two observations are made, and at each,𝑘𝑘 subsequent iteration, only one observation is necessary. Thus, at the end of iteration 2, we have ( ) computed 1 functional evaluations. Further, = 1 it follows 𝑛𝑛from − 2.4 and ( ) 2.5 that 𝑛𝑛 − 𝑓𝑓𝑓𝑓𝑓𝑓 𝑘𝑘 𝑛𝑛 −

1 = = ( + ). Since either = or = , 1 1 2 1 1 1 2 1 2 𝛼𝛼theoretically𝑛𝑛− 𝜇𝜇𝑛𝑛− no new𝑎𝑎 𝑛𝑛−observations𝑏𝑏𝑛𝑛− are to be made𝛼𝛼𝑛𝑛− at this𝜇𝜇 stage.𝑛𝑛− However𝜇𝜇𝑛𝑛− 𝛼𝛼 in𝑛𝑛− order to further reduce the interval of uncertainty; the last observation is placed slightly to the 1 right or to the left of the middle point = so that ( ) is the length 1 1 2 1 1 of the final uncertainty[ , ]. 𝛼𝛼𝑛𝑛− 𝜇𝜇𝑛𝑛− 𝑏𝑏𝑛𝑛− − 𝑎𝑎𝑛𝑛−

𝑛𝑛 𝑛𝑛 Algorithm (Fibonacci search𝑎𝑎 𝑏𝑏 method)

The f ollowing i s a s ummary o f t he F ibonacci search f or minimizing a qua ssi-convex

function over the interval[ 1, 1].

Initialization s tep: ch ose𝑎𝑎 an𝑏𝑏 al lowable f inal l ength o f u n c ertainty > 0 and a

distinguishability c onstant > 0. let [ 1, 1] be t he in itial interval o f uℓ ncertainty, a nd ( 1 1) choose the number of observations𝜀𝜀 n to 𝑎𝑎to be𝑏𝑏 taken such that > . Let 1 = 1 + 𝑏𝑏 −𝑎𝑎 2 𝑛𝑛 ℓ ( 1 1) and 𝐹𝐹 𝛼𝛼 𝑎𝑎 𝐹𝐹𝑛𝑛 − 𝐹𝐹𝑛𝑛 𝑏𝑏 − 𝑎𝑎 33

1 1 = 1 + ( 1 1). Evaluate ( 1) and ( 1), let k=1, and go to the main step 𝐹𝐹𝑛𝑛 − 𝜇𝜇 𝑎𝑎 𝐹𝐹𝑛𝑛 𝑏𝑏 − 𝑎𝑎 𝜙𝜙 𝜆𝜆 𝜙𝜙 𝜇𝜇 Main steps

1) If ( ) > ( ), go to step2: and if ( ) ( ), go to step 3

2) Let𝜙𝜙 𝛼𝛼+1𝑘𝑘 = 𝜙𝜙 𝜇𝜇and𝑘𝑘 +1 = . Furthermore,𝜙𝜙 𝜆𝜆𝑘𝑘 let≤ 𝜙𝜙 𝜇𝜇+1𝑘𝑘 = , and let 𝑘𝑘 𝑘𝑘 𝑘𝑘 1 𝑘𝑘 𝑘𝑘 𝑘𝑘 +1𝑎𝑎= +1𝛼𝛼+ 𝑏𝑏 ( 𝑏𝑏+1 +1). 𝛼𝛼 𝜇𝜇 𝐹𝐹𝑛𝑛 −𝑘𝑘− 𝑘𝑘 𝑘𝑘 𝑛𝑛 −𝑘𝑘 𝑘𝑘 𝑘𝑘 𝜇𝜇If = 𝑎𝑎 2, go� to𝐹𝐹 step5;� 𝑏𝑏otherwise− 𝑎𝑎 evaluate ( +1) and go to step 4.

3) Let𝑘𝑘 +1𝑛𝑛= − and +1 = . Furthermore, let𝜙𝜙 𝜇𝜇𝑘𝑘+1 = , and let 𝑘𝑘 𝑘𝑘 𝑘𝑘 2 𝑘𝑘 𝑘𝑘 𝑘𝑘 +1𝑎𝑎 = +1𝑎𝑎 + 𝑏𝑏 ( 𝜇𝜇 +1 +1). I f =𝜇𝜇 2: 𝛼𝛼go t o s tep 5: ot herwise 𝐹𝐹𝑛𝑛 −𝑘𝑘− 𝑘𝑘 𝑘𝑘 𝑛𝑛 −𝑘𝑘 𝑘𝑘 𝑘𝑘 evalua𝛼𝛼 te 𝑎𝑎 ( +1)� and𝐹𝐹 go� to𝑏𝑏 step− 4 𝑎𝑎 𝑘𝑘 𝑛𝑛 −

4) Replace 𝜙𝜙 by𝛼𝛼 𝑘𝑘 + 1 and go to step 1 ( ) ( ) 5) Let =𝑘𝑘 1𝑘𝑘, a nd = 1 + . If > , le t = and = ( ) ( ) 𝛼𝛼1𝑛𝑛. O therwise,𝛼𝛼𝑛𝑛− if 𝜇𝜇 𝑛𝑛 𝛼𝛼𝑛𝑛− 𝜀𝜀 , le𝜙𝜙 t𝛼𝛼 𝑛𝑛 =𝜙𝜙 𝜇𝜇𝑛𝑛1 and 𝑎𝑎𝑛𝑛= 𝛼𝛼.𝑛𝑛 S top;𝑏𝑏 t𝑛𝑛 he [ ] 𝑏𝑏optimal𝑛𝑛− solution lies in𝜙𝜙 the𝛼𝛼𝑛𝑛 interval≤ 𝜙𝜙 𝜇𝜇 𝑛𝑛 , 𝑎𝑎𝑛𝑛 𝑎𝑎𝑛𝑛− 𝑏𝑏𝑛𝑛 𝛼𝛼𝑛𝑛 𝑛𝑛 𝑛𝑛 Example2.2: consider the following problem𝑎𝑎 𝑏𝑏

2 + 2 3 5

Note that the function 𝑚𝑚is 𝑚𝑚𝑚𝑚𝑚𝑚strictly𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 quassi𝑥𝑥 -convex𝑥𝑥 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠on𝑠𝑠𝑠𝑠 the𝑡𝑡𝑡𝑡 −interval≤ 𝑥𝑥 and ≤ that the true minimum occurs a t = 1. We r educe t he i nterval of unc ertainty t o one whose l ength i s a t 8 most 0.2. H ence w e m ust h ave > = 40. S o t hat = 9.we a dopt t he 𝑥𝑥 − 0.2 distinguishabilty constant = 0.01 𝐹𝐹𝑛𝑛 𝑛𝑛

The first two observations 𝜀𝜀are located at

7 8 1 = 3 + (8) = 0.054545, 1 = 3 + (8) = 1.9454554 . 𝐹𝐹9 𝐹𝐹9 𝛼𝛼 − 𝐹𝐹 𝜇𝜇 − 𝐹𝐹 Note that ( 1) < ( 1). Hence, the interval of uncertainty is [ 3.000000,1.945454]

Table2.2; 𝜙𝜙Summary𝛼𝛼 𝜙𝜙 of𝜇𝜇 computations for the Fibonacci search method−

Iteration ( ) ( 1)

k 𝑎𝑎𝑘𝑘 𝑏𝑏𝑘𝑘 𝛼𝛼𝑘𝑘 𝜇𝜇𝑘𝑘 𝜙𝜙 𝛼𝛼𝑘𝑘 𝜙𝜙 𝜇𝜇 1 3.000000 5.000000 0.054545 1.945454 0.1120* 7.67569

2 −3.000000 1.945454 1.109091 0.05454 0.9886* 0.112065∗ 3 −3.000000 0.054545 −1.836363 1.109091 0.30049* 0.988099 4 −1.836363 0.054545 −1.109091 − 0.67727 0.98809 −0.89289* 0.8400* 5 −1.836363 0.672727 −1.399999 −1.109091 − 0.988099 0.9889 6 −1.399999 −0.672727 −1.109091 −0.963636 −0.99867* 0.9986 7 −1.109091 −0.672727 −0.963636 −0.818182 −0.96694* 0.9986 8 −1.109091 −0.818182 −0.963636 −0.963636 − 0.99867 0.9986* 9 −1.109091 −0.963636 −0.963636 −0.953636 −0.99785* − − − − −

The process is repeated, and the computations are summarized in the above table. The values of that ar e computed at each i teration are i ndicated b y an ast erisk. N ote t hat

at = 8, 𝜙𝜙 = = 1, s o t hat no f unctional e valuations a re ne eded a t t his stage. ( ) For𝑘𝑘 = 9𝛼𝛼, 𝑘𝑘 =𝜇𝜇𝑘𝑘 𝛼𝛼1 𝑘𝑘−= 0.963636 and = + = 0.953636. S ince, > ( 𝑘𝑘), th e f𝛼𝛼 inal𝑘𝑘 𝛼𝛼 interval𝑘𝑘− −o f u ncertainty [𝜇𝜇9𝑘𝑘, 9]𝛼𝛼 𝑘𝑘is [𝜀𝜀 1.109091− , 0.963636],𝜙𝜙 whose𝜇𝜇𝑘𝑘 𝜙𝜙length𝛼𝛼𝑘𝑘 is 0.145455. We a pproximate t he𝑎𝑎 m𝑏𝑏 inimum− t o be t he m− idpoint 1.036364. Note f romℓ example 2.1 that w ith t he same nu mber of obs ervations = 9−, t he gol den section method gave a final interval of uncertainty whose length is 0.176𝑛𝑛 .

The Fibonacci method has the following limitations [9] 1. The initial interval of uncertainty, in which the optimum lies, has to be known. 2. The function being optimized has to be unimodal in the initial interval of uncertainty. 3. The exact optimum cannot be located in this method. Only an interval known as the final interval of uncertainty will be known. The final interval of uncertainty can be made as small as desired by using more computations. 4. The number of function evaluations to be used in the search or the resolution required has to be specified beforehand.

2.2.3Interpolation Methods The interpolation methods were originally developed as one-dimensional searches within multivariable optimization techniques, and are generally more efficient than Fibonacci-type approaches. The aim of all the one-dimensional minimization methods is to find , the smallest nonnegative value of , for which the function ∗ 𝜆𝜆 ( ) = ( + ) 𝜆𝜆 (2.8) attains a local𝑓𝑓 minimum.𝜆𝜆 𝑓𝑓 𝑋𝑋 Hence𝜆𝜆𝜆𝜆 if the original function (X) is expressible as an explicit function of ( = 1, 2, . . . , ), we can readily write the𝑓𝑓 expression for ( ) = 𝑥𝑥(X𝑖𝑖 +𝑖𝑖 S) for any 𝑛𝑛specified vector S, set 𝑓𝑓 𝜆𝜆 𝑓𝑓 𝜆𝜆 ( ) = 0 (2.9) 𝑑𝑑𝑑𝑑 𝜆𝜆 𝑑𝑑𝑑𝑑 and solve Eq. (2.9) to find in terms of and . However, in many practical problems, ∗ the f unction ( ) cannot b𝜆𝜆 e ex pressed 𝑋𝑋ex plicitly𝑆𝑆 i n t erms o f .In such cases t he interpolation methods𝑓𝑓 𝜆𝜆 can be used to find the value of . 𝜆𝜆 ∗ Example 2.3 Derive the one-dimensional minimization𝜆𝜆 problem for the following case: 2 2 2 Minimize ( ) = 1 – 2 + 1 – 1 ( 1) 2 1.00 from the starting𝑓𝑓 𝑋𝑋 point�𝑥𝑥 =𝑥𝑥 � �along𝑥𝑥 the� search direction𝐸𝐸 = . 1 2 0.25 − SOLUTION the new design𝑋𝑋 point� � X can be expressed as 𝑆𝑆 � � − 1 2 + = = 1 + S = 2 2 + 0.25 𝑥𝑥 − 𝜆𝜆 By s ubstituting = 𝑋𝑋2 + � � 𝑋𝑋 = 𝜆𝜆 2 +� 0.25 in E q.� ( 1), w e obt ain as a 1 𝑥𝑥 2 − 𝜆𝜆 function of as 𝑥𝑥 − 𝜆𝜆 𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥 − 𝜆𝜆 𝐸𝐸 𝑓𝑓 2 + ( ) = 𝜆𝜆 = [( 2 + )2 ( 2 + 0.25 )]2 + [1 ( 2 + )]2 2 + 0.25 − 𝜆𝜆 𝑓𝑓 𝜆𝜆 𝑓𝑓 � = 4 8.5� 3 −+ 31.0625λ −2 − 57.0 +λ 45.0 − − λ − 𝜆𝜆 λ − λ λ − λ The value of at which ( ) attains a minimum gives . ∗ In th e f ollowing𝜆𝜆 s ections,𝑓𝑓 w𝜆𝜆 e d iscuss d ifferent in terpolation𝜆𝜆 methods with re ference to one-dimensional m inimization p roblems th at a rise d uring multivariable o ptimization problems. 36

2.2.3.1Quadratic interpolation method The qua dratic i nterpolation method uses t he function values onl y; he nce i t i s useful t o find t he minimizing s tep ( ) of functions (X) for which the partial derivatives with ∗ respect to the variables are𝜆𝜆 not available or𝑓𝑓 difficult to compute. This method finds the minimizing𝑥𝑥𝑖𝑖 step length in three stages. In the first stage the ∗ S-vector is normalized so that a step length of 𝜆𝜆 = 1 is acceptable. In the second stage the function ( ) is approximated by a quadratic𝜆𝜆 function ( ) and the minimum, , of ∗ ( ) is found.𝑓𝑓 If𝜆𝜆 is not sufficiently close to the true minimumℎ 𝜆𝜆 a third stage is used.𝜆𝜆̃ In ∗ ∗ 2 ℎthis𝜆𝜆 stage a new quadratic𝜆𝜆̃ function (refit) ( ) = + +𝜆𝜆 is used to approximate ( ), and a new value of isℎ′ found.𝜆𝜆 This𝑎𝑎′ procedure𝑏𝑏′𝜆𝜆 𝑐𝑐 ′is𝜆𝜆 continued until a ∗ that is sufficiently𝑓𝑓 𝜆𝜆 close to is found.𝜆𝜆̃ ∗ ∗ 𝜆𝜆Stagẽ 1. In this stage, the vector𝜆𝜆 is normalized as follows: Find | | 𝑺𝑺 = max , 𝑖𝑖 where is t he component of Δand di𝑖𝑖 vide𝑠𝑠 e ach c omponent of S by . A nother 𝑡𝑡ℎ 1 𝑖𝑖 2 2 2 method𝑠𝑠 o f nor malization𝑖𝑖 i s t o f ind𝑺𝑺 = ( 1 + 2 + + )2 and diΔ vide e ach component of S by . Δ 𝑠𝑠 𝑠𝑠 ⋯ 𝑠𝑠𝑛𝑛 ( ) 2 Stage 2. Let =Δ + + (2.10) be the quadraticℎ 𝜆𝜆 function𝑎𝑎 used𝑏𝑏𝑏𝑏 for𝑐𝑐 𝜆𝜆approximating the function ( ). It is worth noting at this point that a quadratic is the lowest-order polynomial for which𝑓𝑓 𝜆𝜆 a finite minimum can exist. The necessary condition for the minimum of ( ) is that

= + 2 =ℎ 𝜆𝜆 0. 𝑑𝑑ℎ that is, 𝑏𝑏 𝑐𝑐𝑐𝑐 𝑑𝑑𝑑𝑑 = (2.11) 2 ∗ 𝑏𝑏 The sufficiency condition for𝜆𝜆 ̃the minimum− of ( ) is that 𝑐𝑐 2 2 > 0 ℎ 𝜆𝜆 𝑑𝑑 𝑓𝑓 ∗ that𝑑𝑑𝜆𝜆 is,�𝜆𝜆� > 0 (2.12)

To evaluate the constants𝑐𝑐 , , in Eq. (2.10), we need to evaluate the function 37 𝑎𝑎 𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐

( ) at three points. Let = , = , = be the points at which the function

𝑓𝑓 (𝜆𝜆) is evaluated and let 𝜆𝜆 , 𝐴𝐴, 𝜆𝜆 𝐵𝐵 be𝑎𝑎 the𝑎𝑎𝑎𝑎 corres𝜆𝜆 𝐶𝐶ponding function values, that is, 2 𝑓𝑓 𝜆𝜆 𝑓𝑓𝐴𝐴 𝑓𝑓𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 =𝑓𝑓𝐶𝐶 + + 2 𝑓𝑓𝐴𝐴 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐𝐴𝐴 2 𝑓𝑓 𝐵𝐵 =𝑎𝑎 +𝑏𝑏𝑏𝑏 +𝑐𝑐𝐵𝐵 (2.13) The solution of Eqs. (2.13) gives 𝑓𝑓𝐶𝐶 𝑎𝑎 𝑏𝑏𝑏𝑏 𝑐𝑐𝐶𝐶 ( ) + ( ) + ( ) = (2.14) ( )( )( ) �𝑓𝑓𝐴𝐴𝐵𝐵𝐵𝐵 𝐶𝐶 − 𝐵𝐵 𝑓𝑓𝐵𝐵𝐶𝐶𝐶𝐶 𝐴𝐴 − 𝐶𝐶 𝑓𝑓𝐶𝐶𝐴𝐴𝐴𝐴 𝐵𝐵 − 𝐴𝐴 � 𝑎𝑎 ( 2 2) + ( 2 2) + ( 2 2) = 𝐴𝐴 − 𝐵𝐵 𝐵𝐵 − 𝐶𝐶 𝐶𝐶 − 𝐴𝐴 (2.15) ( )( )( ) �𝑓𝑓𝑓𝑓 𝐵𝐵 − 𝐶𝐶 𝑓𝑓𝑓𝑓 𝐶𝐶 − 𝐴𝐴 𝑓𝑓𝑓𝑓 𝐴𝐴 − 𝐵𝐵 � 𝑏𝑏 ( ) + ( ) + ( ) = 𝐴𝐴 − 𝐵𝐵 𝐵𝐵 − 𝐶𝐶 𝐶𝐶 − 𝐴𝐴 (2.16) ( )( )( ) 𝑓𝑓𝐴𝐴 𝐵𝐵 − 𝐶𝐶 𝑓𝑓𝐵𝐵 𝐶𝐶 − 𝐴𝐴 𝑓𝑓𝐶𝐶 𝐴𝐴 − 𝐵𝐵 𝑐𝑐 − From Eqs. (2.11), (2.15),𝐴𝐴 − 𝐵𝐵 (2.𝐵𝐵16−), the𝐶𝐶 minimum𝐶𝐶 − 𝐴𝐴 of ( ) can be obtained as ( 2 2) + ( 2 2) + ( 2 2) = = 𝑎𝑎𝑎𝑎𝑎𝑎 ℎ 𝜆𝜆 (2.17) 2 2[ ( ) + ( ) + ( )] ∗ 𝑏𝑏 𝑓𝑓𝐴𝐴 𝐵𝐵 − 𝐶𝐶 𝑓𝑓𝐵𝐵 𝐶𝐶 − 𝐴𝐴 𝐴𝐴 − 𝐵𝐵 𝜆𝜆̃ − Provided that 𝑐𝑐, as given𝑓𝑓𝐴𝐴 𝐵𝐵 by− Eq.𝐶𝐶 (2.16𝑓𝑓𝐵𝐵),𝐶𝐶 is −positive.𝐴𝐴 𝑓𝑓𝐶𝐶 𝐴𝐴 − 𝐵𝐵 To s tart w ith,𝑐𝑐 f or s implicity, th e p oints , , a nd can b e ch osen as 0, , a nd 2 , respectively, where is a preselected trial step𝐴𝐴 𝐵𝐵 length.𝐶𝐶 By this procedure, we can𝑡𝑡 save one𝑡𝑡 function e valuation𝑡𝑡 s ince = ( = 0) is ge nerally kno wn f rom t he pr evious iteration (of a multivariable search).𝑓𝑓𝐴𝐴 𝑓𝑓 For𝜆𝜆 this case, Eqs. (2.14) to (2.17) reduce to = (2.18)

4𝐴𝐴 3 𝑎𝑎 = 𝑓𝑓 (2.19) 2 𝑓𝑓𝐵𝐵 − 𝑓𝑓𝐴𝐴 − 𝑓𝑓𝐶𝐶 𝑏𝑏 + 2 = 𝑡𝑡 (2.20) 2 2 𝑓𝑓𝐶𝐶 𝑓𝑓𝐴𝐴 − 𝑓𝑓𝐵𝐵 𝑐𝑐 (4 3 ) = 𝑡𝑡 (2.21) 4 2 2 ∗ 𝑓𝑓𝐵𝐵 − 𝑓𝑓𝐴𝐴 − 𝑓𝑓𝐶𝐶 𝜆𝜆̃ 𝑡𝑡 Provided that 𝑓𝑓𝐵𝐵 − 𝑓𝑓𝐶𝐶 − 𝑓𝑓𝐴𝐴 + 2 = > 0 (2.22) 2 2 𝑓𝑓𝐶𝐶 𝑓𝑓𝐴𝐴 − 𝑓𝑓𝐵𝐵 The inequality (2.22) can𝑐𝑐 be satisfied if 𝑡𝑡

+ > (2.23) 2 𝑓𝑓𝐴𝐴 𝑓𝑓𝐶𝐶 (i.e., the function value should be smaller𝑓𝑓𝐵𝐵 than the average value of and ).

This can be satisfied if 𝑓𝑓𝐵𝐵 lies below the line joining and . 𝑓𝑓𝐴𝐴 𝑓𝑓𝐶𝐶 The following procedure𝑓𝑓𝐵𝐵 can be used not only to satisfy𝑓𝑓𝐴𝐴 the𝑓𝑓 𝐶𝐶inequality (2.23) but also to ensure that the minimum lies in the interval0 < < 2 . ∗ ∗ . Assuming that = 𝜆𝜆̃ ( = 0) and the initial𝜆𝜆 ̃step size𝑡𝑡 0 are known, evaluate the 𝟏𝟏function at = 𝑓𝑓𝐴𝐴0 and𝑓𝑓 obtain𝜆𝜆 1 = ( = 0). 𝑡𝑡 . If 1 >𝑓𝑓 𝜆𝜆,set 𝑡𝑡 = 1 and evaluate𝑓𝑓 𝑓𝑓the𝜆𝜆 function𝑡𝑡 at = 0 𝐴𝐴 using𝐶𝐶 Eq. (2.21) with = 0 . 𝟐𝟐 𝑓𝑓2 𝑓𝑓 𝑓𝑓 𝑓𝑓 2 𝑓𝑓 𝑡𝑡 ∗ 𝑡𝑡 𝜆𝜆 𝑎𝑎𝑎𝑎𝑎𝑎 𝜆𝜆̃ 𝑡𝑡 � . If 1 , set = 1, and evaluate the function

at𝟑𝟑 𝑓𝑓 = ≤2 0𝑓𝑓 𝐴𝐴to find𝑓𝑓𝐵𝐵 2 =𝑓𝑓 ( = 2 0). 𝑓𝑓 . 𝜆𝜆If 2 turns𝑡𝑡 out to 𝑓𝑓be greater𝑓𝑓 𝜆𝜆 than 1𝑡𝑡, set = 2and compute 𝟒𝟒 according𝑓𝑓 to Eq. (2.21) with =𝑓𝑓 0. 𝑓𝑓𝐶𝐶 𝑓𝑓 ∗ 𝜆𝜆̃. If 2 turns out to be smaller 𝑡𝑡than 𝑡𝑡1, set new 1 = 2 and 0 = 2 0 , and repeat steps 𝟓𝟓2 4𝑓𝑓 until we are able to find . 𝑓𝑓 𝑓𝑓 𝑓𝑓 𝑡𝑡 𝑡𝑡 ∗ Stage𝑡𝑡𝑡𝑡 3. The found in stage 2𝜆𝜆̃ is the minimum of the approximating quadratic ( ) and ∗ we have to make𝜆𝜆̃ sure that this is sufficiently close to the true minimum ℎof𝜆𝜆 ( ) ∗ ∗ before taking . several tests𝜆𝜆̃ are possible to ascertain this. One possible𝜆𝜆 test𝑓𝑓 is 𝜆𝜆to ∗ ∗ compare ( 𝜆𝜆̃) with≃ 𝜆𝜆̃ ( ) and c onsider a s ufficiently go od a pproximation i f t hey ∗ ∗ ∗ differ not 𝑓𝑓more𝜆𝜆̃ than byℎ a 𝜆𝜆̃small amount. This𝜆𝜆̃ criterion can be stated as

(2.24) ∗ ∗ 1 ℎ�𝜆𝜆̃ � − 𝑓𝑓 �𝜆𝜆̃ � � ∗ � ≤ 𝜀𝜀 Another p ossible test i s t o ex𝑓𝑓 amine�𝜆𝜆̃ � w hether / is cl ose t o zero at . S ince t he ∗ derivatives of are not used in this method, we𝑑𝑑𝑑𝑑 can𝑑𝑑𝑑𝑑 use a finite-difference𝜆𝜆̃ formula for / and use𝑓𝑓 the criterion 𝑑𝑑 𝑑𝑑 𝑑𝑑𝑑𝑑

+ (2.25) ∗ ∗ 2 ∗ ∗ 2 𝑓𝑓 �𝜆𝜆̃ Δ𝜆𝜆̃ � − 𝑓𝑓 �𝜆𝜆̃ − Δ𝜆𝜆̃ � � ∗ � ≤ 𝜀𝜀 to s top t he procedure. I n E qs.Δ𝜆𝜆̃ (2.24) and (2.25), 1and 2 are small n umbers t o b e specified d epending o n the accu racy d esired. If t he 𝜀𝜀co nvergence𝜀𝜀 cr iteria st ated i n Eqs. (2.24) and (2.25) are not satisfied, a new quadratic function ( ) = + + 2

is us ed to a pproximate t he f unctionℎ′ 𝜆𝜆 (𝑎𝑎′). T o𝑏𝑏′𝜆𝜆 e valuate𝑐𝑐′𝜆𝜆 the c onstants , , , the three be st f unction v alues of t he c𝑓𝑓 urrent𝜆𝜆 = ( = 0), = 𝑎𝑎′ ( 𝑏𝑏′ =𝑎𝑎 𝑎𝑎𝑎𝑎0),𝑐𝑐 ′ = ( = 2 0), and ˜ = = ( = ) are t𝑓𝑓 o𝐴𝐴 be 𝑓𝑓 used.𝜆𝜆 T his 𝑓𝑓 process𝐵𝐵 𝑓𝑓 of𝜆𝜆 t rying𝑡𝑡 t𝑓𝑓 o𝐶𝐶 f it ∗ another𝑓𝑓 𝜆𝜆 pol𝑡𝑡 ynomial 𝑓𝑓t o obtain𝑓𝑓 𝜆𝜆 a be tter𝜆𝜆̃ a pproximation t o is know n a s refitting the ∗ polynomial. 𝜆𝜆̃ For r efitting t he q uadratic, w e co nsider all p ossible situations an d select t he best t hree points of t he pr esent , , , a nd . T here a re f our pos sibilities. A new va lue of is ∗ ∗ computed by us ing t he𝐴𝐴 𝐵𝐵 general𝐶𝐶 f ormula,𝜆𝜆̃ E q. (2.17). If th is also does not s atisfy 𝜆𝜆̃t he ∗ convergence criteria stated in Eqs. (2.24) and (2.25), a new quadratic𝜆𝜆̃ has to be refitted. Example 2.4 Find the minimum of = 5 5 3 20 + 5.

SOLUTION S ince this is n ot a m𝑓𝑓 ultivariable𝜆𝜆 − o𝜆𝜆 ptimization− 𝜆𝜆 problem, w e c an p roceed directly to 2. Let the initial step size be taken as 0 = 0.5 = 0.

Iteration 1𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠 𝑡𝑡 𝑎𝑎𝑎𝑎𝑎𝑎 𝐴𝐴 = ( = 0) = 5

1 = ( = 0) = 0.03125𝑓𝑓𝐴𝐴 𝑓𝑓 5(0.𝜆𝜆 125) 20(0.5) + 5 = 5.59375 Since 𝑓𝑓1 < 𝑓𝑓, 𝜆𝜆 𝑡𝑡 = 1 = 5.−59375, and find− that − 𝑓𝑓 𝑓𝑓𝐴𝐴 𝑤𝑤𝑤𝑤 𝑠𝑠𝑠𝑠𝑠𝑠 𝑓𝑓𝐵𝐵 2 𝑓𝑓 = (− = 2 0 = 1.0) = 19.0 As 2 < 1, we set new 𝑓𝑓0 = 1𝑓𝑓 and𝜆𝜆 1 =𝑡𝑡 19.0. Again −we find that 1 < and hence set 𝑓𝑓 = 𝑓𝑓1 = 19.0, a𝑡𝑡 nd f ind t hat𝑓𝑓 2 =− ( = 2 0 = 2) = 43𝑓𝑓 . S ince𝑓𝑓𝐴𝐴 2 < 1, we𝑓𝑓 again𝐵𝐵 𝑓𝑓set 0 − = 2 and 1 = 43. As𝑓𝑓 this 1𝑓𝑓 <𝜆𝜆 , set𝑡𝑡 = 1 =− 43 and evaluate𝑓𝑓 𝑓𝑓 2 = ( =𝑡𝑡 2 0 = 4)𝑓𝑓 = 629− . This time𝑓𝑓 2 >𝑓𝑓𝐴𝐴1 and𝑓𝑓 𝐵𝐵hence𝑓𝑓 we set− = 2 = 629 𝑓𝑓and compute𝑓𝑓 𝜆𝜆 from𝑡𝑡 Eq. (2.21) as 𝑓𝑓 𝑓𝑓 𝑓𝑓𝐶𝐶 𝑓𝑓 ∗ 4( 43) 3(5)– 629 1632 𝜆𝜆̃ = (2) = = 1.135 4( 43) 2(629) 2(5) 1440 ∗ − − 𝜆𝜆̃ − − −40

Convergence test: Since = 0, = 5, = 2, = 43, = 4, and = 629, the

values of , , can𝐴𝐴 be found𝑓𝑓𝐴𝐴 to be 𝐵𝐵 𝑓𝑓𝐵𝐵 − 𝐶𝐶 𝑓𝑓𝐶𝐶 𝑎𝑎 𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐 = 5, = 204, = 90 and 𝑎𝑎 𝑏𝑏 − 𝑐𝑐 ( ) = (1.135) = 5 204(1.135) + 90(1.135)2 = 110.9 ∗ Since ℎ 𝜆𝜆̃ ℎ − − ˜ = = ( ) = (1.135)5 5(1.135)3 20(1.135) + 5.0 = 23.127 ∗ we have𝑓𝑓 𝑓𝑓 𝜆𝜆̃ − − − 116.5 + 23.127 = = 3.8 ∗ ∗ 23.127 ℎ�𝜆𝜆̃ � − 𝑓𝑓 �𝜆𝜆̃ � − � ∗ � � � As t his q uantity i s v ery l𝑓𝑓 arge,�𝜆𝜆̃ �co nvergence i s− n ot ach ieved an d h ence we h ave t o use refitting. Iteration 2 Since < ˜ > , we take the new values of , , as ∗ 𝜆𝜆̃ 𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝑓𝑓 𝑓𝑓𝐵𝐵 = 1.135, =𝐴𝐴 𝐵𝐵23𝑎𝑎𝑎𝑎𝑎𝑎.127𝐶𝐶 𝐴𝐴 = 2.0, 𝑓𝑓𝐴𝐴 = −43.0 𝐵𝐵 = 4.0, 𝑓𝑓𝐵𝐵 = 629− .0 and compute new , using 𝐶𝐶Eq. (2.17), as 𝑓𝑓𝐶𝐶 ( 23.127)(4.0∗ 16.0) + ( 43.0)(16.0 1.29) + (629.0)(1.29 4.0) = 𝜆𝜆̃ 2[( 23.127)(2.0 4.0) + ( 43.0)(4.0 1.135) + (629.0)(1.135 2.0)] ∗ − − − − − 𝜆𝜆̃ − = 1.661 − − − −

Convergence test: To test the convergence, we compute the coefficients of the Quadratic as = 288.0, = 417.0, = 125.3

As 𝑎𝑎 𝑏𝑏 − 𝑐𝑐 ( ) = (1.661) = 288.0 417.0(1.661) + 125.3(1.661)2 = 59.7 ∗ ℎ 𝜆𝜆̃ ˜ =ℎ ( ) = 12.8 5(4.− 59) 20(1.661) + 5.0 = 38.37− ∗ we obtain 𝑓𝑓 𝑓𝑓 𝜆𝜆̃ − − −

( ) ( ) 59.70 + 38.37 = = 0.556 ∗ ( ) ∗ 38.37 ℎ 𝜆𝜆̃ − 𝑓𝑓 𝜆𝜆̃ − � ∗ � � � Since this quantity is not sufficiently𝑓𝑓 𝜆𝜆̃ small, we− need to proceed to the next refit.

2.3. Multidimensional search

2.3.1 Unvariate Method In this method we change only one variable at a time and seek to produce a sequence of improved approximations to the minimum point. By starting at a base point in the

iteration, we fix the values of 1 variables and vary the remaining varia𝑋𝑋𝑖𝑖 ble. Since 𝑡𝑡ℎ 𝑖𝑖only one va riable i s c hanged, t he𝑛𝑛 −pr oblem b ecomes a o ne-dimensional m inimization problem a nd a ny of t he m ethods di scussed a bove c an be us ed t o p roduce a ne w ba se point +1. T he s earch is n ow c ontinued i n a ne w di rection. T his ne w di rection i s

obtained𝑋𝑋𝑖𝑖 by c hanging a ny one of t he 1 variables t hat w ere fixed i n t he pr evious iteration. In fact, the search procedure is𝑛𝑛 continued− by taking each coordinate direction in turn. After all the n directions are searched sequentially, the first cycle is complete and hence w e r epeat the e ntire p rocess o f seq uential m inimization. T he p rocedure i s continued until no further improvement is possible in the objective function in any of the n directions of a cycle. The univariate method can be summarized as follows [9]:

1. Choose an arbitrary staring point 1 and set = 1.

2. Find the search direction as 𝑋𝑋 𝑖𝑖 𝒔𝒔𝒊𝒊 (1,0,0,...,0) for i = 1, n + 1, 2n + 1, . . . (1,0,0,...,0) for i = 2, n + 2, 2n + 2, . . . (0, 0, 1, . . . , 0) for i = 3, n + 3, 2n + 3, . . . = ⎧ . (2.26) ⎪ 𝑇𝑇 ⎪ . 𝑆𝑆𝑖𝑖 . ⎨(0,0,0,...,1) for i = n, 2n, 3n, . .. ⎪ S 3. Determine⎩ whether should be positive or negative. For the current direction , this means find whether the𝜆𝜆 𝑖𝑖function value decreases in the positive or negative direction.𝒊𝒊 For this w e take a sm all p robe l ength ( ) and e valuate = ( ), + = ( + ), + and = ( ). If < 𝜀𝜀 , will be the𝑓𝑓 correct𝑖𝑖 𝑓𝑓 direction𝑋𝑋𝑖𝑖 𝑓𝑓 for decreasing𝑓𝑓 𝑋𝑋𝑖𝑖 𝜀𝜀𝑆𝑆 the𝑖𝑖 − 𝑓𝑓 𝑓𝑓 𝑋𝑋𝑖𝑖 − 𝜀𝜀𝑆𝑆𝑖𝑖 𝑓𝑓 𝑓𝑓𝑖𝑖 𝑆𝑆𝑖𝑖 42

value of f and i f < , will be t he c orrect one . I f both +and are greater − − than , we take 𝑓𝑓 as the 𝑓𝑓minimum𝑓𝑓 −𝑆𝑆𝑖𝑖 along the direction . 𝑓𝑓 𝑓𝑓 4. Find𝑓𝑓𝑖𝑖 the optimal𝑋𝑋𝑖𝑖 step length such that 𝑆𝑆𝑖𝑖 ∗ ( ± 𝜆𝜆𝑖𝑖) = min( ± ) (2.27) ∗ 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 where + or sign has𝑓𝑓 𝑋𝑋to be𝜆𝜆 used𝑆𝑆 depending𝜆𝜆𝑖𝑖 𝑋𝑋 upon𝜆𝜆 𝑆𝑆 whether or is the direction for

decreasing the− function value. 𝑆𝑆𝑖𝑖 −𝑆𝑆𝑖𝑖 5. Set + = ± Si depending on the direction for decreasing the function value, ∗ and +1𝑋𝑋𝒊𝒊 =𝟏𝟏 (𝑋𝑋𝑖𝑖+ ).𝜆𝜆 𝑖𝑖 6. if 𝑓𝑓𝑖𝑖 +1 𝑓𝑓 𝑋𝑋𝒊𝒊<𝟏𝟏 , stop. Otherwise go to 7 7. Set‖𝑋𝑋 t𝑖𝑖 he ne− w𝑋𝑋𝑖𝑖 ‖va lue𝜀𝜀 of = + 1 and go t o s tep2. C ontinue t his pr ocedure unt il no significant change is achieved𝑖𝑖 in𝑖𝑖 the value of the objective function. The univariate method is very simple and can be implemented easily. However, it will not c onverge r apidly t o t he opt imum s olution, a s it ha s a t endency t o os cillate with steadily d ecreasing p rogress t oward th e o ptimum. H ence it w ill b e b etter to s top th e computations a t s ome poi nt ne ar to t he op timum poi nt r ather t han t rying t o f ind the precise opt imum poi nt. I n t heory, t he uni variate m ethod c an be a pplied t o f ind t he minimum of any function that possesses continuous derivatives. However, if the function has a st eep valley, the method may not even converge. If the univariate search starts at

point P, t he f unction va lue c annot be de creased e ither i n t he di rection ± 1 or i n t he

direction ± 2. Thus the search comes to a halt and one may be misled to take𝑆𝑆 the point , which is c ertainly𝑆𝑆 n ot t he o ptimum p oint, a s the o ptimum p oint. T his s ituation a rises𝑃𝑃 whenever t he va lue of t he pr obe length ε needed f or de tecting t he proper d irection

(± 1 ± 2) happens to be l ess t han t he nu mber of s ignificant f igures us ed i n t he

computations.𝑆𝑆 𝑜𝑜𝑜𝑜 𝑆𝑆 2 2 Example 2.5 Minimize ( 1, 2) = 1 2 + 2 1 + 2 1 2 + 2 with the starting

point (0, 0). 𝑓𝑓 𝑥𝑥 𝑥𝑥 𝑥𝑥 − 𝑥𝑥 𝑥𝑥 𝑥𝑥 𝑥𝑥 𝑥𝑥 SOLUTION we will take the probe length ( ) as 0.01 to find the correct direction for

decreasing t he f unction va lue i n s tep 3. F urther,𝜀𝜀 w e w ill us e t he di fferential c alculus method to find the optimum step length along the direction ± in step 4. ∗ Iteration = 1 𝜆𝜆𝑖𝑖 𝑆𝑆𝑖𝑖 43 𝑖𝑖

1 Step 2: Choose the search direction as = . 1 1 0

Step 3: To f ind w hether t he va lue 𝑆𝑆 of 𝑆𝑆decreases� � al ong 1 or 1, w e us e t he pr obe

length . Since 𝑓𝑓 𝑆𝑆 −𝑆𝑆 1 = 𝜀𝜀 ( 1) = (0, 0) = 0, + 𝑓𝑓 =𝑓𝑓 𝑋𝑋 ( 1 +𝑓𝑓 1 ) = ( , 0) = 0.01 0 + 2(0.0001) + 0 + 0 = 0.0102 𝑓𝑓 𝑓𝑓 𝑋𝑋 > 𝜀𝜀𝑆𝑆1 𝑓𝑓 𝜀𝜀 − ( ) ( ) ( ) = 𝑓𝑓 1 – 1 = , 0 = 0.01 0 + 2 0.0001 − 𝑓𝑓 𝑓𝑓 +𝑋𝑋 0 +𝜀𝜀𝑆𝑆 0 = 𝑓𝑓0.0098−𝜀𝜀 < 1−, − is the correct direction for minimizing− f from𝑓𝑓 1. −𝑆𝑆Step𝟏𝟏 4: To find the optimum step length 1 , we minimize𝑋𝑋 ∗ ( 1 1 1) = ( 1, 0)𝜆𝜆 ( ) 2 2 𝑓𝑓 𝑋𝑋 − 𝜆𝜆 𝑆𝑆 = 𝑓𝑓 −𝜆𝜆1 0 + 2( 1) + 0 + 0 = 2 1 1 −𝜆𝜆 − −𝜆𝜆 𝜆𝜆 − 𝜆𝜆 As 1 is a step length we take 0 1 1 and solving above equation for 1 using golden 1 section method we get = 𝜆𝜆 1 4 ≤ 𝜆𝜆 ≤ 𝜆𝜆 ∗ Step 5: Set 𝜆𝜆 1 0 1 1 = – = = 2 1 1 1 0 4 0 4 ∗ 0 − 𝑋𝑋 𝑋𝑋 𝜆𝜆 𝑆𝑆 � �1 − � � �1 � = ( ) = ( , 0) = 2 2 4 8 Iteration = 𝑓𝑓 𝑓𝑓 𝑓𝑓 𝑓𝑓 − − 0 Step 2: Choose𝑖𝑖 𝟐𝟐 the search direction = 2 2 1

Step 3: Since 2 = ( 2) = 0.125𝑆𝑆 𝑎𝑎, 𝑎𝑎 𝑆𝑆 � � + 𝑓𝑓 =𝑓𝑓 (𝑋𝑋 2 + −2) = ( 0.25, 0.01) = 0.1399 < 2 𝑓𝑓 = 𝑓𝑓 ( 𝑋𝑋2 + 𝜀𝜀𝑆𝑆2) = 𝑓𝑓 ( −0.25, 0.01) =− 0.1099 >𝑓𝑓2 − 2 is the correct𝑓𝑓 direction𝑓𝑓 𝑋𝑋 for decreasing𝜀𝜀𝑆𝑆 𝑓𝑓 the− value− of f from 2−. 𝑓𝑓 2 𝑆𝑆Step 4: We minimize ( 2 + 2 2) to find 2 . 𝑋𝑋 Here 𝑓𝑓 𝑋𝑋 𝜆𝜆 𝑆𝑆 𝜆𝜆 ( 2 + 2 2) = ( 0.25, 2)

𝑓𝑓 𝑋𝑋 𝜆𝜆 𝑆𝑆 44 𝑓𝑓 − 𝜆𝜆

2 2 = 0.25 2 + 2(0.25) 2(0.25)( 2) + 2 2 − − 𝜆𝜆= 2 1.5 2 − 0.125 𝜆𝜆 𝜆𝜆 using golden section𝜆𝜆 − method𝜆𝜆 − we get 2 = 0.75 ∗ Step 5: Set 𝜆𝜆 025 0 0.25 = + = + 0.75 + = 3 2 2 2 0 1 0.75 ∗ − − 𝑋𝑋 𝑋𝑋 𝜆𝜆 3𝑆𝑆 = �( 3) =� 0.6875� � � �

Next we set the iteration number𝑓𝑓 as 𝑓𝑓 = 𝑋𝑋3, and continue− the procedure until the optimum 1.0 solution = ( ) 𝑖𝑖 = 1.25 is found. 1.5 ∗ − ∗ Note: If the𝑋𝑋 method� is� to𝑤𝑤 𝑤𝑤𝑤𝑤beℎ computerized,𝑓𝑓 𝑋𝑋 − a suitable convergence criterion has to be used

to test the point +1( = 1, 2, . . . ) for optimality.

𝑖𝑖 2.3.2 Pattern Directions𝑋𝑋 𝑖𝑖 In t he uni variate m ethod, w e s earch f or t he m inimum a long di rections pa rallel to t he coordinate axes. We noticed that this method may not converge in some cases and that even i f i t c onverges, i ts c onvergence w ill be ve ry s low a s w e a pproach t he opt imum point. These problems can be avoided by changing the directions of search in a favorable manner instead of retaining them always parallel to the coordinate axes. Let the points 1, 2, 3, . .. indicate the successive points found by the univariate method. It can be noticed that the lines joining the alternate points of the search ( . . ,1, 3; 2, 4; 3, 5; 4,6; . . . ) lie

in the general direction of the minimum and are known as𝑒𝑒 pattern𝑔𝑔 directions. It can be proved that if the objective function is a quadratic in two variables, all such lines pass t hrough t he m inimum. U nfortunately, t his pr operty w ill not be va lid f or multivariable f unctions even w hen t hey ar e q uadratics. H owever, t his idea can st ill b e used to achieve rapid convergence while finding the minimum of an n-variable function. Methods t hat u se p attern d irections as sear ch directions a re k nown as pattern search methods. One of the best-known pattern search methods, the Powell’s method, is discussed here. In general, a pattern search method takes n univariate steps, where n denotes the number of design va riables a nd t hen s earches f or t he minimum a long t he pa ttern di rection ,

defined by 𝑆𝑆𝑖𝑖 45

= (2.28)

Where the point is obtained𝑆𝑆𝑖𝑖 𝑋𝑋 at𝑖𝑖 −t he𝑋𝑋 𝑖𝑖−𝑛𝑛e nd of n univariate steps and is the st arting point before𝑋𝑋𝑖𝑖 taking the n univariate steps. In general, the directions used𝑋𝑋𝑖𝑖−𝑛𝑛 prior to taking a move along a pattern direction need not be univariate directions.

2.3.4 POWELL’S METHOD Powell’s method is an extension of the basic pattern search method. It is the most widely used direct search method and can be proved to be a method of conjugate directions [9]. A conjugate directions method will minimize a quadratic function in a finite number of steps. S ince a g eneral nonl inear f unction c an be a pproximated r easonably w ell by a quadratic function near its minimum, a conjugate directions method is expected to speed up the convergence of even general nonlinear objective functions. The de finition, a method of ge neration of c onjugate d irections, a nd the pr operty of quadratic convergence are presented in this section.

2.3.4.1 Conjugate Directions Definition2.4 (Conjugate Directions). Let = [ ] be an × symmetric matrix. A set of vectors (or directions) { } is said to𝐴𝐴 be conjugate𝐴𝐴 (more𝑛𝑛 accurately𝑛𝑛 A-conjugate) if 𝑛𝑛 𝑆𝑆𝑖𝑖 = 0 , = 1, 2, . . . , , = 1, 2, . . . , (2.29) 𝑇𝑇 It can𝑆𝑆𝒊𝒊 𝐴𝐴𝑆𝑆 b𝑗𝑗 e seen𝑓𝑓𝑓𝑓𝑓𝑓 t𝑎𝑎 hat𝑎𝑎𝑎𝑎 𝑖𝑖 orthogonal≠ 𝑗𝑗 𝑖𝑖 di rections 𝑛𝑛 are𝑗𝑗 a s pecial𝑛𝑛 case o f co njugate directions( [ ] = [ ] . (2.29).

Definition2.5𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 (Quadratically𝑤𝑤𝑤𝑤𝑤𝑤ℎ 𝐴𝐴 Convergent𝐼𝐼 𝑖𝑖𝑖𝑖 𝐸𝐸𝐸𝐸 Method): If a minimization method, us ing exact a rithmetic, c an f ind th e m inimum p oint in n steps while m inimizing a q uadratic function in n variables, the method is called a quadratically convergent method. Theorem 2.3Given a quadratic function of n variables and two parallel hyper planes 1 a nd 2 of di mension < . L et t he c onstrained s tationary poi nts of t he qua dratic

function in the hyper planes𝑘𝑘 be𝑛𝑛 1 and 2, respectively. Then the line joining 1 and 2 is conjugate to any line parallel to𝑋𝑋 the hyperplanes.𝑋𝑋 𝑋𝑋 𝑋𝑋 Proof: Let the quadratic function be expressed as 1 ( ) = + + (2.30) 2 𝑇𝑇 𝑇𝑇 𝑄𝑄 𝑋𝑋 𝑋𝑋 𝐴𝐴𝐴𝐴 𝐵𝐵 𝑋𝑋 𝐶𝐶 46

The gradient of is given by

𝑄𝑄 ( ) = + and hence 𝛻𝛻𝑄𝑄 𝑋𝑋 𝐴𝐴𝐴𝐴 𝐵𝐵 ( 1) ( 2) = ( 1 2) (2.31)

If is any vector𝛻𝛻𝑄𝑄 𝑋𝑋 parallel− 𝛻𝛻𝑄𝑄 to𝑋𝑋 the hyper𝐴𝐴 planes,𝑋𝑋 − 𝑋𝑋 it must be orthogonal to the gradients 𝑆𝑆( ) ( 2). Thus ( ) 𝛻𝛻𝑄𝑄 𝑋𝑋 𝟏𝟏 𝑎𝑎 𝑎𝑎𝑎𝑎 𝛻𝛻 𝑄𝑄 𝑋𝑋 1 = 1 + = 0 (2.32) 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑆𝑆 𝛻𝛻𝑄𝑄( 𝑋𝑋2) = 𝑆𝑆 𝐴𝐴𝑋𝑋2 + 𝑆𝑆 𝐵𝐵 = 0 (2.33) 𝑻𝑻 𝑇𝑇 𝑇𝑇 By subtracting 𝑆𝑆. (6𝛻𝛻𝑄𝑄.29𝑋𝑋) from 𝑆𝑆Eq.𝐴𝐴𝑋𝑋 (6.28),𝑆𝑆 we𝐵𝐵 obtain 𝐸𝐸 𝐸𝐸 ( 1 2) = 0 (2.34) 𝑇𝑇 Hence and (𝑆𝑆 1 𝐴𝐴 𝑋𝑋 2−) are𝑋𝑋 A-conjugate. Theorem𝑆𝑆 2.4 If𝑋𝑋 a −quadratic𝑋𝑋 function 1 ( ) = + + (2.35) 2 𝑇𝑇 𝑇𝑇 is m inimized 𝑄𝑄seq𝑋𝑋 uentially,𝑋𝑋 𝐴𝐴 o𝐴𝐴 nce along𝐵𝐵 𝑋𝑋 each𝐶𝐶 d irection o f a set o f n mutually c onjugate directions, t he m inimum o f t he f unction will be f ound a t o r be fore t he nth st ep

irrespective of the starting point. 𝑄𝑄 Proof: Let X minimize the quadratic function (X). Then ∗ (X ) = B + AX = 0 𝑄𝑄 (2.36) ∗ ∗ Given a point𝛻𝛻𝑄𝑄X1 and a set of linearly independent directions S ,S2,...,Sn, constants Can always be found such that 𝟏𝟏

𝛽𝛽𝑖𝑖 = 1 + 𝑛𝑛 (2.37) ∗ =1 𝑋𝑋 𝑋𝑋 � 𝛽𝛽𝑖𝑖 𝑆𝑆𝑖𝑖 where the vectors S ,S2,...,Snhave𝑖𝑖 been used as basis vectors. If the directions S are A-

conjugate and none 𝟏𝟏of them is zero, the Si can easily be shown to be linearly independent𝒊𝒊 and the can be determined as follows.

Equations𝛽𝛽𝑖𝑖 (2.36) and (2.37) lead to

B + AX1 + A 𝑛𝑛 = 0 (2.38) =1 �� 𝛽𝛽𝑖𝑖 𝑆𝑆𝑖𝑖� Multiplying this equation throughout𝑖𝑖 by , we obtain 𝑻𝑻 𝒋𝒋 𝑺𝑺 47

T T Sj (B + AX1) + Sj A 𝑛𝑛 = 0 (2.39) =1 �� 𝛽𝛽𝑖𝑖𝑆𝑆𝑖𝑖� Equation (2.39) can be rewritten as 𝑖𝑖 T T (B + AX1) Sj + Sj ASj = 0 (2.40)

that is, 𝛽𝛽𝑗𝑗 T (B + AX1) Sj = T (2.41) Sj ASj 𝛽𝛽𝑗𝑗 − Now consider an iterative minimization procedure starting at point 1, and successively

minimizing t he qua dratic (X) in th e d irections S ,S2,...,Sn , w here𝑋𝑋 t hese directions satisfy Eq. (2.29). The successive𝑄𝑄 points are determined𝟏𝟏 by the relation Xi+1 = Xi + i Si , i = 1 to n (2.42) ∗ where is found by minimizing (Xiλ + Si) so that ∗ T 𝜆𝜆𝑖𝑖 Si 𝑄𝑄(Xi+1) 𝜆𝜆 =𝑖𝑖 0 (2.43) Since the gradient of at the point𝛻𝛻𝑄𝑄 Xi+1 is given by ( ) 𝑄𝑄 Xi+1 = B + AXi+1 (2.44) Eq. (2.43) can be written𝛻𝛻𝑄𝑄 as T Si {B + A(Xi + S )} = 0 (2.45) ∗ This equation gives 𝜆𝜆𝑖𝑖 𝒊𝒊 (B + AX ) S = i (2.46) ST AS 𝑇𝑇 ∗ i i 𝒊𝒊 𝑖𝑖 From Eq. (2.42), we can express𝜆𝜆 − as 1 𝑿𝑿𝒊𝒊 Xi = X1 + 𝑖𝑖− (2.47) =1 ∗ � 𝜆𝜆𝑗𝑗 𝑆𝑆𝑗𝑗 so that 𝑗𝑗 1 T T Xi ASi = X1 ASi + 𝑖𝑖− =1 ∗ ∗ 𝑗𝑗 𝑗𝑗 𝑗𝑗 T � 𝜆𝜆 𝑆𝑆 𝐴𝐴𝑆𝑆 = X1 AS𝑗𝑗i (2.48) using the relation (2.29). Thus Eq. (2.46) becomes

S = (B + AX ) i (2.49) 1 STAS ∗ 𝑇𝑇 i i 𝑖𝑖 which can be seen𝜆𝜆 to− be identical to Eq. (2.41). Hence the minimizing step lengths are given by . S ince t he opt imal poi nt X is o riginally ex pressed as a su m o f n ∗ ∗ quantities 𝛽𝛽1𝑖𝑖,𝑜𝑜𝑜𝑜2,...,𝜆𝜆𝑖𝑖 , which have been shown to be equivalent to the minimizing step lengths, the𝛽𝛽 minimization𝛽𝛽 𝛽𝛽𝑛𝑛 process leads to the minimum point in n steps or less. Since we have not made any assumption regarding X1 and the order of S1,S2,...,Sn , the process converges in n steps or less, independent of the starting point as well as the order in which the minimization directions are used. Example 2.6 Consider the minimization of the function 2 2 ( 1, 2) = 6 1 + 2 2 6 1 2 1 2 2 1 If = denotes𝑓𝑓 a𝑥𝑥 s𝑥𝑥 earch d irection,𝑥𝑥 f𝑥𝑥 ind− a di𝑥𝑥 rection𝑥𝑥 − 𝑥𝑥 − that𝑥𝑥 is conjugate to the 2 2 𝟏𝟏 direction𝑺𝑺 � 1�. 𝑆𝑆

SOLUTION𝑆𝑆 The objective function can be expressed in matrix form as 1 ( ) = + [ ] 2 𝑇𝑇 𝑇𝑇 𝑓𝑓 𝑋𝑋 𝐵𝐵 𝑋𝑋1 1 𝑋𝑋 𝐴𝐴 𝑋𝑋 12 6 1 = { 1 2} + { 1 2} 2 2 6 4 2 𝑥𝑥 − 𝑥𝑥 and the Hessian matrix [ ] can be− identified− � as� 𝑥𝑥 𝑥𝑥 � � � � 𝑥𝑥 − 𝑥𝑥 12 6 𝐴𝐴 [ ] = 6 4 − 1 𝐴𝐴 � 1� The direction 2 = will be conjugate to 1 = if 2 − 2 𝑠𝑠 𝑆𝑆 � � 𝑆𝑆12 � 6� 1 𝑠𝑠 1 [ ] 2 = (1 2) = 0 6 4 2 𝑇𝑇 − 𝑠𝑠 which upon e xpansion gives𝑆𝑆 𝐴𝐴 2𝑆𝑆 = 0 or � = arbitrary� � and� = 0. Since can have 2 1 − 𝑠𝑠 2 1 any value, we select 1 = 1 and𝑠𝑠 the desired𝑠𝑠 conjugate direction𝑠𝑠 can be expressed𝑠𝑠 as 1 = . 𝑠𝑠 2 0 Powell’s𝑆𝑆 � � Algorithm In Powell’s method for a two-variable function the function is first minimized once along each of the coordinate directions starting with the second coordinate direction and then in the corresponding pattern direction. For the next cycle of minimization, we discard one of 49

the co ordinate d irections ( the 1 direction in t he p resent ca se) i n f avor o f t he p attern

direction. 𝑥𝑥 Then w e ge nerate a ne w pa ttern d irection S2. F or t he ne xt c ycle of m inimization, we

discard one of the previously used coordinate directions (the 2 direction in this case) in

favor of t he ne wly ge nerated pattern di rection. Then, we minimize𝑥𝑥 a long di rections S1 and S2. F or t he ne xt c ycle of minimization, s ince t here i s n o c oordinate di rection to

discard, w e r estart t he w hole p rocedure b y minimizing a long t he 2 direction. T his

procedure is continued until the desired minimum point is found. 𝑥𝑥 Note that the search will be made sequentially in the directions Sn ; S1,S2,S3,...,Sn 1,Sn ;

(1) (1) (1) (2) ( ) (1) (2) (3) − Sp ; S2, , ; ; , , ; , . .. until the minimum point is found. 𝟏𝟏 ( ) Here indicates𝑆𝑆𝑝𝑝 𝑆𝑆 𝑝𝑝the coordinate𝑆𝑆𝑝𝑝 𝑆𝑆𝒑𝒑 direction𝑆𝑆𝑝𝑝 𝑆𝑆 𝒑𝒑 and𝑆𝑆𝑝𝑝 the pattern direction. 𝑗𝑗 𝑡𝑡ℎ 𝑖𝑖 𝑖𝑖 𝑝𝑝(1) (2) (3) Quadratic𝑆𝑆 Convergence the pattern directions𝑢𝑢 𝑆𝑆Sp , 𝑗𝑗 , , . .. are nothing but the (1) (2) lines joining the minima found along the directions , 𝑆𝑆𝑝𝑝 , 𝑆𝑆𝑝𝑝 , . . ., respectively. Hence

(1)𝑛𝑛 𝑝𝑝(1) 𝑝𝑝 (2) by T heorem 2.3, t he pa irs of directions (Sn ,Sp 𝑆𝑆 ),𝑆𝑆 (Sp 𝑆𝑆, ), and s o on, are A-

(1) (2) 𝑝𝑝 conjugate. Thus all the directions Sn ,Sp , , . .. are A-conjugate.𝑆𝑆 Since, by T heorem

2.4, a ny s earch m ethod i nvolving minimization𝑆𝑆𝑝𝑝 a long a s et of c onjugate di rections i s quadratically convergent, Powell’s method is quadratically convergent. From the method used for constructing the (1) (2) conjugate directions Sp , , . . . , we find that n minimization cycles are required to complete the construction of𝑆𝑆𝑝𝑝 n conjugate directions. In the cycle, the minimization is done along t he already c onstructed i conjugate directions𝑖𝑖 𝑡𝑡aℎ nd t he n − i nonconjugate (coordinate) d irections. T hus a fter n cycles, al l t he n search d irections ar e m utually conjugate a nd a qua dratic w ill t heoretically be m inimized in 2one-dimensional

minimizations. This proves the quadratic convergence of Powell’s method.𝑛𝑛 It is to be noted that as with most of the numerical techniques, the convergence in many practical problems may not be as good as the theory seems to indicate. Powell’s method may require a lot more iterations to minimize a function than the theoretically estimated number. There are several reasons for this:

1. Since the number of cycles n is valid only for quadratic functions, it will take generally greater than n cycles for nonquadratic functions. 2. The proof of quadratic convergence has been established with the assumption that the exact minimum is found in each of the one-dimensional minimizations. However, the actual minimizing step lengths will be only approximate, and hence the ∗ subsequent directions will not be conjugate. Thus𝜆𝜆𝑖𝑖 the method requires more number of iterations for achieving the overall convergence. 3. Powell’s m ethod, d escribed a bove, c an br eak dow n be fore t he m inimum poi nt i s found. T his i s because t he sear ch d irections might be come de pendent or a lmost

dependent during numerical computation. 𝑆𝑆𝒊𝒊 Powell’s method is a very popular means of successive minimizations along conjugate directions. It is a zero-order method, requiring the evaluation of (x) only. If the

Problem involves n design variables, the basic algorithm is given𝐹𝐹 by the following [3] • Choose a point 0 in the design space.

• Choose the starting𝑥𝑥 vectors , = 1, 2, . . . , (the usual choice is = , where is the unit vector in the -coordinate𝑣𝑣𝑖𝑖 𝑖𝑖 direction). 𝑛𝑛 𝑣𝑣𝑖𝑖 𝑒𝑒𝑖𝑖 𝑒𝑒𝑖𝑖 • Cycle 𝑥𝑥𝑖𝑖 – do with = 1, 2, . . . ,

* Minimize𝑖𝑖 ( ) along𝑛𝑛 the line through 1 in the direction of . Let the minimum point be xi. 𝐹𝐹 𝑥𝑥 𝑥𝑥𝑖𝑖− 𝑣𝑣𝑖𝑖 – end do

– +1 0 (this vector can be shown to be conjugate to +1 produced in the

previous𝑣𝑣𝑛𝑛 ← cycle)𝑥𝑥 − 𝑥𝑥𝑛𝑛 𝑣𝑣𝑛𝑛 – Minimize ( ) along the line through 0 in the direction of +1. Let the minimum

Point be +1𝐹𝐹. 𝑥𝑥 𝑥𝑥 𝑣𝑣𝑛𝑛 – if |xn+1𝑥𝑥 𝑛𝑛 x | < exit loop – do with − =𝟎𝟎 1, 2, .ε . . , 𝑖𝑖 + ( 1 is discarded,𝑛𝑛 the other vectors are reused) ∗– end𝒗𝒗𝒊𝒊 ←do 𝒗𝒗𝒊𝒊 𝟏𝟏 𝑣𝑣 • end cycle

Powell d emonstrated that th e v ectors +1 produced i n su ccessive cy cles ar e mutually

conjugate, so that the minimum point of𝑣𝑣 𝑛𝑛a quadratic surface is reached in precisely cycles. In practice, the merit function is seldom quadratic, but as long as it can be approximated locally, P owell’s m ethod w ill work. O f c ourse, i t us ually t akes m ore t han n cycles t o arrive at the minimum of a nonquadratic function. Note that it takes n line minimizations to construct each conjugate direction.

We s tart w ith p oint 0 and ve ctors v1and v2. T hen w e f ind t he di stance 1 that

minimizes (x0 + v1)𝑥𝑥, f inishing up a t poi nt x1 = x0 + 1 1. Next, w e d etermine𝑠𝑠 2 that m inimizes𝐹𝐹 (𝑠𝑠 1 + 2) which t akes u s t o x2 = x𝑠𝑠1 𝑣𝑣 + 2v2. T he l ast sear ch𝑠𝑠 direction i s v3 =𝐹𝐹 x𝑥𝑥2 x0𝑠𝑠𝑣𝑣. A fter f inding 3 by m inimizing 𝑠𝑠 (x0 + sv3) we g et to x3 = x0 + 3v3, completing− the cycle. 𝑠𝑠 𝐹𝐹 As explained before,𝑠𝑠 the first cycle starts at point 0 and ends up at 3. The second cycle takes u s t o 6, w hich i s t he opt imal p oint. T he di𝑃𝑃 rections 0 3 and𝑃𝑃 3 6 are m utually conjugate. 𝑃𝑃 𝑃𝑃 𝑃𝑃 𝑃𝑃 𝑃𝑃 Powell’s m ethod d oes have a m ajor f law t hat h as t o b e remedied i f (x) is not a

quadratic; the algorithm tends to produce search directions that gradually become𝐹𝐹 linearly dependent, thereby ruining the progress toward the minimum. The source of the problem

is the automatic discarding of 1 at the end of each cycle. It has been suggested that it is

better to throw out the direction𝑣𝑣 that resulted in the largest decrease of ( ), a policy that we ad opt. I t seem s co unter-intuitive t o di scard t he be st di rection, bu𝐹𝐹 t i t𝑥𝑥 i s likely t o be close to the direction added in the next cycle, thereby contributing to linear dependence. As a result of the change, the search directions cease to be mutually conjugate, so that a quadratic form is not minimized in n cycles any more. This is not a significant loss, since in practice ( ) is seldom a quadratic anyway. 2 2 Example 2.7𝐹𝐹 𝑥𝑥Minimize ( 1, 2) = 1 2 + 2 1 + 2 1 2 + 2 from the starting 0 point = using Powell’s𝑓𝑓 𝑥𝑥 𝑥𝑥 method.𝑥𝑥 − 𝑥𝑥 𝑥𝑥 𝑥𝑥 𝑥𝑥 𝑥𝑥 0 SOLUTION𝑿𝑿𝟏𝟏 � � Cycle 1: Univariate Search

0 We minimize f along = = from . To find the correct direction (+ 2 1 1 2 𝑛𝑛 2) for decreasing the value𝑆𝑆 of𝑆𝑆 , we� �take the𝑋𝑋 probe length as = 0.01. As 𝑆𝑆 𝑜𝑜𝑜𝑜 −

𝑆𝑆1 = ( 1) = 0.0, and 𝑓𝑓 𝜀𝜀 + 𝑓𝑓 𝑓𝑓 𝑋𝑋 = ( 1 + 2) = (0.0, 0.01) = 0.0099 < 1 decreases along𝑓𝑓 the direction𝑓𝑓 𝑋𝑋 +𝜀𝜀𝑆𝑆S2. To find𝑓𝑓 the minimizing− step length𝑓𝑓 along 2, we ∗ 𝑓𝑓minimize 𝜆𝜆 𝑆𝑆 2 ( 1 + 2) = (0.0, ) = 0 Using golden section method𝑓𝑓 𝑋𝑋 we get𝜆𝜆𝑆𝑆 = 12𝑓𝑓 , 𝜆𝜆 𝜆𝜆 − = 𝜆𝜆 + = . 2 1 2 0.5 ∗ ∗ 0 0.5 Next we minimize along = 𝜆𝜆from =𝑤𝑤𝑤𝑤 ℎ𝑎𝑎 𝑎𝑎𝑎𝑎since𝑋𝑋 𝑋𝑋 𝜆𝜆 𝑆𝑆 � � 1 1 2 0.0 𝑓𝑓 2 𝑆𝑆 = �( �2) =𝑋𝑋 (0.0,� 0.5)� = 0.25

+ = (𝑓𝑓 2 +𝑓𝑓 𝑋𝑋1) = 𝑓𝑓(0.01, 0.50) =− 0.2298 > 2 𝑓𝑓 =𝑓𝑓 𝑋𝑋 ( 2 𝜀𝜀𝑆𝑆 1) =𝑓𝑓 ( 0.01, 0.50) − = 0.2698𝑓𝑓 ( ) ( ) 2 decreases al ong𝑓𝑓 − 1. 𝑓𝑓As𝑋𝑋 − 𝜀𝜀𝑆𝑆2 1 𝑓𝑓=− , 0.50 =− 2 2 0.25,using 1 0.5 golden𝑓𝑓 section method−𝑆𝑆 we get𝑓𝑓 𝑋𝑋 = − . Hence𝜆𝜆𝑆𝑆 𝑓𝑓 =−𝜆𝜆 =𝜆𝜆 − 𝜆𝜆 − 2 3 2 1 0.5 ∗ ∗ 0 0.5 − Now we minimize along 𝜆𝜆 = from 𝑋𝑋 = 𝑋𝑋 − ,as𝜆𝜆 𝑆𝑆 = � ( ) � = 0.75, 2 1 3 0.5 3 + − 𝟑𝟑 = ( 3 + 𝑓𝑓2) = 𝑆𝑆 ( 0.5,� 0.�51) =𝑋𝑋 0.7599� <� 3𝑓𝑓, f decreases𝑓𝑓 𝑿𝑿 al− ong + 2 direction.𝑓𝑓 𝑓𝑓 𝑋𝑋 𝜀𝜀𝑆𝑆 𝑓𝑓 − − 𝑓𝑓 𝑺𝑺 Since 2 ( 3 + 2) = ( 0.5, 0.5 + ) = 0.75, using gol den s ection m ethod 1 we get = 𝑓𝑓 𝑋𝑋 𝜆𝜆𝑆𝑆 2 𝑓𝑓 − 𝜆𝜆 𝜆𝜆 − 𝜆𝜆 − ∗ This gives𝜆𝜆 0.5 = + = 4 3 2 1.0 ∗ − Cycle 2: Pattern Search 𝑋𝑋 𝑋𝑋 𝜆𝜆 𝑆𝑆 � � Now we generate the first pattern direction as 1 0 (1) 0.5 = = 1 = 4 2 2 0.5 1 2 − 𝑆𝑆𝑝𝑝 𝑋𝑋 − 𝑋𝑋 �− � − � � � �

( ) and minimize along from X4. Since 𝟏𝟏 𝑓𝑓 𝑺𝑺 𝒑𝒑 4 = ( ) = 1.0 + (1) 𝑓𝑓 = 𝑓𝑓 ( 𝑋𝑋𝟒𝟒 + − = ( 0.5 0.005, 1 + 0.005) 𝑓𝑓 𝑓𝑓 𝑋𝑋 𝟒𝟒 𝜀𝜀𝑆𝑆 =𝒑𝒑 ( 𝑓𝑓0.505− , 1.−005) = 1.004975 (1) f decreases in the positive direction of . As𝑓𝑓 − −

(1) 𝒑𝒑 2 ( 4 + = ( 0.5 0.5 , 1.0𝑆𝑆 + 0.5 ) = 0.25 0.50 1.00, using𝑓𝑓 𝑋𝑋 golden𝜆𝜆𝑆𝑆𝒑𝒑 section𝑓𝑓 method− −we get𝜆𝜆 = 1.0 and𝜆𝜆 hence 𝜆𝜆 − 𝜆𝜆 − ∗ 1 𝜆𝜆 1 (1) 1 X = X + S = + 1.0 2 = 5 4 2 1 − 1.5 ∗ 1 − 𝜆𝜆 𝒑𝒑 �− � � 2 � � � The point X can be identified to be the optimum point.

If we do not𝟓𝟓 recognize 5 as the optimum point at this stage, we proceed to 0 minimize f along the direction𝑋𝑋 = from . Then we would obtain 2 1 5 + 5 = (X5) = 1.25, =𝑆𝑆 ( �5 �+ 𝑋𝑋2) > 5, and = (X5 S2) > f5 − This𝑓𝑓 shows𝑓𝑓 that f −cannot be𝑓𝑓 minimized𝑓𝑓 𝑋𝑋 along𝜀𝜀𝑆𝑆 S2, and𝑓𝑓 hence𝑓𝑓 X5 will𝑓𝑓 be the− optimum𝜀𝜀 point. In this example the convergence has been achieved in the second cycle itself. This is to be expected in this case, as f is a quadratic function, and the method is a Quadratically convergent method. The n umerical re sult f or th is example i s s ummarized be low ( see a ppendix f or t he Powell’s matlab code). xmin fmin ------(-1.503, 2.356) -0.872520631787970 (-1.000, 1.500) -1.249999992736614 (-1.000, 1.500) -1.249999999974476 The minimum point (-1.000, 1.500) is reached at the 3th cycle.

Chapter 3 Trust region methods

3.1Trust region frame work In the last section of the third chapter we explained the trust region methods as one of the solution procedures for treating a nonlinear programming problem. Here a b rief review will be helpful to follow the integration of trust region method into DFO algorithm. The trust region frame work is usually in the context when at least the gradient and sometimes Hessian of the objective function can be evaluated or estimated accurately.

Main steps of a typical trust region method are [2]

1. Given a current iterate, build a good local approximation model. 2. Choose a neighborhood around the current iterate when the model ‘is trusted’ to be accurate. Minimize the model in this neighborhood. 3. Determine if the step is successful by evaluating the true objective function at the new poi nt c omparing t he t rue r eduction i n value of t he obj ective w ith t he reduction predicted by the model. 4. If the step is successful, accept the new point as the next iterate. Increase the size of the trust region, if the success is really significant. Otherwise reject the new point and reduce the size of the trust region. 5. Repeat until convergence.

For a m odel ba sed on t he T aylor s eries e xpansion w e k now th at if th e t rust re gion is made small enough, then the approximation is sufficiently accurate and the algorithm will make a successful step (unless the optimum has been reached).

To use trust region fame work in derivative free case we use an alternative approximation technique, which does not use derivative estimates. Quadratic interpolation is one such technique which can be applied successfully with in a trust region method. However, we need to guarantee that the approximation model is locally good: that is that a successful step will be made after sufficient reduction of the trust region.

3.2 Quadratic interpolation Consider the problem of interpolating a given or suitably chosen function : by a { 1 2 }𝑛𝑛 quadratic p olynomial ( ) at a ch osen set o f p oints = , , … ,𝑓𝑓 ℝ → ℝ. the 𝑝𝑝 𝑛𝑛 quadratic polynomial 𝑄𝑄( 𝑥𝑥) is an interpolation of the function𝑌𝑌 (𝑦𝑦 ) 𝑦𝑦with respect𝑦𝑦 ⊆ to ℝ the set if 𝑄𝑄 𝑥𝑥 𝑓𝑓 𝑥𝑥 𝑌𝑌 = ( = 1,2, … , ) (3.1) 𝑗𝑗 𝑗𝑗 𝑄𝑄�𝑦𝑦 � 𝑓𝑓�𝑦𝑦 � 𝑗𝑗 𝑝𝑝 Such, t hat f i s know n a t a ll f initely m any e lements o f . H ere w e n ote that is our model w hich w e de fined a s in th e f irst c hapter. W𝑌𝑌 ith th e c ontext o f trust𝑄𝑄 region methods; that 𝑚𝑚𝑘𝑘

( ) = ( )

𝑘𝑘 Suppose that the space of quadratic polynomials𝑚𝑚 𝑥𝑥 𝑄𝑄 is𝑥𝑥 spanned by a set of basis functions

(. )( = 1, 2, … ).

𝑖𝑖 𝜙𝜙Then any𝑖𝑖 quadratic𝑞𝑞 polynomial can be written in terms of these basis functions, that is

( ) = 𝑞𝑞 ( ), =1 𝑄𝑄 𝑥𝑥 � 𝛼𝛼𝑖𝑖 𝜙𝜙𝑖𝑖 𝑥𝑥 𝑖𝑖 Where, the coefficient vector = ( 1, 2,…, ) is to be determined. We need 𝑇𝑇 𝑞𝑞 1 1𝛼𝛼 𝛼𝛼 𝛼𝛼 𝛼𝛼 = 1 + + + = ( + 1)( + 2) Points to f ind a ll o f th e in terpolation 2 2 𝑛𝑛− 1 parameters.𝑞𝑞 𝑛𝑛 If𝑛𝑛 we 𝑛𝑛have � �( + 1𝑛𝑛)( + 2)𝑛𝑛 points, we can ensure that the quadratic model is 2 entirely de termined by t he𝑛𝑛 f ollowing𝑛𝑛 s ystem of e quations. When t his i s t he c ase, t he system of linear equations

i i = ( = 1,2, … , ) (3.2) i=1 𝑗𝑗 𝑗𝑗 � α ϕ �𝑦𝑦 � 𝑓𝑓�𝑦𝑦 � 𝑗𝑗 𝑝𝑝 Can be solved to, derive the interpolation parameters. The parameter or coefficient matrix of this system is of the type × and looks as follows; 56 𝑝𝑝 𝑞𝑞

1 1 1( ) ( ) ( ) = . (3.3) 𝜙𝜙 𝑦𝑦 ⋯ 𝜙𝜙𝑞𝑞 𝑦𝑦 1( ) ( ) 𝜙𝜙 𝑌𝑌 � ⋮𝑝𝑝 ⋱ ⋮ 𝑝𝑝 � 𝜙𝜙 𝑦𝑦 ⋯ 𝜙𝜙𝑞𝑞 𝑦𝑦 For a given set of points and a set of function values, an interpolation polynomial exists and is unique if and only if ( ) is square, that is = , and nonsingular. Theoretically, this m eans that t he sy stem 𝜙𝜙(3.3)𝑌𝑌 can be s olved but𝑝𝑝 i𝑞𝑞 n pr actice t he s olvability of t his system depends on whether the matrix ( ) is ill-conditioned or not.

From t he a bove a rgument, w e c onclude𝜙𝜙 𝑌𝑌 that if w e m anage t o de termine t he qu adratic 1 polynomial uniquely, then we have = = ( + 1)( + 2). however, we need to be 2 1 aware o f t he f act t hat not an y ( 𝑝𝑝+ 1)(𝑞𝑞 + 2)𝑛𝑛 point i𝑛𝑛 n can be i nterpolated by a 2 𝑛𝑛 quadratic p olynomial. O bviously a𝑛𝑛 lthough,𝑛𝑛 3 distinct po intsℝ c an be interpolated by a quadratic f unction in u nivariate interpolation, this is not the c ase in multivariate interpolation. i n f act, 3 points w ill not be e nough t o obt ain a qua dratic i nterpolation polynomial w henever t he di mension of t he i nterpolation s pace i s gr eater t han one . B y inspection, one can s ee t hat 6 p oints a re ne cessary t o obt ain a unique qua dratic interpolation of a f unction i n t wo di mensions. H owever, a n i nterpolation s et Y of s ix points l ying on one l ine c annot be i nterpolated by a qua dratic f unction. Therefore, t he points of Y must satisfy a geometric condition to ensure the existence and uniqueness of the quadratic model. This geometric condition is known as the poisedness of the point set. Definition3.1: A set o f points Y i s cal led p oised, w ith r espect t o a g iven su bspace o f polynomials, if the considered function ( ) can be interpolated at the points of Y by the polynomials f rom t his s ubspace, t hat i𝑓𝑓 s, i𝑥𝑥 f t here a lways e xists a s uitable i nterpolating polynomial in that sub space.

Remark: In DFO, poisedness is a necessary geometric condition on the interpolation set Y that ensures the existence and uniqueness of the quadratic model ( )wanted and used in

DFO algorithm. 𝑄𝑄 𝑥𝑥

We illustrate the implied geometric character of poisedness by the following examples.

Example3.1: Suppose n = 2 a nd Y is a set of six points on a unit circle. Then, 2

cannot be interpolated by a polynomial of the form 𝑌𝑌 ⊆ℝ 2 2 0 + 1 1 + 2 2 + 1,1 1 + 1,2 1 2 + 2,2 2 . Hence, is not poised with respect to

the𝑎𝑎 𝑎𝑎 𝑥𝑥 𝑎𝑎 𝑥𝑥 𝑎𝑎 𝑥𝑥 𝑎𝑎 𝑥𝑥 𝑥𝑥 𝑎𝑎 𝑥𝑥 𝑌𝑌 Space of q uadratic po lynomials. On the o ther h and, Y can b e interpolated b y a polynomial of the form 2 2 3 0 + 1 1 + 2 2 + 1,1 1 + 1,2 1 2 + 2,2 2 + 1,1,1 1 . Therefore is poised in an

appropriate𝑎𝑎 𝑎𝑎 𝑥𝑥 subspace𝑎𝑎 𝑥𝑥 of𝑎𝑎 the𝑥𝑥 space𝑎𝑎 of𝑥𝑥 cubic𝑥𝑥 𝑎𝑎polynomials.𝑥𝑥 𝑎𝑎 𝑥𝑥 𝑌𝑌 2 2 Example3.2: Consider the two quadrics 1( , ) = 2 + and ( ) 2 2 ( ) 2 , = + , Whose i ntersection𝑞𝑞 cu𝑥𝑥 rve𝑦𝑦 projects𝑥𝑥 𝑥𝑥 in −t he𝑦𝑦 , to th e 𝑞𝑞conics𝑥𝑥 𝑦𝑦 𝑥𝑥 𝑦𝑦 𝐼𝐼 𝑥𝑥 𝑦𝑦 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ( , ) = 0 ( , ) = 2 . Namely,

𝐶𝐶 𝑥𝑥 𝑦𝑦 𝑤𝑤𝑤𝑤𝑤𝑤ℎ 𝐶𝐶 𝑥𝑥 𝑦𝑦 𝑥𝑥 −𝑦𝑦 2 + 2 2 = 2 + 2, 2 𝑥𝑥 2𝑥𝑥 =− 2𝑦𝑦 𝑥𝑥 𝑦𝑦 2 ⇔ 𝑥𝑥= 𝑦𝑦 Definition3.2: A set⇔ o 𝑥𝑥 f p oints𝑦𝑦 Y i s cal led , i f it r emains poi sed unde r small pe rturbations. F or e xample, i f = 2, s𝑤𝑤 ix𝑤𝑤𝑤𝑤𝑤𝑤 poi− nts𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 a lmost on a l ine m ay define a poised set . However, si nce so me s mall𝑛𝑛 pe rturbation of t he poi nts m ight m ake t hem aligned, it is not a well-poised set. As we mentioned before, a set of points is poised if ( ) is nonsingular with respect to

the space of quadratic polynomials. If we look for an𝜙𝜙 understanding𝑌𝑌 of this by the fact that an interpolation polynomial exists and is unique if and only if ( ) is square and

nonsingular, then we conclude: 𝜙𝜙 𝑌𝑌

is poised if the determinant of ( ) is nonvanishing, that is, 1 1 𝑌𝑌 1( ) (𝜙𝜙 )𝑌𝑌 ( ) = = ( ) 0 (3.4) 𝜙𝜙 𝑦𝑦 ⋯ 𝜙𝜙𝑞𝑞 𝑦𝑦 1( ) ( ) 𝛿𝛿 𝑦𝑦 𝑑𝑑𝑑𝑑𝑑𝑑 � ⋮𝑝𝑝 ⋱ ⋮ 𝑝𝑝 � 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑌𝑌 ≠ 𝜙𝜙 𝑦𝑦 ⋯ 𝜙𝜙𝑞𝑞 𝑦𝑦

The measure of poisedness in a DFO algorithm can be explained by a methodology based on Newton fundamental polynomials. In DFO, the approach of handling the poisedness in combination with the Newton fundamental polynomials is a distinctive issue. This is so, because it a llows us no t onl y to c hoose a goo d i nterpolation s et f rom a gi ven s et of sample points but also to find a new sample point which improves the poisedness of the interpolation set. If w e ha d no s uch a us eful t ool, t hen, removing a poi nt f rom t he s et would have caused the conditioning of the coefficient matrix to get worse in the updating step for the interpolation set of DFO. There is also a detailed work on DFO in which the quadratic a pproximation m odel i s de termined by L agrange i nterpolation pol ynomials instead of the Newton fundamental polynomials Let us focus on the Newton fundamental points: The points y in our interpolation Set = { 1, 2,…, } is a subset of are organized into + 1 blocks, where 𝑝𝑝 𝑛𝑛 + 1 [ ]𝑌𝑌 ( =𝑦𝑦 1, 2,𝑦𝑦 … , )𝑦𝑦 is the𝑤𝑤ℎ𝑖𝑖 𝑖𝑖ℎ block, containingℝ [ ] = 𝑑𝑑 points. 𝑙𝑙 𝑡𝑡ℎ 𝑙𝑙 𝑙𝑙 𝑛𝑛 − Definition3.3:𝑌𝑌 𝑙𝑙 A 𝑑𝑑single Newton𝑙𝑙 fundamental polynomial�𝑌𝑌 � � of degree� corresponds to each 𝑙𝑙 [ ] [ ] point ( ) satisfying the following conditions: 𝑙𝑙 [ 𝑖𝑖 ] 𝑙𝑙 𝑙𝑙 [ ] ( )𝑦𝑦 =∈ 𝑌𝑌 for all ( ) with {0,1,2, … , }. 𝑙𝑙 𝑗𝑗 𝑚𝑚 𝑗𝑗 𝑚𝑚 𝑁𝑁Here𝑖𝑖 𝑦𝑦 we refer𝛿𝛿 to𝑖𝑖𝑖𝑖 𝛿𝛿Kronecker’s𝑙𝑙𝑙𝑙 𝑦𝑦 symbol for𝑚𝑚 , = ∈ 0,1,2, … ,𝑙𝑙 : 1, = = 𝑖𝑖 𝑗𝑗 𝑙𝑙 0, . 𝑖𝑖𝑖𝑖 𝑖𝑖 𝑗𝑗 Consider t he s et of i nterpolation 𝛿𝛿p𝑖𝑖𝑖𝑖 oints� be ing pa rtitioned i nto t hree di sjoint bl ocks 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 [0], [1], [2] which correspond to the constant term, the linear terms and the quadratic [0] [1] 𝑌𝑌terms𝑌𝑌 of a𝑌𝑌 quadratic polynomial, respectively. Hence has a single element, has ( +1) elements and [2] has elements. The basis { (. )} of NFP is also partitioned into 2 𝑌𝑌 𝑌𝑌 𝑛𝑛 𝑛𝑛 𝑛𝑛 three blocks {𝑌𝑌 0(. )}, { 1(. )},{ 2(. )} with the approximate𝑁𝑁𝑖𝑖 number of elements in each 0 block. T he unique𝑁𝑁𝑖𝑖 e lement𝑁𝑁𝑖𝑖 of 𝑁𝑁{ 𝑖𝑖 (. )} is a pol ynomial of de gree zero. E ach of t he n ( +1) elements { 1(. )} is a polynomial of𝑖𝑖 degree one and, finally, each of the elements 𝑁𝑁 2 𝑛𝑛 𝑛𝑛 of { 2(. )}𝑁𝑁 is𝑖𝑖 a polynomial of degree two.

The𝑁𝑁 basis𝑖𝑖 elements and the interpolation points are set in one to one correspondence, so that t he poi nts f rom bl ock [ ] correspond t o the pol ynomials f rom bl ock { 1(. )}. a 𝑙𝑙 𝑌𝑌 59 𝑁𝑁𝑖𝑖

Newton Fundamental Polynomial (NFP) (. ) and a point are in the correspondence 𝑖𝑖 with e ach other i f a nd only i f t he value of𝑁𝑁𝑖𝑖 t hat pol ynomial𝑦𝑦 a t that point i s one a nd i ts value a t a ny ot her poi nt i n the s ame bl ock or i n a ny p revious bl ock i s z ero. In ot her words, if corresponds to , then = 1 and = 0 for all other indices 𝑖𝑖 𝑖𝑖 𝑗𝑗 Example3.3𝑦𝑦 : consider the quadratic𝑁𝑁𝑖𝑖 𝑁𝑁interpolation𝑖𝑖�𝑦𝑦 � on𝑁𝑁 a𝑖𝑖 �𝑦𝑦plane.� We require six interpolation𝑗𝑗 points using three blocks. [0] = {(0,0)}, [1] = {(1,0), (0,1), } [2] = {(2,0), (1,1), (0,2)} 2 𝑌𝑌 Corresponding𝑌𝑌 t o t he i nitial b asis𝑎𝑎 f𝑎𝑎𝑎𝑎 unctions;𝑌𝑌 1, 1, 2, 1 2, 2 respectively. Applying some procedures we find the NFP: 𝑥𝑥 𝑥𝑥 𝑥𝑥 𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥 2 2 0 = 1, 1 = , 1 = , 2 = 1 1 , 2 = 2 = 2 2 1 1 1 2 2 1 2 2 1 2 3 2 𝑥𝑥 −𝑥𝑥 𝑥𝑥 −𝑥𝑥 Algorithm𝑁𝑁 𝑁𝑁 (derivative𝑥𝑥 𝑁𝑁 free𝑥𝑥 trust𝑁𝑁 region method)𝑁𝑁 𝑥𝑥 𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎 𝑁𝑁 The steps of derivative free trust region methods are given as follows [6] Step 0: initializations Let a starting point and the value of ( ) be given.

Choose an initial trust𝑥𝑥𝑠𝑠 region radius 0𝑓𝑓>𝑥𝑥 0.𝑠𝑠 Choose at least one additional point ∆not further than 0> 0 away from to create an initial w ell-poised interpolation s et and initial ∆b asis of N ewton𝑥𝑥 𝑠𝑠f undamental polynomials. 𝑌𝑌 Determine 0 Y which has the best objective function value; i.e. 0 solves the problem min ( ) 𝑥𝑥 ∈ . 𝑥𝑥

Set = 0, ch oose p arameters 0, 1,𝑠𝑠 𝑡𝑡 𝑥𝑥∈𝑌𝑌 𝑓𝑓 0𝑥𝑥 < 0 < 1 < 1 0 < 0 1 < 1

2 𝑘𝑘 𝜂𝜂 𝜂𝜂 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝜂𝜂 𝜂𝜂 𝑎𝑎𝑎𝑎𝑎𝑎 𝛾𝛾 ≤ 𝛾𝛾 ≤ 𝛾𝛾Step 1: build the model using the interpolation set Y and basis of NFP, build a quadratic interpolation polynomial ( ).

Step 2: minimize the model𝑄𝑄𝑘𝑘 with𝑥𝑥 in the trust region. Set = { : }. compute the point such that 𝑛𝑛 ( ) ( ) 𝛽𝛽𝑘𝑘 𝑥𝑥 ∈ℝ ‖𝑥𝑥𝑘𝑘 −𝑥𝑥 ‖ ≤ ∆𝑘𝑘 = min 𝑥𝑥�. 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 Compute ( ) and the ratio 𝑄𝑄 𝑥𝑥� 𝑥𝑥∈𝛽𝛽𝑘𝑘 𝑄𝑄 𝑥𝑥

𝑓𝑓 𝑥𝑥�𝑘𝑘 60

( ) ( ) = . ( ) ( ) 𝑓𝑓 𝑥𝑥𝑘𝑘 − 𝑓𝑓 𝑥𝑥�𝑘𝑘 𝜌𝜌𝑘𝑘 Step3: update the interpolation set 𝑄𝑄𝑘𝑘 𝑥𝑥𝑘𝑘 − 𝑄𝑄𝑘𝑘 𝑥𝑥�𝑘𝑘

• If 0, Include in , dropping one of the existing interpolation points if

necessary.𝜌𝜌𝑘𝑘 ≥ 𝜂𝜂 𝑥𝑥�𝑘𝑘 𝑌𝑌

• If < 0, include in , if it improves the quality of the model

• If 𝜌𝜌𝑘𝑘 < 𝜂𝜂 0 and t here𝑥𝑥� 𝑘𝑘ar e l𝑌𝑌 ess t hat + 1 points in t he i ntersection of and , generate𝜌𝜌𝑘𝑘 𝜂𝜂 ne w i nterpolation poi nt 𝑛𝑛 in , w hile preserving/ i mpronving𝑌𝑌 w ell𝛽𝛽𝑘𝑘-

poissedness. 𝛽𝛽𝑘𝑘 • Update the basis of the Newton Fundamental polynomials. Step 4: update the trust region radius.

• If 1, increase the trust region radius [ ] 𝜌𝜌𝑘𝑘 ≥ 𝜂𝜂 +1 , 2 . • If < 0 and the cardinality∆ 𝑘𝑘of ∈ ∆𝑘𝑘 𝛾𝛾 was∆𝑘𝑘 less than + 1 when was 𝑘𝑘 𝑘𝑘 𝑘𝑘 computed,𝜌𝜌 𝜂𝜂 reduce the trust region radius𝑌𝑌 ∩𝛽𝛽 𝑛𝑛 𝑥𝑥� +1 [ 0 , 1 ]

∆𝑘𝑘 ∈ 𝛾𝛾 ∆𝑘𝑘 𝛾𝛾 ∆𝑘𝑘 +1=

Step 5: update the current iterate. 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑠𝑠𝑠𝑠𝑠𝑠 ∆𝑘𝑘 ∆𝑘𝑘 Determine with b est objective function value b y so lving t he d iscrete

problem 𝑥𝑥̅𝑘𝑘 min 𝑖𝑖 𝑖𝑖 𝑦𝑦 ∈𝑌𝑌 𝑓𝑓�𝑦𝑦 � 𝑖𝑖 If improvement is sufficient (in the𝑦𝑦 sense≠𝑥𝑥𝑘𝑘 of prediction) that is ( ) ( ) ( ) ( ) Then we put = . Set = . otherwise ( ) ( ) 0 ( ) ( ) +1 𝑓𝑓 𝑥𝑥𝑘𝑘 −𝑓𝑓 𝑥𝑥̅𝑘𝑘 𝑓𝑓 𝑥𝑥𝑘𝑘 −𝑓𝑓 𝑥𝑥̅𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑘𝑘 𝑄𝑄 +1𝑥𝑥 =−𝑄𝑄 𝑥𝑥�. increase≥ 𝜂𝜂 k by one, and𝜌𝜌̅ go to𝑄𝑄 step𝑥𝑥 −𝑄𝑄 one.𝑥𝑥� 𝑥𝑥 𝑥𝑥̅

𝑥𝑥 𝑘𝑘 𝑥𝑥𝑘𝑘

Appendix (MATLAB codes) 1. Code for Golden search

function [xmin,fmin] = goldSearch(f,a,b)

% Golden section search for the minimum of f(x).

% The minimum point must be bracketed in a <= x <= b.

% usage: [fmin,xmin] = goldsearch(func,xstart,h)

% input:

% f = handle of function that returns f(x).

% a, b = limits of the interval containing the minimum.

% output:

% fmin = minimum value of f(x).

% xmin = value of x at the minimum point.

N=20;% N is the number of function evaluations done. c = (-1+sqrt(5))/2; x1 = c*a + (1-c)*b; f1 = feval(f,x1); x2 = (1-c)*a + c*b; f2 = feval(f,x2);

%fprintf('------\n');

fprintf(' x1 x2 f(x1) f(x2) b - a \n');

fprintf('------\n');

fprintf('%.4e %.4e %.4e %.4e %.4e\n', x1, x2, f1, f2, b-a);

% Main loop for i = 1:N-2

if f1 < f2

b = x2;

x2 = x1;

f2 = f1;

x1 = c*a + (1-c)*b;

f1 = feval(f,x1);

else

a = x1;

x1 = x2;

f1 = f2;

x2 = (1-c)*a + c*b;

f2 = feval(f,x2);

end;

fprintf('%.4e %.4e %.4e %.4e %.4e\n', x1, x2, f1, f2, b-a);

end

if (abs(b-a) < eps)

fprintf('succeeded after %d steps\n', i); 63

return;

end;

if f1 < f2; fmin = f1; xmin = x1;

else

fmin = f2; xmin = x2;

end

2. Code for Powell method

The algorithm for Powell’s method is listed below. It utilizes two arrays: df contains the decreases of the merit function in the first n moves of a cycle, and the matrix u stores the corresponding direction vectors (one vector per column).

𝒊𝒊 To i mplement t his algorithm w𝒗𝒗 e us e t he gol d bracket a nd t he go ld search algorithms together with it.

i. Gold bracket

function [a,b] = goldBracket(fun,x1,h)

% Brackets the minimum point of f(x).

% USAGE: [a,b] = goldBracket(func,xStart,h)

% INPUT:

% func = handle of function that returns f(x).

% x1 = starting value of x.

% h = initial step size used in search.

% OUTPUT:

% a, b = limits on x at the minimum point. 64

c = 1.618033989; f1 = feval(fun,x1); x2 = x1 + h; f2 = feval(fun,x2);

% Determine downhill direction & change sign of h if needed. if f2 > f1

h = -h;

x2 = x1 + h; f2 = feval(fun,x2);

% Check if minimum is between x1 - h and x1 + h

if f2 >f1

a = x2; b = x1 - h; return

end end

% Search loop for i = 1:50

h = c*h;

x3 = x2 + h; f3 = feval(fun,x3);

if f3 >f2

a = x1;

b = x3;

return

end 65

x1 = x2; x2 = x3; f2 = f3;

end

error('goldbracket did not find minimum')

ii. Golden search

function [xmin,fmin] = goldensearch(f,a,b)

% Golden section search for the minimum of f(x).

% The minimum point must be bracketed in a <= x <= b.

% usage: [fmin,xmin] = goldsearch(func,xstart,h)

% input:

% f = handle of function that returns f(x).

% a, b = limits of the interval containing the minimum.

% output:

% fmin = minimum value of f(x).

% xmin = value of x at the minimum point.

%eps = 1.0e-3;

N = 20; % N is the number of function evaluations done. c = (-1+sqrt(5))/2;

x1 = c*a + (1-c)*b;

f1 = feval(f,x1);

x2 = (1-c)*a + c*b;

f2 = feval(f,x2); 66

% Main loop for i = 1:N-2

if f1 < f2

b = x2;

x2 = x1;

f2 = f1;

x1 = c*a + (1-c)*b;

f1 = feval(f,x1);

else

a = x1;

x1 = x2;

f1 = f2;

x2 = (1-c)*a + c*b;

f2 = feval(f,x2);

end;

end

if f1 < f2

fmin = f1; xmin = x1; else

fmin = f2; xmin = x2; end 67

iii. function powell clc global x V x=[0;0]; tol = 1.0e-4;h = 0.5; if size(x,2) > 1; x = x'; end % x must be column vector n = length(x); % Number of design variables df = zeros(n,1); % Decreases of f stored here u = eye(n); % Columns of u store search directions V fprintf(' xmin fmin \n'); fprintf(' ------\n'); for j = 1:30 % Allow up to 30 cycles

xOld = x;

fOld = feval(@myfun2,xOld);

% First n line searches record the decrease of f

for i = 1:n

V = u(1:n,i);

[a,b] = goldBracket(@fLine,0.0,h);

[s,fmin] = goldensearch(@fLine,a,b);

df(i) = fOld - fmin;

fOld = fmin; 68

x = x + s*V;

end

% Last line search in the cycle

V = x - xOld;

[a,b] = goldBracket(@fLine,0.0,h);

[s,fmin] = goldensearch(@fLine,a,b); x = x + s*V; fprintf(' (%3.3f,%3.3f) %2.15f \n',x, fmin); if sqrt(dot(x-xOld,x-xOld)/n) < tol

y = x; break end

% Identify biggest decrease of f & update search

% directions

iMax = 1; dfMax = df(1);

for i = 2:n

if df(i) > dfMax

iMax = i; dfMax = df(i);

end

for i = iMax:n-1

u(1:n,i) = u(1:n,i+1); 69

end

u(1:n,n) = V;

end

fprintf('The minimum point (%2.3f,%2.3f)is reached at the %3dth cycle.\n',y,j)

function z = fLine(s) % f in the search direction V

global x V

z = feval(@myfun2,x+s*V);

for the example on page 55 we have the following. function y = myfun2(x)

% y = x(1)-x(2)+2*(x(1)).^2+2*x(1)*x(2)+(x(2)).^2;

Reference

[1] Basak Aktek, Derivative Free Optimization Methods: Application in Stirrer Configuration and data clustering, M.Sc Thesis, the Middle East Technical Universitye, July, 2005.

[2] Igor Griva, Stephen G.Nash, Ariela Sofer, Linear and Nonlinear Optimization second edition George Mason University Fairfax, Virginia 2009.

[3] Jaan Kiusalaas, Numerical Methods in engineering with matlab, Second Edition, Cambride university printing press, 2010. [4] Jorge J. More and Stefan M.Wild: Benchmarking Derivative Free Optimization Algorithms. Preprint ANL/MCS-P1471-1207 December 2007. [5] Jorge Necedah and Stephen J. W. Wright: Numerical methods in optimization. Springer-Verlag New York, Inc. 1999. 2 Edition. 𝑛𝑛𝑛𝑛 [6] Katya Sheinberg, Derivative Free Optimization method, CS 4/6-TE3, SEW ENG 4/6- TE3, Tamas Terlaky,IBM Watson Research Center

[7] Melissa Weber Men donça, M ultilevel O ptimization: C onvergence T heory, Algorithms a nd A pplication to D erivative-Free O ptimization, Ph.D T hesis, Facultés Universitaires Notre-Dame de la Paix Faculté des Sciences rue de Bruxelles, 61, B-5000 Namur, Belgium, 2009

[8] Mokhtar S. Bazaraa, Hanit D. Sherali C.M. Shetty; Nonlinear programming: Theory and algorithms, 2nd edition.

[9] Singiresu S. Rao Engineering Optimization Theory and Practice, by John Wiley & Sons, Fourth Edition, Copyright, 2009. [10] Wenyu Sun, Optimization Theory and Methods Nonlinear Programming, Nanjing Normal U niversity, N anjing, C hina YA-XIANG Y UAN Chinese A cademy o f S cience, Beijing, China Springer Science+Business Media, LLC, 2006.