Applied Mathematical Sciences, Vol. 8, 2014, no. 55, 2731 - 2741 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43175

The Wolfe Epsilon Steepest Descent Algorithm

Hakima Degaichia

Department of Mathematics, University of Tebessa 12000 Tebessa, Algeria

Rachid Benzine

LANOS, Badji Mokhtar University, Annaba, Algeria

Copyright c 2014 Hakima Degaichia and Rachid Benzine. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In this article, we study the unconstrained minimization problem

n (P ) min f(x): x R : f 2 g where f : Rn R is a continously di¤erentiable function. We introduce a new ! algorithm which accelerates the convergence of the steepest descent method. We study the global convergence of the new algorithm, named the Wolfe epsilon steep- est descent algorithm, by using Wolfe inexact ([35],[36]). In [16], [33], Benzine, Djeghaba and Rahali studied the same problem by using exact and Armijo inexact line serachs. Numerical tests show that Wolfe Epsilon steepest descent Al- gorithm accelerates the convergence of the method and is more successful than Armijo Epsilon Steepest descent, Exact Epsilon steepest descent algorithm and Steepest descent algorithm.

Mathematics Subject Classi…cation: 90C26; 65H10

Keywords: Unconstrained optimization. . Epsilon steep- est descent Algorithm. Wolfe line search. Armijo line search. Exact line search. Global convergence 2732 Hakima Degaichia and Rachid Benzine

1. INTRODUCTION Our problem is to minimize a function of n variables n (1.1) min f(x); x R ; f 2 g where f : Rn R is smooth and its gradient g is available. We consider iterations of the! form

(1.2) xk+1 = xk + kdk where dk is a search direction and k is a steplenghth obtained by means of a one-dimensional search. In conjugate gradient methods the search direction is of the form

dk = gk + kdk 1 where the scalar k is chosen so that the method reduces to the linear con- jugate gradient method when the function is quadratic and the line search is exact. If k = 0; then one obtains the steepest descent méthod. Another broad class of methods de…nes the search direction by 1 (1.3) dk = B gk k where Bk is a nonsingular symmetric matrix. Important special cases are given by:

Bk = I (the steepest descent method) 2 Bk = f(xk)(Newton0s method) r Variable metric methods are also of the form (1.3), but in this case Bk is not only a function of xk but depends also on Bk 1and xk 1: All these methods are implemented so that dk is a descent direction i.e. so that T dk gk < 0; which guarantees that the function can be decreased by taking a small step along dk: The convergence properties of line search methods can be studied by mea- suring the goodness of the search direction and by considering the length of the step. The quality of the search direction can be studied by monitoring the angle between the steepest descent direction gk and the search direction. Therefore we de…ne T dk gk cos(k) = gk dk k k k k The length of the step is determined by a line search iteration. In computing the step length k, we face tradeo¤. We would like to choose k to give a substancial reduction to f; but at the same time, we do not want to spend too much time making the choice. The ideal choice would be the global minimizer of the univariate function '( ) de…ned by

'( ) = f(xk + dk) The Wolfe epsilon steepest descent algorithm 2733

Exact line search consists to …nd k as a solution of the following one-dimension optimal problem:

f(xk + kdk) = min f(xk + dk): > 0 f g However, commonly, the exact line search is expensive. Especially, when an iterate is far from the solution of the problem, it is not e¤ective to solve exactly one-dimension subproblem. A strategy that will play a central role in this paper consists in accepting a steplength k if it satis…es the two conditions: T (1.4) f(xk + kdk) f(xk) + 1 kg dk  k T T (1.5) g(xk + kdk) dk 2g dk  k where 0 < 1 < 2 < 1: The …rst inequality (1.4) (Armijo condition ([5]), ensures that the function is reduced su¢ ciently, and the second prevents the from being too small. We will call these two relations the Wolfe conditions. One can also choose k satisfying the following conditions: T (1.6) f(xk + kdk) f(xk) +  kg dk  k T (1.7) f(xk + kdk) f(xk) + (1 ) kg dk  k 1 where 0 <  < 2 : (1.6) and (1.7) are called Goldstein conditions. This line search strategy allows us to estabilish the following result due to Zoutendijk. This result was essentially proved by Zoutendijk ([39]) and Wolfe ([35], [36]). The starting point of the algorithm is denoted by x1: Theorem 1. ([39], [35], [36]). Suppose that f is bounded below in IRn and that f is continuously di¤erentiablein a neighborhood of the level set N = x : f(x) f(x1) : Assume also that the gradient is Lipschitz continuous, i.e.,L f there exists a constantg L > 0 such that (1.8) g(x) g(y) L x y k k  k k for all x; y . Consider any iteration of the form (1.2), where dk is a 2 N descent direction and k satis…es the Wolfe conditions (1.4), (1.5). Then

1 2 2 (1.9) cos (k) gk < k k 1 k=1 X We shall call inequality (1.9) the Zoutendijk condition. Suppose that an iteration of the form (1.2) is such that

(1.10) cos(k)  > 0  for all k: Then we conclude that from (1.9) that

(1.11) lim gk = 0: k !1 k k The steepest descent method is one of the simplest and the most fundamen- tal minimization methods for unconstrained optimization. Since it uses the negative gradient as its descent direction, it is also called the gradient method. 2734 Hakima Degaichia and Rachid Benzine

For many problems, the steepest descent method is very slow. Although the method usually works well in the early steps, as a stationnary point is approached, it descends very slowly with zigzaguing phenomena. There are some ways to overcome these di¢ culties of zigzagging by de‡eting the gradient. Rather then moving along dk = f(xk), we can move along r dk = Dk f(xk) ([8],[9],[10], r [12]; [13]; [14]; [15]; [17]; [21]; [22]; [23]; [24]; [25]; [28]; [31]; [32]; [33])); oralong dk = gk + hk ([18], [19],[20],[21],[27], [29],[30]), where Dk is an appropriate matrix and hk is an appropriate vector. In [16] and in [33], Benzine, Djeghaba and Rahali provided another solution to this problem by accelerating the convergence of the gradient method. They achieved this goal by designing a new algorithm, named the epsilon steepest descent algorithm, in which the Florent Cordellier Wynn epsilon al- gorithm ([11], [37], [38]) with exact and Armijo inexact line searches play a prominent role. In this work we accelerate the convergence of the gradient method by using the Florent Cordellier epsilon algorithm ([11]) and the Wolfe inexact line search ([35], [36]). We study the global convergence of the new algorithm, named the Wolfe epsilon steepest descent algorithm, by using Wolfe inexact line searches (1.4), (1.5) ([35], [36]). Numerical tests show that our algorithm accelerates the convergence of the gradient method and is at least as e¢ cient as the other Epsilon steepest descent algorithms ([16], [33]). This paper is organized as follows. In the next section, the New algorithm is stated and descent property is presented. The global convergence of the new alghorithm is established. Numerical results are presented in Section 5.

2. THE EPSILON ALGORITHM The Epsilon Algorithm is due to P. Wynn ([37], [38]). Given a sequence x ; x n: The coordinates of x will be noted as k k N k R k follows: f g 2 2 1 2 i n n xk = (xk; xk; :::; xk; :::; xk ) R 2 For i 1; 2; :::; n ; the Epsilon Algorithm calculates quantities with two 2k;i f g indices "j (j; k = 0; 1; :::) as follows:

k;i k;i i (2.1) " 1 = 0 "0 = xk k = 0; 1; ::: k;i k+1;i 1 "j+1 = "j 1 + k+1;i k;i j; k = 0; 1; ::: " " j j For i 1; 2; :::; n ; these numbers can be placed in an array as follows: 2 f g The Wolfe epsilon steepest descent algorithm 2735

0;i " 1 = 0 0;i i "0 = x0 1;i 0;i " 1 = 0 "1 1;i i 0;i "0 = x1 "2 2;i 1;i 0;i " 1 = 0 "1 "3 2;i i 1;i 0;i "0 = x2 "2 "4 3;i 2;i 1;i 0;i " 1 = 0 "1 "3 "5 3;i i 2;i 1;i 0;i "0 = x3 "2 "4 "6 4;i 3;i 2;i 1;i 0;i " 1 = 0 "1 "3 "5 "7 4;i i 3;i 2;i 1;i "0 = x4 "2 "4 "6 5;i 4;i 3;i 2;i 1;i " 1 = 0 "1 "3 "5 "7 5;i i 4;i 3;i 2;i "0 = x5 "2 "4 "6 6;i 5;i 4;i 3;i 2;i " 1 = 0 "1 "3 "5 "7 6;i i 5;i 4;i 3;i "0 = x6 "2 "4 "6 7;i 6;i 5;i 4;i 3;i " 1 = 0 "1 "3 "5 "7 This array is called the " array. In this array the lower index denotes a column while the upper index denotes a diagonal. For i 1; 2; :::; n ; the Epsilon algorithm relates the numbers located at the four2 vertices f of ag rhombus: k;i "j k+1;i k;i "j 1 "j+1 k+1;i "j k;i k+1;i To calculate the quantities "j+1, we need to know the numbers "j 1 ; k+1;i k;i "j and "j :

3. THE WOLFE EPSILON STEEPEST DESCENT ALGORITHM k;i To construct our algorithm, we use the column "2 (i = 1; 2; :::; n): Given a sequence xi (i = 1; 2; :::; n); Aitken and F. Cordellier ([1], [6], [11]) k k N proposed anotherf g 2 formula to calculate the epsilon algorithm of order 2. The k;i quantities "2 can be calculated as follows 1 1 1 (3.1) "k;i = xi + ; (i = 1; 2; :::; n) 2 k+1 xi xi xi xi  k+2 k+1 k+1 k  k;i i i i To calculate "2 ; we use the elements xk; xk+1 and xk+2 (i = 1; 2; :::; n): Nu- merical calculations ([11]) showed that the epsilon algorithm of order 2 with the Cordellier formula (3.1) is more stable than Wynn epsilon algorithm (2.1). We are now in measure to introduce our new algorithm: the Armijo epsilon steepest descent algorithm. The Wolfe epsilon steepest descent algorithm 2736 Hakima Degaichia and Rachid Benzine

n Initialization step: Choose an initial point x0 R an initial point. The 2 coordinates of x0 will be noted as follows: 1 2 i n n x0 = (x0; x0; :::; x0; :::; x0 ) R 2 Let k = 0 and go to Main step.

Main Step: Starting with the vector xk; 1 2 i n xk = (xk; xk; :::; xk; :::; xk ):

If f(xk) = 0; stop. Otherwise, let should be rk = xk and compute the kr k vectors sk and tk by using twice the steepest descent algorithm, with Wolfe inexact line search sk = rk k f(rk); r and tk = sk f(sk); kr k and k are positive scalars obtained by the Wolfe inexact line search (1.4) and (1.5). If 1 1 si ri = 0; ti si = 0 and = 0; i = 1; :::n k k 6 k k 6 ti si si ri 6 k k k k Let 1 1 1 "k;i = si + ; i = 1; ::::n; 2 k ti si si ri  k k k k  and k k;1 k;i k;n "2 = "2 ; :::; "2 ; :::"2 : k k   If f "2 < f(tk); let xk = "2: Replace k by k + 1 and go to main step. k If f " f(tk) or if 2  1 1 i0 i0 i0 i0 s r = 0 or t s = 0 or = 0; i0 1; :::; n : k k k k ti0 si0 si0 ri0 2 f g k k k k Let xk = tk: Replace k by k + 1 and go to Main step.

4. GLOBAL CONVERGENCE OF THE WOLFE EPSILON STEEPEST DESCENT ALGORITHM

The foregoing preparatory results enable us to estabilish the following the- orem

Theorem 2. For the unconstrained minimization problem (1.1), we let x1 be a starting point of the Wolfe epsilon steepest descent Algorithm, and as- sume that the following assumptions hold: the f is bounded below in IRn and that f is continuously di¤erentiablein a neighborhood of the level set N The Wolfe epsilon steepest descent algorithm 2737

= x : f(x) f(x1) : Assume also that the gradient is Lipschitz continuous, i.e.,L f there exists a constantg L > 0 such that g(x) g(y) L x y k k  k k for all x; y . Then, the sequence x generated by the Wolfe epsilon k k N 2 N f g 2 steepest descent algorithm must satisfy one of the properties: f(xk ) = 0 for r 0 some k0 N or f(xk) 0: k 2 kr k !!1 Proof. Suppose that an in…nite sequence x is generated by the Wolfe k k N epsilon steepest descent Algorithm. Accordingf g to2 the Algorithm, the vectors sk and tk are obtained by using twice the steepest descent method, with Wolfe inexact line search. Then we have

f(sk) < f(rk) = f(xk) and

f(tk) < f(sk) k Now, by considering the Algorithm, if the calculation of "2 is possible, two cases are possible: k a) f "2 < f (tk) : Then we have k  f (xk+1) = f "2 < f (xk) k k b) f "2 f (tk) or if the calculation of"2 is not possible: In this case and according to the algorithm we have  f (xk+1) = f(tk) < f (xk) : In conclusion the Wolfe epsilon steepest descent Algorithm guarantees

(4.1) f (xk+1) < f (xk) ; k = 0; 1; 2; ::: k In the other hand, "2 can be written as follows k (4.2) "2 = xk+1 = xk + kdk; where dk and k depend on sk; rk; tk; k and k: (4.1) implies that dk is a descent direction. Then by theorem 1, we have

1 2 2 (4.3) cos (k) gk < k k 1 k=1 X The Wolfe Epsilon steepest descent Algorithm is similar to the steepest descent algorithm, then it is not di¢ cult to prove that

(4.4) cos(k)  > 0  for all k: Then we conclude that from (4.3) that

(4.5) lim gk = 0: k !1 k k  2738 Hakima Degaichia and Rachid Benzine

5. NUMERICAL RESULTS AND COMPARISONS In this section we report some numerical results obtained with an imple- mentation of the Wolfe Epsilon Steepest Descent algorithm. For our numer- ical tests, we used test functions and Fortran programs from ([2]). Consid- ering the same criterias as in ([3]), the code is written in Fortran and com- piled with f90 on a Workstation Intel Pentium 4 with 2 GHz. We selected a number of 52 unconstrained optimization test functions in generalized or ex- tended form [2] (some from CUTE library [4]). For each test function we have taken twenty (20) numerical experiments with the number of variables increas- ing as n = 2; 10; 30; 50; 70; 100; 300; 500; 700; 900; 1000; 2000; 3000; 4000; 5000; 6000; 7000; 8000; 9000; 10000: The algorithm implements the Wolfe line search conditions (1.4), (1.5) ([35], [36]), and the same stopping criterion f (xk) < 6 kr k 10 : In all the algorithms we considered in this numerical study the maximum number of iterations is limited to 100000. The comparisons of algorithms are given in the following context. Let ALG1 ALG2 fi and fi be the optimal value found by ALG1 and ALG2, for prob- lem i = 1; :::; 962; respectively. We say that, in the particular problem i; the performance of ALG1 was better than the performance of ALG2 if:

ALG1 ALG2 3 f f < 10 i i

and the number of iterations, or the number of function-gradient evaluations, or the CPU time of ALG1 was less than the number of iterations, or the number of function-gradient evaluations, or the CPU time corresponding to ALG2, respectively. In the set of numerical experiments we compare Wolfe Epsilon Steepest De- scent algorithm versus Armijo Epsilon Steepest descent, Exact Epsilon steepest descent algorithm and Steepest descent algorithm. Figure 1 shows the Dolan and Moré CPU performance pro…le of Wolfe Epsilon Steepest Descent algo- rithm versus Armijo Epsilon Steepest descent, Exact Epsilon steepest descent algorithm and Steepest descent algorithm. In a performance pro…le plot, the top curve corresponds to the method that solved the most problems in a time that was within a factor  of the best time. The percentage of the test problems for which a method is the fastest is given on the left axis of the plot. The right side of the plot gives the percentage of the test problems that were successfully solved by these algorithms, respectively. Mainly, the right side is a measure of the robustness of an algorithm. When comparing Wolfe Epsilon Steepest Descent algorithm to Armijo Epsilon Steepest descent, Exact Epsilon steepest descent algorithm and Steepest descent algorithm, subject to CPU time metric, we see that Armijo Wolfe Steepest Descent algorithm is top performer. The Wolfe Epsilon Steepest Descent algorithm is more successful than Armijo Epsilon Steepest descent, The Wolfe epsilon steepest descent algorithm 2739

Exact Epsilon steepest descent algorithm and Steepest descent algorithm.

fig1

References [1] Aitken, A.C. "On Bernoulli’s numerical solution of algebraic equations"; Proc. Roy. Soc. Edinburgh, 46, pp.289-305, 1926. [2] Andrei, N. An unconstrained optimization test functions collection. Advanced Modeling and Optimization, 10 (2008), pp. 147-161. [3] Andrei, N. Another conjugate gradient algorithm for unconstrained optimization. Annals of Academy of Romanian Scientists, Series on Science and Technology of Information, vol. 1, nr.1, 2008, pp.7-20 [4] Bongartz, I, Conn, A.R., Gould, N.I.M., Toint, P.L., "Cute: constrained and uncon- strained testing environments", ACM transactions on Mathematical software 21(1995) 123-160 [5] Armijo, L., " Minimization of functions having Lipschitz continuous …rst partial deriv- atives," Paci…c J. Mathematics, 16, 1, pp. 1-3, 1966 [6] Brezinski, C. “ Acceleration de la convergence en analyse numérique ” Lecture Notes in Mathematics, 584, Springer Verlag (1977) [7] Bongartz, I., Conn, A.R., Gould, N.I.M. and Toint, P.L. CUTE: constrained and un- constrained testing environments, ACM Trans. Math. Software, 21, pp.123-160, 1995. [8] Broyden, C.G., "Quasi-Newton Methods and their application to Function Minimiza- tion" Mathematics of Coputation, 21, pp. 368-381, 1967 [9] Broyden, C.G., " The convergence of a class of double rank minimization algorithms 2. The new algorithm." J. Institute of Mathematics and its applications, 6, pp. 222-231, 1970. [10] Broyden, C.G. , Dennis, J.E. Jr. and Moré, J .J. “On the local and superlinear conver- gence of quasi-Newton methods, J .Inst. Math.Appl . ,12 (1973) 223-246 . [11] Cordellier, F. " Transformations de suites scalaires et vectorielles” Thèse de doctorat d’étatsoutenue à l’université de Lille I, 1981 2740 Hakima Degaichia and Rachid Benzine

[12] Davidon, W.C. ”Variable Metric Method for Minimization”, AEC research Develop- ment, Report ANL-5990, 1959. [13] Dennis, J.E. Jr and Moré, J.J. “A characterization of superlinear convergence and its application to quasi-Newton methods,” Math.Comp. 28, (1974), 549-560. [14] Dennis, J.E. and Moré, JJ. “Quasi-Newton methods, motivation and theory”, SIAM .Rev. 19 (1977) 46-89. [15] Dixon, L .C .W. “Variable metric algorithms : necessary and su¢ cient conditions for identical behavior on nonquadratic functions”, J.Opt.Theory Appl. 10 (1972) 34-40 [16] Djeghaba, N. and Benzine, R. ”Accélération de la convergence de la méthode de la plus forte pente”, Demonstratio Mathematica.Vol.39, N1(2006), pp.169-181. [17] Fletcher, R. "A new approch to Variable Metric Algorithms" Computer Journal, 13, pp. 317-322, 1970 [18] Fletcher, R, “Practical methods of Optimization,” Second Edition , John Wiley & Sons , Chichester ,1987. [19] Fletcher, R. “An overview of unconstrained optimization”, in Algorithms for Continu- ous Optimization : the State of Art, E .Spedicato, ed., Kluwer Academic Publishers, 1994 . [20] Fletcher, R. and M. Reeves. "Function minimization by conjugate ". Computer J.7, pp.149-154, 1964. [21] Fletcher, R. and POWELL, MM., ”A rapidly Convergent Descent Method for Mini- mization”, Computer Journal, 6 pp.163-168, 1963. [22] Forsythe G.E. ’"On the asymptotic directions of the s-dimentional optimum gradient method," Numerische Mathematik 11, pp. 57-76 [23] Gill, P.E. and Marray, W. “Quasi-Newton Methods for unconstrained optimization,” J.Inst. Maths applics, vol 9, pp 91-108, 1972 . [24] Griewank, A., “The global convergence of partitioned BFGS on problems with convex decompositions and Lipschitz gradients”, Math. Prog. 50 (1991) 141-175 . [25] Goldfarb, D., " A Family of Variable Metric Methods Derived by Variational Means," Mathematics of Computation, 24, pp. 23-26, 1970. [26] Goldstein, A. A., and Price, J.F., "An e¤ective Algorithm for Minimization" Nu- merische Mathematik, 10, pp. 184-189, 1967 [27] Hestenes, M.R and E. Stiefel. "Methods of conjugate gradients for solving linear sys- tems". J.Res. Nation-bureau Stand. 49, N6, 409-436, 1952 [28] Nocedal, J. and Wright, S.J. "Numerical Optimization" Springer. Second edition. 2006. [29] Polak, E. and G. Ribiere "Note sur la convergence de méthodes de directions con- juguées" Revue Française Informatique et Recherche opérationnelle, 16, pp.35-43, 1969. [30] Polyak, B.T., " The Method of Conjugate Gradient in Extremum Problems", USSR Computational Mathematics and Mathematical Physics(English Translation), 9pp. 94- 112, 1969. [31] Powell, .M.J.D, “On the convergence of the variable metric algorithms,” J.Inst. Math .Appl. 7 (1971), 21-36. [32] Powell, .M.J.D, “Some global convergence properties of variable metric algorithms for minimization without exact line searches,” in , SIAM -AMS Proceedings, VOL.IX, R . W .Cottle, and C .E Lemke, eds, SIAM 1976. [33] Rahali, N., Djeghaba, N., Benzine, R "Global Convergence of the Armijo Epsilon Steepest Descent Algorithm." Rev. Anal. Numer. Theor. Approx. 41 (2012), no. 2, 169-180(2013 [34] Shanno, D.F., ”Conditionning of quasi-Newton Methods for function minimization ”, Mathematics of Computation, 24, pp. 641-656, 1970. [35] Wolfe, P ."Convergence conditions for ascent méthods". Siam Review ,11,pp.226-235 ,(1969). The Wolfe epsilon steepest descent algorithm 2741

[36] Wolfe, P ."Convergence conditions for ascent méthods II: Some corrections" Siam Re- view ,13,pp.185-188 ,(1971). [37] Wynn, P. "On a device for computing the em(Sn) transformation", M.T.A.C., 10(1956), 91-96. [38] Wynn, P., "Upon systems of recursions which obtain among quotients of the Padé table," Numer. Math., 8(1966) pp. 264-269. [39] G. Zoutendijk, Nonlinear programming computational methods, in: J. Abadie (Ed.), Integer and Nonlinear Programming, North-Holland, Amsterdam, 1970, pp. 37–86.

Received: March 6, 2014