1 A Wolfe algorithm for vector optimization 2 3 4 L. R. LUCAMBIO PÉREZ and L. F. PRUDENTE, Universidade Federal de Goiás, Brazil 5 6 In a recent paper, Lucambio Pérez and Prudente extended the Wolfe conditions for the vector-valued optimization. Here, we propose a 7 line search algorithm for finding a step size satisfying the strong Wolfe conditions in the vector optimization setting. Well definedness 8 and finite termination results are provided. We discuss practical aspects related to the algorithm and present some numerical 9 experiments illustrating its applicability. Codes supporting this paper are written in Fortran 90 and are freely available for download. 10 11 CCS Concepts: • Mathematics of computing → Mathematical software; Continuous optimization; 12 Additional Key Words and Phrases: line search algorithm, Wolfe conditions, vector optimization 13 14 ACM Reference Format: 15 L. R. Lucambio Pérez and L. F. Prudente. 2019. A Wolfe line search algorithm for vector optimization. ACM Trans. Math. Softw. 1, 1 16 (June 2019), 23 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 17 18 19 1 INTRODUCTION 20 In the vector optimization, several objectives have to be minimized simultaneously with respect to a partial order 21 induced by a closed convex pointed cone with non-empty interior. The particular case where the cone is the so-called 22 23 Pareto cone, i.e. the positive orthant, corresponds to the multiobjective optimization. Many practical models in different 24 areas seek to solve multiobjective and vector optimization problems, see for example [De et al. 1992; Fliege and Vicente 25 2006; Fliege and Werner 2014; Gravel et al. 1992; Hong et al. 2008; Leschine et al. 1992; Tavana 2004; Tavana et al. 26 2010; White 1998]. Recently, the extension of iterative methods for scalar-valued to multiobjective and vector-valued 27 28 optimization has received considerable attention from the optimization community. Some works dealing with this 29 subject include the steepest descent and projected [Chuong and Yao 2012; Cruz et al. 2011; Fliege and Svaiter 30 2000; Fukuda and Graña Drummond 2011, 2013; Graña Drummond and Iusem 2004; Graña Drummond and Svaiter 31 2005; Miglierina et al. 2008], conjugate gradient [Lucambio Pérez and Prudente 2018], Newton [Carrizo et al. 2016; 32 33 Chuong 2013; Fliege et al. 2009; Graña Drummond et al. 2014; Lu and Chen 2014], Quasi-Newton [Ansary and Panda 34 2015; Qu et al. 2011, 2013], subgradient [Bello Cruz 2013], interior point [Villacorta and Oliveira 2011], and proximal 35 methods [Bonnel et al. 2005; Ceng et al. 2010; Ceng and Yao 2007; Chuong 2011; Chuong et al. 2011]. 36 Line search and techniques are essential components to globalize nonlinear descent optimization 37 38 algorithms. Concepts such as Armijo and Wolfe conditions are the basis for a broad class of minimization methods 39 that employ line search strategies. Hence, the development of efficient line search procedures that satisfy these kind 40 of conditions is a crucial ingredient for the formulation of several practical algorithms for optimization. This is a 41 well-explored topic in classical optimization, see [Al-Baali and Fletcher 1986; Dennis and Schnabel 1996; Fletcher 2013; 42 43 Authors’ address: L. R. Lucambio Pérez, [email protected]; L. F. Prudente, [email protected], Universidade Federal de Goiás, Instituto de Matemática e Estatística, 44 Avenida Esperança, s/n, Campus Samambaia, 74690-900, Goiânia, GO, Brazil. 45 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not 46 made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights forcomponents 47 of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to 48 redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 49 © 2019 Association for Computing Machinery. 50 Manuscript submitted to ACM 51 52 Manuscript submitted to ACM 1 2 L. R. Lucambio Pérez and L. F. Prudente

53 Hager 1989; Hager and Zhang 2005; Lemaréchal 1981; Moré and Sorensen 1982; Moré and Thuente 1994]. In particular, 54 we highlight the paper [Moré and Thuente 1994] where Moré and Thuente proposed an algorithm designed to find a 55 step size satisfying the strong scalar-valued Wolfe conditions. On the other hand, the Wolfe conditions were extended 56 57 to the vector optimization setting only recently in [Lucambio Pérez and Prudente 2018]. In this work, the authors also 58 introduced the Zoutendijk condition for vector optimization and proved that it holds for a general descent line search 59 method that satisfies the vector-valued Wolfe conditions. The Zoutendijk condition is a powerful tool for analyzingthe 60 convergence of line search methods. These extensions opened perspectives for the formulation of new algorithms for 61 62 vector optimization problems. 63 In the present paper, we take a step beyond [Lucambio Pérez and Prudente 2018] from the perspective of actual 64 implementations of line search strategies. We propose a line search algorithm for finding a step size satisfying the strong 65 vector-valued Wolfe conditions. At each iteration, our algorithm works with a scalar function and uses an inner solver 66 67 designed to find a step size satisfying the strong scalar-valued Wolfe conditions. In the multiobjective optimization case, 68 such scalar function corresponds to one of the objectives. Assuming that the inner solver stops in finite time, we show 69 that the proposed algorithm is well defined and terminates its execution in a finite number of (outer) iterations. Inour 70 implementation, we use the algorithm of Moré and Thuente as the inner solver. Its finite termination is guaranteed 71 72 except in pathological cases, see [Moré and Thuente 1994]. It should be mentioned that an early version of the algorithm 73 proposed here was used in the numerical experiments of [Lucambio Pérez and Prudente 2018], where a family of 74 conjugate methods for vector optimization problems was introduced and tested. We present also a theoretical 75 result: using the convergence result of our algorithm, we improve the theorem in [Lucambio Pérez and Prudente 2018] 76 77 on the existence of step sizes satisfying the vector-valued Wolfe conditions. We also discuss some practical aspects 78 related to our implementation of the proposed algorithm. In particular, using the geometry of the image set under F in 79 a multiobjective optimization problem, we show that a convex quadratic objective can be neglected simplifying the 80 problem. Numerical experiments illustrating the applicability of the algorithm are presented. Our codes are written in 81 82 Fortran 90, and are freely available for download at https://lfprudente.ime.ufg.br/. 83 This paper is organized as follows. In Section2 we provide some useful definitions, notations and preliminary results. 84 In Section3 we describe the Wolfe line search algorithm for vector optimization and present its convergence analysis. 85 Section4 is devoted to practical aspects related to the algorithm. Section5 describes the numerical experiments. Finally 86 87 we make some comments about this work in Section6. 88 m 89 Notation. The symbol ⟨·, ·⟩ is the usual inner product in R and ∥·∥ denotes the Euclidean norm. If K = (k1,k2,...) ⊆ N 90 (with k < k for all j), we denote K ⊂ N. j j+1 ∞ 91 92 93 2 PRELIMINARIES 94 A closed, convex, and pointed cone K ⊂ Rm with non-empty interior induces a partial order in Rm as following 95 96 u ⪯K v (u ≺K v) if and only if v − u ∈ K (v − u ∈ int(K)), 97 98 where int(K) denotes the interior of K. In vector optimization, given a function F : Rn → Rm, we seek to find a 99 n n 100 K-efficient or K-optimal point of F, that is, a point x ∈ R for which there is no other y ∈ R with F (y) ⪯K F (x) and 101 F (y) , F (x). We denote this problem by 102 n 103 MinimizeK F (x), x ∈ R . (1) 104 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 3

105 When F is continuously differentiable, a necessary condition for K-optimality of x ∈ Rn is 106 107 − int(K) ∩ Image(JF (x)) = ∅, (2) 108 m 109 where JF (x) is the Jacobian of F at x and Image(JF (x)) denotes the image on R by JF (x). A point x satisfying (2) is n 110 called K-critical. If x is not K-critical, then there is d ∈ R such that JF (x)d ∈ −int(K). In this case, it is possible to show 111 that d is a K-descent direction for F at x, i.e., there exists T > 0 such that 0 < t < T implies that F (x + td) ≺K F (x), 112 see [Luc 1989]. 113 ∗ ∈ m | ⟨ ⟩ ≥ ∀ ∈ ⊂ ∗ \{ } 114 The positive polar cone of K is defined by K B w R w,y 0, y K . Let C K 0 be a compact set 115 such that ( ) 116 K∗ = cone(conv(C)), (3) 117 118 where conv(C) denotes the convex hull of C, and cone(conv(C)) is the cone generated by conv(C). When K is polyhedral, ∗ 119 K is also polyhedral and the finite set of its extremal rays is such that (3) holds. For example, in multiobjective 120 m ∗ m optimization, where K = R+ , we have K = K and C can be defined as the canonical basis of R . If K is described as 121 p intersection of subspaces, i.e., K = {z | Az ≥ 0}, where A ∈ Rp×m, we have K∗ = {y | y = AT µ, µ ∈ R } which implies 122 +  ∗ 123 that a possible choice of C is the set of row-vectors of A. For a generic cone K, we may take C = w ∈ K | ∥w ∥ = 1 . m 124 Assume that F is continuously differentiable. Define φ : R → R by 125 126 φ(y) B sup ⟨w,y⟩ | w ∈ C , 127 ( ) and f : Rn × Rn → R by 128  129 f (x,d) B φ(JF (x)d) = sup ⟨w, JF (x)d⟩ | w ∈ C . 130 Since C is compact, φ and f are well-defined. The function φ provides characterizations of −K and −int(K), because 131 132 −K = y ∈ Rm | φ(y) ≤ 0 and − int(K) = y ∈ Rm | φ(y) < 0 . 133 ( ) ( ) 134 In its turn, f gives a characterization of K-descent directions of F at x, in the sense that d is a K-descent direction for F 135 at x if and only if f (x,d) < 0, see [Graña Drummond and Svaiter 2005]. 136 Let d ∈ Rn be a K-descent direction for F at x, and C as in (3). It is said that α > 0 satisfies the standard Wolfe 137 138 conditions if F (x + αd) ⪯K F (x) + ρα f (x,d)e, 139 (4) 140 f (x + αd,d) ≥ σ f (x,d), 141 where 0 < ρ < σ < 1 and e ∈ K is a vector such that 142 143 0 < ⟨w, e⟩ ≤ 1 for all w ∈ C. (5) 144 145 The strong Wolfe conditions require α > 0 to satisfy 146 F (x + αd) ⪯ F (x) + ρα f (x,d)e, (6a) 147 K 148 f (x + αd,d) ≤ σ f (x,d) , (6b) 149

150 see [Lucambio Pérez and Prudente 2018, Definition 3.1]. It is easy to see that if α > 0 satisfies the strong Wolfe conditions, 151 then the standard Wolfe conditions also holds for α. 152 Observe that e B e˜/γ , where e˜ is any vector belonging to int(K) and γ is the maximum value of the continuous 153 154 linear function ⟨w, e˜⟩ over the compact set C, fulfills (5), see [Lucambio Pérez and Prudente 2018]. For multiobjective m m T m 155 optimization, where K = R+ and C is given by the canonical basis of R , we may take e = [1,..., 1] ∈ R . It is 156 Manuscript submitted to ACM 4 L. R. Lucambio Pérez and L. F. Prudente

157 worth mentioning that if α > 0 and 158 159 ⟨w, F (x + αd)⟩ ≤ ⟨w, F (x)⟩ + αρ f (x,d) for all w ∈ C, 160 ∈ 161 then α satisfies condition (6a) for any e K such that (5) holds. 162 In [Lucambio Pérez and Prudente 2018, Proposition 3.2] the authors showed that if C is finite, x is non-critical, d is a 163 K-descent direction for F at x, and F is bounded below, in the sense that there exists F ∈ Rm such that 164 165 F (x + αd) ⪰K F for all α > 0, 166 167 then there exist intervals of positive step sizes satisfying the standard and strong Wolfe conditions. In the next section 168 we will improve this result by showing that the boundedness of ⟨w, F (x + αd)⟩, for one w ∈ C, is sufficient to guarantee 169 the existence of step sizes satisfying the Wolfe conditions. 170 Now consider a general line search method for vector optimization 171 172 ℓ+1 ℓ ℓ x = x + αℓd , ℓ ≥ 0 (7) 173 ℓ ℓ 174 where d is a K-descent direction for F at x , and the step size αℓ satisfies the standard Wolfe conditions (4). Under 175 mild assumptions, the general method (7) satisfies the vector Zoutendijk condition given by 176 2 ℓ ℓ 177 X f (x ,d ) < ∞, 178 ∥dℓ ∥2 ℓ ≥0 179 180 see [Lucambio Pérez and Prudente 2018, Proposition 3.3]. This condition can be used to derive convergence results 181 for several line search algorithms. Particularly, in [Lucambio Pérez and Prudente 2018], the concepts of Wolfe and 182 Zoutendijk conditions were widely used in the study of nonlinear conjugate gradient methods in the vector optimization 183 setting. 184 185 186 3 THE WOLFE LINE SEARCH ALGORITHM 187 Hereafter, we assume that F is continuous differentiable, d ∈ Rn is a K-descent direction for F at x, and C 188 B ∗ 189 {w1,...,wp } ⊂ K \{0} is such that ∗ 190 K = cone(conv({w1,...,wp })). 191 I { ,..., } : R → R 192 Let B 1 p be the set of indexes associated with the elements of C. Define ϕi by 193 ϕi (α) B ⟨wi , F (x + αd)⟩, (8) 194 195 for all i ∈ I, and 196 ′ ′ ϕ (0) B max{ϕ (0) | i ∈ I}. 197 max i ′ ′ 198 Observe that f (x + αd,d) = max{ϕi (α) | i ∈ I}, and so ϕmax(0) = f (x,d). 199 Our algorithm seeks to find a step size α > 0 such that 200 ′ 201 ϕi (α) ≤ ϕi (0) + αρϕmax(0) for all i ∈ I, (9a) 202 { ′ | ∈ I} ≤ − ′ 203 max ϕi (α) i σϕmax(0). (9b) 204 It is easy to verify that any α > 0 satisfying conditions (9) also satisfies the strong Wolfe conditions (6) with any e ∈ K 205 206 such that (5) holds. 207 The flowchart of Algorithm1 in Figure1 helps to understand the structure of the line search procedure. 208 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 5

209 Algorithm 1: The Wolfe line search algorithm for vector optimization 210 Let 0 < ρ < ρ¯ < σ¯ < σ < 1, δ > 1, I = {1,...,p}, and αmax > 0. 211 Step 0. 212 Initialization ∈ , brackt ← false ← 213 Choose α0 (0 αmax). Set , and initialize k 0. Step 1. 214 Stopping criteria ≤ ′ ∈ I { ′ | ∈ I} ≤ − ′ 215 1.1. If ϕi (αk ) ϕi (0) + αk ρϕmax(0) for all i , and max ϕi (αk ) i σϕmax(0), then stop with α = αk 216 declaring convergence.

217 1.2. If αk = αmax, 218 ′ ′ ′ ϕi (αmax) ≤ ϕi (0) + αmaxρϕmax(0) and ϕi (αmax) < σϕmax(0), (10) 219 for all i ∈ I, then stop with α = αmax declaring warning. 220 Step 2. Solving the subproblem 221 If there exists i ∈ I such that 222 ′ ′ ′ 223 ϕi (αk ) > ϕi (0) + αk ρϕmax(0) or ϕi (αk ) > −σϕmax(0), (11) 224 then: 225 2.1. brackt ← true; 226 2.2. αmax ← αk ; 227 2.3. using an inner solver, compute αk+1 ∈ (0, αmax) satisfying 228 ϕ (α ) ≤ ϕ (0) + α ρϕ¯ ′ (0) and ϕ′ (α ) ≤ −σϕ¯ ′ (0), (12) 229 ik k+1 ik k+1 max ik k+1 max 230 where ik ∈ I is an index such that (11) holds. 231 Step 3. Trying to find an interval that brackets desired step sizes 232 If brackt = false, then choose αk+1 ∈ min{δαk , αmax}, αmax . 233 Step 4. Beginning a new outer iteration f g 234 Set k ← k + 1, and go to Step 1. 235 236 237 238 Let us explain the main steps of Algorithm1. The main reason to require an upper bound αmax > 0 to the trial step 239 sizes is to guarantee finite termination in the presence of unbounded below functions ϕi . At Step 1.1, Algorithm1 tests 240 whether αk satisfies conditions (9). If this is the case, the algorithm stops with α = αk , declaring convergence. This is 241 the stopping criterion related to success, where a step size satisfying the strong Wolfe conditions is found. The second 242 243 stopping criterion, at Step 1.2, tests for termination at αmax. Algorithm1 stops at αk = αmax if conditions (10) hold for 244 all i ∈ I. This is a reasonable stopping criterion because, as we will show as a consequence of Theorems1 and2, it is 245 possible to determine an acceptable step size when these conditions do not hold. In this case, only condition (9a) is 246 verified at α and Algorithm1 finishes with a warning message. 247 248 The boolean variable brackt indicates whether the working interval contains step sizes satisfying conditions (9). If 249 brackt = true at iteration k, then it turns out that (0, αk ) brackets desired step sizes. On the other hand, brackt = false 250 means that there is no guarantee that (0, αk ) contains an acceptable step size. Test (11) at Step 2 is the key for defining 251 brackt. As we will see in the proof of Theorem1, once brackt is set equal to true for the first time, it no longer 252 253 changes and the algorithm always executes Step 2. Variable brackt also controls the flow of Algorithm1. 254 Basically, Algorithm1 works in two stages. The first stage corresponds toa bracketing phase where the algorithm 255 seeks to identify an interval containing desirable step sizes. Given parameter αmax > 0 and α0 ∈ (0, αmax), it initializes 256 with brackt = false and keeps increasing the trial step size until either α , for some k ≥ 0, satisfies conditions (9) or 257 k0 0

258 it is identified that interval (0, αk0 ) brackets desired step sizes. In this process, we require that the upper limit αmax be 259 used as a trial value in a finite number of iterations. In this first stage, brackt is always equal to false and Algorithm1 260 Manuscript submitted to ACM 6 L. R. Lucambio Pérez and L. F. Prudente

261 Choose α0 ∈ 262 (0, αmax); k ← 0. 263 264 265 266 267 268

269 Does αk yes Stop with α = αk 270 satisfy (9)? declaring convergence. 271 272 273 no 274 275 276 Do αk = yes yes 277 αmax and Stop with α = αmax (10) hold declaring warning. 278 ∀i ∈ I? 279 280

281 no 282 283 284 Does (11) Compute αk+1 ∈ 285   hold for Does αk min{δ α , α }, α ; k max max no some no satisfy (9)? 286 k ← k + 1. i ∈ I? 287 288 α ← α ; 289 max k yes compute 290 αk+1 ∈ (0, αmax) 291 satisfying (12); k ← k + 1. 292 293 Fig. 1. Flowchart of Algorithm1. 294 295 296 brackt executes Step 3. Once it has been identified that interval the (0, αk0 ) brackets a desirable step size, variable is 297 set equal to true and the algorithm enters the second stage, called selection phase. In this phase, it always executes 298 Step 2 and, thereafter, the generated sequence {α } is decreasing. At Step 2.3, Algorithm1 works only with one 299 k k ≥k0

300 function ϕik for some ik ∈ I, and the subproblem consists of calculating αk+1 ∈ (0, αk ) satisfying (12). If αk+1 does 301 not satisfy conditions (9), then it is possible to show that (0, αk+1) also contains desirable step sizes, becoming the new 302 working interval. 303 Note that (12) are the strong Wolfe conditions at α for the scalar function ϕ , with the auxiliary parameters 304 k+1 ik ′ ′ 305 0 < ρ¯ < σ¯ < 1 and ϕ (0) instead of ϕ (0), see for example [Nocedal and Wright 2006]. In practical implementations, max ik 306 the calculation of αk+1 at Step 2.3 is performed by means of an iterative scheme. In this section, we assume that the 307 inner solver is able to compute αk+1 ∈ (0, αk ) in a finite number of (inner) steps. In this case, we say that the inner 308 solver has the finite termination property. We will come back to this issue in Section4 and we will see that this property 309 310 is verified, for example, except in pathological cases, for the well-known algorithm of Moré and Thuente[Moré and 311 Thuente 1994]. 312 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 7

313 Lemma1 establishes that condition (9a) is satisfied for small enough step sizes. 314 315 Lemma 1. There exists Λ > 0 such that 316 ′ 317 ϕi (α) ≤ ϕi (0) + αρϕmax(0) for all i ∈ I, (13) 318 ∈ 319 for all α (0, Λ). 320 i ∈ I Λ > 321 Proof. Given , it is well known that there exists i 0 such that 322 ′ ϕi (α) ≤ ϕi (0) + αρϕ (0), 323 i 324 ′ ′ for all α ∈ (0, Λi ), see for example [Bertsekas 1999, Section 1.2]. Since ϕmax(0) ≥ ϕi (0), it follows that 325 ′ 326 ϕi (α) ≤ ϕi (0) + αρϕmax(0), 327 328 for all α ∈ (0, Λi ). Define Λ B min{Λi | i ∈ I}. Then, (13) clearly holds for all α ∈ (0, Λ), completing the proof. □ 329 330 The following lemma is connected with the solvability of the subproblems at Step 2.3. In particular, Lemma2 implies 331 that it is possible to compute αk+1 ∈ (0, αk ) satisfying (12), provided that (11) holds. 332 333 Lemma 2. Let i ∈ I and β > 0. If 334 ′ ′ ′ 335 ϕi (β) > ϕi (0) + βρϕmax(0) or ϕi (β) > −σϕmax(0), (14) 336 then there exist α ∈ (0, β) and a neighborhood V of α such that 337 ∗ ∗ 338 ′ ′ ′ ϕi (α) ≤ ϕi (0) + αρϕ¯ max(0) and ϕi (α) ≤ −σϕ¯ max(0), (15) 339 340 for all α ∈ V . 341 342 Proof. Assume that (14) holds. Then, 343 ′ ′ ′ 344 ϕi (β) > ϕi (0) + βρϕ¯ max(0) or ϕi (β) > −ρϕ¯ max(0), 345 ′ 346 because 0 < ρ < ρ¯ < σ¯ < σ, and ϕmax(0) < 0. Let us define the auxiliary function ψ : R → R by 347 ψ (α) ϕ (α) − ϕ (0) − αρϕ¯ ′ (0). 348 B i i max 349 ′ ′ Suppose that ϕi (β) ≥ ϕi (0) + βρϕ¯ max(0) or, equivalently, ψ (β) ≥ 0. Since ψ (0) = 0 and ψ (0) < 0, by continuity 350 arguments, we have ψ (α) < 0 for α > 0 sufficiently small. Define α˜ ∈ (0, β] by 351 352 α˜ B min α ∈ (0, β] | ψ (α) = 0 . 353 ( ) 354 Note that, by the intermediate value theorem, α˜ is well defined. We clearly have ψ (α) < 0 for α ∈ (0, α˜). Since 355 ′ ψ (0) = ψ (α˜) = 0, by the mean value theorem, there is α∗ ∈ (0, α˜) such that ψ (α∗) = 0. Hence, 356 357 ′ ϕi (α∗) < ϕi (0) + α∗ρϕ¯ max(0) 358 359 ′ ′ and ϕi (α∗) = ρϕ¯ max(0). Thus, 360 ′ ′ ϕi (α∗) < −σϕ¯ max(0), 361 ′ because ϕ (0) < 0 and ρ¯ < σ¯. Therefore, by continuity arguments, there exists a neighborhood V of α∗ such that (15) 362 max 363 holds for all α ∈ V . 364 Manuscript submitted to ACM 8 L. R. Lucambio Pérez and L. F. Prudente

365 ′ ′ ′ ′ Now consider that ϕi (β) > ρϕ¯ max(0) or, equivalently, ψ (β) > 0. Since ψ (0) < 0, by the intermediate value theorem, 366 ′ ′ ′ there is αˆ ∈ (0, β) such that ψ (αˆ) = 0. Thus, ϕi (αˆ) = ρϕ¯ max(0). Hence, 367 368 ′ ′ ϕi (αˆ) < −σϕ¯ max(0) 369 ′ ′ 370 because ϕmax(0) < 0 and ρ¯ < σ¯. If ϕi (αˆ) < ϕi (0 ) + αˆρϕ ¯ max(0), then the thesis holds for α∗ = αˆ. Otherwise, we can 371 proceed as in the first part of the proof to obtain α∗ ∈ (0, αˆ) for which (15) holds. □ 372 373 374 Next we show the well definedness of Algorithm1. We prove that if Algorithm1 does not stop at Step 1, then one and 375 only one step between Step 2 and Step 3 is executed. Furthermore, we show in Theorem1 that, once Step 2 is executed 376 and brackt is set equal to true for the first time, the algorithm also executes Step 2 in the subsequent iterations. 377 378 Theorem 1. Assume that the inner solver has the finite termination property. Then, Algorithm1 is well defined. 379 380 381 Proof. We first observe that it is possible to perform Steps 2 and 3. Algorithm1 executes Step 2 in iteration k when 382 (11) holds for some i ∈ I. In this case, by Lemma2 and the finite termination property of the inner solver, Algorithm1 383 finds αk+1 ∈ (0, αk ) satisfying (12). At Step 3, we simply define αk+1 belonging to the interval min{δαk , αmax}, αmax . 384 In this last case we obtain α > α , because δ > 1. f g 385 k+1 k ∈ N 386 We claim that if there exists k0 for which (11) holds for αk0 , then for k > k0 either the algorithm stops in 387 iteration k or (11) also holds at αk . This means that, once brackt is set equal to true for the first time, the algorithm 388 always executes Step 2. Assume that (11) holds at αk . Thus, Algorithm1 finds αk+1 satisfying (12). If the algorithm 389 does not stop at α , by Step 1.1, we obtain 390 k+1 391 ′ ϕi (αk+1) > ϕi (0) + αk+1ρϕmax(0) for some i ∈ I (16) 392 393 or 394 ′ ′ max{ϕ (αk+1) | i ∈ I} > −σϕ (0). (17) 395 i max 396 It is straightforward to show that (16) implies that (11) holds for α . So, assume that (17) takes place. By (12), we have k+1 397 ϕ′ (α ) ≤ −σϕ¯ ′ (0), 398 ik k+1 max 399 which, together with (17), implies that there exists i ∈ I such that 400 401 ′ ′ ϕi (αk+1) > −σϕmax(0), 402 403 because 0 < σ¯ < σ. Therefore, the test at Step 2 also holds for αk+1. Hence, by an inductive argument, if there exists a 404 first k0 ∈ N for which (11) holds for αk , then (11) also holds for αk with k > k0. 405 0 406 We prove the well definedness by showing that Step 2 or Step 3 is necessarily executed if Algorithm1 does not stop 407 at Step 1. Assume that the algorithm does not stop in iteration k. If αk = αmax, then (11) holds for some i ∈ I and 408 Step 2 is executed. Indeed, if 409 ′ ϕi (αmax) ≤ ϕi (0) + αmaxρϕmax(0) for all i ∈ I, 410 411 then 412 ′ ′ max{ϕi (αmax) | i ∈ I} > −σϕmax(0) (18) 413

414 and ′ ′ 415 ϕi (αmax) ≥ σϕmax(0) for some i ∈ I, (19) 416 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 9

417 by Steps 1.1 and 1.2, respectively. By (18) and (19), there exists i ∈ I such that 418 ′ ′ 419 ϕi (αmax) > −σϕmax(0), 420 421 implying that (11) holds for αmax. Now assume αk < αmax. If (11) holds for αk , then Algorithm1 executes Step 2. By 422 the previous discussion, this is certainly the case if brackt = true. Note that if Step 2 is executed, then brackt is set 423 equal to true and the algorithm goes directly to Step 4. Finally, if brackt = false and the test at Step 2 is not satisfied, 424 then Algorithm1 executes Step 3. Hence, one and only one step between Step 2 and Step 3 is executed, concluding the 425 426 proof. □ 427

428 Theorem2 is our main result. We show that, in a finite number of iterations, Algorithm1 finishes with α = αmax, or 429 finds a step size satisfying the strong Wolfe conditions. 430 431 Theorem 2. Assume that the inner solver has the finite termination property. Then, Algorithm1 terminates the execution 432 433 in a finite number of outer iterations satisfying one of the following conditions: 434 (1) sequence {α } is increasing and Algorithm1 stops with α = α for which conditions (9) hold; 435 k k ≥0 k 436 (2) sequence {αk }k ≥0 is increasing and Algorithm1 stops with α = αmax; 437 ∈ N { } (3) there exists k0 such that sequence αk k ≥k0 is decreasing and Algorithm1 stops with α = αk for which 438 conditions (9) hold. 439 440 Proof. We consider two possibilities: 441 442 (i) the test at Step 2 is not satisfied in all iterations k ≥ 0; 443 (ii) there exists a first k0 ∈ N for which the test at Step 2 holds. 444 445 Consider case (i). In this case, brackt is always equal to false and Algorithm1 executes Step 3 in all iterations 446 k ≥ 0. Since 447 448 αk+1 ∈ min{δαk , αmax}, αmax , (20) 449 f g where δ > 1, it follows that the generated sequence {αk }k ≥0 is increasing. If, eventually for some k, αk satisfies the 450 stopping criterion at Step 1.1, then Algorithm1 stops with α = αk for which conditions (9) hold, satisfying the first 451 452 possibility of the theorem. Otherwise, by (20), the upper bound αmax is used as a trial value in a finite number of 453 iterations. We claim that Algorithm1 stops at Step 1.2 with α = αmax. Indeed, on the contrary, Step 2 is executed as 454 shown in Theorem1. Last case corresponds to the second condition of the theorem. 455 Now consider case (ii). As shown in Theorem1, for k ≥ k0, either Algorithm1 stops at αk or Step 2 is executed in 456 ∈ { } 457 iteration k. Since, by Step 2.3, αk+1 (0, αk ), it turns out that αk k ≥k0 is decreasing. Thus, the stopping criterion of 458 Step 1.1 is the only one that can be satisfied. We prove the finite termination of the algorithm proceeding by contradiction. 459 { } Assume that Algorithm1 iterates infinitely. Thus, αk k ≥k0 converges, because it is a bounded monotonic sequence. 460 Let α∗ ≥ 0 be the limit of {αk }k ≥k . We consider two cases: 461 0

462 (a) α∗ = 0; 463 (b) α∗ > 0. 464 465 Assume case (a). By Lemma1, for sufficiently large k, we have 466 ′ 467 ϕi (αk ) ≤ ϕi (0) + αk ρϕmax(0) for all i ∈ I. 468 Manuscript submitted to ACM 10 L. R. Lucambio Pérez and L. F. Prudente

469 Thus, since Step 2 is executed in iterations k ≥ k0, by (11) and the definition of ik , we obtain 470 ′ ′ 471 ϕ (α ) > −σϕ (0), (21) ik k max 472 for sufficiently large k. Since I is a finite set of indexes, there are j ∈ I and K ⊂ N, such that i = j for all k ∈ K . 473 1 ∞ k 1 474 Hence, by (21) and (12), we have 475 ′ ′ ϕj (αk ) > −σϕmax(0) (22) 476 477 and ′ ′ 478 ϕj (αk+1) ≤ −σϕ¯ max(0), (23) 479 for all k ∈ K1. Taking limit on (22) and (23) for k ∈ K1, by the continuity of ϕj , we obtain 480 481 ′ ′ ′ −σϕmax(0) ≤ ϕj (α∗) ≤ −σϕ¯ max(0), 482 483 concluding that σ ≤ σ¯. We are in contradiction, because σ > σ¯ by definition. 484 Now consider case (b). Since Algorithm1 executes Step 2 for all k ≥ k0, by (11) and the definition of ik , we obtain 485 486 ′ ϕik (αk ) > ϕik (0) + αk ρϕmax(0) (24) 487 488 or 489 ϕ′ (α ) > −σϕ′ (0). (25) ik k max 490 Assume that (24) holds in infinitely many iterations k, k ∈ K ⊂ N. Since I is a finite set of indexes, there are ℓ ∈ I and 491 2 ∞ 492 K ⊂ K , such that i = ℓ for all k ∈ K . Hence, by (24) and (12), we have 3 ∞ 2 k 3 493 ′ 494 ϕℓ (αk ) > ϕℓ (0) + αk ρϕmax(0) (26) 495 496 and ′ 497 ϕℓ (αk+1) ≤ ϕℓ (0) + αk+1ρϕ¯ max(0), (27) 498 for all k ∈ K3. Taking limit on (26) and (27) for k ∈ K3, by the continuity of ϕℓ, we obtain 499 500 ′ ′ ϕℓ (0) + α∗ρϕmax(0) ≤ ϕℓ (α∗) ≤ ϕℓ (0) + α∗ρϕ¯ max(0). 501 ′ 502 Thus ρ ≥ ρ¯, because α∗ > 0 and ϕmax(0) < 0. Hence, we are in contradiction with the definition of ρ¯ and ρ. Finally, 503 if (24) holds only in a finite number of iterations, then (25) is satisfied in infinitely many iterations. In this case, wecan 504 proceed as in case (a) to conclude that σ ≤ σ¯, generating a contradiction with the definition of σ and σ¯. Therefore, the 505 506 proof is complete. □ 507 508 We can rule out the finite termination at αmax whenever 509 ϕ (α ) > ϕ (0) + α ρϕ′ (0) or ϕ′(α ) ≥ σϕ′ (0), (28) 510 i max i max max i max max 511 for some i ∈ I. In this case, by Theorem2, Algorithm1 finds a step size satisfying the strong Wolfe conditions. 512 Geometrically, the first inequality of (28) implies that the graph of ϕ (α) is above the line l (α) ϕ (0) + αρϕ′ (0) at 513 i i B i max 514 α = αmax. If ϕmin is a strict lower bound of ϕi for all i ∈ I, then this condition holds, for example, whenever 515 ϕmax(0) − ϕmin 516 αmax ≥ ′ , −ρϕmax(0) 517 518 where ϕmax(0) B max{ϕi (0) | i ∈ I}. Thus, in practical problems, condition (28) should hold for a large choice of αmax. 519 The second inequality of (28) means that the slope of the graph of ϕi (α) at α = αmax is greater than or equal to the 520 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 11

521 ′ ′ constant value σϕmax(0), which trivially holds if ϕi (αmax) ≥ 0 for some i ∈ I. A similar discussion of this matter for 522 the scalar optimization case can be found in [Moré and Thuente 1994]. 523 Algorithm1 can be used to derive some theoretical results. In this case, at Step 2.3, the existence of α ∈ (0, α ) 524 k+1 max 525 satisfying (12), guaranteed by Lemma2, is sufficient for our analysis, and we can rule out the discussion about theinner

526 solver. Corollary1 justifies our claim that, if the test at Step 2 holdsfor αk , then the interval (0, αk ) brackets desired 527 step sizes. 528 529 Corollary 1. Let β > 0. If, for some i ∈ I, 530 531 ′ ′ ′ ϕi (β) > ϕi (0) + βρϕmax(0) or ϕi (β) > −σϕmax(0), (29) 532 533 then there exists an interval of step sizes contained in (0, β) for which conditions (9) hold, i.e., the interval (0, β) brackets 534 desired step sizes. 535 536 Proof. Set αmax > β, and α0 = β. Let ρˆ, ρ¯, σˆ , and σ¯ be constants such that 537 538 0 < ρ < ρˆ < ρ¯ < σ¯ < σˆ < σ < 1. 539 540 Apply Algorithm1 with the auxiliary constants 0 < ρˆ < ρ¯ < σ¯ < σˆ < 1. By (29), we obtain 541 ′ ′ ′ 542 ϕi (β) > ϕi (0) + βρϕˆ max(0) or ϕi (β) > −σϕˆ max(0), 543 ′ 544 because ρ < ρˆ, σˆ < σ, and ϕmax(0) < 0. Thus, the test at Step 2 holds for α0. Then, by Theorem2, the generated 545 sequence {αk }k ≥0 is decreasing and Algorithm1 finds α ∈ (0, β) satisfying conditions (9a) with ρˆ, and (9b) with σˆ . 546 Hence, 547 ′ ϕi (α) < ϕi (0) + αρϕ (0) for all i ∈ I, 548 max 549 and 550 ′ ′ max{ϕi (α) | i ∈ I} < −σϕmax(0), 551 because ρ < ρˆ, σˆ < σ, and ϕ′ (0) < 0. Therefore, by continuity arguments, there is a neighborhood of α contained in 552 max 553 (0, β) for which conditions (9) hold. □ 554 555 In [Lucambio Pérez and Prudente 2018, Proposition 3.2], given d a K-descent direction for F at x, the authors 556 557 showed that there is an interval of positive step sizes satisfying the standard Wolfe conditions (4) and the strong Wolfe 558 conditions (6), provided that F (x + αd) is bounded below for α > 0. For each i ∈ I, this assumption implies that ϕi is 559 bounded below for α > 0. Proposition1 improves this result. We show that the presence of a single bounded below ϕi 560 is sufficient to guarantee the existence of step sizes satisfying the Wolfe conditions. In the multiobjective optimization 561 562 case, the benefits of this new result are clear: it is sufficient to assume the boundedness of only one objective, incontrast 563 to the boundedness of all objectives required in [Lucambio Pérez and Prudente 2018]. 564 1 565 Proposition 1. Assume that F is of class C , C = {w1,...,wp }, d is a K-descent direction for F at x, and there exists 566 i ∈ I such that ⟨wi , F (x + αd)⟩ is bounded below for α > 0. Then, there is an interval of positive step sizes satisfying the 567 standard Wolfe conditions (4) and the strong Wolfe conditions (6). 568 569 570 Proof. Define li : R → R by ′ 571 li (α) B ϕi (0) + αρϕmax(0). 572 Manuscript submitted to ACM 12 L. R. Lucambio Pérez and L. F. Prudente

′ 573 Since ϕmax(0) < 0, the line li (α) is unbounded below for α > 0. Thus, by the boundedness of ϕi (α), there is a positive β 574 such that ϕi (β) > li (β), i.e., 575 ϕ (β) > ϕ (0) + βρϕ′ (0). 576 i i max 577 Hence, by Corollary1, there exists an interval of step sizes contained in (0, β) for which conditions (9) hold. Hence, 578 the step sizes belonging to this interval satisfy the strong Wolfe conditions (6) and, a fortiori, the standard Wolfe 579 580 conditions (4). □ 581 582 4 IMPLEMENTATION DETAILS 583 584 We implemented the Wolfe line search algorithm described in Section3. In the following we discuss some practical 585 aspects related to our implementation of the proposed algorithm. The codes are written in Fortran 90, and are freely 586 available at https://lfprudente.ime.ufg.br/. 587 588 589 4.1 Initial parameters 590 591 Parameters ρ and σ related to the Wolfe conditions must be set by the user. Usually ρ is chosen to be small, implying 592 that condition (9a) requires a little more than a simple decrease in each function ϕi , i ∈ I. A “traditional” recommended 593 value is ρ = 10−4. As we will see in Section 4.4, when we are dealing with multi-objective optimization problems 594 595 involving quadratic functions, ρ has to be less than or equal to 0.5. An ideal line search should compute a step size α 596 such that 597 ′ max{ϕi (α) | i ∈ I} = 0. 598 599 Such line search is known as exact line search and it is usually expensive and impractical. Note that σ controls how 600 small the left hand side of (9b) must be. Thus, σ is related to the accuracy of the line search. The trade-off between the 601 computational cost and the accuracy of the search should be considered. Thus, the choice of σ is more complicated 602 and may depend on the method used to solve problem (1). Values between 0.1 and 0.9 for σ should be appropriate. 603 604 In [Lucambio Pérez and Prudente 2018], for nonlinear conjugate gradient methods, the authors used σ = 0.1, performing 605 a reasonably accurate line search. The maximum allowed step size αmax is also a customizable parameter. Its value 606 10 should be large in practice, say αmax = 10 . 607 Given ρ and σ such that 0 < ρ < σ < 1, the auxiliary parameters ρ¯ and σ¯ used in the inner solver are set by 608 609 ρ¯ B min{1.1ρ, 0.75ρ + 0.25σ }, 610 611 and 612 σ¯ B max{0.9σ, 0.25ρ + 0.75σ }, 613 614 ensuring that 0 < ρ < ρ¯ < σ¯ < σ < 1. In our implementation, we do not explicitly set the parameter δ. However, the 615 step size updating scheme at Step 3 holds with δ = 1.1, see Section 4.3 below. 616 The initial trial step size α0 is a user-supplied value. Its smart choice can directly benefit the performance of the 617 = 618 line search algorithm. Based on the scalar minimization case, for a Newton-type method, the step α0 1 should be 619 used. On the other hand, for a gradient-type method, a wise choice is to use the vector extension of the Shanno and 620 Phua [Shanno and Phua 1980] recommended choice, i.e., at iteration ℓ of the underlying method (7), take 621 f (x ℓ−1,dℓ−1) 622 α = α . 0 ℓ−1 ℓ ℓ 623 f (x ,d ) 624 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 13

625 The special case where there exists a convex quadratic objective in a multiobjective problem will be discussed separately 626 in Section 4.4. 627 628 629 630 4.2 Inner solver 631 In principle, any solver designed to find a step size satisfying the strong Wolfe conditions for scalar optimization canbe 632 used to solve the subproblems at Step 2.3 of Algorithm1. On the other hand, the finite termination of the inner solver is 633 634 essential for the convergence results showed in the the last section. In our implementation we used the well-known 635 algorithm of Moré and Thuente [Moré and Thuente 1994] as the inner solver. It is worth mentioning that the algorithm 636 of Moré and Thuente is incorporated in some notable works, such as the L-BFGS-B code, see [Zhu et al. 1997]. As we 637 shall see below, this algorithm terminates in a finite number of steps, except for pathological cases. 638 ′ 639 Given a continuously differentiable function ϕ : R → R with ϕ (0) < 0, and constants ρ¯ and σ¯ satisfying 0 < ρ¯ < 640 σ¯ < 1, the algorithm of Moré and Thuente is designed to find α > 0 such that 641 ′ ′ ′ 642 ϕ(α) ≤ ϕ(0) + ραϕ¯ (0), and ϕ (α) ≤ −σϕ¯ (0). (30) 643 {α } 644 The algorithm generates a sequence ℓ ℓ ≥0 fulfilling the bounds 645 0 < αℓ ≤ αmax for all ℓ ≥ 0, 646 647 where αmax > 0 is a given algorithmic parameter. Assume that αmax > 0 is such that 648 ′ ′ ′ 649 ϕ(αmax) > ϕ(0) + αmaxρϕ¯ (0) or ϕ (αmax) > −σϕ¯ (0). (31) 650 651 Therefore, the interval (0, αmax) contains a step size α satisfying (30), see [Moré and Thuente 1994, Theorem 2.1]. 652 Consider the application of the algorithm of Moré and Thuente with an initial iterate α0 ∈ [0, αmax]. Theorem 2.3 653 of [Moré and Thuente 1994] implies that exactly one of the following two alternatives holds: 654 655 656 (1) the algorithm terminates in a finite number of steps with an αℓ ∈ (0, αmax) satisfying (30); 657 (2) the algorithm iterates infinitely generating a sequence {α } ≥ that converges to some α∗ ∈ (0, α ) such that 658 ℓ ℓ 0 max 659 ′ ′ ′ ϕ(α∗) ≤ ϕ(0) + ρα¯ ∗ϕ (0), and ϕ (α∗) = ρϕ¯ (0). 660 ′ 661 Hence, conditions (30) holds at α∗, because ρ¯ < σ¯ and ϕ (0) < 0. Moreover, if 662 ′ ′ 663 ψ (α) B ϕ (α) − ρϕ¯ (0), 664 665 then ψ (α∗) = 0 and ψ changes sign an infinite number of times, in the sense that there exists a monotone 666 sequence {βℓ } that converges to α∗ and such that ψ (βℓ )ψ (βℓ+1) < 0. 667 668 669 Therefore, as claimed by the authors in [Moré and Thuente 1994], the algorithm produces a sequence of iterates that 670 converge to a step size satisfying (30) and, except for pathological cases, as described in alternative (2) above, terminates 671 in a finite number of steps. 672 673 Now consider the problem of finding α > 0 satisfying 674 ϕ(α) ≤ ϕ(0) + ρα¯ Θ, and ϕ′(α) ≤ −σ¯Θ, (32) 675 676 Manuscript submitted to ACM 14 L. R. Lucambio Pérez and L. F. Prudente

677 where Θ is a given constant such that ϕ′(0) ≤ Θ < 0. We claim that the algorithm of Moré and Thuente can be easily 678 adapted to solve this problem. Indeed, taking 679 680 Θ Θ ρˆ B ρ¯ ′ and σˆ B σ¯ ′ , 681 ϕ (0) ϕ (0) 682 we obtain 0 < ρˆ < σˆ < 1, and (32) can be equivalently rewritten as (30) with constants ρˆ and σˆ . Furthermore, if 683 αmax > 0 is such that 684 ′ 685 ϕ(αmax) > ϕ(0) + αmaxρ¯Θ or ϕ (αmax) > −σ¯Θ, (33) 686 then it is possible to show that (0, αmax) contains a step size α satisfying (32), because (33) is equivalent to (31) with ρˆ 687 and σˆ . 688 ′ 689 In the context of Algorithm1, conditions (12) at Step 2.3 can be viewed as (32) with ϕ ≡ ϕik and Θ = ϕmax(0). Hence, 690 the algorithm of Moré and Thuente can be used to solve the subproblems of Algorithm1. Note that, at Step 2.3, ik ∈ I 691 is an index such that (11) holds. Then, it follows that 692 ′ ′ ′ 693 ϕ (α ) > ϕ (0) + α ρϕ¯ (0) or ϕ (α ) > −σϕ¯ (0), ik k ik k max ik k max 694 ′ 695 because ρ < ρ¯ < σ¯ < σ and ϕmax(0) < 0. Thus, the algorithm of Moré and Thuente with an initial (inner) iterate 696 belonging to [0, αk ] finds, except for pathological cases, a step size αk+1 ∈ (0, αk ) satisfying (12) in a finite number of 697 (inner) steps. This allow us to conclude that it is to be expected that the algorithm of Moré and Thuente has the finite 698 699 termination property, as assumed in Theorems1 and2. 700 701 702 4.3 Bracketing phase 703 In the bracketing phase, variable brackt is false and Algorithm1 attempts to identify whether the interval (0, αk ) 704 k 705 contains desirable step sizes by checking conditions (11). If conditions (11) are not fulfilled at iteration , then Step 3 is 706 executed by choosing

707 αk+1 ∈ min{δαk , αmax}, αmax , (34) 708 f g 709 where δ > 1 is an algorithmic parameter. 710 In our implementation, Step 3 is executed by the inner algorithm of Moré and Thuente. Let iˆ ∈ I be such that 711 ˆ { ′ | ∈ I} 712 i = arg min ϕi (0) i . 713 In the bracketing phase, if the test of Step 2 does not hold at iteration k, we call the algorithm of Moré and Thuente for 714 715 ϕiˆ with (inner) initial trial step size equal to αk . Then, based on interpolations of known function and derivative values 716 of ϕiˆ, the new trial step size αk+1 is computed, satisfying (34) with δ = 1.1, see [Moré and Thuente 1994]. Hence, as 717 mentioned earlier, we do not explicitly set the algorithmic parameter δ. 718 719 720 4.4 Convex quadratic functions in multiobjective problems 721 722 In this section, we discuss the special case where there exists a convex quadratic objective in a multiobjective problem. 723 n m m m Assume that F : R → R , F (x) = (F1(x),..., Fm (x)), K = R+ , C is the (ordered) canonical basis of R , e = 724 [1,..., 1]T ∈ Rm, I = {1,...,m}, ρ ≤ 1/2, and there is i ∈ I such that 725 726 1 T T Fi (x) = x Aix + bi x + ci , 727 2 728 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 15

n×n n 729 where Ai ∈ R is a symmetric definite positive matrix, b ∈ R , and c ∈ R. Let d a K-descent direction for F at x. 730 Note that, in this case, we have 731 ϕ (α) = F (x + αd). 732 i i 733 ∗ Its well known that the exact minimizer αi of ϕi (α) is given by 734 (A x + b )T d 735 α∗ = − i i , i T 736 d Aid 737 see, for example, [Bertsekas 1999; Nocedal and Wright 2006]. It is easy to see that 738 739 ′ ′ ϕi (α) ≤ ϕi (0) + αρϕmax(0) and ϕi (α) ≤ 0, (35) 740 ∗ ′ ′ 741 for all α ∈ (0, αi ], because ϕi (0) ≤ ϕmax(0) < 0. Define IQ ⊂ I by 742 I { ∈ I | } 743 Q B i Fi is a convex quadratic function , 744 and αQ > 0 by 745 ∗ 746 αQ B min{αi | i ∈ IQ }. 747 Now assume that αQ < αmax, and consider the application of Algorithm1 with α0 = αQ . We claim that, in this case, 748 ∗ Algorithm1 needs to work only with the non-quadratic objectives. Indeed, since α0 ∈ (0, α ] for all i ∈ IQ , by (35), we 749 i 750 obtain ′ 751 ϕi (α0) ≤ ϕi (0) + α0ρϕmax(0) for all i ∈ IQ , 752 and 753 ′ ′ 754 0 = max{ϕi (α0) | i ∈ IQ } ≤ −σϕmax(0). 755 Therefore, either α0 satisfies conditions (9 ) and Algorithm1 stops at Step 1.1 with α = α0 declaring convergence, or 756 there is i ∈ I \ I such that (11) holds for α . Note that the first case necessarily occurs when I = I, i.e., all the 757 0 Q 0 Q 758 objectives are convex quadratic functions. Consider the second case. Assuming that the inner solver has the finite 759 termination property, by Theorems1 and2, the interval (0, α0) brackets a desirable step size, Step 2 is always executed, 760 and the generated sequence is decreasing. Since αk ∈ (0, α0) for all k ≥ 1, by (35), the index ik at Step 2.3 is such that 761 i ∈ I \ I , that is, ϕ corresponds to a non-quadratic objective. Thus, the test at Step 2 should be performed only for 762 k Q ik 763 the indexes in I\IQ . Moreover, by (12) and (35), it follows that, for all k ≥ 0, 764 ϕ (α ) ≤ ϕ (0) + α ρϕ′ (0) for all i ∈ {i } ∪ I , 765 i k+1 i k+1 max k Q 766 and 767 { ′ | ∈ { } ∪ I } ≤ − ′ , 768 max ϕi (αk+1) i ik Q σϕmax(0) 769 because ρ < ρ¯ < σ¯ < σ. This implies that it is sufficient to test the convergence criterion at Step 1.1 only for the indexes 770 in I\IQ . Hence, the quadratic functions can be neglected, as we claimed. 771 I I 772 In our implementation, we require the user to inform which objectives are convex quadratics. If Q = , then we 773 take α0 = αQ , and Algorithm1 stops returning α = α0. Otherwise, we set αmax ← αQ and 774 775 α0 ← min{α0, αQ }, 776 where α0 in the right hand side of the above expression is the user-supplied initial trial step size. Then, Algorithm1 is 777 778 applied, neglecting the convex quadratic objectives. 779 780 Manuscript submitted to ACM 16 L. R. Lucambio Pérez and L. F. Prudente

781 5 NUMERICAL EXPERIMENTS 782 783 We define the test problems by directly considering the functions ϕi given by (8). For simplicity, we omit index i in the 784 description of the functions. The first two functions given by 785 α ϕ(α) = − , (36) 786 α2 + β 787 788 where β > 0, and 5 4 789 ϕ(α) = (α + β) − 2(α + β) , (37) 790 p where 0 < β < 1.6, were proposed in [Moré and Thuente 1994]. These functions have unique minimizers at α = β and 791 792 α = 1.6 − β, respectively. The first function is concave to the right of the minimizer, while the region of concavityof ′ ′ 793 the second function is to the left of the minimizer. The parameter β also controls the size of ϕ (0) because ϕ (0) = −1/β 794 for function (36), and ϕ′(0) = (5β − 8)β3 for function (37). In our experiments we set β = 0.16 and β = 0.004, 795 respectively. This choice for function (37) implies that condition (9b) is quite restrictive for any parameter σ < 1, 796 ′ −7 797 because ϕ (0) ≈ −5 × 10 . Plots for these functions are in Figures2(a) and2(b). 798 The next function is defined by 799 2 2 800 −βα + β α if α < 0, 801 − log(1 + βα) if 0 ≤ α ≤ 1, ϕ(α) =  (38) 802  β β2  − log(1 + β) − (α − 1) + (α − 1)2 if α > 1, 803  1 + β (1 + β)2  804  where β > 0. This function is convex and has a unique minimizer at α = 1.5 + 1/(2β). Parameter β controls the size of 805  ′ 806 ϕ (0) = −1/β. We take β = 100. This function appeared in [Lucambio Pérez and Prudente 2018], and its plot can be 807 viewed in Figure2(c). 808 The function defined by 809 ! ! 810 q q q q ϕ(α) = 1 + β − β (1 − α)2 + β2 + 1 + β − β α2 + β2, (39) 811 1 1 2 2 2 1 812 where β , β > 0, was suggested in [Yanai et al. 1981]. This function is convex, but different combinations of parameters 813 1 2 −2 −3 −3 814 β1 and β2 lead to quite different characteristics. Plots of (39) with β1 = 10 and β2 = 10 , and with β1 = 10 and −2 815 β2 = 10 are in Figures2(d) and2(e), respectively. Function (39) with these parameters was also used in the numerical 816 tests of [Moré and Thuente 1994]. 817 The function given by 818  2   − α −0.6 − α −0.2 819 ϕ(α) = 2 − 0.8e 0.4 − e 0.04 (40) 820 is from [Miglierina et al. 2008]. This function has a global minimizer at α ≈ 0.2 and a local minimizer at α ≈ 0.6. 821 822 As highlighted in [Miglierina et al. 2008], the global minimizer of function (40) has a narrow attraction region when 823 compared with the attraction region of its local minimizer. Plot of function (40) appears in Figure2(f). 824 The next two functions are defined by 825 ϕ(α) = e−βα , (41) 826 827 where β > 0, and 828 0 if α = 0, 829 ϕ(α) = (42) −α + 103α3 sin(1/α) if α 0. 830  ,  831  832 Manuscript submitted to ACM  A Wolfe line search algorithm for vector optimization 17

833 We have ϕ′(0) = −1/β and ϕ′(0) = −1, respectively. Although function (41) is bounded below, it has no minimizer. In 834 contrast, function (42) has an infinite number of local minimizers. In our numerical tests, weset β = 10 for function (41). 835 Figures2(g) and2(h) contain the plots of these functions. 836 837 Finally, we consider a simple convex quadratic function given by 838 2 , 839 ϕ(α) = β1α + β2α (43) 840 where β1 > 0, and β2 < 0. In our experiments we take β1 = 0.1 and β2 = −1. A plot of function (43) with these 841 842 parameters appears in Figure2(i). 843

844 0 0.5 0 845 -0.2 0 846 -1 -0.4 -0.5 847 -2 848 -0.6 -1

849 -0.8 -1.5 -3 850 -1 -2 851 -4 -1.2 -2.5 852 853 -1.4 -3 -5 0 1 2 3 4 5 6 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 854 (a) Function (36) with β = 0.16; (b) function (37) with β = 0.004; (c) function (38) with β = 100; 855 856 1.002 1.002 2 857 1 1 1.8 858 1.6 859 0.998 0.998 1.4 860 0.996 0.996 861 1.2 0.994 0.994 862 1 863 0.992 0.992 0.8 864 0.99 0.99 0.6 865 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 −2 −3 −3 −2 866 (d) function (39) with β1 = 10 , and β2 = 10 ; (e) function (39) with β1 = 10 , and β2 = 10 ; (f) function (40); 867 1 15 0 868

869 10 0.8 -0.5 870 5 0.6 871 -1 872 0 0.4 873 -1.5 -5 874 0.2 -2 875 -10 0 876 -15 -2.5 877 0 0.5 1 1.5 2 0 0.05 0.1 0.15 0.2 0.25 0.3 0 2 4 6 8 10 3 878 (g) function (41) with β = 10; (h) function (42) with β = 10 ; (i) function (43) with β1 = 0.1, and β2 = −1. 879 880 Fig. 2. Plot of functions used in the numerical tests. 881 882 The set of test problems was defined combining different functions of Figure2. Except for one problem, we defined −4 10 883 ρ = 10 and σ = 0.1. This choice for σ guarantees a reasonably accurate line search. Parameter αmax was set to 10 . 884 Manuscript submitted to ACM 18 L. R. Lucambio Pérez and L. F. Prudente

885 For each problem, we consider different values for the initial trial step size, including particularly the remote values 886 ±3 α0 = 10 . The results are in Table1. In the table, the first two columns identify the problem and the number p of 887 functions involved. Columns “α ” and “α” inform the initial trial step size and the final iterate, respectively. “SC” informs 888 0 889 the satisfied stopping criterion, where C denotes convergence meaning that the algorithm found a step size satisfying 890 conditions (9). Column “outiter” contains the number of outer iterations of Algorithm1 given in the form: total number 891 of outer iterations (number of iterations corresponding to the bracketing phase / number of iterations corresponding to 892 the selection phase). Recall that, in the bracketing phase, Algorithm1 executes Step 3, whereas in the selection phase, it 893 894 executes Step 2. Still in the table, “initer” is the total number of iterations used by the algorithm of Moré and Thuente, 895 “nfev” is the number of evaluations of the functions, and “ngev” is the number of evaluations of the derivatives. 896 897 # Problem α0 α SC outiter initer nfev ngev 10−3 0.4 C 5(4/1) 11 22 21 898 Functions in Figures2(a) and2(b) 3 0.4 C 2(0/2) 17 21 19 1 899 p = 2 10 0.4 C 2(0/2) 17 21 19 3 900 10 0.4 C 2(0/2) 20 24 22 10−3 1.50 C 6(5/1) 9 15 15 901 Functions in Figures2(c) and2(i) 0.5 1.48 C 2(1/1) 5 7 7 2 902 p = 2 2.5 1.49 C 1(0/1) 4 5 5 3 903 10 1.53 C 1(0/1) 2 3 2 10−3 0.4 C 5(4/1) 11 28 27 904 Functions in Figures2(a),2(b) and2(c) 1.55 0.4 C 2(0/2) 14 21 18 3 905 p = 3 2.0 0.4 C 3(0/3) 21 30 25 3 906 10 0.4 C 2(0/2) 17 22 20 10−3 0.0737 C 4(3/1) 7 16 15 907 Functions in Figures2(d) and2(e) 0.4 0.0743 C 1(0/1) 5 8 7 4 908 p = 2 0.6 0.0743 C 1(0/1) 5 8 7 3 909 10 0.0762 C 1(0/1) 8 10 9 −3 910 10 0.0742 C 4(3/1) 8 22 22 Functions in Figures2(d),2(e) and2(f) 0.4 0.0741 C 1(0/1) 7 12 12 5 911 p = 3 0.8 0.0742 C 2(0/2) 13 20 18 3 912 10 0.0742 C 2(0/2) 17 22 21 −3 913 10 0.400 C 5(4/1) 8 19 18 Functions in Figures2(a),2(f) and2(i) 0.25 0.201 C 1(0/1) 8 11 11 6 914 p = 3 0.5 0.400 C 1(0/1) 4 7 6 915 103 0.399 C 2(0/2) 9 13 11 −3 916 10 1.60 C 6(5/1) 13 25 24 Functions in Figures2(b) and2(g) 0.3 1.60 C 3(2/1) 10 16 15 7 917 p = 2 1.5 1.60 C 2(1/1) 9 13 12 918 103 1.60 C 1(0/1) 11 13 12 −3 919 10 0.0236 C 3(2/1) 10 17 17 Functions in Figures2(g) and2(h) 0.2 0.245 C 2(1/1) 11 16 14 8 920 p = 2 10 0.00115 C 1(0/1) 10 13 11 921 103 0.00195 C 1(0/1) 9 12 10 −3 922 10 0.0206 C 3(2/1) 13 24 24 Functions in Figures2(b),2(f) and2(h) 0.18 0.0934 C 2(1/1) 12 20 17 9 923 p = 3 0.3 0.201 C 2(0/2) 18 25 25 924 103 0.201 C 2(0/2) 21 25 23 −3 925 10 0.0206 C 3(2/1) 13 44 44 Functions in Figures2(a)–2(h) 0.1 0.0164 C 2(0/2) 20 42 31 10 926 p = 8 10 0.0164 C 3(0/3) 31 50 38 927 103 0.0164 C 3(0/3) 34 53 41 928 Table 1. Results of the numerical tests with respect to the functions of Figure2. 929 930 931 As it can be seen in Table1, Algorithm1 found an step size satisfying conditions (9) in all considered instances. Overall, 932 regardless of the initial point used, few outer iterations and a moderate number of evaluations of functions/derivatives 933 934 were needed to find an acceptable step size. Typically, one or two iterations of the selection phase were sufficient −3 935 for convergence. For each problem, the number of outer iterations was the largest when α0 = 10 . In these cases, 936 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 19

937 Algorithm1 performed some (cheap) iterations of the bracketing phase until it identified an interval containing an 938 3 acceptable step size and, consequently, entered in the selection phase. For the cases where α0 = 10 , as it has to be 939 expected, the algorithm identified that the interval (0, α ) brackets desirable step sizes. This can be seen by noting that 940 0 941 no bracketing iterations were performed. 942 We proceed by taking a detailed look at some test problems and the corresponding results. In Problem 1, except in 943 the first choice for α0 where the function in Figure2(a) was used in the bracketing and selection phases, the algorithm 944 identified that (0, α ) brackets acceptable step sizes based on the function in Figure2(b). In these cases, Algorithm1 945 0 946 used this function to find α1 ≈ 1.6. Since α1 does not satisfy conditions (9), the second iteration of the selection phase 947 was based on the function in Figure2(a), resulting in the final iterate α2 ≈ 0.4. 948 ′ Problem 2 was proposed in [Lucambio Pérez and Prudente 2018] to show that it is not possible to replace ϕmax(0) 949 by ϕ′(0) in condition (9a) in the following sense: there is no step size for this problem satisfying simultaneously 950 i ′ ′ 951 conditions (9a) with ϕi (0) instead of ϕmax(0), and (9b). Considering the Wolfe conditions (9) in its own format, Problem 2 952 poses no challenges to the method proposed here. 953 In Problem 3, only the region in the neighborhood of the minimizer of the function in Figure2(a) contains acceptable 954 step sizes. For α = 10−3, the function in Figure2(c) was used in four iterations of the bracketing phase generating 955 0 956 α4 ≈ 0.625. Then, the function in Figure2(a) was used in the unique selection iteration to obtain the final acceptable 957 iterate. The initial trial α0 = 1.55 lies between the minimizers of the functions in Figures2(b) and2(c), while α0 = 2.0 is 958 3 to the right of the minimizers of the three functions. For α0 = 1.55 and α0 = 10 , Algorithm1 performed two selection 959 iterations using the functions in Figures2(c) and2(a), respectively. For α = 2.0, three iterations of the selection phase 960 0 961 generated α1 ≈ 1.6, α2 ≈ 1.5, and α3 ≈ 0.4 which are in the neighborhood of the minimizers of the functions in 962 Figures2(c),2(b) and2(a), respectively. 963 For reasons of scale, we set σ = 10−3 for Problem 4. This value of parameter σ forces the set of acceptable step sizes 964 to be close to the minimizer of the function in Figure2(d). For all initial trial step sizes used, only one iteration of the 965 966 selection phase based on this function was sufficient to found a desired step size. 967 Adding the function in Figure2(f) to Problem 4, we get Problem 5. In all instances of this latter problem, the final 968 3 iterate was obtained based on the function in Figure2(d). For α0 = 0.8 and α0 = 10 , in the first iteration of the selection 969 phase, the algorithm generated iterates in the neighborhood of the global and local minimizers of the function in 970 971 Figure2(f), respectively. 972 Problem 6 has two regions of acceptable step sizes: around the minimizer of the function in Figure2(a) and around the 973 global minimizer of the function in Figure2(f). As discussed in Section 4.4, the convex quadratic function in Figure2(i) 974 was neglected. For α = 10−3, Algorithm1 performed five bracketing iterations based on the function in Figure2(a) 975 0 976 until its minimizer was bracketed. For α0 = 0.25 and α0 = 0.5, the algorithm identified that interval (0, α0) brackets 977 the minimizer of the function in Figure2(a) and the global minimizer of the function in Figure2(f), respectively. As 978 consequence, the final iterate for each of these choices was α ≈ 0.2 and α ≈ 0.4, respectively. Two selection iterations 979 were performed for α = 103. In this case, the initial trial step size was modified as α ← α , where α = 5 is the 980 0 0 Q Q 981 minimizer of the quadratic function in Figure2(i). Using the updated interval (0, α0), the first selection iteration based 982 on the function in Figure2(f) produced α1 ≈ 0.59, which is around its local minimizer. The second selection iteration 983 was based on the function in Figure2(a), returning the acceptable step size α2 ≈ 0.4. 984 Problem 7 combines the function in Figure2(b), which forces condition (9b) to be quite restrictive, with the function 985 ′ 986 in Figure2(g), that has no minimizer. Although there is a large region where ϕ (α) ≈ 0 for the function in Figure2(g), 987 the acceptable step sizes are close to the minimizer of the function in Figure2(b). Typically, the algorithm uses the 988 Manuscript submitted to ACM 20 L. R. Lucambio Pérez and L. F. Prudente

989 function in Figure2 (g) in the bracketing phase until it generates a step size to the right of the minimizer of the function 990 in Figure2(b). Then, using the function in Figure2(b) in the selection phase, Algorithm1 finds an acceptable step size 991 α ≈ 1.6. 992 993 Problem 8 combines two functions with quite different characteristics. While the function in Figure2(g) has no 994 minimizer, the function in Figure2(h) has an infinite number of positive local minimizers close to zero. Table1 shows 995 that Algorithm1 found different acceptable step sizes for each α0 considered. 996 In Problem 9, in addition to several acceptable step sizes coming from the function in Figure2(h), the neighborhood 997 998 of the global minimizer of the function in Figure2(d) also contains desirable points. For the first two choices of α0, the 999 function in Figure2(h) was used in both bracketing and selection phases of the algorithm, leading to different final 1000 iterates. In its turn, for the others choices of α0, Algorithm1 performed only selection iterations and used the function 1001 in Figure2(d) to obtain the final iterate α ≈ 0.2. For α = 0.3 the first iteration of the selection phase was based onthe 1002 2 0 3 1003 function in Figure2(h) to obtain α1 ≈ 0.245, while for α0 = 10 it was based on the function in Figure2(b) generating 1004 α1 ≈ 1.6. 1005 Finally, the eight non-quadratic functions in Figure2 compose Problem 10. All the final iterates were found based on 1006 the function in Figure2(h). For α = 10−3, the function in Figure2(c) was used in the two iterations of the bracketing 1007 0 1008 phase, returning α2 ≈ 0.025. Then, the function in Figure2(h) was used to find the final iterate α3 ≈ 0.0206. In contrast, 1009 Algorithm1 detects that (0, α0) brackets desirable step sizes for the other choices of α0. For α0 = 0.1, the first selection 1010 3 iteration was based on the function in Figure2(d). For α0 = 10 and α0 = 10 , the functions in Figures2(b) and2(d) were 1011 used in the first and second selection iterations, respectively. Then, in these last three cases, the acceptable stepsize 1012 1013 α ≈ 0.0164 was found using the function in Figure2(h). 1014 Another question of interest is the asymptotic behavior of Algorithm1 when the number p of involved functions 1015 increases. Toward this goal we consider an additional class of five problems with p = 5, 100, 200, 500, and 1000 based on 1016 function (36) varying parameter β. Let us clarify how the problems were built. A list of 1000 such functions was made. 1017 6 4 1018 We set β = 10 for the first one and randomly generated integer values between 1 and 10 for the others. For each value 1019 of p, the corresponding problem is defined considering the first p functions in the list. Recall that function (36) has a p 1020 ′ − ′ ′ − −6 unique minimizer at α = β and ϕ (0) = 1/β. Thus, ϕmax(0) = ϕ1(0) = 10 and condition (9b) is quite restrictive 1021 for any parameter σ < 1. We solved each problem starting at four different trial step sizes: one close to zero, one to the 1022 1023 right of the largest minimizer, and the values α0 = 10 and α0 = 50 which are located between the minimizers of the 1024 considered functions. The results are presented in Table2 and organized similarly as in Table1. We point out that in 1025 each instance the final step size is around the lowest minimizer of the corresponding functions. In this class ofproblems, 1026 in relation to the number of outer and inner iterations, Algorithm1 showed little sensitivity to the increase in the 1027 1028 number of involved functions. In turn, there is a linear relationship between p and the number of function evaluations 1029 and their derivatives. This suggests that our scheme is potentially able to solve large problems. 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 21

1041 p α0 α SC outiter initer nfev ngev −3 1042 10 18.9 C 9(7/2) 20 68 65 10 18.9 C 3(1/2) 13 31 28 1043 5 50 18.9 C 2(0/2) 12 25 22 1044 2 × 103 18.9 C 3(0/3) 23 40 33 1045 10−3 10.8 C 7(6/1) 13 812 718 1046 10 10.8 C 4(1/3) 19 516 229 1047 100 50 10.8 C 3(0/3) 18 415 128 1048 2 × 103 10.8 C 4(0/4) 29 525 139 1049 10−3 5.2 C 8(6/2) 19 1817 1577 1050 10 5.2 C 1(0/1) 6 405 359 200 1051 50 5.2 C 4(0/4) 24 1020 387 3 1052 2 × 10 5.2 C 5(0/5) 35 1230 398 −3 1053 10 5.1 C 9(6/3) 24 5021 4138 10 5.1 C 2(0/2) 11 1509 1120 1054 500 50 5.1 C 5(0/5) 29 3024 1148 1055 2 × 103 5.1 C 6(0/6) 40 3534 1159 1056 10−3 1.0 C 6(5/1) 13 7012 6961 1057 10 1.0 C 3(0/3) 20 4017 2577 1058 1000 50 1.0 C 6(0/6) 38 7032 2605 1059 2 × 103 1.0 C 7(0/7) 49 8042 2616 1060 Table 2. Results of the numerical tests with respect to the problem based on (36) varying the number of involved functions. 1061 1062 1063 For each example of this section, it is easy to construct an associated vector optimization problem. Let K = Rm 1064 Note. + m m 1065 and C the canonical basis of R . Define F : R → R such that its coordinate functions are some selected ϕ’s. Since 1066 ϕ′(0) < 0, then d = 1 is a descent direction for F at x = 0 with respect to the order induced by K. Thus, the problem 1067 of computing a step size satisfying the strong Wolfe conditions (6) for F at x = 0 in the direction d = 1 reduces to the 1068 corresponding example. For instance, F : R → R2 given by F (x) = (F (x), F (x))T where F (x) = −x/(x2 + . ) and 1069 1 2 1 0 16 5 4 1070 F2(x) = (x + 0.004) − 2(x + 0.004) corresponds to Problem 1 involving functions in Figures2 (a) and (b). 1071 1072 6 FINAL REMARKS 1073 In this work, an algorithm for finding a step size satisfying the strong vector-valued Wolfe conditions has been proposed. 1074 1075 In our implementation, we used the well-known algorithm of Moré and Thuente as the inner solver. As a consequence, 1076 except in pathological cases, the main algorithm stops in a finite number of iterations. First inequality of (28) is quite 1077 natural for practical problems. Hence, by Theorem2, it is expected that the algorithm converges to an acceptable step 1078 size. 1079 1080 The efficiency of our scheme is linked to the efficiency of the algorithm used as inner solver. Improvements inWolfe 1081 line search algorithms for scalar optimization, increase the effectiveness of the associated vector algorithm. Therefore, 1082 our work can serve as a basis for future research, either by comparing the influence of different alternatives of inner 1083 solvers or by incorporating new developments that arise from the scalar optimization. 1084 ∗ 1085 Algorithm1 is designed for vector-value problems (1) where the positive polar cone K given by (3) is finitely 1086 generated, i.e., when C is a finite set. Although this case covers most of the real applications, there are practical problems 1087 that require partial orders induced by cones with infinitely many extremal vectors. Applications involving such cones 1088 can be found, for example, in [Aliprantis et al. 2004] where problems of portfolio selection in security markets are 1089 1090 studied. When F is K-convex and C is an infinite set, it is possible to show the existence of step sizes satisfying the 1091 vector-value Wolfe conditions, see [Lucambio Pérez and Prudente 2018]. For the general case, even the existence of 1092 Manuscript submitted to ACM 22 L. R. Lucambio Pérez and L. F. Prudente

1093 acceptable step sizes is unknown. The study of the Wolfe conditions for infinitely generated cones as well as the 1094 extension of Algorithm1 for this challenging problems are left as open problems for future research. 1095 1096 1097 ACKNOWLEDGMENTS 1098 The authors were partially supported by FAPEG/CNPq/PRONEM-201710267000532 and CNPq (Grants 424860/2018-0). 1099 1100 The second author was partially supported by FAPEG/CNPq/PPP-03/2015 and CNPq (Grants 406975/2016-7). 1101 1102 REFERENCES 1103 1104 M. Al-Baali and R. Fletcher. 1986. An efficient line search for nonlinear least squares. Journal of Optimization Theory and Applications 48, 3 (01 Mar 1986), 1105 359–377. https://doi.org/10.1007/BF00940566 C. D. Aliprantis, M. Florenzano, V. F. Martins da Rocha, and R. Tourky. 2004. Equilibrium analysis in financial markets with countably many securities. 1106 Journal of Mathematical Economics 40, 6 (2004), 683 – 699. https://doi.org/10.1016/j.jmateco.2003.06.003 1107 Md AT Ansary and G Panda. 2015. A modified Quasi-Newton method for vector optimization problem. Optimization 64, 11 (2015), 2289–2306. 1108 J. Y. Bello Cruz. 2013. A for Vector Optimization Problems. SIAM Journal on Optimization 23, 4 (2013), 2169–2182. https: 1109 //doi.org/10.1137/120866415 arXiv:http://dx.doi.org/10.1137/120866415 1110 D. P. Bertsekas. 1999. . Athena scientific Belmont. 1111 Henri Bonnel, Alfredo Noel Iusem, and Benar Fux Svaiter. 2005. Proximal methods in vector optimization. SIAM Journal on Optimization 15, 4 (2005), 1112 953–970. 1113 G. A. Carrizo, P. A. Lotito, and M. C. Maciel. 2016. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization 1114 problem. Mathematical Programming 159, 1 (01 Sep 2016), 339–369. 1115 L. C. Ceng, B. S. Mordukhovich, and J. C. Yao. 2010. Hybrid approximate proximal method with auxiliary variational inequality for vector optimization. Journal of optimization theory and applications 146, 2 (2010), 267–303. 1116 L. C. Ceng and J. C. Yao. 2007. Approximate proximal methods in vector optimization. European Journal of Operational Research 183, 1 (2007), 1–19. 1117 Thai Doan Chuong. 2011. Generalized proximal method for efficient solutions in vector optimization. Numerical functional analysis and optimization 32, 8 1118 (2011), 843–857. 1119 Thai Doan Chuong. 2013. Newton-like methods for efficient solutions in vector optimization. Computational Optimization and Applications 54, 3 (2013), 1120 495–516. 1121 Thai Doan Chuong, B. S. Mordukhovich, and J. C. Yao. 2011. Hybrid approximate proximal algorithms for efficient solutions in vector optimization. 1122 Journal of Nonlinear and Convex Analysis 12, 2 (8 2011), 257–285. 1123 Thai Doan Chuong and J. C. Yao. 2012. Steepest descent methods for critical points in vector optimization problems. Applicable Analysis 91, 10 (2012), 1124 1811–1829. 1125 J.Y. Bello Cruz, L.R. Lucambio Pérez, and J.G. Melo. 2011. Convergence of the projected for quasiconvex multiobjective optimization. Nonlinear Analysis: Theory, Methods & Applications 74, 16 (2011), 5268 – 5273. https://doi.org/10.1016/j.na.2011.04.067 1126 Prabuddha De, Jay B Ghosh, and Charles E Wells. 1992. On the minimization of completion time variance with a bicriteria extension. Operations Research 1127 40, 6 (1992), 1148–1155. 1128 J. E. Dennis and R. B. Schnabel. 1996. Numerical methods for unconstrained optimization and nonlinear equations. Vol. 16. Siam. 1129 R. Fletcher. 2013. Practical methods of optimization. John Wiley & Sons. 1130 J. Fliege, L. M. Graña Drummond, and B. F. Svaiter. 2009. Newton’s Method for Multiobjective Optimization. SIAM Journal on Optimization 20, 2 (2009), 1131 602–626. https://doi.org/10.1137/08071692X arXiv:http://dx.doi.org/10.1137/08071692X 1132 J. Fliege and B. F. Svaiter. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research 51, 3 (2000), 1133 479–494. https://doi.org/10.1007/s001860000043 1134 Jörg Fliege and Luis N Vicente. 2006. Multicriteria approach to bilevel optimization. Journal of optimization theory and applications 131, 2 (2006), 209–225. European Journal of Operational Research 1135 J. Fliege and R. Werner. 2014. Robust multiobjective optimization & applications in portfolio optimization. 234, 2 (2014), 422 – 433. https://doi.org/10.1016/j.ejor.2013.10.028 1136 Ellen H Fukuda and L. M. Graña Drummond. 2011. On the convergence of the projected gradient method for vector optimization. Optimization 60, 8-9 1137 (2011), 1009–1021. 1138 Ellen H. Fukuda and L. M. Graña Drummond. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and 1139 Applications 54, 3 (2013), 473–493. 1140 L. M. Graña Drummond and A. N. Iusem. 2004. A Projected Gradient Method for Vector Optimization Problems. Computational Optimization and 1141 Applications 28, 1 (2004), 5–29. https://doi.org/10.1023/B:COAP.0000018877.86161.8b 1142 L. M. Graña Drummond, F. M. P. Raupp, and B. F. Svaiter. 2014. A quadratically convergent Newton method for vector optimization. Optimization 63, 5 1143 (2014), 661–677. https://doi.org/10.1080/02331934.2012.693082 arXiv:http://dx.doi.org/10.1080/02331934.2012.693082 1144 Manuscript submitted to ACM A Wolfe line search algorithm for vector optimization 23

1145 L. M. Graña Drummond and B. F. Svaiter. 2005. A steepest descent method for vector optimization. J. Comput. Appl. Math. 175, 2 (2005), 395 – 414. 1146 https://doi.org/10.1016/j.cam.2004.06.018 1147 M. Gravel, J. M. Martel, R. Nadeau, W. Price, and R. Tremblay. 1992. A multicriterion view of optimal resource allocation in job-shop production. European 1148 journal of operational research 61, 1-2 (1992), 230–244. 1149 W. W. Hager. 1989. A derivative-based bracketing scheme for univariate minimization and the conjugate gradient method. Computers & Mathematics with Applications 18, 9 (1989), 779–795. 1150 W. W. Hager and H. Zhang. 2005. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization 1151 16, 1 (2005), 170–192. 1152 Theodore S. Hong, David L. Craft, Fredrik Carlsson, and Thomas R. Bortfeld. 2008. Multicriteria Optimization in Intensity-Modulated Radiation Therapy 1153 Treatment Planning for Locally Advanced Cancer of the Pancreatic Head. International Journal of Radiation Oncology*Biology*Physics 72, 4 (2008), 1154 1208–1214. 1155 C. Lemaréchal. 1981. A view of line-searches. In Optimization and Optimal Control. Springer, 59–78. 1156 Thomas M Leschine, Hannele Wallenius, and William A Verdini. 1992. Interactive multiobjective analysis and assimilative capacity-based ocean disposal 1157 decisions. European Journal of Operational Research 56, 2 (1992), 278–289. 1158 Fang Lu and Chun-Rong Chen. 2014. Newton-like methods for solving vector optimization problems. Applicable Analysis 93, 8 (2014), 1567–1586. 1159 D. T. Luc. 1989. Theory of vector optimization, Lectures Notes in economics and mathematical systems, vol. 319. L. R. Lucambio Pérez and L. F. Prudente. 2018. Non-linear conjugate gradient methods for vector optimization. SIAM Journal on Optimization, to appear 1160 (2018). 1161 E. Miglierina, E. Molho, and M.C. Recchioni. 2008. Box-constrained multi-objective optimization: A gradient-like method without a priori scalarization. 1162 European Journal of Operational Research 188, 3 (2008), 662 – 682. https://doi.org/10.1016/j.ejor.2007.05.015 1163 J. J. Moré and D. C. Sorensen. 1982. Newton’s method. Technical Report. Argonne National Lab., IL (USA). 1164 J. J. Moré and D. J. Thuente. 1994. Line Search Algorithms with Guaranteed Sufficient Decrease. ACM Trans. Math. Softw. 20, 3 (Sept. 1994), 286–307. 1165 https://doi.org/10.1145/192115.192132 1166 J. Nocedal and S. Wright. 2006. Numerical optimization. Springer Science & Business Media. 1167 Shaojian Qu, Mark Goh, and Felix TS Chan. 2011. Quasi-Newton methods for solving multiobjective optimization. Operations Research Letters 39, 5 (2011), 1168 397–399. 1169 S. Qu, M. Goh, and B. Liang. 2013. Trust region methods for solving multiobjective optimisation. Optimization Methods and Software 28, 4 (2013), 796–811. https://doi.org/10.1080/10556788.2012.660483 1170 D. F. Shanno and K. H. Phua. 1980. Remark on Algorithm 500: Minimization of unconstrained multivariate functions. ACM Trans. Math. Software 6, 4 1171 (1980), 618–622. 1172 Madjid Tavana. 2004. A subjective assessment of alternative mission architectures for the human exploration of Mars at NASA using multicriteria 1173 decision making. Computers & Operations Research 31, 7 (2004), 1147–1164. 1174 M. Tavana, M. A. Sodenkamp, and L. Suhl. 2010. A soft multi-criteria decision analysis model with application to the European Union enlargement. 1175 Annals of Operations Research 181, 1 (2010), 393–421. 1176 Kely DV Villacorta and P Roberto Oliveira. 2011. An interior proximal method in vector optimization. European Journal of Operational Research 214, 3 1177 (2011), 485–492. 1178 DJ White. 1998. Epsilon-dominating solutions in mean-variance portfolio analysis. European Journal of Operational Research 105, 3 (1998), 457–466. 1179 H. Yanai, M. Ozawa, and S. Kaneko. 1981. Interpolation methods in one dimensional optimization. Computing 27, 2 (1981), 155–163. C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal. 1997. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-scale Bound-constrained Optimization. ACM 1180 Trans. Math. Softw. 23, 4 (Dec. 1997), 550–560. https://doi.org/10.1145/279232.279236 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 Manuscript submitted to ACM