Algorithms for Large-Scale Nonlinearly Constrained Optimization

Case for Support — Description of Research and Context Purpose

The solution of large-scale nonlinear optimization—minimization or maximization— problems lies at the heart of scientific computation. Structures take up positions of minimal constrained potential energy, investors aim to maximize profit while controlling risk, public utilities run transmission networks to satisfy demand at least cost, and pharmaceutical companies desire minimal drug doses to target pathogens. All of these problems are large either because the mathematical model involves many parameters or because they are actually finite discretisations of some continuous problem for which the variables are functions. The purpose of this grant application is to support the design, analysis and development of new algorithms for nonlinear optimization that are particularly aimed at the large-scale case. A post-doctoral researcher will be employed to perform a significant share of the design, analysis and implementation. The resulting software will be freely available, for research purposes, as part of the GALAHAD library [21]. Background

The design of iterative algorithms for constrained nonlinear optimization has been a vibrant research area for many years. The earliest methods were based on simple (penalty or barrier) transformations of the constrained problem to create one or more artificial unconstrained ones. This has particular advantages when the number of variables is large since significant advances in large-scale unconstrained optimization, particularly in the use of truncated conjugate-gradient methods, were made during the 1980s. The first successful methods were arguably augmented Lagrangian methods, also based on sequential unconstrained minimization, which aimed to overcome the perceived ill-conditioning in the earlier approaches. Certainly both MINOS [27] and LANCELOT [11] are augmented-Lagrangian based packages. Fully-fledged constrained optimization approaches date back to the development of sequential linear (SLP) and (SQP) methods, and have been applied to small-scale problems since the late 1960s. These were the first methods in which separate approximations to the objective function and constraints were treated explicitly, and it has been argued that in its basic form SQP is simply Newton’s method for constrained optimization. The names SLP/SQP arise since a sequence of linear (LP) or quadratic (QP) programs—specifically the minimization of a first or second-order Taylor series approximation to the Lagrangian function subject to linearisations of the constraints—are solved. In practice, SQP methods have proved to be remarkably hard to generalise for the truly large-scale case, and to date only the SNOPT [18] and FilterSQP [15] packages are based on the SQP paradigm. The main issue is that in the small-scale case, the Hessian of the Lagrangian is approximated by a so-called positive-definite quasi-Newton formula, the resulting QP is thus strictly convex, and has but one minimizer. By contrast, in the large-scale case, it is often too expensive to maintain a positive-definite Hessian approximation, but the alternative of using exact second derivatives causes difficulties since then the QP is often non-convex and may have undesirable local minimizers. The bulk of the development in constrained optimization over the past 15 years has been for interior- point methods. In these an alternative Newton model based on perturbed optimality conditions is used to compute improved iterates within the interior of the feasible region, while at the same time reducing the perturbation to zero to encourage convergence. Crucially, there is coherence between the way the step is chosen and the logarithmic barrier function as a means to assess whether to accept or retract the step. The successful packages KNITRO [8], IPOPT [29] and LOQO [1] are all interior-point based. However, when it comes to equality constraints (which have no interiors), the strategies used appear somewhat ad hoc. Programme and methodology

In this proposal, we intend to revisit the SQP approach, particularly, but not exclusively, to derive new

1 convergence strategies and to investigate better ways of dealing with equality constraints. We will consider three approaches: (i) an SQP method in which extra constraints are imposed on the subproblem to ensure that the step gives local improvement, (ii) a regularized SLP-EQP method in which the choice of which constraints are imposed when finding the step is delegated to a subsidiary but simpler problem and (iii) an approach where reduction of the constraint violation and objective function are controlled separately without using a “filter”. Our aim will be, in each case, to design and analyse suitable algorithms, and to produce high-quality, publicly available software as part of the GALAHAD library. We now give technical details of the three approaches we will consider.

Approach 1. Given an estimate xk of the solution to the continuous optimization problem

minimize f(x) subject to c(x) ≥ 0, x∈IRn the basic SQP approach determines a correction sk by solving the quadratic program

T 1 T minimize s gk + 2 s Hks subject to Jks + ck ≥ 0, (1) s∈IRn where gk = ∇xf(xk), ck = c(xk), Jk = ∇xc(xk), and Hk is a symmetric approximation to the Hessian of the Lagrangian function—for simplicity we only refer to the inequality problem here, but in practice equality constraints will result in linearised equalities in the QP subproblem. Both line-search and trust- region globalisations are possible; in the former xk+1 = xk + αksk for some suitable αk ∈ (0, 1], while in the latter an extra constraint ksk∞ ≤ ∆k will be imposed on the QP subproblem for some suitably small radius ∆k and xk+1 = xk + sk. Traditionally, globalisation is achieved by requiring that xk+1 significantly reduces an appropriate penalty/merit function such as φ(x; σ) = f(x) + σv(x), where v(x) = k min(0, c(x))k and σ is a sufficiently positive parameter [28]. More recently, “filter” methods [14, 15] achieve the same effect by insisting that the pair (f(xk+1), v(xk+1)) is not dominated in a Pareto sense by certain previous pairs (f(xi), v(xi)), i ≤ k. The basic SQP method has two significant advantages, namely that the QP subproblem ultimately predicts which of the inequality constraints are active (and thus those which may be discarded), and that, once this happens, the method converges very rapidly if Hk is suitably chosen [5, 23]. However, it also suffers from a number of defects. Firstly, there may be no feasible point for the QP subproblem, especially if a trust-region with small radius is imposed. This may be overcome by replacing the QP by the related penalty-QP problem T 1 T minimize s gk + 2 s Hks + σk min(0, Jks + ck)k, s∈IRn or by entering a “restoration phase” whose aim is simply to move closer to feasibility by discarding the objective. Secondly, the merit function or filter may reject the full SQP step, and thus destroy rapid convergence [26]. This may be avoided at additional cost by adding a “second-order” correction to the SQP step [13]. Finally, and most seriously, when Hk is indefinite, the QP may have many local minimizers, some of which are inappropriate for the merit function or filter; finding the global minimizer is simply too hard, and indeed the global minimizer may itself be inappropriate [22]. One way of avoiding this latter difficulty is to impose an extra constraint on the subproblem (1) to ensure that the step is a descent direction for the merit function. For example, if ei denotes the ith column of the identity matrix and Vk are the indices of constraints that are violated or active at xk, and we consider the subproblem

T 1 T T T minimize s gk + 2 s Hks subject to Jks + ck ≥ 0 and (gk + σ X Jk ei) s ≤ −ωk, (2) s∈IRn i∈Vk for some appropriately small ωk > 0, any solution of the subproblem will be a descent direction for φ(x, σ) at xk. Of course, there is no a priori guarantee that the constraint set for (2) is consistent, and it may

2 instead be better to consider the penalty-QP version of the subproblem

T 1 T T T minimize s gk + 2 s Hks + σk min(0, Jks + ck)k subject to (gk + σ X Jk ei) s ≤ −ωk (3) s∈IRn i∈Vk for which there will be a solution. It is even possible to impose a trust-region upon (2) or (3), and in the latter case the additional constraint set will remain feasible provided either the radius is sufficiently large or ωk sufficiently small. Nevertheless, there remain a number of interesting issues. In particular, it is unclear what effects the additional constraint might have on the ultimate speed of convergence. The intention is to implement these ideas as part of GALAHAD, using the library’s pre-existing QP solvers [10, 24, 25] as crucial sub-programs. This part of the research programme will be performed in conjunction with Professors Philippe Toint (Namur) and Dominique Orban (Montr´eal). Both have considerable experience in the field of large-scale nonlinear optimization, and have been part of the GALAHAD team since its inception. Collaboration will be, as always, primarily by email.

Approach 2. Sequential approaches differ from their SQP counterparts (1) in that the simpler linear programming subproblem

T minimize s gk subject to Jks + ck ≥ 0 and ksk∞ ≤ ∆k (4) s∈IRn is solved—the trust-region constraint is imposed to guarantee a finite solution. Although SLP methods are almost inevitably slowly convergent (and thus impractical), they share with SQP the ability to predict which constraints are ultimately active so long as the radius ∆k stays sufficiently large. This then naturally leads to combined SLP-EQP approaches in which the optimal active set Ak for (4) is simply used for prediction, and the step sk is actually chosen as the minimizer of the equality-constrained quadratic program (EQP) T 1 T minimize s gk + 2 s Hks subject to JAk s + cAk = 0, (5) s∈IRn where the subscript Ak indicates the rows of Jk and ck indexed by Ak. Thus predicted inactive constraints are temporarily ignored when finding the step. The main advantage of this SLP-EQP approach compared to SQP is that neither subproblem (4)-(5) has undesirable local minimizers, and both may be solved very efficiently. Just as for SQP, the subproblem (4) may be infeasible if ∆k is too small, but this defect can be avoided by instead solving a penalty LP problem

T minimize s gk + σk min(0, Jks + ck)k subject to ksk∞ ≤ ∆k, (6) s∈IRn which has the same active-set predictive properties as (4), or by entering a restoration phase. Convergence of such methods has been established [6, 9, 17] and they have been shown to be effective in practice [7]. However, a significant defect of the LP subproblems (4) or (6) is that changes to ∆k as typically arise during a trust-region method often result in drastic changes to the optimal active faces of the norm constraint ksk∞ ≤ ∆—this is a feature of any non-smooth (polyhedral) norm. Hence, even though the set of optimally active problem constraints will settle down, those from the trust region may not, and thus the cost of solving each LP in the iterative sequence does not diminish. Such a defect would not occur with a smooth norm (such as ksk2 ≤ ∆k), but in this case (4)/(6) would not be linear programs. One way around this defect is to replace (4) by a “regularized” LP

T 2 minimize µks gk + ksk2 subject to Jks + ck ≥ 0 (7) s∈IRn where the parameter µk is used to control the size of the step just as ∆k does. Problem (7) is a convex, specially-structured QP, and may be solved as a parametric problem if µk is changed.

3 We intend to consider regularized LP-EQP methods in which the active set Ak for (7) is used in conjunction with the step selection problem (5). We intend using a filter as a step acceptance mechanism. Since we believe that there is close (but nonlinear) relationship between the parameter µk in (7) and ∆k in (4) we feel that it should be straightforward to show that such a scheme will converge globally based on known analyses [9, 16]. To show that the method ultimately converges rapidly is likely less straightforward, but we plan to investigate this. Of course, the real test of whether this is a good method will only be apparent once we have carefully implemented it; many of the existing building blocks from GALAHAD will be useful here. This part of the research programme will be performed in conjunction with Dr. Sven Leyffer (Argonne National Laboratory), a founding expert on filter methods. Again collaboration will primarily be by email, although one visit is envisaged.

Approach 3. When all of the constraints are equations, a typical SQP method computes a correction step sk to T 1 T minimize s gk + 2 s Hks subject to Jks + ck = 0. (8) s∈IRn

Since an additional trust-region constraint ksk ≤ ∆k may render (8) infeasible, it is often more convenient [12, §15.4] to compute sk as a composite step nk + tk, where the normal step nk approximately solves

minimize kJkn + ckk subject to knk ≤ ωn∆k (9) and the tangential one tk approximately solves

T 1 T minimize t (gk + Hknk) + 2 t Hkt subject to Jkt = 0 and ktk ≤ ωt∆k (10) for some appropriate scalars ωn and ωt. Thus the normal step is preoccupied with improving infeasibility, while the tangential one is devoted to reducing the objective without jeopardizing any improvement in feasibility obtained; this strategy lies behind the successful software package KNITRO [2, 8]. While there are good conjugate-gradient/Lanczos strategies for approximately solving (9) and (10) in many cases [19, 20], the requirement in (10) that the tangential step stays in the null space of the constraint Jacobian may limit this approach for truly large problems, particularly those that arise from the discretisation of three–dimensional partial differential equations (PDEs). Thus, there is a need to devise and analyse methods that particularly do not require that Jktk = 0, but rather a weaker condition such as 2 2 2 kck + Jk(nk + tk)k ≤ κkckk + (1 − κ)kck + Jknkk for some κ ∈ (0, 1). In this case, any improvement in feasibility derived through the normal step will not completely be lost by the tangential step, and there is therefore still a trend towards feasibility. At the same time, while it is traditional to constrain (9) and (10) using the same trust-region radius, this is undesirable since the two subproblems actually consider different aspects of the problem. Thus, we C would like to consider methods in which the trust regions in (9) and (10) are replaced by knk ≤ ∆k and F C F ktk ≤ ∆k respectively and where ∆k and ∆k are controlled separately. In particular, the trust region for (9) is only preoccupied with the constraints, and thus it makes sense to control it simply on the basis of how accurately the linearisation Jksk + ck predicts c(xk + sk). Similarly, the trust region for (10) should T 1 T reflect how the model sk gk + 2 sk Hksk predicts the reduction f(xk + sk) − f(xk). Since f and c will be controlled separately, there is no need to use a penalty function to control convergence. While using a filter to ensure convergence might appear natural, preliminary analysis suggests that it might actually be possible to weaken this further so that all that is necessary is that there is a possibly tightening “tube” or “funnel” of permitted constraint values to drive convergence. Thus our aim is to derive a “trust-funnel” method for equality-constrained optimization which allows considerable flexibility in the computation of the step, and at the same time dispenses with the need for either penalty- or filter-based globalisation tools. Subsidiary issues that will be addressed include

4 whether a “second-order” corrector is needed to ensure fast ultimate convergence, and how to incorporate inequality constraints—they might be imposed directly on (9) and (10) or alternatively, the trust-funnel approach might be wrapped inside an interior-point framework. As before, our principal goal is to provide a careful implementation of such a method within GALAHAD, using existing building blocks. We will be particularly interested to see how this may be applied to PDE-constrained optimization [3, 4], for which special-purpose methods (multi-grid or tailored preconditioners) from that field will be necessary when constructing normal and tangential steps. This part of the research programme will again be performed in collaboration with Professor Philippe Toint (Namur). His experience both of filter methods, and as part of the GALAHAD team, is invaluable.

Project management. Dr. Gould will be in overall control and will collaborate over the design and analysis of the algorithms relating to each of the three approaches outlined above. The post-doctoral assistant will be responsible for the implementation and testing of each of the proposed algorithms, as well as being involved with design and analysis. The overseas collaborators (Toint, Orban, Leyffer) will assist with design and analysis. The three approaches can proceed independently, although some re-use of software is both desirable and inevitable. All three approaches will follow the same general pattern: (i) background preparation and assessment of the current state-of-the-art, (ii) initial design of the algorithm, (iii) convergence analysis and implementation (iv) redesign and further analysis on the basis of initial testing, and (v) final testing, documentation and release of software, along with supporting paper(s) for journal publication. References

[1] H. Y. Benson, D. F. Shanno, and R. J. Vanderbei. Interior-point methods for nonconvex nonlinear pro- gramming: filter methods and merit functions. Computational Optimization and Applications, 23:257–272, 2002. [2] L. Biegler, J. Nocedal, and C. Schmid. A reduced Hessian method for large-scale constrained optimization. SIAM Journal on Optimization, 5(2):314–347, 1995. [3] L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. v. Bloemen Waanders, editors. Large-Scale PDE- Constrained Optimization, number 30 in Lecture Notes in Computational Science and Engineering, Heidelberg, Berlin, New York, 2003. Springer Verlag. [4] G. Biros and O. Ghattas. A Lagrange-Newton-Krylov-Schur method for PDE-constrained optimization. SIAG/Optimiztion Views-and-News, 11(2):12–18, 2000. [5] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica, 4:1–51, 1995. [6] R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz. An algorithm for nonlinear optimization using linear programming and equality constrained subproblems. Mathematical Programming, Series B, 100(1):27– 48, 2004. [7] R. H. Byrd, N. I. M. Gould, J. Nocedal, and R. A. Waltz. On the convergence of successive linear-quadratic programming algorithms. SIAM Journal on Optimization, 16(2):471–489, 2006. [8] R. H. Byrd, M. E. Hribar, and J. Nocedal. An interior point algorithm for large scale nonlinear programming. SIAM Journal on Optimization, 9(4):877–900, 1999. [9] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter algorithm that takes EQP steps. Mathematical Programming, 96(1):161–177, 2003. [10] A. R. Conn, N. I. M. Gould, D. Orban, and P. L. Toint. A primal-dual trust-region algorithm for non-convex nonlinear programming. Mathematical Programming, 87(2):215–249, 2000. [11] A. R. Conn, N. I. M. Gould, and P. L. Toint. LANCELOT: a Fortran package for Large-scale Nonlinear Optimization (Release A). Springer Series in Computational Mathematics. Springer Verlag, Heidelberg, Berlin, New York, 1992. [12] A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust-Region Methods. SIAM, Philadelphia, 2000. [13] R. Fletcher. Second-order corrections for non-differentiable optimization. In G. A. Watson, editor, Proceed- ings Dundee 1981, pages 85–114, Heidelberg, Berlin, New York, 1982. Springer Verlag. Lecture Notes in Mathematics.

5 [14] R. Fletcher, N. I. M. Gould, S. Leyffer, P. L. Toint, and A. W¨achter. Global convergence of a trust-region SQP-filter algorithm for nonlinear programming. SIAM Journal on Optimization, 13(3):635–659, 2002. [15] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty function. Mathematical Programming, 91(2):239–269, 2002. [16] R. Fletcher, S. Leyffer, and P. L. Toint. On the global convergence of an SLP-filter algorithm. Technical Report 98/13, Department of Mathematics, University of Namur, Namur, Belgium, 1998. [17] R. Fletcher and E. Sainz de la Maza. Nonlinear programming and nonsmooth optimization by successive linear programming. Mathematical Programming, 43(3):235–256, 1989. [18] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimiza- tion. SIAM Journal on Optimization, 12(4):979–1006, 2002. [19] N. I. M. Gould, M. E. Hribar, and J. Nocedal. On the solution of equality constrained quadratic problems arising in optimization. SIAM Journal on Scientific Computing, 23(4):1375–1394, 2001. [20] N. I. M. Gould, S. Lucidi, M. Roma, and P. L. Toint. Solving the trust-region subproblem using the Lanczos method. SIAM Journal on Optimization, 9(2):504–525, 1999. [21] N. I. M. Gould, D. Orban, and P. L. Toint. GALAHAD—a library of thread-safe fortran 90 packages for large-scale nonlinear optimization. ACM Transactions on Mathematical Software, 29(4):353–372, 2003. [22] N. I. M. Gould, D. Orban, and P. L. Toint. Numerical methods for large-scale nonlinear optimization. Acta Numerica, 14:299–361, 2005. [23] N. I. M. Gould and P. L. Toint. SQP methods for large-scale nonlinear programming. In M. J. D. Powell and S. Scholtes, editors, System Modelling and Optimization, Methods, Theory and Applications, pages 149–178, Dordrecht, The Netherlands, 2000. Kluwer Academic Publishers. [24] N. I. M. Gould and P. L. Toint. An iterative working-set method for large-scale non-convex quadratic pro- gramming. Applied Numerical Mathematics, 43(1–2):109–128, 2002. [25] N. I. M. Gould and P. L. Toint. Numerical methods for large-scale non-convex quadratic programming. In A. H. Siddiqi and M. Koˇcvara, editors, Trends in Industrial and Applied Mathematics, pages 149–179, Dordrecht, The Netherlands, 2002. Kluwer Academic Publishers. [26] N. Maratos. Exact penalty function algorithms for finite-dimensional and control optimization problems. PhD thesis, University of London, London, England, 1978. [27] B. A. Murtagh and M. A. Saunders. A projected Lagrangian algorithm and its implementation for sparse non-linear constraints. Mathematical Programming Studies, 16:84–117, 1982. [28] M. J. D. Powell. A fast algorithm for nonlinearly constrained optimization calculations. In G. A. Watson, editor, Numerical Analysis, Dundee 1977, number 630 in Lecture Notes in Mathematics, pages 144–157, Heidelberg, Berlin, New York, 1978. Springer Verlag. [29] A. W¨achter and L. T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, Series A, 106(1):25–57, 2006.

Relevance to Beneficiaries

Large-scale optimization is one of the cornerstones of modern applied mathematics. Successful optimization codes lead to advances in any area that relies on optimization. Thus the potential impact of advances is significant, both scientifically and for wealth creation. Potential beneficiaries include those engaged in optimization research, and more generally those in research whose application (science, engineering, planning, finance) relies on optimization. Beneficiaries outside of research include planners, investors, plant operators, designers and medics to name but a few. Dissemination and exploitation

Standard methods of dissemination (technical reports, journal papers, seminars, conference talks and proceedings) will be used to convey our results to the wider academic community and beyond. The ultimate goal is the production of new high-quality software for research and development purposes. The software will be freely available as part of the GALAHAD library (http://galahad.rl.ac.uk/galahad-www/) for research purposes.

6