Algorithms for Large-Scale Nonlinearly Constrained Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Algorithms for Large-Scale Nonlinearly Constrained Optimization Case for Support | Description of Research and Context Purpose The solution of large-scale nonlinear optimization|minimization or maximization| problems lies at the heart of scientific computation. Structures take up positions of minimal constrained potential energy, investors aim to maximize profit while controlling risk, public utilities run transmission networks to satisfy demand at least cost, and pharmaceutical companies desire minimal drug doses to target pathogens. All of these problems are large either because the mathematical model involves many parameters or because they are actually finite discretisations of some continuous problem for which the variables are functions. The purpose of this grant application is to support the design, analysis and development of new algorithms for nonlinear optimization that are particularly aimed at the large-scale case. A post-doctoral researcher will be employed to perform a significant share of the design, analysis and implementation. The resulting software will be freely available, for research purposes, as part of the GALAHAD library [21]. Background The design of iterative algorithms for constrained nonlinear optimization has been a vibrant research area for many years. The earliest methods were based on simple (penalty or barrier) transformations of the constrained problem to create one or more artificial unconstrained ones. This has particular advantages when the number of variables is large since significant advances in large-scale unconstrained optimization, particularly in the use of truncated conjugate-gradient methods, were made during the 1980s. The first successful methods were arguably augmented Lagrangian methods, also based on sequential unconstrained minimization, which aimed to overcome the perceived ill-conditioning in the earlier approaches. Certainly both MINOS [27] and LANCELOT [11] are augmented-Lagrangian based packages. Fully-fledged constrained optimization approaches date back to the development of sequential linear (SLP) and quadratic programming (SQP) methods, and have been applied to small-scale problems since the late 1960s. These were the first methods in which separate approximations to the objective function and constraints were treated explicitly, and it has been argued that in its basic form SQP is simply Newton's method for constrained optimization. The names SLP/SQP arise since a sequence of linear (LP) or quadratic (QP) programs|specifically the minimization of a first or second-order Taylor series approximation to the Lagrangian function subject to linearisations of the constraints|are solved. In practice, SQP methods have proved to be remarkably hard to generalise for the truly large-scale case, and to date only the SNOPT [18] and FilterSQP [15] packages are based on the SQP paradigm. The main issue is that in the small-scale case, the Hessian of the Lagrangian is approximated by a so-called positive-definite quasi-Newton formula, the resulting QP is thus strictly convex, and has but one minimizer. By contrast, in the large-scale case, it is often too expensive to maintain a positive-definite Hessian approximation, but the alternative of using exact second derivatives causes difficulties since then the QP is often non-convex and may have undesirable local minimizers. The bulk of the development in constrained optimization over the past 15 years has been for interior- point methods. In these an alternative Newton model based on perturbed optimality conditions is used to compute improved iterates within the interior of the feasible region, while at the same time reducing the perturbation to zero to encourage convergence. Crucially, there is coherence between the way the step is chosen and the logarithmic barrier function as a means to assess whether to accept or retract the step. The successful packages KNITRO [8], IPOPT [29] and LOQO [1] are all interior-point based. However, when it comes to equality constraints (which have no interiors), the strategies used appear somewhat ad hoc. Programme and methodology In this proposal, we intend to revisit the SQP approach, particularly, but not exclusively, to derive new 1 convergence strategies and to investigate better ways of dealing with equality constraints. We will consider three approaches: (i) an SQP method in which extra constraints are imposed on the subproblem to ensure that the step gives local improvement, (ii) a regularized SLP-EQP method in which the choice of which constraints are imposed when finding the step is delegated to a subsidiary but simpler problem and (iii) an approach where reduction of the constraint violation and objective function are controlled separately without using a nonlinear programming “filter”. Our aim will be, in each case, to design and analyse suitable algorithms, and to produce high-quality, publicly available software as part of the GALAHAD library. We now give technical details of the three approaches we will consider. Approach 1. Given an estimate xk of the solution to the continuous optimization problem minimize f(x) subject to c(x) ≥ 0; x2IRn the basic SQP approach determines a correction sk by solving the quadratic program T 1 T minimize s gk + 2 s Hks subject to Jks + ck ≥ 0; (1) s2IRn where gk = rxf(xk), ck = c(xk), Jk = rxc(xk), and Hk is a symmetric approximation to the Hessian of the Lagrangian function|for simplicity we only refer to the inequality problem here, but in practice equality constraints will result in linearised equalities in the QP subproblem. Both line-search and trust- region globalisations are possible; in the former xk+1 = xk + αksk for some suitable αk 2 (0; 1], while in the latter an extra constraint ksk1 ≤ ∆k will be imposed on the QP subproblem for some suitably small radius ∆k and xk+1 = xk + sk. Traditionally, globalisation is achieved by requiring that xk+1 significantly reduces an appropriate penalty/merit function such as φ(x; σ) = f(x) + σv(x), where v(x) = k min(0; c(x))k and σ is a sufficiently positive parameter [28]. More recently, “filter” methods [14, 15] achieve the same effect by insisting that the pair (f(xk+1); v(xk+1)) is not dominated in a Pareto sense by certain previous pairs (f(xi); v(xi)), i ≤ k. The basic SQP method has two significant advantages, namely that the QP subproblem ultimately predicts which of the inequality constraints are active (and thus those which may be discarded), and that, once this happens, the method converges very rapidly if Hk is suitably chosen [5, 23]. However, it also suffers from a number of defects. Firstly, there may be no feasible point for the QP subproblem, especially if a trust-region with small radius is imposed. This may be overcome by replacing the QP by the related penalty-QP problem T 1 T minimize s gk + 2 s Hks + σk min(0; Jks + ck)k; s2IRn or by entering a \restoration phase" whose aim is simply to move closer to feasibility by discarding the objective. Secondly, the merit function or filter may reject the full SQP step, and thus destroy rapid convergence [26]. This may be avoided at additional cost by adding a \second-order" correction to the SQP step [13]. Finally, and most seriously, when Hk is indefinite, the QP may have many local minimizers, some of which are inappropriate for the merit function or filter; finding the global minimizer is simply too hard, and indeed the global minimizer may itself be inappropriate [22]. One way of avoiding this latter difficulty is to impose an extra constraint on the subproblem (1) to ensure that the step is a descent direction for the merit function. For example, if ei denotes the ith column of the identity matrix and Vk are the indices of constraints that are violated or active at xk, and we consider the subproblem T 1 T T T minimize s gk + 2 s Hks subject to Jks + ck ≥ 0 and (gk + σ X Jk ei) s ≤ −!k; (2) s2IRn i2Vk for some appropriately small !k > 0, any solution of the subproblem will be a descent direction for φ(x; σ) at xk. Of course, there is no a priori guarantee that the constraint set for (2) is consistent, and it may 2 instead be better to consider the penalty-QP version of the subproblem T 1 T T T minimize s gk + 2 s Hks + σk min(0; Jks + ck)k subject to (gk + σ X Jk ei) s ≤ −!k (3) s2IRn i2Vk for which there will be a solution. It is even possible to impose a trust-region upon (2) or (3), and in the latter case the additional constraint set will remain feasible provided either the radius is sufficiently large or !k sufficiently small. Nevertheless, there remain a number of interesting issues. In particular, it is unclear what effects the additional constraint might have on the ultimate speed of convergence. The intention is to implement these ideas as part of GALAHAD, using the library's pre-existing QP solvers [10, 24, 25] as crucial sub-programs. This part of the research programme will be performed in conjunction with Professors Philippe Toint (Namur) and Dominique Orban (Montr´eal). Both have considerable experience in the field of large-scale nonlinear optimization, and have been part of the GALAHAD team since its inception. Collaboration will be, as always, primarily by email. Approach 2. Sequential linear programming approaches differ from their SQP counterparts (1) in that the simpler linear programming subproblem T minimize s gk subject to Jks + ck ≥ 0 and ksk1 ≤ ∆k (4) s2IRn is solved|the trust-region constraint is imposed to guarantee a finite solution. Although SLP methods are almost inevitably slowly convergent (and thus impractical), they share with SQP the ability to predict which constraints are ultimately active so long as the radius ∆k stays sufficiently large. This then naturally leads to combined SLP-EQP approaches in which the optimal active set Ak for (4) is simply used for prediction, and the step sk is actually chosen as the minimizer of the equality-constrained quadratic program (EQP) T 1 T minimize s gk + 2 s Hks subject to JAk s + cAk = 0; (5) s2IRn where the subscript Ak indicates the rows of Jk and ck indexed by Ak.