Journal name manuscript No. (will be inserted by the editor)
A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm
Renke Kuhlmann · Christof B¨uskens
Received: date / Accepted: date
Abstract Interior-point methods have been shown to be very efficient for large-scale nonlinear programming. The combination with penalty methods increases their robustness due to the regularization of the constraints caused by the penalty term. In this paper a primal-dual penalty-interior-point algo- rithm is proposed, that is based on an augmented Lagrangian approach with an `2-exact penalty function. Global convergence is maintained by a combina- tion of a merit function and a filter approach. Unlike other filter methods no separate feasibility restoration phase is required. The algorithm has been im- plemented within the solver WORHP to study different penalty and line search options and to compare its numerical performance to two other state-of-the- art nonlinear programming algorithms, the interior-point method IPOPT and the sequential quadratic programming method of WORHP.
Keywords Nonlinear Programming · Constrained Optimization · Augmented Lagrangian · Penalty-Interior-Point Algorithm · Primal-Dual Method
Mathematics Subject Classification (2000) 49M05 · 49M15 · 49M29 · 49M37 · 90C06 · 90C26 · 90C30 · 90C51
Renke Kuhlmann Optimization and Optimal Control, Center for Industrial Mathematics (ZeTeM), Biblio- thekstr. 5, 28359 Bremen, Germany E-mail: [email protected] Christof B¨uskens E-mail: [email protected] 2 Renke Kuhlmann, Christof B¨uskens
1 Introduction
In this paper we consider the nonlinear optimization problem
min f(x) n x∈R s.t. c(x) = 0 (1.1) x ≥ 0
with twice continuously differentiable functions f : Rn → R and c : Rn → Rm, but the methods can easily be extended to the general case with l ≤ x ≤ g and c(x) ≤ 0 (cf. [45]). The widely used and very efficient interior-point strategy (cf. [6,21,34]) handles the inequality constraints by adding a barrier term to the objective function f(x) and solving a sequence of barrier problems
n X (i) min ϕµ(x) := f(x) − µ ln x x∈ n R i=1 (1.2) s.t. c(x) = 0
with a decreasing barrier parameter µ > 0. In this paper, we consider an algorithm that penalizes both, the inequality box constraints and the nonlinear equality constraints c(x), by a log-barrier term and an augmented Lagrangian term, respectively. However, unlike other augmented Lagrangian methods we do not use a quadratic `2-norm as measure for the constraint violation, but an exact `2-penalty-interior-point algorithm (see Chen and Goldfarb [10,11,12]). The resulting unconstrained reformulation is
n ! X (i) > min Φµ,λ,ρ,τ (x) := ρ f(x) − µ ln x + λ c(x) + τ kc(x)k (1.3) x 2 i=1 with penalty parameters ρ ≥ 0 and τ > 0, a barrier parameter µ ≥ 0 and Lagrangian multipliers λ ∈ Rm. For improved readability the depen- dences of ρ, τ and λ are neglected when clear from the context and we write Φµ(x) := Φµ,λ,ρ,τ (x). The penalty parameter τ controls the size of the multipli- ers and will be updated until a certain threshold value is reached. The penalty parameter ρ balances the optimization of the Lagrangian function and the con- straint violation of problem (1.2). In particular the algorithm solves a sequence of (1.3) with a decreasing penalty parameter ρ until finding a first-order op- timal point of (1.2). However, unlike penalty-interior-point algorithms with a quadratic penalty function (e.g. Armand et al. [1], Armand and Omheni [2,3] or Yamashita and Yabe [47]) the penalty parameter ρ does not have to con- verge to zero. A first-order optimal point of (1.2) satisfying the Mangasarian- Fromovitz constraint qualification (MFCQ) is a stationary point of the merit function Φµ(x) if ρ is smaller than a certain threshold value or the duals of (1.3) equal ρλ. Using two penalty parameters is mainly motivated by a better accuracy of the implemented algorithm. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 3
It is an important feature of optimization algorithms to detect infeasibility of the given problem. In such a case a first-order optimal point of (1.2) does not exist and the penalty parameter ρ will converge to zero resulting in the optimization of
min kc(x)k . (1.4) x≥0 2
The solution of (1.4) that is infeasible for (1.1) serves as a certificate of in- feasibility. The presented algorithm follows the idea of Fletcher [18] and Byrd et al. [8] to place the penalty parameter in front of the objective function or the Lagrangian function, respectively, instead of in front of the measure of constraint violation for better solver performance for infeasible problems. The proposed algorithm shares the following properties with other primal- dual penalty-interior-point algorithms (e.g. [1,10,15]): The step is a guaranteed descent direction for the merit function Φµ(x) and a rank-deficient Jacobian of the constraints at infeasible non-stationary points can be handled without modification of the Newton system. The latter avoids failure of global con- vergence for example for the optimization problem in W¨achter and Biegler [44]. An extension to the pure (quadratic) `2-penalty function are the aug- mented Lagrangian methods (e.g. [14,35]). Recently, primal-dual augmented Lagrangian methods have enjoyed an increased popularity. They have been studied by Armand and Omheni [2,3], Forsgren and Gill [20], Gertz and Gill [22], Gill and Robinson [23] and Goldfarb et al. [25]. These methods can re- move the perturbation of the KKT system caused by the penalty term by an appropriate update of the Lagrangian multipliers λ. This makes it unnecessary to calculate a further unperturbed step per iteration like in Chen and Goldfarb [11,12], and naturally leads to a quadratic rate of convergence to first-order optimal points of (1.2) and a superlinear rate in case of the nonlinear program (1.1). Our update of the Lagrangian multipliers λ differs from other augmented Lagrangian based algorithms (e.g. [2,3,13]), as it does not rely on a criterion that measures the reduction of the constraint violation. Instead, it is based on the dual information and is designed to be applied as often as possible when approaching the optimal solution. For step acceptance, instead of following recent research trends to avoid penalties and a filter like in Liu and Yuan [33] or Gould and Toint [30], we combine the two – the merit function and the filter mechanism – as line search criteria, of which at least one has to indicate progress for a trial iterate. Com- parable combinations have been proposed by Chen and Goldfarb [12] and Gould et al. [26,27]. The filter, originally introduced by Fletcher and Leyf- fer [19] significantly increases the flexibility of the step acceptance and, thus, is widely used by nonlinear programming solvers (e.g. [4,9,19,40,45]). Global convergence has been proved for several filter methods and usually depends on a further algorithm phase: the feasibility restoration. Due to the combination with the merit function, a feasibility restoration phase – which we believe to be a drawback of the filter approach – is not necessary for global convergence. 4 Renke Kuhlmann, Christof B¨uskens
A further advantage is that our filter entries do not depend on parameter choices, e.g. the barrier parameter µ. Other penalty-interior-point algorithms consider an `1-penalty, see e.g., Benson et al. [5], Boman [7], Curtis [15], Fletcher [18], Tits et al. [39], Gould et al. [29], Yamashita [46]. Many `1-penalty-interior-point algorithms reformulate the problem into a smooth one using additional elastic variables. However, for large-scale nonlinear programming this can be a disadvantage. Closely related are also the stabilized sequential quadratic programming methods like the works of Gill and Robinson [24] or Shen et al. [38]. The aim of this paper is to study the convergence properties of the pro- posed algorithm and its numerical performance. Therefore, we implemented the algorithm within the large-scale nonlinear programming solver WORHP. The paper is organized as follows. In Section2 we describe the algorithm in- cluding the general approach of primal-dual penalty-interior-point algorithms, the step calculation and the line search. The global and local convergence of the presented algorithm are shown in Section3 and Section4, respectively. Finally, in Section5 we perform numerical experiments using the CUTEst test set [28] to show the efficiency of the proposed algorithm and compare it to other solvers, in particular the interior-point method IPOPT [45] and the sequential quadratic programming algorithm of WORHP [9]. Notation Matrices are written in uppercase and vectors in lowercase. The i-th component of a vector x is denoted by x(i). A diagonal matrix with the entries of a vector x on its diagonal has the same name in uppercase, i.e X := diag(x). The vector e stands for a vector of all ones with appropriate dimension. The norm k·k is the Euclidean norm k·k2 unless stated differently, e.g. k·k∞ is the maximum norm. The notation In(X) = (λ+, λ−, λ0) stands for the inertia of a matrix X, in particular (λ+, λ−, λ0) are the numbers of positive, negative and zero eigenvalues, respectively. We will denote the gradient of a function n n h1 : R → R at the point x0 as ∇h1(x0) ∈ R , the Jacobian of a function n m n×m h2 : R → R as ∇h2(x0) ∈ R and the subdifferential of h1(x) at x0 as ∂h1(x0).
2 Algorithm Description
2.1 The Primal-Dual Penalty-Interior-Point Approach
The first-order optimality conditions of problem (1.2) are
∇f(x) + ∇c(x)λ − ν = 0 (2.1a) c(x) = 0 (2.1b) Xν − µe = 0, (2.1c) where λ ∈ Rm and ν ∈ Rn correspond to the Lagrangian multipliers of the nonlinear equality constraints and the inequality bound constraints, respec- tively. In the case of µ = 0, the conditions (2.1) are the optimality conditions Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 5
of (1.1) if x ≥ 0, z ≥ 0 are added. It is well-established to consider (2.1) as a homotopy method with µ → 0 for finding an optimal solution of (1.1). To derive the first-order optimality conditions for problem (1.3), we con- sider the generic function
Φ(x) := ρϕ(x) + h(c(x)) with c as above, the smooth function ϕ : Rn → R and the non-smooth but convex function h : Rm → R. The first-order necessary optimality condition is that if a point x minimizes Φ(x), then there exists y ∈ ∂h such that
ρ∇ϕ(x) + ∇c(x)y = 0, see Fletcher [17]. In case of problem (1.3), we have h(c) = ρλ>c + τ kck and, thus, ( c if c > 0 ∂h(c) = ρλ + τ kck (2.2) {g ∈ Rm | kgk ≤ 1} if c = 0. We can transform the condition y ∈ ∂h to c(x) − τ −1 kc(x)k (y − ρλ) = 0 together with ky − ρλk ≤ τ and retrieve the first-order optimality conditions for problem (1.3):
ρ∇f(x) + ∇c(x)y − z = 0 (2.3a) c(x) − τ −1 kc(x)k (y − ρλ) = 0 (2.3b) ky − ρλk ≤ τ (2.3c) Xz − ρµe = 0 (2.3d)
The optimality conditions (2.3) can be interpreted as scaled and perturbed optimality conditions (2.1) of the barrier problem (1.2) with a scaling factor ρ and a perturbation τ −1 kc(x)k (y − ρλ) with size smaller or equal to kc(x)k. This perturbation vanishes if either the constraint violation kc(x)k is zero or the duals are chosen to be y = ρλ. This leads to the following propositions that formally state the relation of first-order optimal points of problem (1.2) and (1.3) similarly to [15].
Proposition 2.1 Let τ > 0, ρ > 0, µ > 0 and let (¯x, y,¯ z¯) be a first-order optimal point of problem (1.3), i.e. equations (2.3) hold. If kc(¯x)k = 0 and (λ,¯ ν¯) = (¯y/ρ, z/ρ¯ ), then (¯x, λ,¯ ν¯) is first-order optimal for problem (1.2).
Proposition 2.2 Let µ > 0. If (¯x, λ,¯ ν¯) is a first-order optimal point of (1.2), then for all penalty parameters ρ > 0 and τ > 0, the point (¯x, y,¯ z¯) with (¯y, z¯) = (ρλ,¯ ρν¯) is first-order optimal for (1.3).
The Propositions 2.1 and 2.2 validate our approach of optimizing prob- lem (1.3) with appropriate choices of the penalty parameters ρ and τ and Lagrangian multipliers λ for finding a first-order optimal point of the barrier problem (1.2). This in turn yields a solution of (1.1) for µ → 0. A difference 6 Renke Kuhlmann, Christof B¨uskens
to other optimization algorithms is that our primal-dual algorithm works with the possibly scaled multipliers y instead of the multipliers λ of problem (1.2). For penalization of the constraints there are two options with different properties: increasing τ or decreasing ρ. While ρ scales the multipliers y itself, τ scales their distance to the multipliers λ, see (2.3a) and (2.3c). If a huge penalization of the constraints is needed, i.e. τ being very big or ρ being very small, both approaches have a disadvantage. On the one hand, if the problem (1.1) is infeasible, letting τ tend to infinity would lead to divergence of the multipliers y (cf. Chen and Goldfarb [10]), which can be harmful in practical implementations. On the other hand, a very small ρ can cause difficulties in finding the optimal solution with respect to a given tolerance due to the scaling (cf. Curtis [15]). That is why we propose the combination of both. All together, the parameters ρ, τ and λ form a dual trust-region algorithm.
2.2 Step Computation
Instead of applying Newton’s method directly to the optimality conditions (2.3) at an iterate (xk, yk), we first rewrite the feasibility condition (2.3b) to
c(xk) + σk(ρkλk − yk) = 0 and fix σk = kc(xk)k /τk throughout the iteration k. The dual trust-region con- dition (2.3c) is omitted for the step computation. For given penalty parameter ρk, the Newton iterate then yields the linear system Hk ∇c(xk) −I ∆xk ρk∇f(xk) + ∇c(xk)yk − zk > ∇c(xk) −σkI 0 ∆yk = − c(xk) + σk(ρkλk − yk) , (2.4) Zk 0 Xk ∆zk Xkzk − µρke
2 Pm (i) 2 (i) where Hk := ρk∇xxf(xk) + i=1 yk ∇xxc (xk) is the Hessian of the La- grangian function with respect to the multipliers yk or an approximation to it. In case of kc(xk)k > 0 this linear equation system is equivalent to the one of a primal-dual augmented Lagrangian method with a quadratic `2-penalty function (cf. Armand and Omheni [3]) and penalty parameter adaptively set to σk = kc(xk)k /τ. Because the iterates xk are kept strictly feasible throughout the optimization, we can eliminate the last equation of the Newton system (2.4) and solve the smaller linear equation system −1 ∆xk ρk∇f(xk) + ∇c(xk)yk − µρkXk e Mk = − (2.5a) ∆yk c(xk) + σk(ρkλk − yk) −1 Hk + Xk Zk ∇c(xk) Mk := > (2.5b) ∇c(xk) −σkI −1 −1 ∆zk = µρkXk e − zk − Xk Zk∆xk. (2.5c) Greif et al. [31] investigate eigenvalue bounds for the two matrices of (2.4) and (2.5) if the Hessian Hk is regularized or σk is constantly zero and conclude Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 7
that (2.4) is better conditioned when µ becomes very small. However, in the context of practical implementations for large-scale programming the system (2.5) is generally preferred (cf. [41,45]). The next result shows that, provided the Hessian matrix is modified ap- propriately, the step computed by (2.4) will always yield a descent direction for the merit function Φµ.
Proposition 2.3 Let τk > 0, ρk ≥ 0, µ ≥ 0 and (∆xk, ∆yk) be a solution of the linear system (2.5). Then,
> ∇Φµ(xk) ∆xk > > = ρk∇ϕµ(xk) ∆xk − ρkλk c(xk) − τk kc(xk)k > + (c(xk) + σkρkλk) (yk + ∆yk − ρkλk) ( > −1 −(∆xk) (Hk + Xk Zk)∆xk, if kc(xk)k = 0 = > −1 −1 > −(∆xk) Hk + Xk Zk + σk ∇c(xk)∇c(xk) ∆xk, if kc(xk)k > 0
Furthermore, if the inertia of (2.5) satisfies In(Mk) = (n, m, 0) and the op- timality conditions (2.3) are not satisfied at the current iterate (xk, yk, zk), then (∆xk, ∆yk) is a descent direction for the merit function Φµ at xk, i.e. > ∇Φµ(xk) ∆xk < 0. Proof The proof is similar to the one of Lemma 3.2 in [10] but is extended here for the case of λ not being constantly zero. We split the proof into the two cases kc(xk)k > 0 and kc(xk)k = 0:
Case kc(xk)k > 0: Then we have > ∇Φµ(xk) ∆xk −1 > −1 > > = ρk ∇f(xk) − µXk e + ∇c(xk)λk ∆xk + σk c(xk) ∇c(xk) ∆xk > −1 > > = ρk∇ϕµ(xk) ∆xk + σk (σkρkλk + c(xk)) ∇c(xk) ∆xk > −1 > = ρk∇ϕµ(xk) ∆xk + σk (σkρkλk + c(xk)) (σk (yk + ∆yk − ρkλk)) −1 > − σk (σkρkλk + c(xk)) c(xk) > > = ρk∇ϕµ(xk) ∆xk + (c(xk) + σkρkλk) (yk + ∆yk − ρkλk) > − ρkλk c(xk) − τk kc(xk)k , where the third equality follows by applying the second equation of (2.5a). This proves the first equation of the Proposition. In addition, we also have
> ∇Φµ(xk) ∆xk −1 > −1 > > = ρk ∇f(xk) − µXk e + ∇c(xk)λk ∆xk + σk c(xk) ∇c(xk) ∆xk > −1 = −(∆xk) (Hk + Xk Zk)∆xk −1 > > − yk + ∆yk − ρkλk − σk c(xk) ∇c(xk) ∆xk > −1 −1 > > = −(∆xk) (Hk + Xk Zk)∆xk − σk (∆xk) ∇c(xk)∇c(xk) ∆xk, 8 Renke Kuhlmann, Christof B¨uskens
where the second equality follows from the first and the third equality from the second equation of (2.5a). > Case kc(xk)k = 0: Then we have σk = 0 and ∇c(xk) ∆xk = 0 from the second equation of (2.5a) and, thus,
kc(x + t∆x )k − kc(x )k lim k k k t↓0 t 1 m (i) (i) 2! 2 X c (xk + t∆xk) − c (xk) = lim t↓0 t i=1 > = ∇c(xk) ∆xk = 0.
Using this together with the definition of the directional derivative and the > fact that c(xk) = 0, σk = 0 and, again, ∇c(xk) ∆xk = 0 yields
> ∇Φµ(xk) ∆xk > ϕµ(xk + t∆xk) − ϕµ(xk) λk (c(xk + t∆xk) − c(xk)) = lim ρk + ρk t↓0 t t kc(x + t∆x )k − kc(x )k + τ k k k t > > > = ρk∇ϕµ(xk) ∆xk + ρkλk ∇c(xk) ∆xk > = ρk∇ϕµ(xk) ∆xk > > = ρk∇ϕµ(xk) ∆xk − ρkλk c(xk) − τk kc(xk)k > + (c(xk) + σkρkλk) (yk + ∆yk − ρkλk) .
This proves the first equation of the Proposition. Furthermore, using the first equation of (2.5a) we get
> > ∇Φµ(xk) ∆xk = ρk∇ϕµ(xk) ∆xk > −1 > = −(∆xk) (Hk + Xk Zk)∆xk − (y + ∆yk) ∇c(xk) ∆xk > −1 = −(∆xk) (Hk + Xk Zk)∆xk.
Combining the two cases, the two equations of the Proposition have been proven. Using Lemma 3.1 of [10], In(Mk) = (n, m, 0) yields the positve defi- −1 −1 −1 > niteness of Hk +Xk Zk or Hk +Xk Zk +σk ∇c(xk)∇c(xk) for kc(xk)k = 0 or kc(xk)k > 0, respectively. If the optimality conditions (2.3) are not satisfied, > the step (∆xk, ∆yk) is not zero. Thus, we have ∇Φµ(xk) ∆xk < 0. ut
Proposition 2.3 states that the step calculated by (2.5) is a descent direction if the inertia of (2.5) satisfies In(Mk) = (n, m, 0). If this is not the case, it can −1 be achieved by regularizing the Hessian Hk + Xk Zk, i.e. adding a multiple −1 of the identity to Hk + Xk Zk until In(Mk) = (n, m, 0) holds (cf. [10,42,45]). Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 9
Adding a further vI with a small v > 0 then guarantees the conditions (cf. [10, Lemma 3.1])
> 2 −(∆xk) Hk∆xk ≤ −v k∆xkk , or (2.7a) > −1 −1 > 2 −(∆xk) Hk + Xk Zk + σk ∇c(xk)∇c(xk) ∆xk ≤ −v k∆xkk , (2.7b) in case of kc(xk)k = 0 or kc(xk)k > 0, respectively. This strategy can also be interpreted as a proximal algorithm or primal trust-region algorithm (cf. Parikh and Boyd [37]). It can only fail for this algorithm if the current iterate is feasible and the MFCQ fails to hold. In other words, this algorithm has the important property of handling problems with rank-deficient Jacobians ∇c(xk) at infeasible non-stationary points due to the automatic dual regularization if σk > 0, i.e. the term −σkI in the (2,2)-block of Mk. The linear equation system (2.5) reveals another important feature of penalty-interior-point algorithms that rely on an augmented Lagrangian ap- proach: Choosing λk = yk/ρk reduces the system (2.5) to a regularized Newton method applied to the optimality conditions of the barrier problem (1.2), which is relevant for the fast local convergence.
2.3 Computation of the Step Sizes
x x x After the step computation a step size αk ∈ (0, αmax] with αmax ∈ (0, 1] has to be determined to update the primal iterates by
x xk+1 ← xk + αk∆xk. (2.8)
The step size has to guarantee that the iterate xk+1 remains strictly positive, which is done by a fraction-to-the-boundary rule with a parameter sequence {ηk} with ηk ∈ (0, 1) and ηk → 1:
x αmax := max {α ∈ (0, 1] | xk + α∆xk ≥ (1 − ηk)xk} (2.9) To measure progress towards the optimal solution, two different approaches are combined: a filter and a merit function. Checking a reduction in the merit function Φµ is a straightforward cri- terion for penalty-interior-point algorithms. In particular, for a trial iterate x xk + αk∆xk we check the Armijo condition x x > Φµ(xk + αk∆xk) − Φµ(xk) ≤ ωαk∇Φµ(xk) ∆xk (2.10)