Journal name manuscript No. (will be inserted by the editor)

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Algorithm

Renke Kuhlmann · Christof B¨uskens

Received: date / Accepted: date

Abstract Interior-point methods have been shown to be very efficient for large-scale . The combination with penalty methods increases their robustness due to the regularization of the constraints caused by the penalty term. In this paper a primal-dual penalty-interior-point algo- rithm is proposed, that is based on an augmented Lagrangian approach with an `2-exact penalty . Global convergence is maintained by a combina- tion of a merit function and a filter approach. Unlike other filter methods no separate feasibility restoration phase is required. The algorithm has been im- plemented within the solver WORHP to study different penalty and line search options and to compare its numerical performance to two other state-of-the- art nonlinear programming algorithms, the interior-point method IPOPT and the sequential method of WORHP.

Keywords Nonlinear Programming · · Augmented Lagrangian · Penalty-Interior-Point Algorithm · Primal-Dual Method

Mathematics Subject Classification (2000) 49M05 · 49M15 · 49M29 · 49M37 · 90C06 · 90C26 · 90C30 · 90C51

Renke Kuhlmann Optimization and Optimal Control, Center for Industrial Mathematics (ZeTeM), Biblio- thekstr. 5, 28359 Bremen, Germany E-mail: [email protected] Christof B¨uskens E-mail: [email protected] 2 Renke Kuhlmann, Christof B¨uskens

1 Introduction

In this paper we consider the nonlinear

min f(x) n x∈ s.t. (x) = 0 (1.1) x ≥ 0

with twice continuously differentiable functions f : Rn → R and c : Rn → Rm, but the methods can easily be extended to the general case with l ≤ x ≤ g and c(x) ≤ 0 (cf. [45]). The widely used and very efficient interior-point strategy (cf. [6,21,34]) handles the inequality constraints by adding a barrier term to the objective function f(x) and solving a sequence of barrier problems

n X (i) min ϕµ(x) := f(x) − µ ln x x∈ n R i=1 (1.2) s.t. c(x) = 0

with a decreasing barrier parameter µ > 0. In this paper, we consider an algorithm that penalizes both, the inequality box constraints and the nonlinear equality constraints c(x), by a log-barrier term and an augmented Lagrangian term, respectively. However, unlike other augmented Lagrangian methods we do not use a quadratic `2-norm as measure for the constraint violation, but an exact `2-penalty-interior-point algorithm (see Chen and Goldfarb [10,11,12]). The resulting unconstrained reformulation is

n ! X (i) > min Φµ,λ,ρ,τ (x) := ρ f(x) − µ ln x + λ c(x) + τ kc(x)k (1.3) x 2 i=1 with penalty parameters ρ ≥ 0 and τ > 0, a barrier parameter µ ≥ 0 and Lagrangian multipliers λ ∈ Rm. For improved readability the depen- dences of ρ, τ and λ are neglected when clear from the context and we write Φµ(x) := Φµ,λ,ρ,τ (x). The penalty parameter τ controls the size of the multipli- ers and will be updated until a certain threshold value is reached. The penalty parameter ρ balances the optimization of the Lagrangian function and the con- straint violation of problem (1.2). In particular the algorithm solves a sequence of (1.3) with a decreasing penalty parameter ρ until finding a first-order op- timal point of (1.2). However, unlike penalty-interior-point algorithms with a quadratic penalty function (e.g. Armand et al. [1], Armand and Omheni [2,3] or Yamashita and Yabe [47]) the penalty parameter ρ does not have to con- verge to zero. A first-order optimal point of (1.2) satisfying the Mangasarian- Fromovitz constraint qualification (MFCQ) is a stationary point of the merit function Φµ(x) if ρ is smaller than a certain threshold value or the duals of (1.3) equal ρλ. Using two penalty parameters is mainly motivated by a better accuracy of the implemented algorithm. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 3

It is an important feature of optimization algorithms to detect infeasibility of the given problem. In such a case a first-order optimal point of (1.2) does not exist and the penalty parameter ρ will converge to zero resulting in the optimization of

min kc(x)k . (1.4) x≥0 2

The solution of (1.4) that is infeasible for (1.1) serves as a certificate of in- feasibility. The presented algorithm follows the idea of Fletcher [18] and Byrd et al. [8] to place the penalty parameter in front of the objective function or the Lagrangian function, respectively, instead of in front of the measure of constraint violation for better solver performance for infeasible problems. The proposed algorithm shares the following properties with other primal- dual penalty-interior-point algorithms (e.g. [1,10,15]): The step is a guaranteed descent direction for the merit function Φµ(x) and a rank-deficient Jacobian of the constraints at infeasible non-stationary points can be handled without modification of the Newton system. The latter avoids failure of global con- vergence for example for the optimization problem in W¨achter and Biegler [44]. An extension to the pure (quadratic) `2-penalty function are the aug- mented Lagrangian methods (e.g. [14,35]). Recently, primal-dual augmented Lagrangian methods have enjoyed an increased popularity. They have been studied by Armand and Omheni [2,3], Forsgren and Gill [20], Gertz and Gill [22], Gill and Robinson [23] and Goldfarb et al. [25]. These methods can re- move the perturbation of the KKT system caused by the penalty term by an appropriate update of the Lagrangian multipliers λ. This makes it unnecessary to calculate a further unperturbed step per iteration like in Chen and Goldfarb [11,12], and naturally leads to a quadratic rate of convergence to first-order optimal points of (1.2) and a superlinear rate in case of the nonlinear program (1.1). Our update of the Lagrangian multipliers λ differs from other augmented Lagrangian based algorithms (e.g. [2,3,13]), as it does not rely on a criterion that measures the reduction of the constraint violation. Instead, it is based on the dual information and is designed to be applied as often as possible when approaching the optimal solution. For step acceptance, instead of following recent research trends to avoid penalties and a filter like in Liu and Yuan [33] or Gould and Toint [30], we combine the two – the merit function and the filter mechanism – as line search criteria, of which at least one has to indicate progress for a trial iterate. Com- parable combinations have been proposed by Chen and Goldfarb [12] and Gould et al. [26,27]. The filter, originally introduced by Fletcher and Leyf- fer [19] significantly increases the flexibility of the step acceptance and, thus, is widely used by nonlinear programming solvers (e.g. [4,9,19,40,45]). Global convergence has been proved for several filter methods and usually depends on a further algorithm phase: the feasibility restoration. Due to the combination with the merit function, a feasibility restoration phase – which we believe to be a drawback of the filter approach – is not necessary for global convergence. 4 Renke Kuhlmann, Christof B¨uskens

A further advantage is that our filter entries do not depend on parameter choices, e.g. the barrier parameter µ. Other penalty-interior-point algorithms consider an `1-penalty, see e.g., Benson et al. [5], Boman [7], Curtis [15], Fletcher [18], Tits et al. [39], Gould et al. [29], Yamashita [46]. Many `1-penalty-interior-point algorithms reformulate the problem into a smooth one using additional elastic variables. However, for large-scale nonlinear programming this can be a disadvantage. Closely related are also the stabilized sequential quadratic programming methods like the works of Gill and Robinson [24] or Shen et al. [38]. The aim of this paper is to study the convergence properties of the pro- posed algorithm and its numerical performance. Therefore, we implemented the algorithm within the large-scale nonlinear programming solver WORHP. The paper is organized as follows. In Section2 we describe the algorithm in- cluding the general approach of primal-dual penalty-interior-point algorithms, the step calculation and the line search. The global and local convergence of the presented algorithm are shown in Section3 and Section4, respectively. Finally, in Section5 we perform numerical experiments using the CUTEst test set [28] to show the efficiency of the proposed algorithm and compare it to other solvers, in particular the interior-point method IPOPT [45] and the sequential quadratic programming algorithm of WORHP [9]. Notation Matrices are written in uppercase and vectors in lowercase. The i-th component of a vector x is denoted by x(i). A diagonal with the entries of a vector x on its diagonal has the same name in uppercase, i.e X := diag(x). The vector e stands for a vector of all ones with appropriate dimension. The norm k·k is the Euclidean norm k·k2 unless stated differently, e.g. k·k∞ is the maximum norm. The notation In(X) = (λ+, λ−, λ0) stands for the inertia of a matrix X, in particular (λ+, λ−, λ0) are the numbers of positive, negative and zero eigenvalues, respectively. We will denote the of a function n n h1 : R → R at the point x0 as ∇h1(x0) ∈ R , the Jacobian of a function n m n×m h2 : R → R as ∇h2(x0) ∈ R and the subdifferential of h1(x) at x0 as ∂h1(x0).

2 Algorithm Description

2.1 The Primal-Dual Penalty-Interior-Point Approach

The first-order optimality conditions of problem (1.2) are

∇f(x) + ∇c(x)λ − ν = 0 (2.1a) c(x) = 0 (2.1b) Xν − µe = 0, (2.1c) where λ ∈ Rm and ν ∈ Rn correspond to the Lagrangian multipliers of the nonlinear equality constraints and the inequality bound constraints, respec- tively. In the case of µ = 0, the conditions (2.1) are the optimality conditions Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 5

of (1.1) if x ≥ 0, z ≥ 0 are added. It is well-established to consider (2.1) as a homotopy method with µ → 0 for finding an optimal solution of (1.1). To derive the first-order optimality conditions for problem (1.3), we con- sider the generic function

Φ(x) := ρϕ(x) + h(c(x)) with c as above, the smooth function ϕ : Rn → R and the non-smooth but convex function h : Rm → R. The first-order necessary optimality condition is that if a point x minimizes Φ(x), then there exists y ∈ ∂h such that

ρ∇ϕ(x) + ∇c(x)y = 0, see Fletcher [17]. In case of problem (1.3), we have h(c) = ρλ>c + τ kck and, thus, ( c if c > 0 ∂h(c) = ρλ + τ kck (2.2) {g ∈ Rm | kgk ≤ 1} if c = 0. We can transform the condition y ∈ ∂h to c(x) − τ −1 kc(x)k (y − ρλ) = 0 together with ky − ρλk ≤ τ and retrieve the first-order optimality conditions for problem (1.3):

ρ∇f(x) + ∇c(x)y − z = 0 (2.3a) c(x) − τ −1 kc(x)k (y − ρλ) = 0 (2.3b) ky − ρλk ≤ τ (2.3c) Xz − ρµe = 0 (2.3d)

The optimality conditions (2.3) can be interpreted as scaled and perturbed optimality conditions (2.1) of the barrier problem (1.2) with a scaling factor ρ and a perturbation τ −1 kc(x)k (y − ρλ) with size smaller or equal to kc(x)k. This perturbation vanishes if either the constraint violation kc(x)k is zero or the duals are chosen to be y = ρλ. This leads to the following propositions that formally state the relation of first-order optimal points of problem (1.2) and (1.3) similarly to [15].

Proposition 2.1 Let τ > 0, ρ > 0, µ > 0 and let (¯x, y,¯ z¯) be a first-order optimal point of problem (1.3), i.e. equations (2.3) hold. If kc(¯x)k = 0 and (λ,¯ ν¯) = (¯y/ρ, z/ρ¯ ), then (¯x, λ,¯ ν¯) is first-order optimal for problem (1.2).

Proposition 2.2 Let µ > 0. If (¯x, λ,¯ ν¯) is a first-order optimal point of (1.2), then for all penalty parameters ρ > 0 and τ > 0, the point (¯x, y,¯ z¯) with (¯y, z¯) = (ρλ,¯ ρν¯) is first-order optimal for (1.3).

The Propositions 2.1 and 2.2 validate our approach of optimizing prob- lem (1.3) with appropriate choices of the penalty parameters ρ and τ and Lagrangian multipliers λ for finding a first-order optimal point of the barrier problem (1.2). This in turn yields a solution of (1.1) for µ → 0. A difference 6 Renke Kuhlmann, Christof B¨uskens

to other optimization algorithms is that our primal-dual algorithm works with the possibly scaled multipliers y instead of the multipliers λ of problem (1.2). For penalization of the constraints there are two options with different properties: increasing τ or decreasing ρ. While ρ scales the multipliers y itself, τ scales their distance to the multipliers λ, see (2.3a) and (2.3c). If a huge penalization of the constraints is needed, i.e. τ being very big or ρ being very small, both approaches have a disadvantage. On the one hand, if the problem (1.1) is infeasible, letting τ tend to infinity would lead to divergence of the multipliers y (cf. Chen and Goldfarb [10]), which can be harmful in practical implementations. On the other hand, a very small ρ can cause difficulties in finding the optimal solution with respect to a given tolerance due to the scaling (cf. Curtis [15]). That is why we propose the combination of both. All together, the parameters ρ, τ and λ form a dual trust-region algorithm.

2.2 Step Computation

Instead of applying Newton’s method directly to the optimality conditions (2.3) at an iterate (xk, yk), we first rewrite the feasibility condition (2.3b) to

c(xk) + σk(ρkλk − yk) = 0 and fix σk = kc(xk)k /τk throughout the iteration k. The dual trust-region con- dition (2.3c) is omitted for the step computation. For given penalty parameter ρk, the Newton iterate then yields the linear system       Hk ∇c(xk) −I ∆xk ρk∇f(xk) + ∇c(xk)yk − zk > ∇c(xk) −σkI 0  ∆yk  = −  c(xk) + σk(ρkλk − yk)  , (2.4) Zk 0 Xk ∆zk Xkzk − µρke

2 Pm (i) 2 (i) where Hk := ρk∇xxf(xk) + i=1 yk ∇xxc (xk) is the Hessian of the La- grangian function with respect to the multipliers yk or an approximation to it. In case of kc(xk)k > 0 this linear equation system is equivalent to the one of a primal-dual augmented Lagrangian method with a quadratic `2-penalty function (cf. Armand and Omheni [3]) and penalty parameter adaptively set to σk = kc(xk)k /τ. Because the iterates xk are kept strictly feasible throughout the optimization, we can eliminate the last equation of the Newton system (2.4) and solve the smaller linear equation system    −1  ∆xk ρk∇f(xk) + ∇c(xk)yk − µρkXk e Mk = − (2.5a) ∆yk c(xk) + σk(ρkλk − yk)  −1  Hk + Xk Zk ∇c(xk) Mk := > (2.5b) ∇c(xk) −σkI −1 −1 ∆zk = µρkXk e − zk − Xk Zk∆xk. (2.5c) Greif et al. [31] investigate eigenvalue bounds for the two matrices of (2.4) and (2.5) if the Hessian Hk is regularized or σk is constantly zero and conclude Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 7

that (2.4) is better conditioned when µ becomes very small. However, in the context of practical implementations for large-scale programming the system (2.5) is generally preferred (cf. [41,45]). The next result shows that, provided the is modified ap- propriately, the step computed by (2.4) will always yield a descent direction for the merit function Φµ.

Proposition 2.3 Let τk > 0, ρk ≥ 0, µ ≥ 0 and (∆xk, ∆yk) be a solution of the linear system (2.5). Then,

> ∇Φµ(xk) ∆xk > > = ρk∇ϕµ(xk) ∆xk − ρkλk c(xk) − τk kc(xk)k > + (c(xk) + σkρkλk) (yk + ∆yk − ρkλk) ( > −1 −(∆xk) (Hk + Xk Zk)∆xk, if kc(xk)k = 0 = > −1 −1 > −(∆xk) Hk + Xk Zk + σk ∇c(xk)∇c(xk) ∆xk, if kc(xk)k > 0

Furthermore, if the inertia of (2.5) satisfies In(Mk) = (n, m, 0) and the op- timality conditions (2.3) are not satisfied at the current iterate (xk, yk, zk), then (∆xk, ∆yk) is a descent direction for the merit function Φµ at xk, i.e. > ∇Φµ(xk) ∆xk < 0. Proof The proof is similar to the one of Lemma 3.2 in [10] but is extended here for the case of λ not being constantly zero. We split the proof into the two cases kc(xk)k > 0 and kc(xk)k = 0:

Case kc(xk)k > 0: Then we have > ∇Φµ(xk) ∆xk −1 > −1 > > = ρk ∇f(xk) − µXk e + ∇c(xk)λk ∆xk + σk c(xk) ∇c(xk) ∆xk > −1 > > = ρk∇ϕµ(xk) ∆xk + σk (σkρkλk + c(xk)) ∇c(xk) ∆xk > −1 > = ρk∇ϕµ(xk) ∆xk + σk (σkρkλk + c(xk)) (σk (yk + ∆yk − ρkλk)) −1 > − σk (σkρkλk + c(xk)) c(xk) > > = ρk∇ϕµ(xk) ∆xk + (c(xk) + σkρkλk) (yk + ∆yk − ρkλk) > − ρkλk c(xk) − τk kc(xk)k , where the third equality follows by applying the second equation of (2.5a). This proves the first equation of the Proposition. In addition, we also have

> ∇Φµ(xk) ∆xk −1 > −1 > > = ρk ∇f(xk) − µXk e + ∇c(xk)λk ∆xk + σk c(xk) ∇c(xk) ∆xk > −1 = −(∆xk) (Hk + Xk Zk)∆xk −1 > > − yk + ∆yk − ρkλk − σk c(xk) ∇c(xk) ∆xk > −1 −1 > > = −(∆xk) (Hk + Xk Zk)∆xk − σk (∆xk) ∇c(xk)∇c(xk) ∆xk, 8 Renke Kuhlmann, Christof B¨uskens

where the second equality follows from the first and the third equality from the second equation of (2.5a). > Case kc(xk)k = 0: Then we have σk = 0 and ∇c(xk) ∆xk = 0 from the second equation of (2.5a) and, thus,

kc(x + t∆x )k − kc(x )k lim k k k t↓0 t 1 m  (i) (i) 2! 2 X c (xk + t∆xk) − c (xk) = lim t↓0 t i=1 > = ∇c(xk) ∆xk = 0.

Using this together with the definition of the directional and the > fact that c(xk) = 0, σk = 0 and, again, ∇c(xk) ∆xk = 0 yields

> ∇Φµ(xk) ∆xk  > ϕµ(xk + t∆xk) − ϕµ(xk) λk (c(xk + t∆xk) − c(xk)) = lim ρk + ρk t↓0 t t kc(x + t∆x )k − kc(x )k + τ k k k t > > > = ρk∇ϕµ(xk) ∆xk + ρkλk ∇c(xk) ∆xk > = ρk∇ϕµ(xk) ∆xk > > = ρk∇ϕµ(xk) ∆xk − ρkλk c(xk) − τk kc(xk)k > + (c(xk) + σkρkλk) (yk + ∆yk − ρkλk) .

This proves the first equation of the Proposition. Furthermore, using the first equation of (2.5a) we get

> > ∇Φµ(xk) ∆xk = ρk∇ϕµ(xk) ∆xk > −1 > = −(∆xk) (Hk + Xk Zk)∆xk − (y + ∆yk) ∇c(xk) ∆xk > −1 = −(∆xk) (Hk + Xk Zk)∆xk.

Combining the two cases, the two equations of the Proposition have been proven. Using Lemma 3.1 of [10], In(Mk) = (n, m, 0) yields the positve defi- −1 −1 −1 > niteness of Hk +Xk Zk or Hk +Xk Zk +σk ∇c(xk)∇c(xk) for kc(xk)k = 0 or kc(xk)k > 0, respectively. If the optimality conditions (2.3) are not satisfied, > the step (∆xk, ∆yk) is not zero. Thus, we have ∇Φµ(xk) ∆xk < 0. ut

Proposition 2.3 states that the step calculated by (2.5) is a descent direction if the inertia of (2.5) satisfies In(Mk) = (n, m, 0). If this is not the case, it can −1 be achieved by regularizing the Hessian Hk + Xk Zk, i.e. adding a multiple −1 of the identity to Hk + Xk Zk until In(Mk) = (n, m, 0) holds (cf. [10,42,45]). Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 9

Adding a further vI with a small v > 0 then guarantees the conditions (cf. [10, Lemma 3.1])

> 2 −(∆xk) Hk∆xk ≤ −v k∆xkk , or (2.7a) > −1 −1 > 2 −(∆xk) Hk + Xk Zk + σk ∇c(xk)∇c(xk) ∆xk ≤ −v k∆xkk , (2.7b) in case of kc(xk)k = 0 or kc(xk)k > 0, respectively. This strategy can also be interpreted as a proximal algorithm or primal trust-region algorithm (cf. Parikh and Boyd [37]). It can only fail for this algorithm if the current iterate is feasible and the MFCQ fails to hold. In other words, this algorithm has the important property of handling problems with rank-deficient Jacobians ∇c(xk) at infeasible non-stationary points due to the automatic dual regularization if σk > 0, i.e. the term −σkI in the (2,2)-block of Mk. The linear equation system (2.5) reveals another important feature of penalty-interior-point algorithms that rely on an augmented Lagrangian ap- proach: Choosing λk = yk/ρk reduces the system (2.5) to a regularized Newton method applied to the optimality conditions of the barrier problem (1.2), which is relevant for the fast local convergence.

2.3 Computation of the Step Sizes

x x x After the step computation a step size αk ∈ (0, αmax] with αmax ∈ (0, 1] has to be determined to update the primal iterates by

x xk+1 ← xk + αk∆xk. (2.8)

The step size has to guarantee that the iterate xk+1 remains strictly positive, which is done by a fraction-to-the-boundary rule with a parameter sequence {ηk} with ηk ∈ (0, 1) and ηk → 1:

x αmax := max {α ∈ (0, 1] | xk + α∆xk ≥ (1 − ηk)xk} (2.9) To measure progress towards the optimal solution, two different approaches are combined: a filter and a merit function. Checking a reduction in the merit function Φµ is a straightforward cri- terion for penalty-interior-point algorithms. In particular, for a trial iterate x xk + αk∆xk we check the Armijo condition x x > Φµ(xk + αk∆xk) − Φµ(xk) ≤ ωαk∇Φµ(xk) ∆xk (2.10)

1  with ω ∈ 0, 2 . Since Proposition 2.1 guarantees that the step ∆xk is a descent direction for the merit function Φµ(xk) with an appropriate Hessian x x regularization, a step size αk ∈ (0, αmax) that satisfies (2.10) exists. However, as pointed out by Fletcher and Leyffer [19] a bad choice of the penalty parameter can slow down the performance of the optimization method. 10 Renke Kuhlmann, Christof B¨uskens

For example if the penalty parameter τk is too large or ρk too small, respec- tively, the influence of the objective function may be damped out. Analogously this may occur to the constraint violation if τk is too small or ρk too large. To avoid this and to increase the flexibility of the step acceptance, it is said to be acceptable if one of the two measures, constraint violation θ(x) = kc(x)k or objective function, improve. This is the basic idea of the filter method, which can be interpreted as checking a reduction in the merit function Φµ for either ρk = 0 or τk = 0. The filter is defined as the prohibited region in the two-dimensional space of constraint violation θ and objective function value f and is initially set to

 2 F0 ← (θ, f) ∈ R | θ ≥ θmax . (2.11)

with a maximum allowed constrained violation θmax > 0. Unlike other filter methods we do not use ϕµ(x) as objective function, but the original f(x) to avoid the dependence of the filter data from specific parameter choices, e.g. x µ or λk. A trial iterate xk + αk∆xk is accepted by the filter, if it produces a sufficient reduction of constraint violation or objective function with respect to a filter envelope δk > 0, regarding the current iterate xk, i.e. if

x θ(xk + αk∆xk) + γθδk ≤ θ(xk), or (2.12a) x f(xk + αk∆xk) + γf δk ≤ f(xk) (2.12b)

holds where γθ, γf > 0, and regarding the current filter:

x x (θ(xk + αk∆xk + γθδk), f(xk + αk∆xk + γf δk)) ∈/ Fk (2.13)

The filter envelope δk measures the error of the optimality conditions of prob- lem (1.2) and is defined as follows:

1  2 2 −1 2 2 δk := kρk∇f(xk) + ∇c(xk)yk − zkk2 + kc(xk)k2 + zk − ρkµXk e 2 (2.14)

A switching condition, usually a mandatory feature of filter methods, is not required due to the choice of the filter envelope and the combination with the merit function. If the filter accepts the trial iterate, it is augmented by

 2 Fk+1 ← Fk ∪ (θ, f) ∈ R | θ ≥ θ(xk) and f ≥ f(xk) (2.15) to avoid cycling of the iterates. If neither of the two criteria, the filter or the merit function, accepts the x maximum step size αmax, similar to [12] we try to refine the step by a second- order-correction step

" #  −1  ∆xdk ρk∇f(xk) + ∇c(xk)yk − µρkXk e Mk = − x x > , (2.16) ∆ydk c(xk + αmax∆xk) − αmax∇c(xk) ∆xk Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 11

and apply a fraction-to-the-boundary rule for ∆xdk, i.e.

x n o αbmax := max α ∈ (0, 1] | xk + α∆xdk ≥ (1 − ηk)xk . (2.17)

The filter conditions for the second-order-correction that correspond to (2.12) and (2.13) are

x θ(xk + αbmax∆xdk) + γθδk ≤ θ(xk) or (2.18a) x f(xk + αbmax∆xdk) + γf δk ≤ f(xk), and (2.18b)  x x  θ(xk + αbmax∆xdk + γθδk), f(xk + αbmax∆xdk + γf δk) ∈/ Fk, (2.18c) and the Armijo condition corresponding to (2.10) is

x x > Φµ(xk + αbmax∆xdk) − Φµ(xk) ≤ ωαbmax∇Φµ(xk) ∆xk. (2.19)

x If one of the two accepts the step, we update xk+1 by xk+1 ← xk + αbmax∆xdk instead of (2.8). Since this is a strategy to improve local convergence, it is just applied if λk = yk/ρk holds in the current iteration. Otherwise, or if the second-order-correction step does not help, the step ∆xdk will be rejected and x x a backtracking line search αk ← βαk with β ∈ (0, 1) is applied and the filter and Armijo condition are checked for ∆xk and the updated step size again. The dual iterates (yk, zk) are updated by

yk+1 ← yk + ∆yk (2.20a) z zk+1 ← zk + αk∆zk. (2.20b) with a fraction-to-the-boundary rule

z α := max {α ∈ (0, 1] | zk + α∆zk ≥ (1 − ηk)zk} (2.21)

(i) h (i) (i) i and a further projection of zk+1 into the interval ρkµ/(κzxk+1), κzρkµ/xk+1 by

( ( ) ) κ ρ µ ρ µ z(i) ← max min z(i) , z k , k i = 1, . . . , n. (2.22) k+1 k+1 (i) (i) xk+1 κzxk+1

with κz > 1. While the fraction-to-the-boundary rule again guarantees the (i) strict positivity of the iterate zk+1, the projection avoids that zk+1 deviates (i) too much from ρkµ/xk+1 which has to be satisfied in an optimal solution of (1.2). 12 Renke Kuhlmann, Christof B¨uskens

2.4 Update of Lagrangian Multipliers and Penalty Parameters

For the update of the Lagrangian multipliers λ and the penalty parameters ρ and τ we check if problem (1.3) has been solved to a certain accuracy, i.e. we check

kρk−1∇f(xk) + ∇c(xk)yk − zkk∞ ≤ εk (2.23a) c(x ) − τ −1 kc(x )k (y − ρ λ ) ≤ ε (2.23b) k k−1 k k k−1 k−1 ∞ k

kXkzk − ρk−1µek∞ ≤ εk, (2.23c)

for a tolerance εk > 0 that converges to zero if (2.23) is satisfied infinitely many times. In the following, we define the maximum of the left sides as Eµ,ρ,τ,λ(xk, yk, zk). If the left side of (2.23) is equal to zero, we have 0 ∈ ∂Φµ(xk) meaning that xk is a stationary point of the merit function Φµ. However, this condition is not sufficient for a first-order optimal solution of (1.2). In particular, Proposition 2.1 further requires the constraint violation kc(xk)k to be zero. This can be checked by the omitted dual trust-region condi- tion kyk − ρk−1λk−1k ≤ τk−1, see (2.3c). From (2.2) we can conclude that if kyk − ρk−1λk−1k < τk−1 holds, the case kc(xk)k = 0 in (2.2) will be true and if kyk − ρk−1λk−1k = τk−1 we have kc(xk)k ≥ 0. In the former case Propo- sition 2.1 suggest to update the Lagrangian multipliers λ and in the latter case to update the penalty parameter ρ or τ trying to avoid the case of kc(xk)k > 0. In particular, we update the penalty parameter ρ or τ if

kyk − ρk−1λk−1k > κyτk−1 (2.24)

with κy ∈ (0, 1) holds in addition to (2.23). Otherwise, if

kyk − ρk−1λk−1k ≤ κyτk−1 (2.25) is satisfied, an update of the Lagrangian multipliers

−1 (λk, νk) ← ρk (yk, zk). (2.26) is applied. This update strategy of the multipliers λ differ from other aug- mented Lagrangian based algorithms as it does not rely on a further criterion that measures a reduction in the constraint violation (see for example Armand and Omheni [2,3] or Conn et al. [13]). The penalty parameter update is defined as

ρk ← χρρk−1, or (2.27a)

τk ← min{τmax, χτ τk−1}. (2.27b) with χρ ∈ (0, 1), χτ > 1 and τmax > 0. As mentioned in the end of Section 2.1, the updates of ρk and τk both may have a disadvantage. We prefer to update τk using (2.27b) if problem (1.1) is feasible and to update ρk using (2.27a) otherwise. In the feasible case, this would avoid the scaling of the optimality conditions of (1.1) and (1.2), which can decrease the accuracy of the solver. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 13

And in the infeasible case, this would enable the solver to satisfy the optimality conditions of the feasibility problem (1.4) to a better accuracy. Because it is unknown in advance if (1.1) is feasible or not, we propose the following strategy for a penalty update: if τk < τmax holds, then we update τk using (2.27b) and, otherwise, we update ρk using (2.27a). This way the algorithm switches to the update of ρk if it gets more likely that problem (1.1) is infeasible. With the threshold τmax it is possible to adjust the timing of the switch.

2.5 The Algorithm

In this section we formally state our penalty-interior-point algorithm. We split the presentation into AlgorithmA (inner algorithm) that solves problem (1.2) for a given barrier parameter µ and AlgorithmB (outer algorithm) that re- peatedly calls AlgorithmA with decreasing µ → 0. After an initialization in Step A-1 AlgorithmA enters the optimization loop, which starts with the optimality and infeasibility check at Step A-2 and Step A-3, respectively. Subsequently, the optimality of the `2-exact augmented Lagrangian subproblem is evaluated at Step A-4 and Step A-5 to update the penalty parameters ρk or τk, or the Lagrangian multipliers (λk, νk). The search direction is calculated in Step A-7 and its step size in Step A-8. It remains the update of the iterate in Step A-9 and the finalization of the iteration in Step A-11. The most expensive part of AlgorithmA is the factorization of the linear equation system in Step A-7. Note, that this factorization can be reused for the calculation of the second-order-correction in Step A-8.4. AlgorithmB starts with an initialization in Step B-1. Within the opti- mization loop, the optimality check is performed first in Step B-2. If it is not satisfied, AlgorithmA is called in√ Step B-3, which solves (1.2) up to a toler- ance εµj = δµj with δ ∈ (0, min{ n, δµ}) with δµ > 0. It determines how to update the iterate in Step B-4. For the barrier update in Step B-5 we apply the well-established update rule

 κµ µj+1 ← min χµµj, µj (2.28) κη ηj+1 ← max{ηmin, 1 − µj } (2.29) with χµ ∈ (0, 1), κµ ∈ (1, 2), ηmin ∈ (0, 1) and κη > κµ − 1 to eventually update µ at a superlinear rate, see [12]. Step B-6 finalizes the iteration of AlgorithmB.

3 Global Convergence

This algorithm can be seen as an extension or modification of the `2-penalty- interior-point algorithm in [10,12]. Therefore, the proof of global convergence is based on these works and we focus on the differences that arise due to the augmented Lagrangian approach, the different penalty parameter and its updates. Throughout this section we make the following assumptions. 14 Renke Kuhlmann, Christof B¨uskens

Algorithm A Inner Algorithm A-1: (Initialization) Set k ← 0. Choose a starting point (x0, y0, z0) with x0 > 0 and initial penalty parameters ρ0 ∈ (0, 1] and τ0 > 0. Select a tolerance εtol > 0, a sequence ζk → 0 monotonically with ζk > 0 and an ` > 0. Choose a barrier parameter µ > 0 and a switching parameter τmax > 0. Set θmax > 0 and initialize the filter by (2.11). Choose a sequence ηk → 1 with ηk ∈ (0, 1). Furthermore, select ω ∈ (0, 1/2), κy ∈ (0, 1), κz > 1, κH > 0, κε ∈ (0, 1), χρ ∈ (0, 1), χτ > 1, γf > 0, γθ > 0 and β ∈ (0, 1). A-2: (Optimality check) If (2.1) is satisfied to a tolerance εtol, then STOP; xk is a first- order optimal point of (1.2). A-3: (Infeasibility check) If (2.3) with ρ = 0 is satisfied to a tolerance εtol, then STOP; xk is a first-order optimal point of (1.4) that is infeasible for (1.1) and (1.2). A-4: (Penalty update) If (2.23), (2.24) and τk < τmax are satisfied update τk ← min{τmax, χτ τk−1}, set ρk ← ρk−1 and reduce the tolerance εk. Otherwise, if (2.23), (2.24) and τk ≥ τmax are satisfied, set τk ← τk−1, update ρk ← χρρk−1 and update the tolerance εk. Otherwise, set τk ← τk−1 and ρk ← ρk−1. A-5: (Multiplier update) If (2.23) and (2.25) is satisfied, update Lagrangian multi- −1 pliers by (λk, νk) ← ρk (yk, zk) and update the tolerance εk. Otherwise set (λk, νk) ← (λk−1, νk−1). A-6: (Hessian regularization) Modify the Hessian Hk by adding a multiple of the identity until Mk has the correct inertia In(Mk) = (n, m, 0). If necessary add a further κH I such that (2.7) holds. If the Hessian regularization fails, then STOP; xk is feasible and the MFCQ fails to hold. A-7: (Search direction) Compute a search direction (∆xk, ∆yk, ∆zk) from (2.5). A-8: (Line search) x x x A-8.1: Apply fraction-to-the-boundary rule (2.9) to get αmax. Set αk ← αmax. x A-8.2: (Filter) If (2.12) and (2.13) are satisfied, accept the current step size αk, augment the filter by (2.15) and go to Step A-9. A-8.3: (Merit function) If the Armijo condition (2.10) is satisfied, go to Step A-9. x x A-8.4: (Second-order-correction) If λk 6= yk/ρk or αk 6= αmax, go to Step A-8.8. x Otherwise, calculate second-order-correction step ∆xdk from (2.16) and αbmax from (2.17). A-8.5: (Second-order-correction / Filter) If (2.18) is satisfied, augment the filter by (2.15) and go to Step A-8.7. A-8.6: (Second-order-correction / Merit) If the Armijo condition (2.19) is satisfied, go to Step A-8.7. Otherwise, reject ∆xdk and go to Step A-8.8. A-8.7: (Second-order-correction / Primal update) Update primal iterate by x xk+1 ← xk + αbk∆xdk and go to Step A-10. A-8.8: Reduce step size by setting αk ← βαk and go back to Step A-8.2. x A-9: (Primal update) Update primal iterate by xk+1 ← xk + αk∆xk. A-10: (Dual update) Use the fraction-to-the-boundary rule (2.21) to get αz. Update dual z iterate by yk+1 ← yk + ∆yk and zk+1 ← zk + αk∆zk. Apply the dual projection of (2.22). A-11: (k increment) Set k ← k + 1 and go to Step A-2.

Assumptions G

G1. The functions f and c are real valued and twice continuously differentiable. G2. The primal iterates {xk} are bounded. G3. The modified Hessians {Hk} are bounded.

We will use the definition of a Fritz-John point within the global conver- gence analysis. The reader is referred to [10, Definition 2.1-2.4] for details. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 15

Algorithm B Outer Algorithm B-1: (Initialization) Set j ← 0. Choose a starting point (x0, y0, z0) with x0 > 0 and an initial barrier parameter µ0 > 0. Select a tolerance εtol. Choose initial penalty parameters ρ0 = 1 and τ0 > 0. Furthermore, select χµ ∈ (0, 1), κµ ∈ (1, 2), δµ > 0, ηmin ∈ (0, 1), κη > κµ − 1. B-2: (Optimality check) If (2.1) with µ = 0 is satisfied to a tolerance εtol, then STOP; xj is a first-order optimal point of (1.1).

B-3: (Inner algorithm) Call AlgorithmA with the barrier parameter µj , the tolerance εµj , the fraction-to-the-boundary parameter ηj and the initial guesses (xj , ρj λj , ρj νj ) for the iterates and ρj and τj for the penalty parameters. AlgorithmA returns the solutionx ¯j , multipliers (λ¯j , ν¯j ) and penalty parametersρ ¯j andτ ¯j . B-4: (Iterate update) Update iterate (xj+1, λj+1, νj+1) ← (¯xj , λ¯j , ν¯j ) and set ρj+1 ← ρ¯j and τj+1 ← τ¯j . B-5: (Barrier update) Update the barrier parameter µj+1 by (2.28) and the fraction-to- the-boundary parameter ηj+1 by (2.29). B-6: (k increment) Set j ← j + 1 and go to Step B-2.

3.1 Global Convergence of the Inner AlgorithmA

For the global convergence analysis we assume that the AlgorithmA does not terminate, i.e. it produces an infinite sequence of iterates. We will write k− to refer to the iteration right before iteration k of the algorithm. In the first part of the global convergence analysis we study the convergence in case the penalty parameter ρk is updated infinitely many times, i.e. it tends to zero. Note, that we do not have to consider the infinite update of the penalty parameter τk as it is just updated up to the threshold τmax. We therefore assume without loss of generality that τk =τ ¯ ≤ τmax for all k. We begin with a preliminary result stating that in case of infeasible problems the multipliers (λk, νk) will get updated just finitely many times.

Lemma 3.1 Suppose AssumptionsG hold. Let K be an index set such that the necessary conditions for an update of the penalty parameter ρk or multipliers (λk, νk)(2.23) are satisfied. Further assume that {xk}K converges to a point x¯ with kc(¯x)k > 0. Then the multipliers (λk, νk) are updated finitely many times.

Proof Because xk → x¯ for k ∈ K → ∞ with kc(¯x)k > 0 we have kc(xk)k > 0 for large k ∈ K. With (2.23b) we can conclude, that then

c(xk) εk τ¯ − yk + ρk− λk− ≤ τ¯ . kc(xk)k kc(xk)k

It follows yk − ρkλk → τc¯ (¯x)/ kc(¯x)k for k ∈ K → ∞ since kc(xk)k is bounded by Assumption G1 and G2 and εk tends to zero. Subsequently, we have

yk − ρk− λk− → τ.¯

For an index k0 ∈ K large enough we have that yk − ρk− λk− ∈ (κyτ,¯ τ¯] for all k ∈ K, k ≥ k0. It follows that for k ≥ k0 (2.25) is violated and no update of the multipliers (λk, νk) can be performed. ut 16 Renke Kuhlmann, Christof B¨uskens

Lemma 3.1 can be used to study the possible outcomes if the penalty parameter ρk is updated infinitely many times.

Lemma 3.2 Suppose AssumptionsG hold. If the penalty parameter ρk is de- creased infinitely many times, then it exists an index set K such that either of the following holds:

1. The sequence (λk, νk) is updated finitely many times and {(xk, zk/τ¯)}K converges to a KKT point (¯x, z/¯ τ¯) of the feasibility problem (1.4) that is infeasible for (1.1). The sequence {yk}K converges to τc¯ (¯x)/ kc(¯x)k. 2. The sequence {xk}K converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ.

Proof Let K be an index set such that (2.23) holds, the conditions that are necessary for updates of the penalty parameter ρk and the multipliers λk. The index set K is infinite because ρk is updated infinitely many times by assumption. This implies ρk → 0 for k ∈ K → ∞ since χρ ∈ (0, 1). With 0 Assumption G2 it exist an index set K ⊆ K such that xk → x¯ > 0 for k ∈ K0 → ∞. We have to distinguish between two cases: ¯ Case kc(¯x)k > 0: Then, by Lemma 3.1 it exists an index k0 such that λk = λ for all k ≥ k0. With

c(xk) c(xk) τ¯ − yk = τ¯ − yk + ρk− λk− − ρk− λk− kc(xk)k kc(xk)k

c(xk) ≤ τ¯ − yk + ρk− λk− + ρk− λk− kc(xk)k εk ¯ ≤ τ¯ + ρk− λ kc(xk)k

−1 0 it follows that yk → τ¯ kc(¯x)k c(¯x). Now, by letting k ∈ K → ∞ in (2.23a) and (2.23c) yields kc(¯x)k−1 ∇c(¯x)c(¯x)−τ¯−1z¯ = 0 andτ ¯−1X¯Ze¯ = 0, respectively. Sincex ¯ > 0 andz ¯ ≥ 0, it follows that (¯x, z/¯ τ¯) is a KKT point of the feasibility problem minx≥0 kc(x)k that is infeasible for problem (1.1). Case kc(¯x)k = 0: Let K00 ⊂ K0 be the index set of iterations at which the 00 penalty parameter ρk is updated. For all k ∈ K (2.24) is satisfied and thus kyk − ρk−1λk−1k > 0. Then there exists (¯y, z¯) such that

−1 k(yk − ρk−1λk−1, zk)k (yk, zk) → (¯y, z¯)

with k(¯y, z¯)k = 1. Dividing (2.23a) and (2.23c) by k(yk − ρk−1λk−1, zk)k 00 and letting k ∈ K → ∞ yields ∇c(¯x)¯y − z¯ = 0 and X¯Ze¯ = 0 since ρk 00 and εk converge to zero for k ∈ K → ∞. Because of k(¯y, z¯)k = 1 and kc(xk)k = 0, it follows that (¯x, y,¯ z¯) is a Fritz-John point of problem (1.1) that fails to hold the MFCQ. ut

We next analyze the convergence if the penalty parameter ρk is bounded away from zero, but the duals (λk, νk) are updated infinitely many times. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 17

Lemma 3.3 Suppose AssumptionsG hold. If ρk is updated finitely many times and (λk, νk) is updated infinitely many times, then it exists an index set K such that either of the following holds:

1. The sequence {(yk, zk)}K is bounded and {(xk, λk, νk)}K converges to a first-order optimal point of problem (1.2). 2. The sequence {(yk, zk)}K is unbounded and {xk} converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ.

Proof Let K be an index set such that (λk, νk) = (yk, zk)/ρk for all k ∈ K. The index set K is infinite because (λk, νk) is updated infinitely many times by assumption. This implies, that for all k ∈ K (2.23) must be satisfied. With Assumption G2 and the assumption that ρk is updated finitely many times, it 0 0 exists an index set K ⊆ K such that xk → x¯ > 0 and ρk → ρ¯ for k ∈ K → ∞. Furthermore, we have that kc(¯x)k = 0 since otherwise it would contradict Lemma 3.1. We now have to distinguish between two cases:

00 0 Case {(yk, zk)}K0 is bounded: Then it exists an index set K ⊂ K such that 00 ¯ (yk, zk) → (¯y, z¯) for k ∈ K → ∞. This implies λk → λ :=y/ ¯ ρ¯ and 00 νk → ν¯ :=z/ ¯ ρ¯ for k ∈ K → ∞. Dividing (2.23a) and (2.23c) by ρk−1 and letting k → ∞ yields 1 1 ∇f(¯x) + ∇c(¯x)¯y − z¯ = 0 ρ¯ ρ¯ 1 X¯z¯ − µe = 0 ρ¯

Together with c(¯x) = 0 it follows, that (¯x, λ,¯ ν¯) is a first-order optimal point of (1.2). 00 0 Case {(yk, zk)}K0 is unbounded: Then it exists an infinite index set K ⊂ K 00 such that k(yk, zk)k > 0 and (yk, zk)/ k(yk, zk)k → (¯y, z¯) for k ∈ K → ∞ with k(¯y, z¯)k = 1. Dividing (2.23a) and (2.23c) by k(yk, zk)k and letting k ∈ K00 → ∞ then yields ∇c(¯x)¯y − z¯ = 0 and X¯Ze¯ = 0. Because of k(¯y, z¯)k = 1 and c(¯x) = 0, it follows that (¯x, y,¯ z¯) is a Fritz-John point of problem (1.1) that fails to hold the MFCQ. ut

Next, we analyze the situation that the filter accepts the trial iterates infinitely many times.

Lemma 3.4 Suppose AssumptionsG hold. Let the penalty parameter ρk be updated finitely many times and all feasible limit points satisfy the MFCQ. If the filter is augmented infinitely many times, then the multipliers {(λk, νk)} are updated infinitely many times.

Proof Let K be an index set of iterations at which the filter accepts the trial iterate, i.e. it is augmented, and for which xk → x¯ and ρk =ρ ¯ holds for k ∈ K → ∞. The latter exists due to Assumption G2 and the assumption that ρk is updated just finitely many times. 18 Renke Kuhlmann, Christof B¨uskens

We next show, that δk of (2.14) converges to zero by contradiction. This part of the proof is similar to [36, Theorem 5.1]. Assume that it exists an 0 0 infinite K ⊂ K such that δk ≥ ε > 0 for k ∈ K . Due to Assumption G1 and G2 the sequences {|f(xk)|}K0 and {kc(xk)kK0 } are bounded. This implies 0 that the area in which filter entries (f(xk), kc(xk)k) for k ∈ K are located is bounded by F¯ := {(f, θ) | fL ≤ f ≤ fU , 0 ≤ θ ≤ θU }, where fL and fU are the lower and upper bounds of {f(xk)}K0 , respectively, and θU is the upper bound of {kc(xk)k}K0 . With every filter augmentation an area Fk+1 \Fk is added 2 2 to the filter, which is at least of size δk ≥ ε . Because of the monotonicity Fk ⊂ Fk+1, this is a contradiction to the boundedness of F¯. It follows δk → 0 and, thus,

kρ¯∇f(xk) + ∇c(xk)yk − zkk → 0 (3.1a)

kc(xk)k → 0 (3.1b) −1 zk − ρµX¯ k e → 0. (3.1c)

We now have to distinguish between two cases:

−1 Case {(yk, zk)}K0 is bounded: Then, dividing (3.1a) byρ ¯, multiplyingρ ¯ Xk, which is bounded, in (3.1c) and letting k ∈ K0 → ∞ in (3.1) yields that (2.23) will eventually be satisfied for every εk > 0. Since ρk is updated just finitely many times by assumption, (2.24) must be violated and (2.25) holds. This implies that {(λk, νk)}K0 is updated infinitely many times. 00 0 Case {(yk, zk)}K0 is unbounded: Then it exists an infinite index set K ⊂ K 00 such that k(yk, zk)k > 0 and (yk, zk)/ k(yk, zk)k → (¯y, z¯) for k ∈ K → ∞ with k(¯y, z¯)k = 1. Dividing (3.1a) by k(yk, zk)k, multiplying (3.1c) by −1 00 k(yk, zk)k Xk and letting k ∈ K → ∞ in (3.1) yields that

∇c(¯x)¯y − z¯ = 0, c(¯x) = 0 and X¯z¯ = 0.

Because of k(¯y, z¯)k = 1, it follows that (¯x, y,¯ z¯) is a Fritz-John point of problem (1.1) that fails to hold the MFCQ, a contradiction. ut

In the remainder of the global convergence analysis we study the case if both, the penalty parameter ρk and the filter, are updated just finitely many times.

Lemma 3.5 Suppose AssumptionsG hold. Further assume, that the penalty parameter ρk and the duals (λk, νk) are updated finitely many times and the filter is augmented finitely many times. Then {xk} is bounded away from zero and {zk} is bounded.

Proof The proof is by contradiction and similar to [10, Lemma 3.7] since λk is assumed to be bounded. Let K be an infinite index set of iterations at which the trial step is accepted by the Armijo condition (2.10). Now assume that it (j) exists an index j ∈ {1, . . . , n} such that xk ↓ 0 for k ∈ K → ∞. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 19

¯ There exists an index k0 and constantsρ ¯ and λ such that ρk =ρ ¯ and ¯ λk = λ hold for k ∈ K, k ≥ k0. Then, on the one hand, for k ≥ k0 it follows from the Armijo condition (2.10) that

Φµ,ρk,λk,τ¯(xk) = Φµ,ρ,¯ λ,¯ τ¯(xk) ≤ Φµ,ρ,¯ λ,¯ τ¯(xk0 ). (3.2)

On the other hand, however, as {|f(xk)|}K and {kc(xk)k}K are bounded due to Assumption G1 and G2, it follows that {Φµ,ρ,¯ λ,¯ τ¯(xk)}K → ∞ due to the barrier term, a contradiction to (3.2). Thus, {xk} is bounded away from zero and together with (2.22) if follows that {zk} is bounded. ut

Lemma 3.6 Suppose AssumptionsG hold, the penalty parameter ρk and the duals (λk, νk) are updated finitely many times and the filter is augmented finitely many times. Further assume that all feasible limit points satisfy the MFCQ. Then the sequences {(xk, yk, zk)}, {(∆xk, ∆yk, ∆zk)} and (∆xdk, ∆ydk) are bounded.

Proof Consider the linear equation system

−1 0 0 ρk Hk∆xk + ∇c(xk)λk − νk = −∇f(xk) (3.3a) > 0 ∇c(xk) ∆xk − ρkσkλk = −c(xk) − ρkσkλk (3.3b) 0 −1 −1 −1 νk = µkXk e − ρk Xk Zk∆xk (3.3c) which is equal to (2.5) and the linear equation system in [12, Section 1.1] per- > > turbed by (01×n, −σkρkλk ) and scaled with ρk. Since ρk is bounded away from zero and {Hk} is bounded by Assumption G3 we have that the matrix −1 −1 ρk (Hk + Xk Zk) is bounded by Lemma 3.5. Furthermore, due to the reg- ularization strategy in Step A-6 condition (2.7) holds. Then, since the right side of (3.3) is bounded due to Assumption G1 and G2 and the assumption that λk is updated finitely many times, we can apply the proof of Lemma 3.3 of [12], which then shows the boundedness of the sequences {(xk, yk, zk)}, {(∆xk, ∆yk, ∆zk)} and (∆xdk, ∆ydk). ut

Lemma 3.7 Suppose AssumptionsG hold. Let the penalty parameter ρk be updated finitely many times and all feasible limit points satisfy the MFCQ. If the filter is augmented finitely many times, then the multipliers {(λk, νk)} are updated infinitely many times.

Proof The proof is by contradiction. Assume that (λk, νk) are updated finitely many times. By Lemma 3.5, Lemma 3.6, Assumption G2 and the assumption that ρk is updated finitely many times, it exists an index set K such that xk → x¯ > 0, ∆xk → ∆x¯ and ρk =ρ ¯ for k ∈ K → ∞. Then, by the same proof as in [12, Lemma 3.8] it can be shown that ∆x¯ = 0 for k ∈ K → ∞. This holds analogously in case the second-order-correction step ∆xdk is used. Due to the boundedness of all components in (2.5) from Assumption G1 and G2 and Lemma 3.5 and 3.6, this implies that the left sides of (2.23) converge to zero, i.e. for every εk > 0 there is an iteration k ∈ K such that (2.23) holds. 20 Renke Kuhlmann, Christof B¨uskens

Since ρk =ρ ¯ for all k ∈ K the conditions (2.24) must be violated and thus (2.25) be satisfied. It follows that (λk, νk) is updated infinitely many times in K. ut Combining the results from Lemma 3.4 with Lemma 3.7 yields that if the penalty parameter ρk does not tend to zero, the iterates either converge to a Fritz-John point that fails to satisfy the MFCQ or the algorithm will update the Lagrangian multipliers (λk, νk) infinitely many times. This in turn together with Lemma 3.3 states the convergence to a first-order optimal point. However, if the ρk but not the constraint violation tend to zero, then by Lemma 3.2 the algorithm will converge to a first-order optimal point of the feasibility problem (1.4). The global convergence result of AlgorithmA for a fixed barrier parameter µ is formally stated in the following theorem. Theorem 3.8 Suppose AssumptionsG hold and AlgorithmA generates an infinite sequence of iterates. Then there exists an index set K with either of the following:

1. The penalty parameter {ρk}K tends to zero, the multipliers {(λk, νk)}K are updated finitely many times and {(xk, zk/τ¯)}K converges to a KKT point (¯x, z/¯ τ¯) of the feasibility problem (1.4) that is infeasible for (1.1). The sequence {yk}K converges to τc¯ (¯x)/ kc(¯x)k. 2. The penalty parameter {ρk}K is bounded away from zero, the multipliers {(λk, νk)}K are updated infinitely many times and {(xk, λk, νk)}K converges to a first-order optimal point of problem (1.2). 3. The sequence {xk}K converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ.

3.2 Global Convergence of the Outer AlgorithmB

The global convergence of the AlgorithmB result is the same as in Theo- rem 3.13 of [10]. Theorem 3.9 Suppose AlgorithmB generates an infinite sequence of iterates. Further suppose Assumption G2 holds with the same bound for every µj and Assumption G3 and G1 hold for every µj. Let there be a set {µj} with µj → 0 and assume that AlgorithmA terminates successfully for every µj. Then it exists an index set K for which one of the following holds:

1. The sequence {(λj, νj)}K is bounded and {(xj, λj, νj)}K converges to a first- order optimal point of (1.1). 2. The sequence {(λj, νj)}K is unbounded and {xj}K converges to a Fritz-John point of problem (1.1) that fails to hold the MFCQ. In summary, AlgorithmB always terminates successfully finding an optimal solution of (1.1) or the iterates converge within AlgorithmA to an optimal solution of (1.4) that is infeasible for (1.1) or to a Fritz-John point that fails to hold the MFCQ. While the second outcome serves as a certificate of local infeasibility, the latter outcome indicates that locally there may be no feasible first-order optimal point. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 21

4 Fast Local Convergence

The local convergence of the proposed algorithm resembles the one of Chen and Goldfarb [12] since both algorithms eventually switch to a regularized Newton method (in [12] by switching explicitly to that system and here by an update of the Lagrangian multipliers λ). Therefore, we briefly show the two extensions in the local convergence analysis of [12] that have to be done to handle also the presented penalty-interior-point algorithm based on the augmented Lagrangian function: first, the acceptance of the full second-order- correction step by the Armijo condition (2.19), and second, the update of the Lagrangian multipliers (λ, ν) in every iteration, both near the optimal solution. Let (¯x, λ,¯ ν¯) be a first-order optimal point of problem (1.1). Then we make the following assumptions for the local convergence analysis: Assumptions L L1. The functions f and c are real valued and twice continuously differentiable and the Hessian matrices ∇2f(x) and ∇2c(i)(x) for i = 1, . . . , m are locally Lipschitz continuous at x¯. L2. The linear independence constraint qualification (LICQ) holds: the gra- (j) dients of the active constraints and ei, i ∈ B := {j = 1, . . . , n|x¯ = 0} and ∇c(i)(¯x), i = 1, . . . , m are linearly independent. L3. The second-order sufficient conditions (SOSC) hold: it exists v > 0 such that m ! > 2 X (i) 2 (i) 2 d ∇xxf(xk) + yk ∇xxc (xk) d ≥ v kdk i=1

for all d ∈ Rn with d(i) = 0 for all i ∈ B and ∇c(¯x)>d = 0. L4. Strict complementarity holds: x¯ +z ¯ > 0.

4.1 Local Convergence of AlgorithmA

In the case of convergence to a first-order optimal point, we know from Theo- rem 3.8 that the penalty parameter ρk is bounded away from zero. For simplic- ity we assume w.l.o.g. that ρk = 1 and τk =τ ¯ for all k large enough. This also implies that λk = yk when updated in an iteration k. In this section we con- sider µ to be sufficiently small and w(µ) := (x(µ), y(µ), z(µ)) to be an optimal solution of (1.2) in a neighborhoodw ¯ that converges tow ¯ for µ → 0. Due to the LICQ of Assumption L2 the multipliers of w(µ) are unique. For improved readability, we introduce the notation wk := (xk, yk, zk) for the iterate and for the steps ∆wk := (∆xk, ∆yk, ∆zk) and ∆w[k := (∆xdk, ∆ydk, ∆zdk). The first result shows the quadratic convergence rate of the step wk + ∆wk 1 and wk + ∆w[k towards w(µ). It is proven in [12, Theorem 5.1] . Let N (w ¯) be

1 Note, that if λk = yk/ρk and ρk = 1 the step (∆xk, ∆yk, ∆zk) equals the step (∆xgk, ∆λgk, ∆ygk) of [12]. 22 Renke Kuhlmann, Christof B¨uskens

−1 a neighborhood ofw ¯ such that Mk ≤ M for all w ∈ N (w ¯) and M ∈ R, see [12].

Lemma 4.1 Suppose AssumptionsL hold. If wk ∈ N (w ¯) and λk = yk/ρk, then the following is satisfied:

 2 1. kwk + ∆wk − w(µ)k = O kwk − w(µ)k   2 2. wk + ∆wdk − w(µ) = O kwk − w(µ)k 3. k∆wkk = Ω (kwk − w(µ)k)

4. ∆wdk = Ω (kwk − w(µ)k)

Eventually a full second-order-correction step is accepted by the Armijo condition, which is formally stated in the following.

1  Lemma 4.2 Suppose AssumptionsL hold. Let ω ∈ 0, 2 . If µ is sufficiently small, kwk − w(µ)k = o(µ) and λk = yk/ρk, then

> Φµ(xk + ∆xdk) − Φµ(xk) ≤ ω∇Φµ(xk) ∆xk.

Proof Because µ is sufficiently small and kwk − w(µ)k = o(µ), it holds that wk ∈ N (w ¯). The proof of this lemma is similar to [12, Theorem 5.4] and we can use its result for estimating the change in the [12, Theorem 5.4 equations (5.19) - (5.37)]. It states, that for kc(xk)k > 0

ϕµ(xk + ∆xdk) − ϕµ(xk) 1  1 ≤ − ω¯ ∇ϕ (x )>∆x + c(x )>(y + ∆y ) +ω ¯τ¯ kc(x )k 2 µ k k 2 k k k k > 2 +ωy ¯ k c(xk) + o(kc(xk)k) + o(k∆xkk ) (4.1) and similar for kc(xk)k = 0

  1 > 2 ϕµ(xk + ∆xdk) − ϕµ(xk) ≤ − ω¯ ∇ϕµ(xk) ∆xk + o(k∆xkk ) (4.2) 2

1 whereω ¯ is an arbitrary constant satisfyingω ¯ ∈ (0, 2 − ω). Furthermore, it is shown that

2 c(xk + ∆xdk) = o(kc(xk)k + o(k∆xkk ) (4.3)

which is a consequence of Lemma 4.1. Now, for kc(xkk > 0 by combining (4.1), (4.3) with Proposition 2.3 and using λk = yk and k∆wkk = o(µ), which is a consequence of Lemma 4.1 and Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 23

kwk − w(µ)k = o(µ), it follows that

Φµ(xk + ∆xdk) − Φµ(xk) >   = ϕµ(xk + ∆xdk) − ϕµ(xk) + λk c(xk + ∆xdk) − c(xk)

+τ ¯( c(xk + ∆xdk) − kc(xk)k) 1  1 ≤ − ω¯ ∇ϕ (x )>∆x + c(x )>y +ω ¯τ¯ kc(x )k +ωy ¯ >c(x ) 2 µ k k 2 k k k k k > 2 − λk c(xk) − τ¯ kc(xk)k + o(kc(xk)k) + o(k∆xkk ) 1  1  1  = − ω¯ ∇Φ (x )>∆x − − ω¯ λ>c(x ) + − ω¯ y>c(x ) 2 µ k k 2 k k 2 k k 1 − τ¯ kc(x )k + o(kc(x )k) + o(k∆x k2) 2 k k k 1  ≤ − ω¯ ∇Φ (x )>∆x + o(k∆x k2) 2 µ k k k

In the case of kc(xk)k = 0 the same result directly follows from Proposition 2.3 and (4.2). By Assumption L3 and Proposition 2.3, it exist an v > 0 such that 1  − ω¯ − ω ∇Φ (x )>∆x + o(k∆x k2) 2 µ k k k 1  ≤ − − ω¯ − ω v k∆x k2 + o(k∆x k2) ≤ 0, 2 k k

1 > sinceω ¯ < 2 −ω. It then follows Φµ(xk +∆xdk)−Φµ(xk) ≤ ω∇Φµ(xk) ∆xk. ut

The next result shows that eventually the Lagrangian mutlipliers (λk, νk) will be updated in every iteration and the step calculation resembles a regu- larized Newton method applied to the optimality conditions of (1.2).

Lemma 4.3 Suppose AssumptionsL hold. Let εk+1 = Ω(kwk − w(µ)k). If µ is sufficiently small, kwk − w(µ)k = o(µ) and λk = yk/ρk, then the following holds:

1. Eµ,ρ,τ,λ(xk + ∆xk, yk + ∆yk, zk + ∆zk) ≤ εk+1 2. Eµ,ρ,τ,λ(xk + ∆xdk, yk + ∆ydk, zk + ∆zdk) ≤ εk+1 3. kyk + ∆yk − ρkλkk ≤ κyτ¯

4. yk + ∆ydk − ρkλk ≤ κyτ¯

Proof Because µ is sufficiently small and kwk − w(µ)k = o(µ) it holds, that wk ∈ N (w ¯). First, we show that (2.25) will be satisfied. With Lemma 4.1 we have for

wk + ∆wk and analogue for wk + ∆w[k

kyk + ∆yk − ρkλkk = k∆ykk = o(µ) ≤ κyτ¯ 24 Renke Kuhlmann, Christof B¨uskens

Second, consider the conditions (2.23) for wk + ∆wk and remember that ρk = 1 for all k. Then by Taylor’s theorem, the linear equation system (2.5) and Lemma 4.1 it follows

∇f(xk + ∆xk) + ∇c(xk + ∆xk)(yk + ∆yk) − (zk + ∆zk) 2 = ∇f(xk) + ∇c(xk)yk − zk + Hk∆xk + ∇c(xk)∆yk − ∆zk + O(k∆wkk ) 2 = O(k∆wkk ) (4.4) −1 c(xk + ∆xk) − τ¯ kc(xk + ∆xk)k (yk + ∆yk − λk) −1 > 2 ≤ c(xk) − τ¯ kc(xk)k ∆yk + ∇c(xk) ∆xk + O(k∆wkk ) 2 = O(k∆wkk ) (4.5)

(Xk + diag(∆xk))(zk + ∆zk) − µe 2 = Xkzk + diag(∆xk)zk + Xk∆zk − µe + O(k∆wkk ) 2 = O(k∆wkk ) (4.6)

2 Then by Lemma 4.1 we have k∆wkk = o(k∆wkk) = o(kwk − w(µ)k) ≤ εk+1. This shows Eµ,ρ,τ,λ(xk +∆xk, yk +∆yk, zk +∆zk) ≤ εk+1. For wk +∆w[k (4.4) and (4.6) are analogue. We further have by (2.16) and Taylor’s theorem

−1 c(xk + ∆xdk) − τ¯ kc(xk)k ∆yk 2 −1 > ≤ c(xk) − τ¯ kc(xk)k ∆yk + ∇c(xk) ∆xdk + O( ∆w[k ) 2 x x > = c(xk) − c(xk + αmax∆xk) + αmax∇c(xk) ∆xk + O( ∆w[k ) 2 2 = O( ∆w[k ) + O(k∆wkk ) and thus by Lemma 4.1 Eµ,ρ,τ,λ(xk + ∆xdk, yk + ∆ydk, zk + ∆zdk) ≤ εk+1. ut

With the two modifications in Lemma 4.2 and Lemma 4.3, the following local convergence result for the AlgorithmA can be formulated. The proof is equal to [12, Theorem 5.5], in which the reference to [12, Lemma 5.3] has to be replaced by Lemma 4.3 and [12, Theorem 5.4] by Lemma 4.2. From Theorem 3.8 we know that when converging to a first-order optimal point of 1.2, the Lagrangian multipliers (λk, νk) will be updated infinitely many times. This will eventually fulfill the conditions of Lemma 4.3 as also shown in [12, Theorem 5.5].

Theorem 4.4 Suppose AssumptionsL hold and AlgorithmA generates an infinite sequence of iterates {wk} = {(xk, yk, zk)} with an accumulation point w(µ). Assume that µ is sufficiently small, w(µ) is sufficiently close to w¯ and ρk =ρ ¯ for k large enough. If kwk − w(µ)k = o(µ), then {wk} converges 2 quadratically, i.e. kwk+1 − w(µ)k = O(kwk − w(µ)k ). Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 25

4.2 Local Convergence of AlgorithmB

Since the outer AlgorithmB equals the one of Chen and Goldfarb [12], we cite its local convergence analysis here, see Theorem 5.9 of [12]. The following theorem states the superlinear convergence of the proposed algorithm.

Theorem 4.5 Suppose AssumptionsL and the assumptions of Theorem 3.9 hold. Assume that AlgorithmB produces an infinite sequence of iterates {wj} with wj = (xj, λj, νj) and that w¯ is an accumulation point of {wj}. If wj is sufficiently close to w¯, then wj+1 can be determined by just one iteration of AlgorithmA and kwj+1 − w¯k = o(kwj − w¯k).

5 Numerical Results

The augmented Lagrangian based penalty-interior-point algorithm has been implemented in the nonlinear programming solver WORHP2. In this section we compare its performance with the sequential quadratic programming (SQP) method of WORHP 1.10 [9] and the interior-point solver IPOPT 3.12 [45]. In the following we will refer WORHP-IP as the penalty-interior-point algorithm and WORHP-SQP as the SQP method of WORHP. The experiments are based on the CUTEst test set [28].

5.1 Implementation Details

In the following we briefly describe the implementation of the augmented La- grangian based penalty-interior-point algorithm WORHP-IP. To handle inequal- ity nonlinear constraints or general bound constraints, i.e. problems of the type

min f(x) n x∈R s.t. cL ≤ c(x) ≤ cU

xL ≤ x ≤ xU

n m with bounds xL, xU ∈ R and cL, cU ∈ R , we use the same strategy as in [45, Section 3.4], in particular general inequality constraints are transformed to equality constraints by introducing slack variables. At the beginning of the optimization the initial guess x0 provided by the user or the test set is slightly shifted into the interior of the feasible region with respect to the bound constraints. Therefore, we adapted the strategy of [45, Section 3.5] and set for lower bound constraints: n n oo (i) (i) (i) (i) x0 ← max x0 , xL + 0.02 max 1, xL

2 Further information and download of WORHP on www..de. 26 Renke Kuhlmann, Christof B¨uskens

WORHP-IP Parameter Value Parameter Value Parameter Value β 5.0e-01 χρ 2.0e-01 χτ 2.0e+01 χµ 2.0e-01 δµ 1.0e+01 ηmin 9.9e-01 εtol 1.0e-06 γf 1.0e-05 γθ 1.0e-05 κy 9.9e-01 κz 1.0e+10 κµ 1.5e+00 κη 5.1e-01 τmax 1.0e+05 ω 1.0e-08 Table 5.1 Parameter values for the penalty-interior-point algorithm of WORHP

The procedure is applied analogously for upper bound constraints. The mul- (i) (i) (i) (i) tipliers z0 and ν0 are set to 1 whereas y0 and λ0 are initialized by 0. In addition, we adapted the scaling procedure and the inertia correction strategy of [45, Section 3.1 and Section 3.8], the slack reset of [43] and the step size update based on a cubic interpolation of the merit function of [35]. If the last one fails we fall back to the strategy described in Section 2.3. The algorithm terminates successfully with an optimal solution if the op- timality conditions are satisfied to a certain accuracy, i.e.

k∇f(xk) + ∇c(xk)λk − νkk∞ ≤ εtol

kc(xk)k∞ ≤ εtol

kXkνkk∞ ≤ εtol.

Note, that we consider the optimality conditions of the original unscaled prob- lem despite the scaling of the objective function and the constraints explained above. Furthermore, the termination conditions are not scaled by ρ (compare (2.23)), which is sometimes the case for similar penalty algorithms and which would weaken the termination condition. The algorithm terminates with a −9 Fritz-John point if kc(xk)k ≤ 10 and the linear equation system is numeri- cally singular despite a Hessian regularization. In our implementation we choose the following parameter values. We ini- tialize µ0 with 0.1, ρ0 with 1 and τ0 by

   5 |ϕµ(x0)| k∇f(x0)k∞ τ0 = 100 min 10 , max 1, , . max{1, kc(x0)k} max{1, k∇c(x0)k∞}

For the tolerance εk we choose

εk := 0.9ρk max{Eµ,ρ,τ,λ(xi, yi, zi) | i ∈ K`} + 10ρk(kc(xk)k + 1)/(kτk),

where K` is an index set of the last ` iterations at which the Lagrangian multiplier λk has been updated. In our experiments we choose ` = 4. It is easy to see, that εk → 0 for ρk → 0. If the Lagrangian multipliers are updated infinitely many times, then εk → 0 follows from [2, Lemma 3.1]. The remaining parameter values are listed in Table 5.1. Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 27

IPOPT WORHP-SQP Parameter Value Default Parameter Value Default linear solver ma97 (ma27) AcceptTolFeas 1e-06 (1e-03) max cpu time 1.8e+03 (1e+06) AcceptTolOpti 1e-06 (1e-03) tol 1e-06 (1e-08) MaxIter 3000 (10000) dual inf tol 1e-06 (1e+00) KeepAcceptableSol false (true) constr viol tol 1e-06 (1e-04) LowPassFilter false (true) compl inf tol 1e-06 (1e-04) ScaledKKT false (true) acceptable iter 0 (15) Table 5.2 Parameter values different from the default for IPOPT and WORHP-SQP.

5.2 Experiments with the CUTEst test set

For the numerical results we use all the instances of the CUTEst test set [28] giving a total amount of 1257 problems. Note, that because we do not exclude any problems and the test set also contains infeasible problems the solvers will not be able to solve 100% of the instances to optimality. All experiments are run on a machine with Intel Xeon E5-2637 v3 proces- sors with a clock speed of 3.50 GHz running Ubuntu 16.04.2 LTS. Despite the multiple core architecture no parallelization is enabled for any solver. Further- more, for a fair comparison all solvers use the same linear solver (HSL MA97 [32]) and the same basic linear algebra subroutines (Reference BLAS 3.73). Because the terminations of IPOPT, WORHP-SQP and WORHP-IP differ when it comes to scalings of the optimality conditions and the definition of acceptable solutions, both features are disabled and an absolute error of the optimality conditions of 10−6 is requested for an optimal solution. We note, that all solvers may perform slightly better with default parameters. Finally, we apply a time limit of 30 minutes and an iteration limit of 3000. The changes we made in the default parameters for IPOPT and WORHP-SQP are listed in Table 5.2. For comparing the different results we use the performance profiles pro- posed by Dolan and Mor´e[16]. We also provide shifted geometric means with respect to the CPU time and a shift of 10.0 on a subset where all different solvers terminate successfully. For measuring time we do not use the solver internal timing functionalities as they may be different. Time is user time measured by the command time.

5.2.1 Experiments with different solver options

First, we present numerical results for different solver options of WORHP-IP. Since our augmented Lagrangian based penalty-interior-point algorithm with a filter line search also contains the `2-penalty-interior-point algorithm and the merit function based line search as a special case, we compare the different combinations of those options in Figure 5.1. The augmented Lagrangian based penalty-function with the combination of the filter and the merit function (standard WORHP-IP configuration) can solve

3 See netlib.org/blas. 28 Renke Kuhlmann, Christof B¨uskens

CPU time Iterations 90 90

85 80 70 80 60 75 % of problems 50

70 40 0 2 4 6 8 65 Function evaluations 60 90 % of problems 80 55 70 50 WORHP−IP (aug/fil+mer) WORHP−IP (l2/fil+mer) 60

45 WORHP−IP (aug/mer) % of problems 50 WORHP−IP (l2/mer) 40 40 0 2 4 6 8 0 2 4 6 8 not more than 2x times worse than best solver

Fig. 5.1 Performance profiles comparing CPU time, number of iterations and number of function evaluations for different line search and penalty options of WORHP-IP. fil+mer: filter and merit function; mer: just merit function; aug: augmented Lagrangian penalty function; l2: `2-penalty function.

1016 problems and the `2-based penalty function with the same line search option solves 1018 problems. If just the merit function is used the results are worse: 913 solved problems for the augmented Lagrangian penalty function and 900 solved problems for the `2-penalty function. There are 828 instances that are solved by every algorithm option. The shifted geometric means are 1.80 (augmented Lagrangian with filter and merit function), 1.85 (`2-penalty with filter and merit function), 2.07 (augmented Lagrangian with merit func- tion) and 2.17 (`2-penalty with merit function). The results clearly show the dominance of the filter line search option. The augmented Lagrangian based penalty function is just slightly better than the `2-penalty function, a result that can be expected from theory. The augmented Lagrangian penalty func- tion has been introduced to improve the local convergence of the algorithm, an effect that is usually dominated by the global convergence properties in the performance profiles. In addition, this global convergence can differ if an update of the Lagrangian multipliers appears rather early in the optimization.

5.2.2 Experiments with the standard version of the CUTEst test set

The second experiment compares the different solvers IPOPT, WORHP-SQP and WORHP-IP on the standard version of the CUTEst test set. Figure 5.2 presents the performance profiles. While WORHP-IP solved 1016 problems, WORHP-SQP solved 985 and IPOPT 1030 of the overall 1257 prob- lems. There are 865 instances that are solved by every solver. The shifted ge- ometric means are 1.25 (WORHP-IP), 3.60 (WORHP-SQP) and 1.29 (IPOPT). The results show that WORHP-IP and IPOPT perform similar good. While IPOPT Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 29

CPU time Iterations 100 100

90 80 60 80 40 70 % of problems 20

60 0 0 2 4 6 8 50 Function evaluations 40 100 % of problems 80 30 60 20 WORHP−IP 40

10 WORHP−SQP % of problems 20 IPOPT 0 0 0 2 4 6 8 0 2 4 6 8 not more than 2x times worse than best solver

Fig. 5.2 Performance profiles comparing CPU time, number of iterations and number of function evaluations for WORHP-IP, WORHP-SQP and IPOPT on the standard version of the CUTEst test set. solves slightly more problems, WORHP-IP is slightly more efficient. Although WORHP-SQP needs less iterations and less function evaluations, it is a bit slower on the CUTEst test set. The reason is that in every iteration a quadratic pro- gram has to be solved, whereas WORHP-IP and IPOPT just need the solution of a linear system. However, this result indicates that the CPU time perfor- mance profile may change in favor of WORHP-SQP for a test set where function evaluations, especially gradient and Hessian evaluations, are expensive. If we compare WORHP-IP and IPOPT alone, we have that WORHP-IP is faster than IPOPT in 29.83% of the 1257 cases, on 35.64% equally fast and on 21.80% slower. Since IPOPT is a well established and well developed solver, these re- sults show the high efficiency of the proposed method.

5.2.3 Experiments with an infeasible version of the CUTEst test set

The last experiment tests the performance of IPOPT and WORHP-IP on an infeasible version of the CUTEst test set. Therefore we consider the subset of all constrained problems of the CUTEst test set (897 instances) and added (i) (i) (i) (i) for every constraint c (x) ≤ cU the constraint cU + 1 ≤ c (x), for every (i) (i) (i) (i) (i) (i) (i) cL ≤ c (x) the constraint c (x) ≤ cL − 1 and for every cL ≤ c (x) ≤ cU (i) (i) (i) (i) the constraint cU + 1 ≤ c (x) ≤ 2cU − cL + 1. This test set is difficult not just because of the infeasibility that has to be detected but especially due to the lack of constraint qualification. Note, that by construction there exist 383 problems that contain more equality constraints than optimization variables, a situation IPOPT cannot handle. Therefore, these problems are removed from the test set and we end up with 512 test problems. Unfortunately, results 30 Renke Kuhlmann, Christof B¨uskens

CPU time Iterations 100 100

90 80 60 80 40 70 % of problems 20

60 0 0 2 4 6 8 50 Function evaluations 40 100 % of problems 80 30 60 20 WORHP−IP 40 τ

WORHP−IP % of problems 10 20 IPOPT 0 0 0 2 4 6 8 0 2 4 6 8 not more than 2x times worse than best solver

Fig. 5.3 Performance profiles comparing CPU time, number of iterations and number of function evaluations for WORHP-IP, WORHP-IPτ and IPOPT on an infeasible version of the CUTEst test set. for WORHP-SQP cannot be provided due to memory issues when solving the instances in combination with HSL MA97. For WORHP-IP we also included the solver option that the penalty parameter just appears in front of the measure of constraint violation, i.e. ρ is fixed to 1 and just τ is updated. We refer to WORHP-IPτ as this option. In these experiments we consider a problem solved to optimality, if the solver returns a certificate of infeasibility, in case of WORHP-IP a point that satisfies the optimality conditions of (1.4) to εtol. Of the 514 instances WORHP-IP solved 474, WORHP-IPτ 434 and IPOPT 312. There are 276 problems for that all solver return a certificate of infeasibil- ity. The performance profiles are shown in Figure 5.3. The shifted geometric means are 4.09 (WORHP-IP), 4.11 (WORHP-IPτ) and 11.57 (IPOPT). The results, especially the numbers of successful terminations, clearly indicate that it is preferred in the case of infeasible problems to use a penalty parameter in front of the objective function instead of the measure of constraint violation in the merit function (or at least switch eventually to that option like in WORHP-IP). When comparing WORHP-IP to IPOPT alone, we have that WORHP-IP is faster than IPOPT in 72.18% of the 514 cases, in 13.81% equally fast and in just 8.75% it is slower. This clearly shows the high efficiency of the proposed algorithm on infeasible problems.

6 Conclusions

We have developed a penalty-interior-point algorithm based on an augmented Lagrangian function but with an exact `2-measure of the constraint violation. The line search combines a filter approach with the merit function for increased Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 31

flexibility of the step acceptance. We have put special attention to handle infeasible problems. In particular, the penalty parameter is located in front of the objective function and the filter line search does not need a feasibility restoration. We have shown that the method has the same global and local convergence properties as the `2-penalty-interior-point algorithm of Chen and Goldfarb [12]. The numerical experiments tested our implementation of the proposed al- gorithm within the nonlinear programming solver WORHP on the full CUTEst test set and show its high efficiency: It is equally efficient as IPOPT on feasible problems, but outperforms IPOPT if the problems are infeasible. The latter is an important feature for example for mixed-integer nonlinear programming, where within a branch-and-bound framework many infeasible subproblems may appear. In addition, the results indicate, that the success of filter line search algorithms is based on the increased flexibility for the step acceptance and not the general absence of a penalty parameter. Although an initial guess has to be chosen for the penalty parameter, this seems to be no disadvantage. Further improvements could be the adaptive update of barrier and penalty parameters (cf. Curtis [15]) as well as the Lagrangian multiplier. Since the proposed filter line search does not store filter entries that are dependent on any of the mentioned parameters, the algorithm is well-prepared for this extension.

References

1. Armand, P., Benoist, J., Omheni, R., Pateloup, V.: Study of a primal-dual algorithm for equality constrained minimization. Computational Optimization and Applications 59(3), 405–433 (2014) 2. Armand, P., Omheni, R.: A globally and quadratically convergent primal–dual aug- mented lagrangian algorithm for equality constrained optimization. Optimization Meth- ods and 32(1), 1–21 (2017) 3. Armand, P., Omheni, R.: A mixed logarithmic barrier-augmented lagrangian method for nonlinear optimization. Journal of Optimization Theory and Applications pp. 1–25 (2017) 4. Benson, H., Vanderbei, R., Shanno, D.: Interior-point methods for nonconvex nonlinear programming: Filter methods and merit functions. Computational Optimization and Applications 23(2), 257–272 (2002) 5. Benson, H.Y., Sen, A., Shanno, D.F.: Convergence analysis of an interior-point method for nonconvex nonlinear programming. Tech. rep. (2007) 6. Benson, H.Y., Shanno, D.F., Vanderbei, R.J.: A comparative study of large-scale non- linear optimization algorithms. In: G. Di Pillo, A. Murli (eds.) High Performance Al- gorithms and Software for Nonlinear Optimization, Applied Optimization, vol. 82, pp. 95–127. Springer US (2003) 7. Boman, E.G.: Infeasibility and negative curvature in optimization. Ph.d. thesis, Stanford University (1999) 8. Byrd, R.H., Curtis, F.E., Nocedal, J.: Infeasibility detection and sqp methods for non- linear optimization. SIAM Journal on Optimization 20(5), 2281–2299 (2010) 9. B¨uskens, C., Wassel, D.: The esa nlp solver worhp. In: G. Fasano, J.D. Pint´er(eds.) Modeling and Optimization in Space Engineering, Springer Optimization and Its Ap- plications, vol. 73, pp. 85–110. Springer New York (2013) 10. Chen, L., Goldfarb, D.: Interior-point `2-penalty methods for nonlinear programming with strong global convergence properties. Mathematical Programming 108(1), 1–36 (2006) 32 Renke Kuhlmann, Christof B¨uskens

11. Chen, L., Goldfarb, D.: On the fast local convergence of interior-point `2-penalty meth- ods for nonlinear programming. Tech. rep., IEOR Department, Columbia University, New York, NY10027 (2006) 12. Chen, L., Goldfarb, D.: An interior-point piecewise linear for nonlinear programming. Mathematical Programming 128(1), 73–122 (2009) 13. Conn, A., Gould, G., Toint, P.: Lancelot: A Package for Large-Scale Nonlinear Optimization (Release A). Springer Series in Computational Mathematics. Springer Berlin Heidelberg (2013) 14. Conn, A., Gould, N., Toint, P.: Methods. Society for Industrial and Applied Mathematics (2000) 15. Curtis, F.E.: A penalty-interior-point algorithm for nonlinear constrained optimization. Mathematical Programming Computation 4(2), 181–209 (2012) 16. Dolan, E.D., Mor´e,J.J.: Benchmarking optimization software with performance profiles. Mathematical Programming 91(2), 201–213 (2002) 17. Fletcher, R.: Penalty Functions, pp. 87–114. Springer Berlin Heidelberg (1983) 18. Fletcher, R.: An `1 penalty method for nonlinear constraints. In: P.T. Boggs, R.H. Byrd, R.B. Schnabel (eds.) Numerical Optimization 1984, pp. 26–40. SIAM, Philadel- phia (1985) 19. Fletcher, R., Leyffer, S.: Nonlinear programming without a penalty function. Mathe- matical Programming 91(2), 239–269 (2002) 20. Forsgren, A., Gill, P.E.: Primal-dual interior methods for nonconvex nonlinear program- ming. SIAM Journal on Optimization 8(4), 1132–1152 (1998) 21. Forsgren, A., Gill, P.E., Wright, M.H.: Interior methods for nonlinear optimization. SIAM Review 44(4), 525–597 (2002) 22. Gertz, E.M., Gill, P.E.: A primal-dual trust region algorithm for nonlinear optimization. Mathematical Programming 100(1), 49–94 (2004) 23. Gill, P.E., Robinson, D.P.: A primal-dual augmented lagrangian. Computational Opti- mization and Applications 51(1), 1–25 (2010) 24. Gill, P.E., Robinson, D.P.: A globally convergent stabilized sqp method. SIAM Journal on Optimization 23(4), 1983–2010 (2013) 25. Goldfarb, D., Polyak, R., Scheinberg, K., Yuzefovich, I.: A modified barrier-augmented lagrangian method for constrained minimization. Computational Optimization and Applications 14(1), 55–74 (1999) 26. Gould, N.I.M., Loh, Y., Robinson, D.P.: A filter method with unified step computation for nonlinear optimization. SIAM Journal on Optimization 24(1), 175–209 (2014) 27. Gould, N.I.M., Loh, Y., Robinson, D.P.: A nonmonotone filter sqp method: Local con- vergence and numerical results. SIAM Journal on Optimization 25(3), 1885–1911 (2015) 28. Gould, N.I.M., Orban, D., Toint, P.L.: Cutest: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Computational Opti- mization and Applications 60(3), 545–557 (2015) 29. Gould, N.I.M., Orban, D., Toint, P.L.: and Optimization: NAO-III, Muscat, Oman, January 2014, chap. An Interior-Point `1-Penalty Method for Nonlinear Optimization, pp. 117–150. Springer International Publishing, Cham (2015) 30. Gould, N.I.M., Toint, P.L.: Nonlinear programming without a penalty function or a filter. Mathematical Programming 122(1), 155–196 (2010) 31. Greif, C., Moulding, E., Orban, D.: Bounds on eigenvalues of matrices arising from interior-point methods. SIAM Journal on Optimization 24(1), 49–83 (2014) 32. Hogg, J., Scott, J.A.: New parallel sparse direct solvers for engineering applications. Tech. rep., Science and Technology Facilites Council (2012) 33. Liu, X., Yuan, Y.: A sequential quadratic programming method without a penalty function or a filter for nonlinear equality constrained optimization. SIAM Journal on Optimization 21(2), 545–571 (2011) 34. Morales, J.L., Nocedal, J., Waltz, R.A., Liu, G., Goux, J.P.: Assessing the potential of interior methods for nonlinear optimization. In: L.T. Biegler, M. Heinkenschloss, O. Ghattas, B. van Bloemen Waanders (eds.) Large-Scale PDE-Constrained Optimiza- tion, Lecture Notes in Computational Science and Engineering, vol. 30, pp. 167–183. Springer Berlin Heidelberg (2003) 35. Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media (2006) Primal-Dual Augmented Lagrangian Penalty-Interior-Point Algorithm 33

36. Nocedal, J., W¨achter, A., Waltz, R.A.: Adaptive barrier update strategies for nonlinear interior methods. SIAM Journal on Optimization 19(4), 1674–1693 (2009) 37. Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1(3), 127–239 (2014) 38. Shen, C., Zhang, L.H., Liu, W.: A stabilized filter sqp algorithm for nonlinear program- ming. Journal of 65(4), 677–708 (2016) 39. Tits, A.L., W¨achter, A., Bakhtiari, S., Urban, T.J., Lawrence, C.T.: A primal-dual interior-point method for nonlinear programming with strong global and local conver- gence properties. SIAM Journal on Optimization 14(1), 173–199 (2003) 40. Ulbrich, M., Ulbrich, S., Vicente, N.L.: A globally convergent primal-dual interior-point filter method for nonlinear programming. Mathematical Programming 100(2), 379–410 (2003) 41. Vanderbei, R.J.: Loqo: an interior point code for quadratic programming. Optimization Methods and Software 11(1–4), 451–484 (1999) 42. Vanderbei, R.J., Shanno, D.F.: An interior-point algorithm for nonconvex nonlinear programming. Computational Optimization and Applications 13(1–3), 231–252 (1999) 43. Waltz, R., Morales, J., Nocedal, J., Orban, D.: An interior algorithm for nonlinear opti- mization that combines line search and trust region steps. Mathematical Programming 107(3), 391–408 (2006) 44. W¨achter, A., Biegler, L.T.: Failure of global convergence for a class of interior point methods for nonlinear programming. Mathematical Programming 88(3), 565–574 (2000) 45. W¨achter, A., Biegler, L.T.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Mathematical Program- ming 106(1), 25–57 (2006) 46. Yamashita, H.: A globally convergent primal-dual interior point method for constrained optimization. Optimization Methods and Software 10(2), 443–469 (1998) 47. Yamashita, H., Yabe, H.: An interior point method with a primal-dual quadratic barrier penalty function for nonlinear optimization. SIAM Journal on Optimization 14(2), 479– 499 (2003)