Fast and Stable Nonconvex Constrained Distributed Optimization: the ELLADA Algorithm Wentao Tang and Prodromos Daoutidis

1 Fast and Stable Nonconvex Constrained Distributed Optimization: The ELLADA Algorithm Wentao Tang and Prodromos Daoutidis Abstract—Distributed optimization, where the computations are performed in a localized and coordinated manner using mul- Agent 1 tiple agents, is a promising approach for solving large-scale optimization problems, e.g., those arising in model predictive control Dual Primal (MPC) of large-scale plants. However, a distributed optimization algorithm that is computationally efficient, globally convergent, Agent 2 Agent 4 amenable to nonconvex constraints and general inter-subsystem interactions remains an open problem. In this paper, we com- bine three important modifications to the classical alternating Coordinator direction method of multipliers (ADMM) for distributed opti- Agent 3 mization. Specifically, (i) an extra-layer architecture is adopted to accommodate nonconvexity and handle inequality constraints, (ii) equality-constrained nonlinear programming (NLP) problems Fig. 1. Primal-dual distributed optimization. are allowed to be solved approximately, and (iii) a modified Anderson acceleration is employed for reducing the number of iterations. Theoretical convergence towards stationary solutions and computational complexity of the proposed algorithm, named interact only through inputs and considering state coupling ELLADA, is established. Its application to distributed nonlinear as disturbances, or semi-centralized by using moving-horizon MPC is also described and illustrated through a benchmark predictions based on the entire system, which, however, con- process system. tradicts the fact that the subsystem models should be usually Index Terms—Distributed optimization, nonconvex optimiza- packaged inside the local agents rather than shared over the tion, model predictive control, acceleration entire system. Distributed MPC under truly localized model information is typically restricted to linear systems [17]–[19]. We note that in general, distributed nonlinear MPC with I. INTRODUCTION subsystem interactions should be considered as a distributed ISTRIBUTED optimization [1]–[3] refers to methods of optimization problem under nonconvex constraints. To solve D performing optimization using a distributed architecture such problems using distributed agents with local model – multiple networked agents are used for subsystems and knowledge, Lagrangian decomposition using dual relaxation necessary information among the agents is communicated to of complicating interactions [20, Section 9] and the alternating coordinate the distributed computation. An important desirable direction method of multipliers (ADMM) algorithm using application of distributed optimization is in model predictive augmented Lagrangian [21], [22] were proposed as general control (MPC), where control decisions are made through solv- frameworks. These are primal-dual iterative algorithms. As ing an optimal control problem minimizing the cost associated illustrated in Fig. 1, in each iteration, the distributed agents with the predicted trajectory in a future horizon subject to the receive the dual information from the coordinator and execute system dynamics and operational constraints [4]. For large- subroutines to solve their own subproblems, and a coordinator arXiv:2004.01977v1 [eess.SY] 4 Apr 2020 scale systems, it is desirable to seek a decomposition (e.g., collects information of their solutions to update the duals. using community detection or network block structures [5]– Convergence is the most basic requirement but also a [7]) and deploy distributed MPC strategies [8]–[10], which major challenge in distributed optimization under nonconvex allows better performance than fully decentralized MPC by constraints. Although distributed optimization with nonconvex enabling coordination, while avoiding assembling and com- objective functions has been well discussed [23]–[26], the puting on a monolithic model in centralized MPC. nonconvex constraints appear much more difficult to handle. Despite efficient algorithms for solving monolithic nonlinear To guarantee convergence, [27] suggested dualizing and penal- programming (NLP) problems in centralized MPC (e.g., [11]– izing all nonconvex constraints, making them undifferentiated [13]), extending them into distributed algorithms is nontrivial. and tractable by ADMM; however, this alteration of the A typical approach of distributed MPC is to iterate the control problem structure eliminates the option for distributed agents inputs among the subsystems (in sequence or in parallel) to use any subroutine other than the method of multipliers [14]–[16]. The input iteration routine is typically either semi- (MM). [28] used a quadratic programming problem to decide decentralized by implicitly assuming that the subsystems the dual variables in the augmented Lagrangian as well as an extrapolation of primal updates; this algorithm, however, The authors are with the Department of Chemical Engineering and Ma- terials Science, University of Minnesota, Minneapolis, MN 55455, U.S.A. involves a central agent that extracts Hessian and gradient in- (e-mails: [email protected], [email protected]). formation of the subsystem models from the distributed agents 2 in every iteration, and is thus essentially semi-centralized. algorithm with a two-layer augmented Lagrangian-based ar- [29] adopted feasibility-preserving convex approximations to chitecture, in which the outer layer handles the slack variables approach the solution, which is applicable to problems without as well as inequality constraints by using a barrier technique, nonconvex equality constraints. We note that several recent and the inner layer performs approximate ADMM under an papers (e.g., [30]–[32]) proposed the idea of placing slack acceleration scheme. With guaranteed stability and elevated variables corresponding to the inter-subsystem constraints and speed, to the best knowledge of the authors, the proposed forcing the decay to zero by tightening the penalty parameters algorithm is the first practical and generic algorithm of its of slack variables. This modification to the ADMM with slack kind for distributed nonlinear MPC with truly localized model variables and their penalties leads to a globally convergent information. We name this algorithm as ELLADA (stand- extra-layer augmented Lagrangian-based algorithm with pre- ing for extra-layer augmented Lagrangian-based accelerated served agent-coordinator problem architecture. distributed approximate optimization). Computational efficiency is also of critical importance for The paper discusses the movitation, develops the ELLADA distributed optimization, especially in MPC. The slothfulness algorithm and establishes its theoretical properties. An ap- of primal-dual algorithms typically arises from two issues. plication to a benchmark quadruple tank process is also First, the subgradient (first-order) update of dual variables presented. The remainder of this paper is organized as follows. restricts the number of iterations to be of linear complexity In Section II, we first review the classical ADMM and its [33]–[35]. For convex problems, momentum methods [36], modified versions. Then we derive our ELLADA algorithm [37] can be adopted to obtain second-order dual updates. in Section III with a trilogy pattern. First, a basic two-layer Such momentum acceleration can not be directly extended to augmented Lagrangian-based algorithm (ELL) is introduced nonconvex problems without a positive definite curvature, al- and its convergence is discussed. Then the approximate solu- though our previous numerical study showed that a discounted tion of equality-constrained NLP problems and the Anderson momentum may allow limited improvement [38]. Another acceleration scheme are incorporated to form the ELLA and effort to accelerate dual updates in convex ADMM is based ELLADA algorithms. The implementation of the ELLADA on Krylov subspace methods [39]. Under nonconvexity, it algorithm on the distributed optimization problem involved in was only very recently realized that Anderson acceleration, distributed nonlinear MPC is shown in Section IV, and the case a multi-secant technique for fixed-point problems, can be study is examined in Section V. Conclusions and discussions generally used to accelerate the dual variables [40]–[42]. are given in Section VI. The second cause for the high computational cost of distributed optimization is the instruction on the distributed agents II. ADMM AND ITS MODIFICATIONS to fully solve their subproblems to high precision in each iteration. Such exhaustive efforts may be unnecessary since A. ADMM the dual information to be received from the coordinator The alterating direction method of multipliers is the most will keep changing. For convex problems, it is possible to commonly used algorithm for distributed optimization under linearize the augmented Lagrangian and replace the distributed linear equality constraints [1]. Specifically, consider the fol- subproblems with Arrow-Hurwicz-Uzawa gradient flows [43]. lowing problem In the presence of nonconvexity of the objective functions, min f(x) + g(¯x) s:t: Ax + Bx¯ = 0 a dual perturbation technique to restore the convergence of x;x¯ (1) the augmented Lagrangian was proposed in [44]. It is yet un- with two blocks of variables x and x¯, where f and g are known how to accommodate such gradient flows to nonconvex usually assumed to be convex. (The symbols in (1) are not constraints. A different approach is to allow inexact solution related to the ones in

Fast and Stable Nonconvex Constrained Distributed Optimization: the ELLADA Algorithm Wentao Tang and Prodromos Daoutidis

Julia, My New Friend for Computing and Optimization? Pierre Haessig, Lilian Besson

Numericaloptimization

Pysp: Modeling and Solving Stochastic Programs in Python

Pyomo Documentation Release 5.6.2.Dev0

Research and Development of an Open Source System for Algebraic Modeling Languages

Methods of Distributed and Nonlinear Process Control: Structuredness, Optimality and Intelligence

Open Source Tools for Optimization in Python

Introduction to Modeling Optimization Problems in Python

Casadi Tutorial – Introduction

Pyomo and Jump – Modeling Environments for the 21St Century

Computational Integer Programming Lecture 3

MPEC Methods for Bilevel Optimization Problems∗