Equality Constrained Differential Dynamic Programming Sarah Kazdadi, Justin Carpentier, Jean Ponce

Equality Constrained Differential Dynamic Programming Sarah Kazdadi, Justin Carpentier, Jean Ponce To cite this version: Sarah Kazdadi, Justin Carpentier, Jean Ponce. Equality Constrained Differential Dynamic Program- ming. ICRA 2021 - IEEE International Conference on Robotics and Automation, May 2021, Xi’an, China. hal-03184203v2 HAL Id: hal-03184203 https://hal.inria.fr/hal-03184203v2 Submitted on 7 Apr 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Equality Constrained Differential Dynamic Programming Sarah El Kazdadi∗, Justin Carpentier∗ and Jean Ponce∗ Abstract— Trajectory optimization is an important tool in More recently, Differential Dynamic Programming (DDP) task-based robot motion planning, due to its generality and originally introduced in the late ’60s [7], has emerged as a convergence guarantees under some mild conditions. It is often good trade-off between direct and indirect approaches. DDP used as a post-processing operation to smooth out trajectories that are generated by probabilistic methods or to directly belongs to the class of second-order methods (similar to control the robot motion. Unconstrained trajectory optimization Newton methods), and is relatively simple to implement problems have been well studied, and are commonly solved while taking advantage of the sparsity pattern of the problem. using Differential Dynamic Programming methods that allow Additionally, it also provides, along with the solution, a linear for fast convergence at a relatively low computational cost. In feedback term that can be used to correct the control sequence this paper, we propose an augmented Lagrangian approach that extends these ideas to equality-constrained trajectory when the observed trajectory deviates from the optimal one. optimization problems, while maintaining a balance between In particular, this allows the solution to be robust to some convergence speed and numerical stability. We illustrate our amount of external noise. Classical DDP was initially limited contributions on various standard robotic problems and high- to unconstrained problems. Recent works have proposed lights their benefits compared to standard approaches. using various optimization strategies to handle different levels I. INTRODUCTION of constraints: box control constraints (aka box-DDP) [8] Optimal control and trajectory optimization in general via box Quadratic Programming (QP), nonlinear equality are powerful and versatile mathematical frameworks to constraints [9], [10], [11], [12], a mix of box constraints and program robot’s motions. They provide a way to efficiently equality constraints, and generic nonlinear constraints [13], compute, in a generic manner, complex robot motions which [14], [15] via either AL, penalty, or SQP methods. However, minimize a given cost criterion and satisfy a set of constraints, these solutions either provide limited convergence guarantees while taking advantage of the dynamical capacities of the (no globalization strategies [3]) or require a significant amount robot in question. The most well-known optimal control of iterations of the DDP algorithm or the internal numerical principles [1], i.e., the Pontryagin Maximum Principle (PMP) routines to converge to a feasible solution, limiting then the and Hamilton-Jacobi-Bellman (HJB) equations, have been deployment of these methods for online Model Predictive derived at the same time in the ’50s. For a certain period, Control for instance. Other works indirect approaches [2], which directly exploit these principles, In this paper, we extend the DDP method to handle were preferred to analytically solve optimal control problems, additional equality constraints, beyond the constraints of the but could only be applied to a limited class of dynamical dynamical system itself, using an augmented Lagrangian- systems (e.g space rockets, Dubin’s car). With the growth based formulation. The first contribution of this paper is of the computational resources available on standard com- an adaptation of the globalization strategy called Bound- puters, together with the development of efficient numerical Constrained Lagrangian (BCL) [16], [3] to the case of optimization methods for solving large scale problems, direct equality-constrained DDP. This goes beyond previous works approaches [2] have emerged as an efficient alternative which only provide limited convergence guarantees to find for solving optimal control problems. These methods [2], an optimal solution. As the second contribution, we propose such as single shooting, multiple-shooting or collocations, two slightly different strategies to estimate the Lagrange first, discretize the problem, reformulate it as a constrained multipliers of the constraints: one that directly applies the nonlinear programming (NLP) instance, and then solve it BCL method to DDP to converge to an optimal solution, and using classic methods for constrained optimization [3] such a second one that more closely mirrors the principles of the as Penalty, Sequential Quadratic Programming (SQP), Interior DDP method, by also providing a feedback term, extending Points (IP) or Augmented Lagrangian (AL) methods. While the solution to a neighborhood of the optimal trajectory. these approaches can handle a large variety of dynamical The paper is organized as follows. We first recall the main systems [4], [5], [6], their performances directly rely on the technical background in Sec. II concerning DDP and AL. efficiency and numerical robustness of the underlying NLP Sec. III depicts the main contributions of the paper and solvers and their ability to exploit the problem sparsity. Sec. IV empirically shows the performances of the proposed approach on several standard robotic problems. This work was supported in part by the Louis Vuitton / ENS chair on artificial intelligence, the Inria / NYU collaboration agreement, and the French government under management of Agence Nationale de la Recherche as part II. DIFFERENTIAL DYNAMIC PROGRAMMING AND of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 AUGMENTED LAGRANGIAN (PRAIRIE 3IA Institute). ∗The authors are with Inria and Departement´ d’Informatique de l’Ecole Normale Superieure,´ PSL Research University, Paris, France. In this section, we recall the main equations behind Corresponding author: [email protected] the unconstrained DDP and the Augmented Lagrangian methods. This section also introduces the main notations Backward pass — Propagating the second-order approxi- used throughout the paper. mation: We model the functions Vi and Qi using second-order approximations, and compute their Jacobians and Hessians A. Optimal control recursively, starting from A discrete-time dynamical system is a system whose 1 evolution from the time instant i to i + 1 is governed by V (x + dx) = v +V dx + dx>V dx: i x 2 xx a transition function fi: where v, Vx and Vxx are respectively the value, the Jacobian, xi+1 = fi(xi;ui); (1) and the Hessian at x. i 2 N − ;:::; where x 2 X, u 2 U, and X (resp. U) is the state space (resp. And for each 1 0, we define the second-order Q (x ;u ) control space) of the dynamical system. A feasible trajectory approximation of i around i i as: (x;u) is a sequence of states x = (x0;:::;xN) and controls Qi(x + dx;u + du) = q + Qxdx + Qudu u = (u0;:::;uN−1) that satisfies (1). 1 1 + dx>Q dx + du>Q du + du>Q dx: A trajectory optimization problem is defined by a dynamical 2 xx 2 uu ux system fi, a running cost, ì : X×U ! R, as well as a terminal The equalities (4) then read: cost `f : X ! R. The total cost of a control sequence u, starting For i = N : Vxx = `f;xx; Vx = `f;x; and v = `f: (8) from an initial state x0 corresponds to: 0 N−1 For i 2 fN − 1;:::;0g, we denote Vi+1 by V . J(x ;u) = ` (x ;u ) + ` (x ); (2) > 0 ∑ i i i f N Q = ` + f >V 0 f +V 0 f ; (9a) i=0 xx xx x xx x x xx > 0 0> where the state sequence is obtained by integrating the Qux = ùx + fu Vxx fx +Vx fux; (9b) dynamics of the system along the time horizon N. The goal > 0 0> Quu = ùu + fu Vxx fu +Vx fuu; (9c) is to find the optimal control sequence that minimizes the 0> total cost, given a fixed initial state: Qx = `x +Vx fx; (9d) 0> ? Qu = ù +V fu; (9e) u = argmin J(x0;u): (3) x u2UN q = ` + v0: (9f) B. Unconstrained differential dynamic programming The minimizer of (6) is then found by taking a Newton A DDP problem is solved by iteratively improving a step, and can be written as: starting feasible trajectory also known as the initial guess. u?(x) = k + Kdx; with k = −Q−1Q and K = −Q−1Q : Let u = (u ;:::;u ) be the tail of the control sequence uu u uu ux i i N−1 (10) starting at the time i. We define the cost-to-go functions as: Note that when the current solution is not close enough to N−1 the optimal solution, Q may not necessarily be positive J (x;u ) = ` (x ;u ) + ` (x ) uu i i ∑ i i i f N definite. In this case a regularization term lI may be added. j=i Finally, the second-order approximation of the value and the value functions as: function at the time instant i is given by: Vi(x) = min Ji(x;ui): > N−i Vxx = Qxx − K QuuK; (11a) ui2U > Vx = Qx − k QuuK; (11b) This optimality condition on the sequence of (Vi)i2f ;:::;Ng 0 1 translates into the Bellman equation: v = q − k>Q k: (11c) 2 uu Vi(x) = min ì(x;u)+Vi+1( fi(x;u)) with VN(x) = `f(x): (4) u2U Forward pass — Integrating the system dynamics: The k ;K We also define the Q function as the argument of the refined control policies are given by i i.

Load more