<<

Equality Constrained Differential Dynamic Programming Sarah Kazdadi, Justin Carpentier, Jean Ponce

To cite this version:

Sarah Kazdadi, Justin Carpentier, Jean Ponce. Equality Constrained Differential Dynamic Program- ming. ICRA 2021 - IEEE International Conference on Robotics and Automation, May 2021, Xi’an, China. ￿hal-03184203v2￿

HAL Id: hal-03184203 https://hal.inria.fr/hal-03184203v2 Submitted on 7 Apr 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Equality Constrained Differential Dynamic Programming

Sarah El Kazdadi∗, Justin Carpentier∗ and Jean Ponce∗

Abstract— Trajectory optimization is an important tool in More recently, Differential Dynamic Programming (DDP) task-based robot motion planning, due to its generality and originally introduced in the late ’60s [7], has emerged as a convergence guarantees under some mild conditions. It is often good trade-off between direct and indirect approaches. DDP used as a post-processing operation to smooth out trajectories that are generated by probabilistic methods or to directly belongs to the class of second-order methods (similar to control the robot motion. Unconstrained trajectory optimization Newton methods), and is relatively simple to implement problems have been well studied, and are commonly solved while taking advantage of the sparsity pattern of the problem. using Differential Dynamic Programming methods that allow Additionally, it also provides, along with the solution, a linear for fast convergence at a relatively low computational cost. In feedback term that can be used to correct the control sequence this paper, we propose an augmented Lagrangian approach that extends these ideas to equality-constrained trajectory when the observed trajectory deviates from the optimal one. optimization problems, while maintaining a balance between In particular, this allows the solution to be robust to some convergence speed and numerical stability. We illustrate our amount of external noise. Classical DDP was initially limited contributions on various standard robotic problems and high- to unconstrained problems. Recent works have proposed lights their benefits compared to standard approaches. using various optimization strategies to handle different levels I.INTRODUCTION of constraints: box control constraints (aka box-DDP) [8] and trajectory optimization in general via box (QP), nonlinear equality are powerful and versatile mathematical frameworks to constraints [9], [10], [11], [12], a mix of box constraints and program robot’s motions. They provide a way to efficiently equality constraints, and generic nonlinear constraints [13], compute, in a generic manner, complex robot motions which [14], [15] via either AL, penalty, or SQP methods. However, minimize a given cost criterion and satisfy a set of constraints, these solutions either provide limited convergence guarantees while taking advantage of the dynamical capacities of the (no globalization strategies [3]) or require a significant amount robot in question. The most well-known optimal control of iterations of the DDP or the internal numerical principles [1], i.e., the Pontryagin Maximum Principle (PMP) routines to converge to a feasible solution, limiting then the and Hamilton-Jacobi-Bellman (HJB) equations, have been deployment of these methods for online Model Predictive derived at the same time in the ’50s. For a certain period, Control for instance. Other works indirect approaches [2], which directly exploit these principles, In this paper, we extend the DDP method to handle were preferred to analytically solve optimal control problems, additional equality constraints, beyond the constraints of the but could only be applied to a limited class of dynamical dynamical system itself, using an augmented Lagrangian- systems (e.g space rockets, Dubin’s car). With the growth based formulation. The first contribution of this paper is of the computational resources available on standard com- an adaptation of the globalization strategy called Bound- puters, together with the development of efficient numerical Constrained Lagrangian (BCL) [16], [3] to the case of optimization methods for solving large scale problems, direct equality-constrained DDP. This goes beyond previous works approaches [2] have emerged as an efficient alternative which only provide limited convergence guarantees to find for solving optimal control problems. These methods [2], an optimal solution. As the second contribution, we propose such as single shooting, multiple-shooting or collocations, two slightly different strategies to estimate the Lagrange first, discretize the problem, reformulate it as a constrained multipliers of the constraints: one that directly applies the (NLP) instance, and then solve it BCL method to DDP to converge to an optimal solution, and using classic methods for [3] such a second one that more closely mirrors the principles of the as Penalty, Sequential Quadratic Programming (SQP), Interior DDP method, by also providing a feedback term, extending Points (IP) or Augmented Lagrangian (AL) methods. While the solution to a neighborhood of the optimal trajectory. these approaches can handle a large variety of dynamical The paper is organized as follows. We first recall the main systems [4], [5], [6], their performances directly rely on the technical background in Sec. II concerning DDP and AL. efficiency and numerical robustness of the underlying NLP Sec. III depicts the main contributions of the paper and solvers and their ability to exploit the problem sparsity. Sec. IV empirically shows the performances of the proposed approach on several standard robotic problems. This work was supported in part by the Louis Vuitton / ENS chair on artificial intelligence, the Inria / NYU collaboration agreement, and the French government under management of Agence Nationale de la Recherche as part II.DIFFERENTIAL DYNAMIC PROGRAMMINGAND of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 AUGMENTED LAGRANGIAN (PRAIRIE 3IA Institute). ∗The authors are with Inria and Departement´ d’Informatique de l’Ecole Normale Superieure,´ PSL Research University, Paris, France. In this section, we recall the main equations behind Corresponding author: [email protected] the unconstrained DDP and the Augmented Lagrangian methods. This section also introduces the main notations Backward pass — Propagating the second-order approxi- used throughout the paper. mation: We model the functions Vi and Qi using second-order approximations, and compute their Jacobians and Hessians A. Optimal control recursively, starting from A discrete-time dynamical system is a system whose 1 evolution from the time instant i to i + 1 is governed by V (x + dx) = v +V dx + dx>V dx. i x 2 xx a transition function fi: where v, Vx and Vxx are respectively the value, the Jacobian, xi+1 = fi(xi,ui), (1) and the Hessian at x. i ∈ N − ,..., where x ∈ X, u ∈ U, and X (resp. U) is the state space (resp. And for each 1 0, we define the second-order Q (x ,u ) control space) of the dynamical system. A feasible trajectory approximation of i around i i as: (x,u) is a sequence of states x = (x0,...,xN) and controls Qi(x + dx,u + du) = q + Qxdx + Qudu u = (u0,...,uN−1) that satisfies (1). 1 1 + dx>Q dx + du>Q du + du>Q dx. A trajectory optimization problem is defined by a dynamical 2 xx 2 uu ux system fi, a running cost, `i : X×U → R, as well as a terminal The equalities (4) then read: cost `f : X → R. The total cost of a control sequence u, starting For i = N : Vxx = `f,xx, Vx = `f,x, and v = `f. (8) from an initial state x0 corresponds to: 0 N−1 For i ∈ {N − 1,...,0}, we denote Vi+1 by V . J(x ,u) = ` (x ,u ) + ` (x ), (2) > 0 ∑ i i i f N Q = ` + f >V 0 f +V 0 f , (9a) i=0 xx xx x xx x x xx > 0 0> where the state sequence is obtained by integrating the Qux = `ux + fu Vxx fx +Vx fux, (9b) dynamics of the system along the time horizon N. The goal > 0 0> Quu = `uu + fu Vxx fu +Vx fuu, (9c) is to find the optimal control sequence that minimizes the 0> total cost, given a fixed initial state: Qx = `x +Vx fx, (9d) 0> ? Qu = `u +V fu, (9e) u = argmin J(x0,u). (3) x u∈UN q = ` + v0. (9f) B. Unconstrained differential dynamic programming The minimizer of (6) is then found by taking a Newton A DDP problem is solved by iteratively improving a step, and can be written as: starting feasible trajectory also known as the initial guess. u?(x) = k + Kdx, with k = −Q−1Q and K = −Q−1Q . Let u = (u ,...,u ) be the tail of the control sequence uu u uu ux i i N−1 (10) starting at the time i. We define the cost-to-go functions as: Note that when the current solution is not close enough to N−1 the optimal solution, Q may not necessarily be positive J (x,u ) = ` (x ,u ) + ` (x ) uu i i ∑ i i i f N definite. In this case a regularization term λI may be added. j=i Finally, the second-order approximation of the value and the value functions as: function at the time instant i is given by:

Vi(x) = min Ji(x,ui). > N−i Vxx = Qxx − K QuuK, (11a) ui∈U > Vx = Qx − k QuuK, (11b) This optimality condition on the sequence of (Vi)i∈{ ,...,N} 0 1 translates into the : v = q − k>Q k. (11c) 2 uu Vi(x) = min `i(x,u)+Vi+1( fi(x,u)) with VN(x) = `f(x). (4) u∈U Forward pass — Integrating the system dynamics: The k ,K We also define the Q function as the argument of the refined control policies are given by i i. To obtain an minimization operator: improved trajectory, we integrate the system using these new policies. A parameter α is used to guarantee Qi(x,u) = `i(x,u) +Vi+1( fi(x,u)). (5) convergence, as the full step may not necessarily decrease Then, to find the optimal control policy at time i, one needs the total cost function (2) unless we are close to the optimum. to solve: The new trajectory is then given by: ? ? argmin Qi(x,u). (6) ui = ui + αk + K(xi − xi), (12a) u∈U ? ? ? ? x = x and x = fi(x ,u ). (12b) As (4) or (6) may not have an analytical solution in general, 0 0 i+1 i i they are modeled with second-order approximations. The idea C. The Augmented Lagrangian behind DDP is to compute a second-order approximation of AL [17] is a standard method to solve nonlinear constrained Vi, propagating them backward in time using the Bellman optimization problems. We first review the AL approach . The computed controls minimizing the previous for solving equality-constrained QPs before presenting the expression (6) are then used as a refinement of the controls algorithm for solving generic nonlinear equality-constrained from the current trajectory. problems with global convergence guarantees. Equality-constrained Quadratic Programming: Consider enough. A regular increase of µ can also make it more difficult the following equality constrained quadratic program: to properly minimize the augmented Lagrangian with a line search approach, when the µ kc(x)k2 term starts to dominate 1 > > 2 min x Qx + q x, (13) the others. On top of that, the high-penalty parameter values x 2 s.t. Ax = b, (14) interact poorly with the DDP algorithm, as the high-order terms can propagate during the backward pass, which affects where Q is symmetric and A is full rank. The augmented the stability of the numerical computations. Lagrangian of this QP is: We propose instead an alternative called Bound- Constrained Lagrangian [16], [3] and detailed in Alg. 1. 1 > > > µ 2 Lµ (x,λ) = x Qx + q x + λ (Ax − b) + kAx − bk . It consists of only updating the multipliers when both the 2 2 (15) optimum of the augmented Lagrangian is reached, which we The classic AL method consists of minimizing Lµ with detect based on the norm of the gradient, and when the error respect to x. In this case, the minimizer is given by: of the constraints is small enough. If the optimum of the

0 > −1 > > augmented Lagrangian is reached, but the constraint has not x = −(Q + µA A) (q + A λ − µA b) (16) decreased sufficiently, we then increase the value of µ by a with the updated multipliers: constant factor k. The BCL strategy is the basis of the NLP solver LANCELOT [16] and comes with general convergence 0 0 λ = λ + µ(Ax − b). (17) guarantees under mild assumptions, often fulfilled in robotics. Each minimization + update iteration decreases the error on As highlighted in Sec. IV and shown in [16], this strategy the constraint by a factor that grows proportionally with allows to (i) convergence to a locally optimal solution from the penalty term µ as it tends to ∞. Also, note that the any starting conditions, and (ii) to significantly reduce the augmented Lagrangian can only be minimized if Q + µA>A number of required steps, by enforcing a consensus between is positive definite, which constrains µ to be larger than some primal and dual updates. We propose to adapt this strategy lower bound. Because of this, it is often preferable to use for handling constraints in DDP. A similar strategy was used an optimization strategy that increases the value of µ if the in [18], with varying degrees of success depending on the system cannot be solved or the convergence rate is below the accuracy of the computed derivatives. desired threshold as explained below. Algorithm 1: Bound-Constrained Lagrangian [16] When µ becomes too large, the condition number of Result: Optimal primal and dual variables x? and λ ? (Q + µA>A) blows up which may lead to unstable computa- Given an initial x ,λ , and parameters η ,ω , µ ; tions. To overcome this issue, solving (16) and (17) is strictly 0 0 0 0 0 Given hyperparameters k > 1,α ∈ (0,1); equivalent to solving the system: while ηn > ηthreshold and ωn > ωthreshold do QA>  0   −q  2 −1 x xn+1 = xn −t∇x,xLµ ∇xLµ with t > 0 found by 1 0 = λ (18) A − µ I λ b − µ line search; with the main advantage of having to invert a matrix with a if k∇xLµ (xn+1,λn)k∞ < ωn then better condition number. if kc(xn+1)k∞ < ηn then λn+1 = λn + µnc(xn+1); Nonlinear equality constrained optimization: In general, α the constrained optimization problems we aim to solve in µn+1 = µn; ηn+1 = ηn/µn ; ωn+1 = ωn/µn; robotics go beyond QPs and are generically formulated as: else µn+1 = kµn; min f (x) s.t. c(x) = 0, (19) end x end where f is a smooth twice-differentiable cost function and c end is a smooth equality constraint. The augmented Lagrangian function now reads: Imposing a desired convergence rate: We also propose an µ alternative strategy to update the penalty factor µ to control L (x,λ) = f (x) + λ >c(x) + kc(x)k2. (20) the convergence rate. This strategy consists of increasing µ 2 the penalty parameter adaptively, instead of using a constant Within this generic setting, additional care must be taken in factor k. Asymptotically, the constraint error kc(xn)k between order to converge to the solution when using AL methods. two consecutive multiplier updates decreases at a rate of A common strategy relies on applying a or −1 O(µn ). We can estimate the decrease factor φ by keeping Newton step with a line search to minimize the track of the norm constraint error at the previous optimum augmented Lagrangian Lµ with respect to x, then updating 0 0 and computing the ratio of the new value. c0 = kc(x0)k, the multipliers with the standard update rule: λ = λ + µc(x). 0 0 and c = {kc(xn+1)k if λn+1 is updated, otherwise c }. φ The penalty factor µ is increased after each iteration to ensure n+1 n is then estimated by letting c0 = φ c0 . Then, assuming we that the problem is convex in a neighborhood of the solution. n+1 µn n β In practice, however, this approach often fails to converge want a convergence rate of 1/µn+1, with β ∈ (0,1), we can β to the solution when the step-size does not decrease rapidly deduce the ideal value of µn+1 by letting φ/µn+1 = 1/µn+1. 1 The solution is given by µn+1 = φ 1−β . linearly independent. This is equivalent to using second-order sequential quadratic programming methods, and can allow III.EQUALITY CONSTRAINED DIFFERENTIAL DYNAMIC for quadratic convergence. PROGRAMMING Unfortunately, the control policy given by the DDP itera- This section contains the core contribution of the paper. We tions no longer holds the same meaning. In the unconstrained consider the optimal control problem, characterized by the case, the feedback term allows correcting the control when dynamics (1) and the cost function (2), under the additional the actual trajectory differs from the expected state. But in our equality constraints over xi,ui, given by: current scenario, even correcting back to a feasible trajectory is no longer feasible, once the constraints are violated. Our hi(xi,ui) = 0, (21) second approach was designed to fix this issue. c where hi : X × U → R i , and ci is lower than or equal to the dimension of the tangent space of U. In order to solve the B. Locally affine Lagrange multipliers optimization problem, we suggest two approaches based on The following method is similar to the previous one, with the well-known augmented Lagrangian methods. the difference that the multipliers are now allowed to vary c with the state of the system (λi : X → R ). We choose to A. Globally constant Lagrange multipliers represent them as affine functions defined in a neighborhood In this case, the augmented Lagrangian of the constrained of xi, which take the form DDP is given by: λi(x) = λi+λi,x(x − xi). (25) N−1 n > µ 2o Lµ (x,u,λ) = ∑ `i(xi,ui)+λi hi(xi,ui) + khi(xi,ui)k In this case, the augmented Lagrangian is equal to: i=0 2 N−1 n + `f(xT ) (22) > Lµ (x,u,λ) = ∑ `i(xi,ui)+λi(xi) hi(xi,ui) The first strategy consists of (i) fixing the value of the i=0 µ o multipliers, then (ii) finding the trajectory that minimizes + kh (x ,u )k2 + ` (x ). (26) 2 i i i f T the augmented Lagrangian, by relying on the previous uncon- strained DDP algorithm. The forward and backward passes The equations of the backward pass (9) incorporate new terms: of the standard DDP algorithm remain largely unchanged. Qc = Q +(λ + µh)>h + µh>h +h>λ + λ >h , Eqs (9) have to be augmented with the AL terms: xx xx xx x x x x x x Qc = Q +( + h)>h + h>h +h> , c > > ux ux λ µ ux µ u x u λx Q = Qxx+(λ + µh) hxx + µh hx, xx x Qc = Q +( + h)>h + h>h , c > > uu uu λ µ uu µ u u Q = Qux+(λ + µh) hux + µh hx, ux u Qc = Q +( + h)>h +h> , Qc = Q +( + h)>h , c > > x x λ µ x λx u u λ µ u Q = Quu+(λ + µh) huu + µh hu, uu u c > µ 2 c > c > q = q+λ h + khk . Qx = Qx+(λ + µh) hx, Qu = Qu+(λ + µh) hu, 2 µ qc = q+λ >h + khk2. The same considerations of the first method still apply, with 2 regards to increasing µ when necessary, to ensure convergence. The DDP iterations are repeated until the minimizer of (22) In addition to that, since the multipliers are computed in a is found with respect to (x,u). We can then update the value neighborhood of the trajectory’s states xi, the origin of the of the multipliers using the classic update: approximation and the value at the origin are updated at the end of the forward pass of each DDP iteration. When X is λ 0 = λ + µh (x0,u0), (24) i i i i i a vector space, this produces the same affine function. So c To overcome the issue of ill-conditioning when inverting Quu, the Jacobian λx is unchanged and the value at the origin 0 0 we use the same strategy as the one exposed in (18). is updated as λ = λ + λx(x − x). Once the minimizer of Note that similarly to the generic augmented Lagrangian the augmented Lagrangian is found, the multipliers can be method, additional care must be taken in practice to ensure updated at each node i using the constraint feasibility error: convergence, by increasing µ when needed (e.g., when Q uu λ 0 = λ + µh and λ 0 = λ + µ(h + h K). (28) is not positive definite, when solving the DDP subproblem x x x u does not bring us closer to satisfying the constraints, etc.), by This method converges at a similar rate to the first one, with following the BCL strategy recalled in Alg. 1. As this method the additional benefit that the computed feedback allows us is a direct adaptation of the classic augmented Lagrangian to recompute the optimal control that satisfies the equality procedure for solving generic optimization problems, we can constraints, in a neighborhood of the solution trajectory. The readily borrow the convergence results to prove the desired pseudo-code of the algorithm is depicted in Algo. 2. properties of our algorithm. That is, if we converge to a One limitation, however, is that convergence can only be feasible solution, then it is optimal, and the convergence guaranteed when the optimal policy exists, which adds the 1 is linear with a rate of approximately µ . Importantly, it constraint that hu and hux,i must have full rank. Otherwise, is also possible to let µ → ∞ when we detect that we changing the control would have no effect (at first order) on are close enough to the optimum and the constraints are the feasibility of the constraints, making it infeasible to find a policy that always satisfies them. An important consequence D. Related works of this is that constraints of the form h(x) = 0 cannot be The closest work to our paper is the ALTRO algorithm [13]. directly handled by this method, as hu,hux are always zero. It is also based on an AL strategy and composed of two main It is possible to get around this limitation in certain cases. stages: the first stage corresponds to an augmented Lagrangian A constraint hi(xi) can also be written as hi( f (xi−1,ui−1)), strategy applied to DDP, similar to Sec III-A, except that which creates a dependence on ui−1. This means that for the multipliers are always updated and the penalty factor i < T we can choose to redefine the constraint as: constantly increased. As illustrated in IV and detailed in the literature [3], [16], this may lead to poor convergence results h0(x ,u ) = h ( f (x ,u ),u ). (29) i i i i+1 i i i (convergence to a feasible but sub-optimal solution), especially

This discards the constraint h0 = 0, but this is unavoidable when working with finite precision. The second stage of in the general case. ALTRO consists of a projection step, whose capacities to converge to a good optimum actively depend on the first stage, Algorithm 2: BCL-DDP which does not come with convergence guarantees. Compared ? ? ? ? Result: Optimal trajectory u ,x , λ , λ x to this work, relying on BCL as a globalization strategy Given an initial u0,λ0,λx, and parameters η0,ω0, µ0; enables us to converge toward local optima with a controlled Given hyperparameters k > 1,α ∈ (0,1); convergence rate. Other approaches have used a Penalty based while ηn > ηthreshold and ωn > ωthreshold do method [15] without a guarantee of the convergence to an Backward pass (27) to get feedback kn+1,Kn+1; optimal solution. The SQP-based strategy proposed by [14] Forward pass (12) with LS; requires solving multiple QPs when computing a feasible un+1,xn+1 that decreases Lµ ; solution for the forward pass, which might slow down the if k∇uLµ (xn+1,un+1,λn(xn+1))k∞ < ωn then computations. In our work instead, equality-constrained QPs if kc(xn+1)k∞ < ηn then are solved in the backward pass, not by calling an external λn+1 = AL multiplier update (28); QP solver, but directly when implementing the AL approach, α µn+1 = µn; ηn+1 = ηn/µn ; ωn+1 = ωn/µn; as highlighted by (18). In addition, our work is the first to else introduce a linear dependency on the state quantity to the µn+1 = kµn or any other update rule; evolution of the multipliers, so far improving the feedback end term of the constrained DDP solver. end end IV. RESULTS In order to validate and analyze the theoretical and practical C. Handling infeasible initial guesses properties of the methods introduced in III, we have devel- oped an open-source C++ multi-precision-arithmetic (MA) In some cases where the evolution of the system may be framework1 to solve equality constrained DDP problems. This unstable (e.g., due to unstable dynamics or a large temporal framework relies on Eigen [20] and GNU MPFR [21] libraries horizon), it may be desirable to use a multiple shooting [19] for linear algebra and MA. It is based on Pinocchio [22], method instead, as they allow for infeasible trajectories which [23] for computing the robot dynamics and their analytical can be easier to keep under control. In this case, we can readily derivatives [24]. In practice, we performed the experiments modify our method to allow for a discontinuous trajectory. using either double floating-point precision (≈ 10−16 relative Given a dynamical system over the state space X, the control error) or a much higher precision (≈ 10−500 relative error). space U where the evolution is dictated by the transition function fi, and additional constraints hi, we can transform A. Cartpole System it into the system over the same state space, with controls in u ∈ U,ε ∈ TXxi+1 , the tangent space of X at xi+1, such that the dynamics are:

0 xi+1 = fi (xi,ui,εi) = f (xi,ui) + εi, (30) under the constraints:

hi(xi,ui) = 0 and aεi = 0, (31a) Fig. 1: Cartpole: optimal trajectory found by our constrained DDP solver. where a > 0 is a parameter controlling how the infeasibility w.r.t. penalty is weighted the standard constraints. Since The first experiment (augmented precision) showcases a convergence to the constraint zero set happens progressively cartpole system, where the control variable is the force applied along with the minimization of the objective function, rather to the cart. The objective and constraint functions were than eagerly as in SQP methods, this allows the second defined such that the upright position must be reached, while = constraint εi 0 to control the degree of infeasibility, and then minimizing the total norm of the applied force. We compare improving the overall stability of the optimization process. 1https://github.com/s-elkazdadi/ddp-pinocchio 10 15 10 17 : constant: Primal error 37 10 : constant: Dual error 10 32

59 : affine: Primal error : constant 10 10 47 : affine: Dual error : affine Constraint error 0 10 20 30 40 50 60 70 80 10 26 10 21 10 16 10 11 10 6 Iteration Noise scale Fig. 2: Convergence rate: µ is increased by a constant factor Fig. 5: Evolution of the constraint error as a function of the of 100 noisiness of the dynamics

10 38 : constant: Primal error B. Robotic arm: UR5 10 88 : constant: Dual error : affine: Primal error 10 138 : affine: Dual error

0 10 20 30 40 50 60 70 80 Iteration Fig. 3: Convergence rate: µ is increased adaptively the two penalty update strategies, Fig. 2 uses the strategy Fig. 6: Optimal trajectory of the UR-5 robot reaching the described in Alg. 1, and increases the penalty parameter three targets. whenever the minimum of the augmented Lagrangian is reached without a sufficient decrease of the constraint error. The second experiment (usual precision) is performed on The algorithm used in Fig. 3 uses an adaptive update strategy a robotic arm equipped with 6 degrees of freedom. The by using the shrinking rate of the constraint error to estimate system dynamics are integrated over 2 seconds, with a the optimal penalty parameter. In order to avoid increasing the discretization step of dt = 0.01s. Once again, the objective penalty too excessively early on in the solving procedure, we function minimizes the cost, but the constraint was set so limit the growth factor by a certain amount. (µ < 106µ ). n+1 n that the robot must reach the given targets sequentially at Both methods converge to the correct solution, but the second three consecutive time instants. Despite the instability of one manages to find an appropriate penalty parameter in fewer the system, the algorithm manages to converge to within an iterations error of 10−6 of the optimal solution in approximately 50 Note that in general, the multiplier and penalty update iterations. strategy are important in order to guarantee the convergence. Otherwise, we may fail to converge to the optimal value or V. DISCUSSIONAND CONCLUSION diverge to infinity, as can be seen in Fig.4. The proposed augmented Lagrangian approach to deal with equality constrained DDP displays consistent convergence

1014 properties, despite the genericity of the algorithm. This

108 allows the DDP methods to be reliably applied to a large

102 class of optimal control problems, where constraints can : constant: Primal error 10 4 : constant: Dual error be used to generate more robust and feasible movements. : affine: Primal error 10 10 : affine: Dual error However, there is still room for improvement to better 10 16 take into account real-world application requirements. Most 10 22 notably, inequality constraints are a natural next step, seeing

0 10 20 30 40 50 60 70 80 as classical augmented Lagrangian methods already tend to Iteration handle them naturally. They play a key role in guaranteeing Fig. 4: Convergence rate: every iteration, λ is updated and the robustness of robot motion planning, as they can express µ is increased by a factor of 3 the boundaries of the valid controls, states, and the limits that are induced by contact and friction forces. We can see that the algorithm variant that builds an affine model of the multipliers takes longer to converge in most cases. This is since, on top of finding the optimal trajectory, the feedback term needs to be computed in parallel as well, which can help with stability when the trajectory needs to be corrected in real-time. Asymptotically, when a noise of norm ε is added to the trajectory during the rollout, the order of the error with the constant multiplier variant is O(1/εµ), while the order of the variant with affine multipliers is O(1/ε2). REFERENCES on Robotics and Automation (ICRA), May 2017. [Online]. Available: http://dx.doi.org/10.1109/ICRA.2017.7989016 [1] D. Liberzon, and optimal : a [13] T. A. Howell, B. E. Jackson, and Z. Manchester, “Altro: A fast solver for concise introduction. Princeton University Press, 2011. constrained trajectory optimization,” in 2019 IEEE/RSJ International [2] M. Diehl, H. G. Bock, H. Diedam, and P.-B. Wieber, “Fast direct Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, multiple shooting for optimal robot control,” in Fast motions pp. 7674–7679. in biomechanics and robotics. Springer, 2006, pp. 65–93. [14] Z. Xie, C. K. Liu, and K. Hauser, “Differential dynamic programming [3] J. Nocedal and S. Wright, Numerical optimization. Springer Science with nonlinear constraints,” in 2017 IEEE International Conference on & Business Media, 2006. Robotics and Automation (ICRA). IEEE, 2017, pp. 695–702. [4] J. Carpentier and N. Mansard, “Multicontact locomotion of legged [15] C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, robots,” IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1441–1460, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, 2018. “Crocoddyl: An Efficient and Versatile Framework for Multi-Contact [5] M. Neunert, C. De Crousaz, F. Furrer, M. Kamel, F. Farshidian, Optimal Control,” in IEEE International Conference on Robotics and R. Siegwart, and J. Buchli, “Fast nonlinear model predictive control Automation (ICRA), 2020. for unified trajectory optimization and tracking,” in 2016 IEEE [16] A. R. Conn, G. Gould, and P. L. Toint, LANCELOT: a Fortran package international conference on robotics and automation (ICRA). IEEE, for large-scale nonlinear optimization (Release A). Springer Science 2016, pp. 1398–1404. & Business Media, 2013, vol. 17. [6] M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectory [17] M. J. Powell, “A method for nonlinear constraints in minimization optimization of rigid bodies through contact,” The International Journal problems,” Optimization, pp. 283–298, 1969. of Robotics Research, vol. 33, no. 1, pp. 69–81, 2014. [18] B. Plancher, Z. Manchester, and S. Kuindersma, “Constrained unscented [7] D. Mayne, “A second-order for determining optimal dynamic programming,” 09 2017, pp. 5674–5680. trajectories of non-linear discrete-time systems,” International Journal [19] H. G. Bock and K.-J. Plitt, “A multiple shooting algorithm for direct of Control, vol. 3, no. 1, pp. 85–95, 1966. solution of optimal control problems,” IFAC Proceedings Volumes, [8] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential vol. 17, no. 2, pp. 1603–1608, 1984. dynamic programming,” in 2014 IEEE International Conference on [20] G. Guennebaud, B. Jacob et al., “Eigen v3,” http://eigen.tuxfamily.org, Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175. 2010. [9] D. M. Murray and S. J. Yakowitz, “Constrained differential dynamic [21] L. Fousse, G. Hanrot, V. Lefevre,` P. Pelissier,´ and P. Zimmermann, programming and its application to multireservoir control,” Water “Mpfr: A multiple-precision binary floating-point library with correct Resources Research, vol. 15, no. 5, pp. 1017–1027, 1979. rounding,” ACM Transactions on Mathematical Software (TOMS), [10] M. Giftthaler and J. Buchli, “A projection approach to equality con- vol. 33, no. 2, pp. 13–es, 2007. strained iterative linear quadratic optimal control,” in 2017 IEEE-RAS [22] J. Carpentier, F. Valenza, N. Mansard et al., “Pinocchio: fast forward 17th International Conference on Humanoid Robotics (Humanoids). and inverse dynamics for poly-articulated systems,” https://stack-of- IEEE, 2017, pp. 61–66. tasks.github.io/pinocchio, 2015–2019. [11] R. Budhiraja, J. Carpentier, C. Mastalli, and N. Mansard, “Differential [23] J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, dynamic programming for multi-phase rigid contact dynamics,” in O. Stasse, and N. Mansard, “The pinocchio c++ library – a fast and 2018 IEEE-RAS 18th International Conference on Humanoid Robots flexible implementation of rigid body dynamics algorithms and their (Humanoids). IEEE, 2018, pp. 1–9. analytical derivatives,” in IEEE International Symposium on System [12] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and Integrations (SII), 2019. J. Buchli, “An efficient optimal planning and control framework [24] J. Carpentier and N. Mansard, “Analytical derivatives of rigid body for quadrupedal locomotion,” 2017 IEEE International Conference dynamics algorithms,” in Robotics: Science and Systems, 2018.