2005 American Control Conference ThA18.2 June 8-10, 2005. Portland, OR, USA
An Efficient Sequential Linear Quadratic Algorithm for Solving Nonlinear Optimal Control Problems
Athanasios Sideris and James E. Bobrow Department of Mechanical and Aerospace Engineering University of California, Irvine Irvine, CA 92697 {asideris, jebobrow}@uci.edu
Abstract— We develop a numerically efficient algorithm for continuous-time optimal control problems can handled after computing controls for nonlinear systems that minimize a discretization. We are interested primarily in systems for quadratic performance measure. We formulate the optimal which the derivatives of the dynamic equations with respect control problem in discrete-time, but many continuous-time problems can be also solved after discretization. Our ap- to the state and the control are available, but whose second proach is similar to sequential quadratic programming for derivatives with respect to the states and the controls are too finite-dimensional optimization problems in that we solve the complex to compute analytically. Our algorithm is based nonlinear optimal control problem using sequence of linear on linearizing the system dynamics about a input/state quadratic subproblems. Each subproblem is solved efficiently trajectory and solving a corresponding linear quadratic using the Riccati difference equation. We show that each iteration produces a descent direction for the performance optimal control problem. From the solution of the latter measure, and that the sequence of controls converges to a problem, we obtain a search direction along which we solution that satisfies the well-known necessary conditions for minimize the performance index by direct simulation of the optimal control. the system dynamics. Given the structure of the proposed algorithm, we refer to it in the sequel as the Sequential I. INTRODUCTION Linear Quadratic (SLQ) algorithm. We prove that search As the complexity of nonlinear systems such as robots directions produced in this manner are descent directions, increases, it is becoming more important to find controls and that this algorithm converges to a control that locally for them that minimize a performance measure such as minimizes the cost function. Solution of the linear quadratic power consumption. Although the Maximum Principle [1] problem is well-known, and can be reliably obtained by provides the optimality conditions for minimizing a given solving a Riccati difference equation. cost function, it does not provide a method for its nu- Algorithms similar in spirit are reported in [6], [7], merical computation. Because of the importance in solving [8], [9]. Such algorithms implement a Newton search, these problems, many numerical algorithms and commercial or asymptotically Newton search, but require that 2nd- software packages have been developed to solve them order system derivatives be available. Newton algorithms since the 1960’s [2]. The various approaches taken can can achieve quadratic local convergence under favorable be classified as either indirect or direct methods. Indirect circumstances such as the existence of continuous 3rd-order methods explicitly solve the optimality conditions stated or Lipschiz continuous 2nd-order derivatives, but cannot in terms of the maximum principle, the adjoint equations, guarantee global convergence (that is convergence from any and the tranversality (boundary) conditions. Direct methods starting point to a local minimum) unless properly modified. such as collocation [3] techniques, or direct transcription, Many systems are too complex for 2nd-order derivatives to replace the ODE’s with algebraic constraints using a large be available or even do not satisfy such strong continuity set of unknowns. These collocation methods are powerful assumptions (e.g. see Section IV for examples.) In such for solving trajectory optimization problems [4], [5] and cases, Newton’s method cannot be applied, or the quadratic problems with inequality constraints. However, due to the convergence rate of Newton’s method does not material- large-scale nature of these problems, convergence can be ize. We have found that our approach efficiently solves slow. Furthermore, optimal control problems can be difficult optimal control problems that are difficult to solve with to solve numerically for a variety of reasons. For instance, other popular algorithms such as collocation methods (see the nonlinear system dynamics may create an unfavorable Example, Section IV.) More specifically, we have observed eigenvalue structure of the Hessian of the ensuing optimiza- that our algorithm exhibits near-quadratic convergence in tion problem, and gradient-based descent methods will be many of the problems that we have tested. Indeed, it inefficient. is shown in the full version of the paper [10], that the In this paper, we focus on the case of finding opti- proposed algorithm can be interpreted as a Gauss-Newton mal controls for nonlinear dynamic systems with general method, thus explaining its excellent rate of convergence linear quadratic performance indexes (including tracking properties observed in simulations. Thus although many of and regulation). The discrete-time case is considered but the alternative methods ([2], [12], [11], [8]) can be applied 0-7803-9098-9/05/$25.00 ©2005 AACC 2275 to a broader class of problems, our SLQ algorithm provides we have the recursion: a fast and reliable alternative to such algorithms for the T T λ (n)=Lx(x(n),u(n),n)+λ (n +1)fx(x(n),u(n)) important class of optimal control problems with quadratic o T T =[x(n) − x (n)] Q(n)+λ (n +1)fx(x(n),u(n)) (8) cost under general nonlinear dynamics, while relying only on first derivative information. by using (3) and where Lx and fx denote the partials of L and f respectively with respect to the state variables. The II. PROBLEM FORMULATION AND BACKGROUND previous recursion can be solved backward in time (n = RESULTS N − 1,...,0) given the control and state trajectories and it We consider the following general formulation of can be started with the final value: discrete-time optimal control problems. ∂L N λT N ( ) x N − xo N T Q N ( )= ∂x N =[ ( ) ( )] ( ) (9) N −1 ( ) Minimize J = φ(x(N)) + L(x(n),u(n),n) u(n),x(n) (1) derived from (4). In a similar manner, we compute the n=0 J n u n subject to x(n +1)=f(x(n),u(n)); x(0) = x sensitivity of ( ) with respect to the current control ( ). 0 (2) Clearly from (7), In the formulation above we assume a quadratic perfor- ∂J(n) T mance index, namely: = Lu(x(n),u(n),n)+λ (n +1)fu(x(n),u(n)) ∂u(n) 1 o T o o T T L(x(n),u(n),n)= [x(n) − x (n)] Q(n)[x(n) − x (n)] + =[u(n) − u (n)] R(n)+λ (n +1)fu(x(n),u(n)). (10) 2 [u(n) − uo(n)]T R(n)[u(n) − uo(n)] (3) In (10), Lu and fu denote the partials of L and f respec- and tively with respect to the control variables and (3) is used. 1 o T o ∂J ∂J(n) φ(x)= [x − x (N)] Q(N)[x − x (N)] (4) Next note that = since the first n terms in J 2 ∂u(n) ∂u(n) do not depend on u(n). We have then obtained the gradient o o In (3) and (4), u (n), x (n), n =1,... N are given of the cost with respect to the control variables, namely: control input and state target sequences. In standard optimal uo n xo n ∂J(0) ∂J(1) ∂J(N − 1) regulator control problem formulations, ( ), ( ) are ∇uJ = ... . (11) usually taken to be zero with the exception perhaps of ∂u(0) ∂u(1) ∂u(N − 1) xo N ( ), the desired final value for the state. The formulation Assuming u is unconstrained, the first order optimality considered here addresses the more general optimal tracking conditions require that control problem and is required for the linear quadratic step in the proposed algorithm presented in Section III.1. ∇uJ =0. (12)
A. First Order Optimality Conditions We remark that by considering the Hamiltonian We next briefly review the first order optimality condi- T H(x, u, λ, n) ≡ L(x, u, n)+λ f(x, u), (13) tions for the optimal control problem of (1) and (2), in ∂J a manner that brings out certain important interpretations we have that Hu(x(n),u(n),λ(n +1),n) ≡ ∂u(n) , i.e. we of the adjoint dynamical equations encountered in a control uncover the generally known but frequently overlooked fact theoretic approach and Lagrange Multipliers found in a pure that the partial of the Hamiltonian with respect to the control optimization theory approach. variables u is the gradient of the cost function with respect Let us consider the cost-to-go: to u. We emphasize here that in our approach for solving N −1 the optimal control problem, we take the viewpoint of the J(n) ≡ L(x(k),u(k),k)+φ(x(N)) (5) control variables u(n) being the independent variables of n=k the problem since the dynamical equations express (recur- sively) the state variables in terms of the controls and thus with L and φ as defined in (3) and (4) respectively. We can be eliminated from the cost function. Thus in taking remark that J(n) is a function of x(n),andu(k), k = the partials of J with respect to u, J is considered as a n,...,N− 1 and introduce the sensitivity of the cost to go function u(n), n =0,...,N− 1 alone, assuming that x(0) with respect to the current state: is given. With this perspective, the problem becomes one T ∂J(n) of unconstrained minimization, and having computed ∇uJ, λ (n)= (6) ∂x(n) Steepest Descent, Quasi-Newton, and other first derivative methods can be brought to bear to solve it. However, due From to the large-scale character of the problem, only methods J(n)=L(x(n),u(n),n)+J(n +1), (7) that take advantage of the special structure of the problem and since become viable. The Linear Quadratic Regulator algorithm is such an approach in case of linear dynamics. We briefly ∂J(n+1) = ∂J(n+1) · ∂x(n+1) = λT (n +1)f (x(n),u(n)) ∂x(n) ∂x(n+1) ∂x(n) x , review it next. 2276 B. Linear Quadratic Tracking Problem By replacing λ¯(n) and λ¯(n +1) in (19) in terms of x¯(n) x n We next consider the case of linear dynamics in the and ¯( +1)from (20), we obtain optimal control problem of (1) and (2). In the following, P (n)¯x(n)+s(n)= we distinguish all variables corresponding to the linear T = Q(n)¯x(n)+A (n)[P (n +1)¯x(n +1)+ optimal control problem that may have different values in s n − Q n xo n , the nonlinear optimal control problem by using an over-bar. + ( +1)] ( )¯ ( ) When the cost is quadratic as in (3) we have the well- and by expressing x¯(n +1) from (22) and (24) above, we known Linear Quadratic Tracking problem. The control get theoretic approach to this problem is based on solving the P n x n s n first order necessary optimality conditions (also sufficient in ( )¯( )+ ( )= this case) in an efficient manner by introducing the Riccati = Q(n)¯x(n)+AT (n)P (n +1)M(n)A(n)¯x(n) − equation. We briefly elaborate on this derivation next, for −AT (n)P (n +1)M(n)B(n)R−1(n)BT (n)s(n +1) completeness and also since most references assume that T o +A (n)P (n +1)M(n)B(n)¯u (n)+ the target sequences xo(n) and uo(n) are zero. First, we AT n s n − Q n xo n . summarize the first order necessary optimality conditions + ( ) ( +1) ( )¯ ( ) for this problem. The above equation is satisfied by taking P (n)=Q(n)+AT (n)P (n +1)M(n)A(n) (25) x¯(n +1) = A(n)¯x(n)+B(n)¯u(n) (14) s(n)=AT (n) I − P (n +1)M(n)B(n)R−1(n)· λ¯T (n)=[¯x(n) − x¯o(n)]T Q(n)+ ·BT (n) s(n +1)+ +λ¯T (n +1)A(n) (15) +AT (n)P (n +1)M(n)B(n)¯uo(n) − ∂J¯(n)/∂u¯(n)=[¯u(n) − u¯o(n)]T R(n)+ −Q(n)¯xo(n) (26) +λ¯T (n +1)B(n)=0 (16) Equation (25) is the well-known Riccati difference equation Note that the system dynamical equations (14) run forward and together with the auxiliary equation (26), which is un- in time n ,...,N − with initial conditions x o o =0 1 ¯(0) = necessary if x¯ (n) and u¯ (n) are zero, are solved backward x0 given, while the adjoint dynamical equations (15) run ¯ in time (n = N − 1,...,1), with final values given by (21) backward in time, n N − ,..., with final conditions = 1 0 and together with (23) and (24). The resulting values P (n) λ¯T N x N − xo N T Q N . From (16), we obtain ( )=[¯( ) ¯ ( )] ( ) and s(n) are stored and used to solve forward in time (22) u¯(n)=¯uo(n) − R−1(n)BT (n)λ¯(n +1) (17) and (17) for the optimal control and state trajectories. and by substituting in (14) and (15), we obtain the classical III. MAIN RESULTS two-point boundary system but with additional forcing A. Formulation of the SLQ Algorithm terms due to the x¯o(n) and u¯o(n) sequences. In the proposed SLQ algorithm, the control at stage k+1 x¯(n +1) = A(n)¯x(n) − B(n)R−1(n)BT (n)λ¯(n +1)+ is found by performing a one-dimensional search from the o k +B(n)¯u (n) (18) control at stage and along a search direction that is T T found by solving an Linear Quadratic (LQ) optimal control λ¯ (n)=Q(n)¯x(n)+A (n)λ¯(n +1)− T T T problem. Specifically, let Uk =[u (0) u (1) ...u (N − −Q n xo n . T ( )¯ ( ) (19) 1)] be the optimal solution candidate at step k,and T T T T The system of (18) and (19) can be solved by the sweep Xk =[x (1) x (2) ...x (N)] the corresponding state method [1], based on the postulated relation trajectory obtained by solving the dynamical equations (2) using Uk and with the initial conditions x(0).Wenext λ¯ n P n x n s n ( )= ( )¯( )+ ( ) (20) linearize the state equations (2) about the nominal trajectory where P (n) and s(n) are appropriate matrices that can be of Uk and Xk. The linearized equations are found as follows. For n = N, (20) holds with x¯(n +1)=fx(x(n),u(n))¯x(n)+fu(x(n),u(n))¯u(n) P (N)=Q(N),s(N)=−Q(N)¯xo(N). (21) (27) with initial conditions x¯(0) = 0. We then minimize the We now substitute (20) in (18) and after some algebra we cost index (1) with respect to u¯(n). The solution of this obtain T T T T LQ problem gives U¯k =[¯u (0)u ¯ (1) ...u¯ (N − 1)] , x¯(n +1)=M(n)A(n)¯x(n)+v(n) (22) the proposed search direction. Thus, the control variables where we defined at stage k +1of the algorithm are obtained from −1 T −1 M(n)= I + B(n)R (n)B (n)P (n +1) (23) Uk+1 = Uk + αk · U¯k (28) o −1 T v n M n B n u n − R n B n s n . + ( )= ( ) ( )[¯ ( ) ( ) ( ) ( +1)] where αk ∈ R is appropriate stepsize the selection of (24) which is discussed later in the paper. Note again our 2277 perspective of considering the optimal control problem as where A(n)=fx(x(n),u(n)) and B(n)=fu(x(n),u(n)). an unconstrained finite-dimensional optimization problem Let us define in U. λ˜(n)=λ¯(n) − λ(n) (32) We emphasize that U¯k as computed above is not the and note from (8) and (15) that steepest descent direction. It is the solution to a linear quadratic tracking problem for a nonlinear system that has λ˜T (n)=¯xT (n)Q(n)+λ˜T (n +1)A(n) been linearized about Uk. Note that the objective function (33) is not linearized for this solution. Our algorithm is different λ˜ N Q N x N . than standard Quasilinearization [1] and Neighboring Ex- ( )= ( )¯( ) tremal [13] methods where the adjoint equations are also Next through the indicated algebra, we can establish the linearized and two-point boundary problems are solved. following relation: As it was mentioned earlier, our algorithm corresponds to ∂J n the Gauss-Newton method for solving the optimal control ( ) · u n ∂u n ¯( )= problem of (1) and (2). ( ) = [u(n) − uo(n)]T R(n)+λT (n +1)B(n) u¯(n) B. Properties of the SQL Algorithm (using (10)) In this section, we prove two important properties of the = −λ˜T (n +1)B(n)¯u(n) − u¯T (n)R(n)¯u(n) U¯ proposed algorithm. First, we show that search direction (using (16)) is a descent direction. = −λ˜T (n +1)¯x(n +1)+λ˜T (n +1)A(n)¯x(n) − T Theorem 1: Consider the discrete-time nonlinear optimal −u¯ (n)R(n)¯u(n) (using (30)) control problem of (1) and (2), and assume a quadratic −λ˜T n x n λ˜T n x n − cost function as in (3) and (4) with R(n)=RT (n) > 0, = ( +1)¯( +1)+ ( )¯( ) T T Q(n)=QT (n) ≥ 0, n =0, 1,...,N − 1,andQ(0) = 0, −x¯ (n)Q(n)¯x(n) − u¯ (n)R(n)¯u(n). Q N QT N ≥ ( )= ( ) 0. Also consider a control sequence (using (33)) U ≡ [uT (0) ...uT (N − 1)]T and the corresponding state trajectory X ≡ [xT (1) ...xT (N)]T . Next, linearize system Finally, summing up the above equation from n =0to (2) about U and X and solve the following linear quadratic n = N − 1 and noting that x¯(0) = 0 and from (21) that problem: λ¯(N)=Q(N)¯x(N),gives:
N−1 Minimize ¯ 1 o T o ∂J(n) J = [¯x(N) − x¯ (N)] Q(N)[¯x(N)+¯x (N)] + ∇ J · U¯ = · u¯(n) u¯(n), x¯(n) 2 u ∂u(n) n=0 1 N −1 + [¯x(n) − x¯o(n)]T Q(n)[¯x(n) − x¯o(n)]+ N −1 2 = − [¯xT (n)Q(n)¯x(n)+¯uT (n)R(n)¯u(n)] − n=0 n=0 o T o +[¯u(n) − u¯ (n)] R(n)[¯u(n) − u¯ (n)] T (29) −x¯ (N)Q(N)¯x(N) < 0 (34) subject to
x¯(n +1)=fx(x(n),u(n))¯x(n)+ and the proof of the theorem is complete. +fu(x(n),u(n))¯u(n); (30) x¯(0) = 0, We remark that the search direction U¯ can be found by the LQ algorithm of Section II.B with o o o o where x¯ (n) ≡ x (n) − x(n), u¯ (n) ≡ u (n) − u(n).Then A(n) ≡ fx(x(n),u(n)) and B(n) ≡ fu(x(n),u(n)). if U¯ ≡ [¯uT (0) ...u¯T (N − 1)]T is not zero, it is a descent direction for the cost function (1), i.e. J(U +α·U¯)
15 Stance Flight Stance Flight 10 phase phase phase phase m 5
0
−5
Mass and piston position, inch 0 0.2 0.4 0.6 0.8 1
yp y 40
20
0
−20
Minumum fuel control, u −40 0 0.2 0.4 0.6 0.8 1 Time, seconds
Fig. 1. Maximum height hopping motion and minimum fuel control.
the hopping system and it naturally identified the times to REFERENCES switch between these phases. If one were to use collocation [1] A.E. Bryson and Y.C. Ho, Applied Optimal Control, Wiley, New methods to solve this problem with explicit consideration York, 1995. of the different dynamics in the different phases, one would [2] J.T. Betts, “Survey of Numerical Methods for Trajectory Optimiza- tion,” Journal of Guidance, Control and Dynamics, V. 21: (2) 193- have to guess the number of switches between phases and 207, 1999. would need to treat the times at which the switch occurs [3] O. von Stryk, “Numerical solution of optimal control problems by as variables in the optimization. In the full version of the direct collocation,” Optimal Control, Bulirsch R, Miele A, Stoer J, Well KH (eds), International Series of Numerical Mathematics, vol. paper [10] (omitted here due to space limitations), we show 111. Birkshauser Verlag; Basel, 1993; 129143. that the proposed SLQ algorithm can be interpreted as a [4] C. R. Hargraves and S. W. Paris, “Direct trajectory optimization Gauss-Newton algorithm and thus explain its excellent rate using nonlinear programming and collocation,” Journal of Guidance, Control, and Dynamics, 1987; 10:338342. of convergence observed in simulations. Consistent with [5] P. J. Enright and B. A. Conway, “Optimal finite-thrust spacecraft this interpretation, our algorithm converges much faster trajectories using collocation and nonlinear programming,” Journal when the weighting on the control r is increased; also of Guidance, Control, and Dynamics, 1991; 14:981 985. [6] J. F. Pantoja, “Differential Dynamic Programming and Newton’s the number of iterations required for convergence in this Method,” International Journal on Control, Vol. 47, No. 5, pp. 1539- o o problem increases for larger yN , ranging from 3 for yN =1, 1553, 1988. o [7] J. C. Dunn and D. P. Bertsekas, “Efficient Dynamic Programming to 166 for yN =50. In addition, the algorithm failed to α< × −5, Implementations of Newton’s Method for Unconstrained Optimal converge for 1 10 which demonstrates the need Control Problems,” Journal of Optimization Theory and Applications, for the dynamics to be continuously differentiable. Vol. 63, No. 1, pp. 23-38, 1989. [8] J. F. Pantoja and D. Q. Mayne, “Sequential Quadratic Programming Algorithm for Discrete Optimal Control Problems with Control Inequality Constraints,” International Journal on Control, Vol. 53, V. C ONCLUSION No. 4, pp. 823-836, 1991. [9] L. Z. Liao and C. A. Shoemaker, “Convergence in Unconstrained Discrete- Time Differential Dynamic Programming,” IEEE Transac- tions on Automatic Control, Vol. 36, No. 6, pp. 692-706, 1991. We developed an algorithm for solving nonlinear optimal [10] A. Sideris and J. E. Bobrow, “An Efficient Sequential Linear control problems with quadratic performance measures and Quadratic Algorithm for Solving Nonlinear Optimal Control Prob- unconstrained controls. Each subproblem in the course of lems,” submitted to IEEE Transactions on Automatic Control, June 2004. the algorithm is a linear quadratic optimal control prob- [11] S. J. Wright, “Interior-Point Methods for Optimal Control of lem that can be efficiently solved by Riccati difference Discrete-Time Systems,” Journal of Optimization Theory and Ap- equations. We show that each search direction generated in plications, Vol. 77, No. 1, pp. 161-187, 1993. [12] C. R. Dohrmann and R. D. Robinett, “Dynamic Programming the linear quadratic subproblem is a descent direction, and Method for Constrained Discrete-Time Optimal Control,” Journal Of that the algorithm is convergent. Computational experience Optimization Theory And Applications, Vol. 101, No. 2. pp. 259-283, has demonstrated that the algorithm converges quickly to 1999. [13] T. Veeraklaew and S. K. Agrawal, “Neighboring optimal feedback the optimal solution. The SLQ algorithm is proposed as a law for higher-order dynamic systems,” ASME J. Dynamic Systems, powerful alternative to Newton methods in situations that Measurement, and Control, 124 (3): 492-497 SEP 2002. second order derivative information on the system dynamics [14] R. Fletcher, Practical Methods of Optimization, Wiley, 2nd edition, 1987. is not available or expensive to obtain and a near real-time [15] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, solver is required. Harcourt Brace, 1981. 2280