Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM

Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM Abstract—Solving the inverse optimal control problem for In the Bayesian framework, unknown parameters are mod- discrete-time nonlinear systems requires the construction of a eled as random variables with probability density functions stabilizing feedback control law based on a control Lyapunov representing distributions of possible values. The EnKF is a function (CLF). However, there are few systematic approaches available for defining appropriate CLFs. We propose an ap- nonlinear Bayesian filter which uses ensemble statistics in proach that employs Bayesian filtering methodology to param- combination with the classical Kalman filter equations for eterize a quadratic CLF. In particular, we use the ensemble state and parameter estimation [9]–[11]. The EnKF has been Kalman filter (EnKF) to estimate parameters used in defining employed in many settings, including weather prediction the CLF within the control loop of the inverse optimal control [12], [13] and mathematical biology [11]. To the authors’ problem formulation. Using the EnKF in this setting provides a natural link between uncertainty quantification and optimal knowledge, this is the first proposed use of the EnKF in design and control, as well as a novel and intuitive way to find inverse optimal control problems. The novelty of using the the one control out of an ensemble that stabilizes the system the EnKF in this setting allows us to generate an ensemble of fastest. Results are demonstrated on both a linear and nonlinear control laws, from which we can then select the control law test problem. that drives the system to zero the fastest. While the nonlinear Index Terms—inverse optimal control, Bayesian statistics, problem has no guarantee of a unique control, we use the nonlinear filtering, ensemble Kalman filter (EnKF). control ensemble to find the best solution starting from a prior distribution of possible controls. The paper is organized as follows. We review the main I. INTRODUCTION ideas behind optimal control and inverse optimal control in HE aim of nonlinear optimal control [1], [2] is to deter- Section II and nonlinear Bayesian filtering and the EnKF T mine a control law for a given system that minimizes in Section III. In Section IV, we describe the application a cost functional relating the state and control variables. of the EnKF to parametrizing the CLF for inverse optimal The solution to this problem relies on solving the Hamilton- control problem. The results in Section V demonstrate the Jacobi-Bellman (HJB) equation, which has been solved for effectiveness of the EnKF CLF procedure on both a linear linear systems [3] but is very difficult to solve for general and nonlinear test example. nonlinear systems [4], [5]. An alternate approach is to find a stabilizing feedback control first, then establish that it optimizes a specified cost functional – this is known as the II. OPTIMAL AND INVERSE OPTIMAL CONTROL inverse optimal control problem. In this section we describe the optimal control problem and Solving the inverse optimal control problem for discrete- inverse optimal control problem for discrete-time nonlinear time nonlinear systems requires the construction of a stabiliz- systems using similar notations as in [8]. For details on feed- ing feedback control law based on a control Lyapunov func- back control methodology for nonlinear dynamic systems, tion (CLF). However, there are few systematic approaches see, e.g., [14]. available for defining appropriate CLFs. Available meth- Consider the discrete-time affine nonlinear system ods parameterize quadratic CLFs using a recursive speed- gradient algorithm [6], particle swarm optimization [7] or, xk+1 = f(xk) + g(xk)uk; x0 = x(0); (1) more recently, the extended Kalman filter (EKF) [8]. n This work develops a novel approach employing Bayesian where xk 2 R is the state of the system at time k, uk 2 Rm is the control input at time k, and f : Rn ! Rn and filtering methodology to parameterize a quadratic CLF. In n n×m particular, we use the ensemble Kalman filter (EnKF) to g : R ! R are smooth mappings with f(0) = 0 and estimate parameters used in defining the CLF within the g(xk) 6= 0 for all xk 6= 0. The nonlinear optimal control control loop of the inverse optimal control problem. Using problem is to determine a control law uk that minimizes the the EnKF in this setting provides a natural link between associated cost functional uncertainty quantification and optimal design and control, 1 X T as well as an intuitive way to find the one control out of an V (xk) = L(xn) + unEun ; (2) ensemble that drives the system to zero the fastest. n=k n n where V : ! + has V (0) = 0, L : ! + is Manuscript submitted December 19, 2017. This work was supported in R R R R part by National Science Foundation grant number NSF RTG/DMS-1246991 positive semidefinite, and E is a real, symmetric positive (Research Training Group in Mathematical Biology at NC State). definite m × m weighting matrix. The boundary condition Andrea Arnold is with the Department of Mathematical Sciences, V (0) = 0 is necessary so that V (xk) can be used as a CLF. Worcester Polytechnic Institute, Worcester, MA 01609, USA, e-mail: [email protected]. The cost functional (2) can be rewritten as Hien Tran is with the Department of Mathematics, North Carolina State T University, Raleigh, NC 27695, USA, e-mail: [email protected]. V (xk) = L(xk) + uk Euk + V (xk+1): (3) For an infinite horizon control problem, the time-invariant III. NONLINEAR BAYESIAN FILTERING AND THE ENKF ∗ function V (xk) satisfies the discrete-time Bellman equation We approach the solution to the inverse optimal control problem from the Bayesian statistical framework, using non- ∗ T ∗ V (xk) = min L(xk) + uk Euk + V (xk+1) : (4) linear Bayesian filtering methodology to parameterize the u k quadratic CLF. In the Bayesian framework, the quantities of Taking the gradient of (4) with respect to uk yields the interest (such as the system states or parameters) are treated optimal control as random variables with probability distributions, and their joint posterior density is assembled using Bayes’ theorem. ∗ ∗ 1 −1 T @V (xk+1) In particular, if x denotes the states of a system and y some uk = − E g (xk) (5) 2 @xk+1 partial, noisy system observations, then Bayes’ theorem gives which, when substituting into (3), yields the discrete-time π(x j y) / π(y j x)π(x); (11) Hamilton-Jacobi-Bellman (HJB) equation where the likelihood function π(y j x) indicates how likely it is that the data y are observed if the state values were V ∗(x ) = L(x ) + V ∗(x ) k k k+1 known and the prior distribution π(x) encodes any known ∗T 1 @V (xk+1) −1 T @V (xk+1) information on the states before taking the data into account. + g(xk)E g (xk) : (6) 4 @xk+1 @xk+1 Bayesian filtering methods rely on the use of discrete- time stochastic equations describing the model states and ob- Since solving the discrete-time HJB equation (6) is very servations to sequentially update the joint posterior density. difficult for general nonlinear systems, an alternative ap- Assuming a time discretization tk, k = 0; 1;:::;T , with the proach is to consider the inverse optimal control problem. observations yk occurring possibly in a subset of the discrete In inverse optimal control, the first step is to construct a time instances (where yk = ; if there is no observation stabilizing feedback control law, then to establish that the at tk), we can write an evolution-observation model for control law optimizes a given cost functional. By definition, the stochastic state and parameter estimation problem using the control law discrete-time Markov models. The state evolution equation ∗ ∗ 1 −1 T @V (xk+1) X = F (X ) + V ;V ∼ N (0; Q ); (12) uk = − E g (xk) (7) k+1 k k+1 k+1 k+1 2 @xk+1 where F is a known propagation model and Vk+1 is an is inverse optimal if it satisfies the following two criteria: innovation process, computes the forward time propagation 1) It achieves (global) exponential stability of the equi- of the state variables Xk given parameters θ, while the librium point xk = 0 for the system (1). observation equation 2) It minimizes the defined cost functional (2), for which Yk+1 = G(Xk+1) + Wk+1;Wk+1 ∼ N (0; Rk+1); (13) L(xk) = −V with where G is a known operator and Wk+1 is the observation ∗T ∗ V := V (xk+1) − V (xk) + uk Euk ≤ 0; (8) noise, predicts the observation at time tk+1 based on the current state and parameter values. where V (xk) is positive definite. Letting Dk = y1; y2; : : : ; yk denote the set of obser- A control law satisfying the above definition can be defined vations up to time tk, the stochastic evolution-observation using a quadratic control Lyapunov function (CLF) of the model allows us to sequentially update the posterior distri- form bution π(xk j Dk) using a two-step, predictor-corrector-type 1 V (x ) = xTPx ; (9) scheme: k 2 k k π(xk j Dk) ! π(xk+1 j Dk) ! π(xk+1 j Dk+1) (14) where the matrix P 2 Rn×n is symmetric positive definite (i.e., P = PT > 0). Once an appropriate CLF (9) has been The first step (the prediction step) employs the state evolution selected, the state feedback control law (7) becomes equation (12) to predict the values of the states at time tk+1 without knowledge of the data.

Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM

Propt Product Sheet

Optimal Control Theory Version 0.2

Optimal Control Theory

PSO-Based Soft Lunar Landing with Hazard Avoidance: Analysis and Experimentation

ID 467 Optimal Control Applied to an Economic Model for Reducing Unemployment Problems in Bangladesh

Optimal Control the Optimal Control Problem Can Be Described By

Optimal and Autonomous Control Using Reinforcement Learning: a Survey

Tomlab Product Sheet

Learning, Expectations Formation, and the Pitfalls of Optimal Control Monetary Policy

Decentralized Optimal Control of Connected and Automated Vehicles in a Corridor

Numerical Methods for Solving Optimal Control Problems

Linear-Quadratic Optimal Control in Maximal Coordinates