Ensemble Kalman Filtering for Inverse Optimal Control Andrea Arnold and Hien Tran, Member, SIAM

Abstract—Solving the inverse optimal control problem for In the Bayesian framework, unknown parameters are mod- discrete-time nonlinear systems requires the construction of a eled as random variables with probability density functions stabilizing feedback control law based on a control Lyapunov representing distributions of possible values. The EnKF is a function (CLF). However, there are few systematic approaches available for defining appropriate CLFs. We propose an ap- nonlinear Bayesian filter which uses ensemble statistics in proach that employs Bayesian filtering methodology to param- combination with the classical Kalman filter equations for eterize a quadratic CLF. In particular, we use the ensemble state and parameter estimation [9]–[11]. The EnKF has been Kalman filter (EnKF) to estimate parameters used in defining employed in many settings, including weather prediction the CLF within the control loop of the inverse optimal control [12], [13] and mathematical biology [11]. To the authors’ problem formulation. Using the EnKF in this setting provides a natural link between uncertainty quantification and optimal knowledge, this is the first proposed use of the EnKF in design and control, as well as a novel and intuitive way to find inverse optimal control problems. The novelty of using the the one control out of an ensemble that stabilizes the system the EnKF in this setting allows us to generate an ensemble of fastest. Results are demonstrated on both a linear and nonlinear control laws, from which we can then select the control law test problem. that drives the system to zero the fastest. While the nonlinear Index Terms—inverse optimal control, Bayesian statistics, problem has no guarantee of a unique control, we use the nonlinear filtering, ensemble Kalman filter (EnKF). control ensemble to find the best solution starting from a prior distribution of possible controls. The paper is organized as follows. We review the main I.INTRODUCTION ideas behind optimal control and inverse optimal control in HE aim of nonlinear optimal control [1], [2] is to deter- Section II and nonlinear Bayesian filtering and the EnKF T mine a control law for a given system that minimizes in Section III. In Section IV, we describe the application a cost functional relating the state and control variables. of the EnKF to parametrizing the CLF for inverse optimal The solution to this problem relies on solving the Hamilton- control problem. The results in Section V demonstrate the Jacobi-Bellman (HJB) equation, which has been solved for effectiveness of the EnKF CLF procedure on both a linear linear systems [3] but is very difficult to solve for general and nonlinear test example. nonlinear systems [4], [5]. An alternate approach is to find a stabilizing feedback control first, then establish that it optimizes a specified cost functional – this is known as the II.OPTIMALAND INVERSE OPTIMAL CONTROL inverse optimal control problem. In this section we describe the optimal control problem and Solving the inverse optimal control problem for discrete- inverse optimal control problem for discrete-time nonlinear time nonlinear systems requires the construction of a stabiliz- systems using similar notations as in [8]. For details on feed- ing feedback control law based on a control Lyapunov func- back control methodology for nonlinear dynamic systems, tion (CLF). However, there are few systematic approaches see, e.g., [14]. available for defining appropriate CLFs. Available meth- Consider the discrete-time affine nonlinear system ods parameterize quadratic CLFs using a recursive speed- gradient algorithm [6], particle swarm optimization [7] or, xk+1 = f(xk) + g(xk)uk, x0 = x(0), (1) more recently, the extended Kalman filter (EKF) [8]. n This work develops a novel approach employing Bayesian where xk ∈ R is the state of the system at time k, uk ∈ Rm is the control input at time k, and f : Rn → Rn and filtering methodology to parameterize a quadratic CLF. In n n×m particular, we use the ensemble Kalman filter (EnKF) to g : R → R are smooth mappings with f(0) = 0 and estimate parameters used in defining the CLF within the g(xk) 6= 0 for all xk 6= 0. The nonlinear optimal control control loop of the inverse optimal control problem. Using problem is to determine a control law uk that minimizes the the EnKF in this setting provides a natural link between associated cost functional uncertainty quantification and optimal design and control, ∞ X  T  as well as an intuitive way to find the one control out of an V (xk) = L(xn) + unEun , (2) ensemble that drives the system to zero the fastest. n=k n n where V : → + has V (0) = 0, L : → + is Manuscript submitted December 19, 2017. This work was supported in R R R R part by National Science Foundation grant number NSF RTG/DMS-1246991 positive semidefinite, and E is a real, symmetric positive (Research Training Group in Mathematical Biology at NC State). definite m × m weighting matrix. The boundary condition Andrea Arnold is with the Department of Mathematical Sciences, V (0) = 0 is necessary so that V (xk) can be used as a CLF. Worcester Polytechnic Institute, Worcester, MA 01609, USA, e-mail: [email protected]. The cost functional (2) can be rewritten as Hien Tran is with the Department of Mathematics, North Carolina State T University, Raleigh, NC 27695, USA, e-mail: [email protected]. V (xk) = L(xk) + uk Euk + V (xk+1). (3) For an infinite horizon control problem, the time-invariant III.NONLINEAR BAYESIAN FILTERING AND THE ENKF ∗ function V (xk) satisfies the discrete-time We approach the solution to the inverse optimal control   problem from the Bayesian statistical framework, using non- ∗ T ∗ V (xk) = min L(xk) + uk Euk + V (xk+1) . (4) linear Bayesian filtering methodology to parameterize the u k quadratic CLF. In the Bayesian framework, the quantities of

Taking the gradient of (4) with respect to uk yields the interest (such as the system states or parameters) are treated optimal control as random variables with probability distributions, and their joint posterior density is assembled using Bayes’ theorem. ∗ ∗ 1 −1 T ∂V (xk+1) In particular, if x denotes the states of a system and y some uk = − E g (xk) (5) 2 ∂xk+1 partial, noisy system observations, then Bayes’ theorem gives which, when substituting into (3), yields the discrete-time π(x | y) ∝ π(y | x)π(x), (11) Hamilton-Jacobi-Bellman (HJB) equation where the likelihood function π(y | x) indicates how likely it is that the data y are observed if the state values were V ∗(x ) = L(x ) + V ∗(x ) k k k+1 known and the prior distribution π(x) encodes any known ∗T 1 ∂V (xk+1) −1 T ∂V (xk+1) information on the states before taking the data into account. + g(xk)E g (xk) . (6) 4 ∂xk+1 ∂xk+1 Bayesian filtering methods rely on the use of discrete- time stochastic equations describing the model states and ob- Since solving the discrete-time HJB equation (6) is very servations to sequentially update the joint posterior density. difficult for general nonlinear systems, an alternative ap- Assuming a time discretization tk, k = 0, 1,...,T , with the proach is to consider the inverse optimal control problem. observations yk occurring possibly in a subset of the discrete In inverse optimal control, the first step is to construct a time instances (where yk = ∅ if there is no observation stabilizing feedback control law, then to establish that the at tk), we can write an -observation model for control law optimizes a given cost functional. By definition, the stochastic state and parameter estimation problem using the control law discrete-time Markov models. The state evolution equation ∗ ∗ 1 −1 T ∂V (xk+1) X = F (X ) + V ,V ∼ N (0, Q ), (12) uk = − E g (xk) (7) k+1 k k+1 k+1 k+1 2 ∂xk+1 where F is a known propagation model and Vk+1 is an is inverse optimal if it satisfies the following two criteria: innovation process, computes the forward time propagation 1) It achieves (global) exponential stability of the equi- of the state variables Xk given parameters θ, while the librium point xk = 0 for the system (1). observation equation 2) It minimizes the defined cost functional (2), for which Yk+1 = G(Xk+1) + Wk+1,Wk+1 ∼ N (0, Rk+1), (13) L(xk) = −V with where G is a known operator and Wk+1 is the observation ∗T ∗ V := V (xk+1) − V (xk) + uk Euk ≤ 0, (8) noise, predicts the observation at time tk+1 based on the current state and parameter values. where V (xk) is positive definite.  Letting Dk = y1, y2, . . . , yk denote the set of obser- A control law satisfying the above definition can be defined vations up to time tk, the stochastic evolution-observation using a quadratic control Lyapunov function (CLF) of the model allows us to sequentially update the posterior distri- form bution π(xk | Dk) using a two-step, predictor-corrector-type 1 V (x ) = xTPx , (9) scheme: k 2 k k π(xk | Dk) → π(xk+1 | Dk) → π(xk+1 | Dk+1) (14) where the matrix P ∈ Rn×n is symmetric positive definite (i.e., P = PT > 0). Once an appropriate CLF (9) has been The first step (the prediction step) employs the state evolution selected, the state feedback control law (7) becomes equation (12) to predict the values of the states at time tk+1 without knowledge of the data. The second step (the 1 1 −1 analysis step or observation update) then uses the observation u∗ = − E + gT(x )Pg(x ) gT(x )Pf(x ). (10) k 2 2 k k k k equation (13) to correct the prediction by taking into account the data at time t . If there is no data observed at t , Therefore, the problem at hand is to select an appropriate k+1 k+1 then D = D and the prediction density π(x | D ) matrix P to achieve stability and minimize a meaningful cost k+1 k k+1 k is equivalent to the posterior π(x | D ). Starting with function. k+1 k+1 a prior density π(x | D ), D = ∅, this updating scheme As noted in the introduction, currently proposed methods 0 0 0 is repeated until the final posterior density is obtained when to estimate the entries of the matrix P in the CLF (9) k = T . include a recursive speed-gradient algorithm [6], particle swarm optimization [15], and, more recently, use of the extended Kalman filter [8]. In this work, we propose use A. Ensemble of Bayesian filtering techniques, in particular the ensemble The ensemble Kalman filter [9], [10] is a Bayesian filter Kalman filter (EnKF), to estimate the entries of the matrix which uses ensemble statistics in combination with the P from a distribution of possible values, which allows us to classical Kalman filter equations to accommodate nonlinear find the best control out of an ensemble. models. While there are versions of the EnKF that perform joint state and parameter estimation [11], [16], for our IV. ENKF FOR INVERSE OPTIMAL CONTROL purposes we need only consider the standard EnKF for state To apply the EnKF to the inverse optimal control problem, estimation, which will be adapted in the following section we treat the entries of the symmetric positive definite P for the inverse optimal control problem. To avoid confusion defining the quadratic CLF in (9) as the states of the filter and with the states of the control system (1), here we denote apply the following updating procedure to find the control the states in the filter as ak, k = 0,...,T , as opposed to the that drives the system to zero the fastest. At time k, assume typical xk notation. The EnKF algorithm for state estimation a discrete ensemble of P matrices is outlined as follows.  j j  P1,1 ··· P1,n Assume the current density π(ak | Dk) at time tk is Pj =  . .. .  ∈ n×n, j = 1,...,N, represented in terms of a discrete ensemble of size N, k|k  . . .  R j j n 1 2 N o P1,n ··· Pn,n Sk|k = (ak|k), (ak|k),..., (ak|k) . (15) (23) j where each Pk|k is symmetric positive definite. Using sym- In the prediction step, the states at time tk+1 are predicted metry to our advantage, we need only update the upper using the state evolution equation (12) to form a state triangular entries, which we place into vectors prediction ensemble,  j  P1,1 aj = F (aj ) + vj , j = 1,...,N, (16) k+1|k k|k k+1  .   .  j  j  n n(n + 1) j p = P  ∈ Rb, n = . (24) where vk+1 ∼ N (0, Qk+1) represents error in the model k|k  1,n  b 2  .  prediction. Ensemble statistics yield the prediction ensemble  .  mean j N Pn,n 1 X j ak+1|k = a (17) As in the prediction step of the filter, we generate a N k+1|k j=1 prediction ensemble and covariance matrix j j j j pk+1|k = pk|k + vk+1, vk+1 ∼ N (0, Q), (25) N 1 X where here the propagation function in equation (16) is the Γ = (aj − a )(aj − a )T. k+1|k N − 1 k+1|k k+1|k k+1|k k+1|k n × n identity matrix and the covariance of the innovation j=1 b b j (18) term vk+1 is some constant matrix Q. Prediction ensemble statistics can be computed as in (17)–(18), however they are When an observation yk+1 arrives, an artificial observation ensemble is generated around the true observation, such that not needed for the remaining computations. Reformulating the prediction ensemble vectors j j {pj }N into matrices {Pj }N , we can compute yk+1 = yk+1 + wk+1, j = 1,...,N (19) k+1|k j=1 k+1|k j=1 the corresponding predicted controls, states, and root mean j where wk+1 ∼ N (0, Rk+1) represents the observation error. square error (RMSE) values for each ensemble member The observation ensemble is compared to the observation using the following formulas. The predicted controls are model predictions, given by

j j −1 j 1 1 T j j j  ybk+1 = G(ak+1|k), j = 1,...,N (20) u = − E + g (x )P g(x ) k+1|k 2 2 k|k k+1|k k|k which is computed using the observation function G as in T j j j · g (xk|k)Pk+1|kf(xk|k) (26) (13). The posterior ensemble at time tk+1 is then computed by as in (10) for each j = 1,...,N, which are then used to j j j j  generate the state prediction ensemble ak+1|k+1 = ak+1|k + Kk+1 yk+1 − ybk+1 (21) xj = f(xj ) + g(xj )uj , j = 1,...,N (27) for each j = 1,...,N, where the Kalman gain is defined as k+1|k k|k k|k k+1|k as in (1). −1 ayb ybyb  Kk+1 = Σk+1 Σk+1 + Rk+1 (22) For the analysis step of the filter, we interpret as “obser- vations” the RMSE values of the states as we drive them ayb with Σk+1 denoting the cross covariance of the state pre- to zero. Since the aim is to find a control that drives the j j ybyb dictions ak+1|k and observation predictions ybk+1, Σk+1 RMSE to zero, we treat RMSE = 0 as the true “observation” the forecast error covariance of the observation prediction and generate an observation ensemble using the prescribed ensemble, and Rk+1 the observation noise covariance. This observation noise covariance matrix R as follows: formulation of the Kalman gain straightforwardly allows RMSEj = wj , wj ∼ N (0, R). (28) for nonlinear observations, as opposed to the more familiar obs k+1 k+1 formula for linear observation models [17]. Use of the We then compare the “observed” RMSEs to the RMSEs of artificial observation ensemble (19) ensures that the resulting the predicted states, given by posterior ensemble in (21) does not have too low a variance s j 2 j 2 j 2 [10]. The posterior means and covariances for the states are (x )1 + (x )2 + ··· + (x )n RMSEj = k+1|k k+1|k k+1|k then computed using posterior ensemble statistics, and the k+1|k n process repeats. (29) x control u 1 and compute the posterior ensemble as in (21) using 2 8

6 j j j j 1.5 pk+1|k+1 = pk+1|k + Kk+1(RMSEobs − RMSEk+1|k), (30) 4

1 2 ay where the Kalman gain is defined as in (22) with Σ b denot- 0 k+1 0.5 ing the cross covariance of the predictions pj and RMSE -2 k+1|k -4 j ybyb 0 predictions RMSEk+1|k, Σk+1 the forecast error covariance 20 40 60 80 100 20 40 60 80 100 x RMSE of the RMSE prediction ensemble, and R the observation 2 1 noise covariance. Posterior control law, state, and RMSE 1.4 ensembles can be computed after reformulating the posterior 0.5 1.2 j 1 ensemble of entry vectors p into their corresponding 0 k+1|k+1 0.8 matrices, and ensemble statistics can be computed. -0.5 0.6 This process is repeated for each successive time step until 0.4 -1 an appropriate control is found, based on some prescribed 0.2 stopping criterion. In particular, if we want to find the control 20 40 60 80 100 20 40 60 80 100 that drives the system to zero the fastest, we can stop when the minimum RMSE of all ensemble members is less than Fig. 1. The states x1 and x2, control u, and RMSE for the linear system some prescribed tolerance. (31) corresponding to the CLF with P matrix (33) estimated using the EnKF CLF procedure.

V. RESULTS B. Example: Nonlinear System We demonstrate the effectiveness of the proposed method- Consider the discrete-time nonlinear system ology on two illustrated examples, one involving a linear 2.5 x = f(x ) + g(x )u , x = ∈ 2 (34) system and the other a nonlinear system. k+1 k k k 0 −1 R where  2  A. Example: Linear System 2x1,k sin(0.5x1,k) + 0.1x2,k f(xk) = 2 (35) 0.1x + 1.8x2,k Consider the discrete-time linear system 1,k and    0.9974 0.0539 0.0013 0 x = x + u (31) g(xk) = . (36) k+1 −0.1078 1.1591 k 0.0539 k 2 + 0.1 cos(x2,k) This system is also considered in [8]. Here we take with initial point x0 = [2, 1], where the goal is to minimize the performance measure E = 1, Q = q0I2, R = r0 (37)

−2 −3 N−1 with q0 = 1 × 10 and r0 = 1 × 10 , and we 1 X h i J = 0.25(x )2 + 0.05(x )2 + 0.05u2 generate a uniform prior ensemble of size N = 1, 000 2 1 k 2 k k k=0 on the upper-triangular entries of P with minimum value 0.001 and maximum value 0.1. We stop the algorithm when as described in [18]. We set up the EnKF CLF estimator by min(RMSE) < 1 × 10−3. After 9 steps, the resulting matrix letting P defining the CLF that drives the system to zero the fastest is given by E = 0.05, Q = q0I2, R = r0, (32) 0.2089 0.2300 P = . (38) −4 −3 0.2300 0.2604 where q0 = 1 × 10 and r0 = 1 × 10 . We generate uniform prior of size N = 1000 ensemble members on the Fig. 2 shows the corresponding states x1 and x2, control u, upper-triangular entries of the P matrix with minimum value and RMSE. 0.05 and maximum value 0.2. We set the stopping criterion such that the filter stops when min(RMSE) < 1 × 10−3 VI.CONCLUSION to find control that drives system to zero the fastest. After 103 steps, the procedure results in the following matrix P In this work we present a novel approach using nonlinear defining the CLF: Bayesian filtering, in particular the EnKF, to parameterize a quadratic CLF for inverse optimal control. We demonstrate 1.2429 1.7809 the effectiveness of using the EnKF to estimate the upper- P = . (33) 1.7809 2.7101 triangular entries of the symmetric positive definite matrix P in (10) on both a linear and nonlinear example. Since Fig. 1 shows the corresponding states x1 and x2, control u, the nonlinear problem does not guarantee a unique solution, and RMSE. After an initial transient, the control defined by the ensemble formulation allows us to find the control that this resulting CLF matches the expected control for the linear stabilizes the system the fastest (i.e. using the least amount system (31) [18]. of steps). x control u 1 4 [13] P. L. Houtekamer, H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, 10 2 L. Spacek, and B. Hansen, “Atmospheric with an 0 ensemble Kalman filter: results with real observations,” Mon Weather

5 -2 Rev, vol. 133, pp. 604–620, 2005.

-4 [14] S. C. Beeler, H. Tran, and H. T. Banks, “Feedback control method- ologies for nonlinear systems,” J Optim Theory Appl, vol. 107, no. 1, 0 -6 pp. 1–33, 2000. -8 [15] F. Ornelas-Tellez, E. N. Sanchez, A. G. Loukianov, and J. J. Rico, -5 -10 “Robust inverse optimal control for discrete-time nonlinear system 2 4 6 8 2 4 6 8 stabilization,” European Journal of Control, vol. 20, pp. 38–44, 2014. x RMSE 2 [16] G. Evensen, “The ensemble kalman filter for combined state and parameter estimation,” IEEE Control Syst Mag, vol. 29, no. 3, pp. 6 8 83–104, 2009. 4 [17] H. Moradkhani, S. Sorooshian, H. V. Gupta, and P. R. Houser, “Dual 2 6 state-parameter estimation of hydrological models using ensemble 0 kalman filter,” Adv Water Resour, vol. 28, no. 2, pp. 135–147, 2005. 4 -2 [18] D. E. Kirk, Optimal : An Introduction, ser. Electrical

-4 Engineering, Networks, R. W. Newcomb, Ed. Englewood Cliffs, NJ: 2 -6 Prentice-Hall, 1970.

2 4 6 8 2 4 6 8

Fig. 2. The states x1 and x2, control u, and RMSE for the nonlinear system (34) corresponding to the CLF with P matrix (38) estimated using the EnKF CLF procedure.

While the EnKF was our filter of choice in this work, the algorithm can be straightforwardly modified to use other filtering schemes (such as particle filters) instead. Future work may study if the choice of filter affects the number of iterations needed to find a CLF matching the prescribed criteria. We also plan to apply the EnKF CLF method to an application relating to HIV drug therapy.

ACKNOWLEDGMENT The authors would like to thank the Research Training Group (RTG) in Mathematical Biology at NC State.

REFERENCES [1] L. D. Berkovitz and N. G. Medhin, Nonlinear Optimal Control Theory. CRC Press, 2013. [2] T. L. Vincent and W. J. Grantham, Nonlinear and Optimal Control Systems. New York: John Wiley & Sons, Inc., 1997. [3] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods. Englewood Cliffs, NJ: Prentice-Hall, 1990. [4] R. Sepulchre, M. Jankovic, and P. V. Kokotovic, Constructive Nonlin- ear Control. Berlin: Springer-Verlag, 1997. [5] M. Krstic and H. Deng, Stabilization of Nonlinear Uncertain Systems. Berlin: Springer-Verlag, 1998. [6] F. Ornelas-Tellez, E. N. Sanchez, A. G. Loukianov, and E. M. Navarro-Lopez, “Speed-gradient inverse optimal control for discrete- time nonlinear systems,” in Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC- ECC), Orlando, FL, 2011, pp. 290–295. [7] R. Ruiz-Cruz, E. N. Sanchez, F. Ornelas-Tellez, A. G. Loukianov, and R. G. Harley, “Particle swarm optimization for discrete-time inverse optimal control of a doubly fed induction generator,” IEEE Trans Cybernetics, vol. 43, no. 6, pp. 1698–1709, 2013. [8] M. Almobaied, I. Eksin, and M. Guzelkaya, “Inverse optimal con- troller based on extended Kalman filter for discrete-time non- linear systems,” Optim Control Appl Meth, vol. 0, pp. 1–16, https://doi.org/10.1002/oca.2331 2017. [9] G. Evensen, “Sequential data assimilation with a nonlinear quasi- geostrophic model using monte carlo methods to forecast error statis- tics,” J Geophys Res, vol. 99, no. C5, pp. 10 143–10 162, 1994. [10] G. Burgers, P. J. van Leeuwen, and G. Evensen, “Analysis scheme in the ensemble kalman filter,” Mon Weather Rev, vol. 126, no. 6, pp. 1719–1724, 1998. [11] A. Arnold, D. Calvetti, and E. Somersalo, “Parameter estimation for stiff deterministic dynamical systems via ensemble Kalman filter,” Inverse Problems, vol. 30, no. 10, p. 105008, 2014. [12] A. J. Majda, J. Harlim, and B. Gershgorin, “Mathematical strategies for filtering turbulent dynamical systems,” Discrete and Continuous Dynamical Systems, vol. 27, pp. 441–486, 2010.