System Identification of Nonlinear State-Space Models
Total Page:16
File Type:pdf, Size:1020Kb
Linköping University Post Print System identification of nonlinear state-space models Thomas Schön, Adrian Wills and Brett Ninness N.B.: When citing this work, cite the original article. Original Publication: Thomas Schön, Adrian Wills and Brett Ninness, System identification of nonlinear state- space models, 2011, AUTOMATICA, (47), 1, 39-49. http://dx.doi.org/10.1016/j.automatica.2010.10.013 Copyright: Elsevier Science B.V., Amsterdam. http://www.elsevier.com/ Postprint available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-65958 System Identification of Nonlinear State-Space Models ? Thomas B. Sch¨on a, Adrian Wills b, Brett Ninness b aDivision of Automatic Control, Link¨oping University, SE-581 83 Link¨oping, Sweden bSchool of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW 2308, Australia Abstract This paper is concerned with the parameter estimation of a general class of nonlinear dynamic systems in state-space form. More specifically, a Maximum Likelihood (ML) framework is employed and an Expectation Maximisation (EM) algorithm is derived to compute these ML estimates. The Expectation (E) step involves solving a nonlinear state estimation problem, where the smoothed estimates of the states are required. This problem lends itself perfectly to the particle smoother, which provide arbitrarily good estimates. The maximisation (M) step is solved using standard techniques from numerical optimisation theory. Simulation examples demonstrate the efficacy of our proposed solution. Key words: System identification, nonlinear models, dynamic systems, Monte Carlo method, smoothing filters, expectation maximisation algorithm, particle methods. 1 Introduction a Kalman filter, for each parameter that is to be esti- mated. An alternative, recently explored in [17] in the The significance and difficulty of estimating nonlinear context of bilinear systems is to employ the Expectation systems is widely recognised [1, 31, 32]. As a result, there Maximisation algorithm [8] for the computation of ML is very large and active research effort directed towards estimates. the problem. A key aspect of this activity is that it gen- erally focuses on specific system classes such as those de- Unlike gradient based search, which is applicable to max- scribed by Volterra kernels [4], neural networks [37], non- imisation of any differentiable cost function, EM meth- linear ARMAX (NARMAX) [29], and Hammerstein– ods are only applicable to maximisation of likelihood Wiener [41] structures, to name just some examples. In functions. However, a dividend of this specialisation is relation to this, the paper here considers Maximum Like- that while some gradients calculations may be necessary, lihood (ML) estimation of the parameters specifying a the gradient of the likelihood function is not required, relatively general class of nonlinear systems that can be which will prove to be very important in this paper. represented in state-space form. In addition to this advantage, EM methods are widely recognised for their numerical stability [28]. Of course, the use of an ML approach (for example, with regard to linear dynamic systems) is common, and it is Given these recommendations, this paper develops and customary to employ a gradient based search technique demonstrates an EM-based approach to nonlinear sys- such as a damped Gauss–Newton method to actually tem identification. This will require the computation of compute estimates [30, 46]. This requires the computa- smoothed state estimates that, in the linear case, could tion of a cost Jacobian which typically necessitates im- be found by standard linear smoothing methods [17]. In plementing one filter derived (in the linear case) from the fairly general nonlinear (and possibly non-Gaussian) context considered in this work we propose a “parti- ? cle based” approach whereby approximations of the re- Parts of this paper were presented at the 14th IFAC quired smoothed state estimates are approximated by Symposium on System Identification, Newcastle, Australia, Monte Carlo based empirical averages [10]. March 2006 and at the 17th IFAC World Congress, Seoul, South Korea, July, 2008. Corresponding author: T. B. Sch¨on. Tel. +46-13-281373. Fax +46-13-139282. It is important to acknowledge that there is a very signif- Email addresses: [email protected] (Thomas B. Sch¨on), icant body of previous work on the problems addressed [email protected] (Adrian Wills), here. Many approaches using various suboptimal non- [email protected] (Brett Ninness). linear filters (such as the extended Kalman filter) to ap- Preprint submitted to Automatica 25 August 2010 proximate the cost Jacobian have been proposed [5, 22, is intended. Furthermore, note that we have, for brevity, 27]. Additionally, there has been significant work [3, 12, dispensed with the input signal ut in the notation (2). 40] investigating the employment of particle filters to However, everything we derive throughout this paper is compute the Jacobian’s necessary for a gradient based valid also if an input signal is present. search approach. The formulation (1) and its alternative formulation (2) There has also been previous work on various approxi- capture a relatively broad class of nonlinear systems and mate EM-based approaches. Several authors have con- we consider the members of this class where pθ(xt+1 | xt) sidered using suboptimal solutions to the associated non- and pθ(yt | xt) can be explicitly expressed and evaluated. linear smoothing problem, typically using an extended Kalman smoother [13, 15, 19, 43]. The problem addressed here is the formation of an es- timate θb of the parameter vector θ based on N mea- As already mentioned, this paper is considering parti- surements UN = [u1, ··· , uN ], YN = [y1, ··· , yN ] of ob- cle based approaches in order to solve the involved non- served system input-output responses. Concerning the linear smoothing problem. This idea has been partially notation, sometimes we will make use of Yt:N , which is reported by the authors in two earlier conference publi- used to denote [yt, ··· , yN ]. However, as defined above, cations [45, 47]. for brevity we denote Y1:N simply as YN . Hence, it is here implicitly assumed that the index starts at 1. An interesting extension, handling the case of miss- ing data is addressed in [20]. Furthermore, in [26], the One approach is to employ the general prediction error authors introduce an EM algorithm using a particle (PE) framework [30] to deliver θb according to smoother, similar to the algorithm we propose here, but tailored to stochastic volatility models. The sur- θb = arg min V (θ), (3) vey paper [3] is one of the earliest papers to note the θ∈Θ possiblility of EM-based methods employing particle smoothing methods. with cost function V (θ) of the form 2 Problem Formulation N X V (θ) = `(εt(θ)), εt(θ) = yt − ybt|t−1(θ). (4) This paper considers the problem of identifying the pa- t=1 rameters θ for certain members of the following nonlin- nθ ear state-space model structure and with Θ ⊆ R denoting a compact set of permissible values of the unknown parameter θ. Here, xt+1 = ft(xt, ut, vt, θ), (1a) Z y (θ) = E {y | Y } = y p (y | Y ) dy (5) yt = ht(xt, ut, et, θ). (1b) bt|t−1 θ t t−1 t θ t t−1 t nx Here, xt ∈ R denotes the state variable, with ut ∈ is the mean square optimal one-step ahead predictor of yt nu ny R and yt ∈ R denoting (respectively) observed in- based on the model (1). The function `(·) is an arbitrary put and output responses. Furthermore, θ ∈ Rnθ is a and user-chosen positive function. vector of (unknown) parameters that specifies the map- pings ft(·) and ht(·) which may be nonlinear and time- This PE solution has its roots in the Maximum Like- varying. Finally, vt and et represent mutually indepen- lihood (ML) approach, which involves maximising the dent vector i.i.d. processes described by probability den- joint density (likelihood) pθ(YN ) of the observations: sity functions (pdf’s) pv(·) and pe(·). These are assumed to be of known form (e.g.,Gaussian) but parameterized θb = arg max pθ(y1, ··· , yN ). (6) (e.g.,mean and variance) by values that can be absorbed θ∈Θ into θ for estimation if they are unknown. To compute this, Bayes’ rule may be used to decompose Due to the random components vt and et, the model (1) the joint density according to can also be represented via the stochastic description N Y p (y , ··· , y ) = p (y ) p (y |Y ). (7) xt+1 ∼ pθ(xt+1 | xt), (2a) θ 1 N θ 1 θ t t−1 t=2 yt ∼ pθ(yt | xt), (2b) Accordingly, since the logarithm is a monotonic func- where pθ(xt+1 | xt) is the pdf describing the dynamics tion, the maximisation problem (6) is equivalent to the for given values of xt, ut and θ, and pθ(yt | xt) is the pdf minimisation problem describing the measurements. As is common practise, in (2) the same symbol pθ is used for different pdf’s that θb = arg min −Lθ(YN ), (8) depend on θ, with the argument to the pdf denoting what θ∈Θ 2 where Lθ(YN ) is the log-likelihood of numerically evaluating the required nx-dimensional integrals. N X Lθ(YN ) , log pθ(YN ) = log pθ(y1)+ log pθ(yt | Yt−1). In what follows, the recently popular methods of sequen- t=2 tial importance resampling (SIR, or particle filtering) (9) will be employed to address this problem. The PE and ML approaches both enjoy well understood theoretical properties including strong consistency, However, there is a remaining difficulty which is related asymptotic normality, and in some situations asymp- to the second challenge mentioned at the end of section totic efficiency.