On the Equivalence of Time and Frequency Domain Maximum Likelihood Estimation ?
Total Page:16
File Type:pdf, Size:1020Kb
On the Equivalence of Time and Frequency Domain Maximum Likelihood Estimation ? Juan C. Ag¨uero a , Juan I. Yuz b , Graham C. Goodwin a , and Ram´onA. Delgado b aARC Centre for Complex Dynamics Systems and Control (CDSC) School of Electrical Engineering and Computer Science, The University of Newcastle, Australia bDepartment of Electronic Engineering Universidad T´ecnica Federico Santa Mar´ıa,Valpara´ıso,Chile Abstract Maximum likelihood estimation has a rich history. It has been successfully applied to many problems including dynamical system identification. Different approaches have been proposed in the time and in the frequency domains. In this paper we discuss the relationship between these approaches and we establish conditions under which the different formulations are equivalent for finite length data. A key point in this context is how initial (and final) conditions are considered and how they are introduced in the likelihood function. Key words: Maximum likelihood estimation, frequency domain identification, system identification. 1 Introduction In [34], Box-Jenkins identification has been analyzed. Extensions to identification in closed loop have also been Maximum Likelihood (ML) estimation methods have be- presented [34]. Also the case of reduced bandwidth esti- come a popular approach to dynamic system identifica- mation has been considered. A surprising result, in this tion [10,40,18]. Different approaches have been proposed context, is that, for processes operating in open loop, in the time and frequency domains [19,17,21,33,34]. the commonly used frequency domain ML method re- A commonly occurring question is how time- and quires exact knowledge of the noise model in order to frequency-domain versions of ML estimation are related. obtain consistent estimates for the plant parameters. Some insights into the relationship between the methods On the other hand, it is well known that the commonly have been given in past literature. However, to the best used ML in the time domain (for systems driven by a of the authors knowledge, there has not previously been quasi-stationary input and Gaussian white noise that a comprehensive account of the equivalence between the are mutually uncorrelated) provides consistent estimates two approaches, in particular, for finite length data. For for the transfer function from input to output irrespec- example, [17,21,23] have shown that, when working in tive of possible under-modelling of the transfer func- the frequency domain, an extra term arises in the like- tion from noise to output [18]. This fact suggests that lihood function that depends on the noise model. This there could be key differences between the time- and term vanishes asymptotically for long data sets when frequency-domain approaches in the usual formats. In considering uniformly spaced frequency points over the the current paper we will see that the apparent differ- full bandwidth [−π; π] (see, for example, [34]). ences are a result of inconsistent formulations rather than fundamental issues between the use of time or fre- ? quency domain data. In particular, we establish in this This paper was not presented at any IFAC meeting. Corre- paper that the domain chosen to describe the available sponding author: Tel. +61-2-49216351, Fax +61-2-49601712. Work supported by CDSC and FONDECYT-Chile through data (i.e., time or frequency) does not change the result grant 11070158. of the estimation problem (see also [39,19]). Instead, it is Email addresses: [email protected] (Juan the choice of the likelihood function, i.e., which parame- C. Ag¨uero), [email protected] (Juan I. Yuz), ters are to be estimated and what data is assumed avail- [email protected] (Graham C. able, that leads to perceived differences in the estimation Goodwin), [email protected] (Ram´onA. problems. This issue has previously been highlighted for Delgado). Preprint submitted to Automatica 26 October 2009 time domain methods in the statistics literature where, stable, with no poles on the unit circle; (ii) H−1(q) for example, the way in which initial conditions are con- is stable (i.e., H(q) is minimum phase, with no zeros sidered defines different likelihood functions and, thus, on the unit circle); and (iii) limq!1 H(q) = 1. The different estimation problems (see e.g. [36, chapter 22]). transfer function description of the system in (1) can More specifically, for dynamic system identification, the equivalently be represented in state space form as: time domain likelihood function is different depending on the assumptions made regarding the initial state (x0), e.g. xt+1 = A xt + B ut + K wt (2) yt = C xt + D ut + wt (3) (T1) x0 is assumed to be zero, (T2) x0 is assumed as a deterministic parameter to be estimated, or The system parameter vector θ contains the coeffi- (T3) x is assumed to be a random vector. cients of the transfer functions G(q) and H(q) in (1). 0 This parameter vector also uniquely defines the matri- On the other hand, if we convert the data to the fre- ces A; B; C; D; K in the state space representation in quency domain by applying the discrete Fourier trans- (2){(3) for controllable or observable canonical forms form (DFT) then a term arises which depends on the dif- [16,22,5]. The two alternative models are related by ference between the initial and final states, α = x0 −xN . Different assumptions can be made about this term in G(q) = C(qI − A)−1B + D (4) the frequency domain, e.g. H(q) = C(qI − A)−1K + 1 (5) (F1) α is assumed to be zero (equivalent to assuming periodicity in the state) The initial conditions in (2) summarize the past of the (F2) α is estimated as a deterministic parameter (as in, system prior to time t = 0. To include the effect of ini- e.g., [1,34]), or tial conditions on the system response, we note that the (F3) α is considered as a hidden random variable. solution of the state-space model (2){(3) can be written as In this paper we show that the case when the term α is considered as a random variable is the most general. In fact, we show that each of the six cases described yt = F (q)st + G(q)ut + H(q)wt (6) above, i.e., (T1){(T3) and (F1){(F3), can be obtained "t−1 # X by making particular assumptions regarding the statis- = CAt x + CAt−1−`B u + Du tical properties of the random variable α and x . In par- 0 ` t 0 `=0 ticular, our analysis shows that the same solution is ob- "t−1 # X tained (in the time and in the frequency domain, and + CAt−1−`K w + w (7) using finite data) only if the properties of α as a ran- ` t dom variable are chosen such that they are consistent `=0 with the system dynamics and the way the initial state is considered (Theorem 21 in Section 4.2). Throughout the where the additional term st captures the effect of an paper we assume that the system is operating in open initial state x0 on the system response. If we interpret loop. Closed loop data can be treated in a similar fash- st as a Kronecker delta function, i.e., st = x0δK [t], then ion by the inclusion of additional terms (see for example the transfer functions in (6) are given by (4), (5), and Remark 2 below). F (q) = C(qI − A)−1q (8) 2 Time Domain Maximum Likelihood 2.1 Time-domain model and data For the sake of simplicity, we represent the system re- sponse using block matrices. Equation (7) can then be We consider the following Single-Input Single-Output rewritten as (SISO) linear system model: ~y = Γx0 + Λ~u + Ω~w (9) yt = G(q)ut + H(q)wt (1) where where futg and fytg are the (sampled time-domain) T input and output signals, respectively, and fwtg is zero ~y = [y0; : : : ; yN−1] (10) mean Gaussian noise with variance σ2 . G(q; θ) and T w ~u = [u0; : : : ; uN−1] (11) H(q; θ) are rational functions in the forward shift op- T erator q. We also assume that: (i) G(q) and H(q) are ~w = [w0; : : : ; wN−1] (12) 2 and matrix for the output data given the parameters, i.e. 2 3 2 3 C D 0 ::: 0 µ~y = Λ~u + Γµx0 (17) 6 7 6 7 Σ = ΓΣ ΓT + σ2 ΩΩT (18) 6 CA 7 6 CBD::: 0 7 ~y 0 w 6 7 6 7 Γ = 6 . 7 ; Λ = 6 . 7 6 . 7 6 . .. 7 4 5 4 5 CAN−1 CAN−2BCAN−3B:::D PROOF. The likelihood of the data given the param- (13) eter vector θ and the random vector x0 can be obtained from (9): 2 3 I 0 ::: 0 " # h i x0 6 7 ~y = Λ~u + ΓΩ (19) 6 CKI::: 07 6 7 ~w Ω = 6 . .7 (14) 6 . .. .7 4 5 Using the PDF of a transformation of random variables CAN−2KCAN−3K:::I (see, for example, [15, page 34]), we have that the PDF for ~y is readily obtained from the affine transformation in (19), i.e. 2.2 Time domain maximum likelihood n 1 T −1 o exp − 2 (~y − µ~y) Σ~y (~y − µ~y) p~y(~y) = (20) p N The time-domain likelihood function is defined as the (2π) det Σ~y conditional probability density function (PDF) of the data given the parameters, i.e. where the mean and covariance are given by equations (17) and (18). The negative log-likelihood function is readily obtained from (20).