Robust Maximum Likelihood Estimation in the Linear Modelଝ Giuseppe Cala"Ore! *, Laurent El Ghaoui"
Total Page:16
File Type:pdf, Size:1020Kb
Automatica 37 (2001) 573}580 Brief Paper Robust maximum likelihood estimation in the linear modelଝ Giuseppe Cala"ore! *, Laurent El Ghaoui" !Dipartimento di Automatica e Informatica, Politecnico di Torino, Cso Duca degli Abruzzi 24, 10129 Torino, Italy "Electrical Engineering and Computer Sciences Department, University of California at Berkeley, USA Received 15 April 1999; revised 4 July 2000; received in "nal form 6 October 2000 Abstract This paper addresses the problem of maximum likelihood parameter estimation in linear models a!ected by Gaussian noise, whose mean and covariance matrix are uncertain. The proposed estimate maximizes a lower bound on the worst-case (with respect to the uncertainty) likelihood of the measured sample, and is computed solving a semide"nite optimization problem (SDP). The problem of linear robust estimation is also studied in the paper, and the statistical and optimality properties of the resulting linear estimator are discussed. ( 2001 Elsevier Science Ltd. All rights reserved. Keywords: Robust estimation; Distributional robustness; Least squares; Convex optimization; Linear matrix inequalities 1. Introduction to model uncertainty) value of the likelihood function. Next, we will analyze the case of linear robust estimation, The problem of estimating parameters from noisy ob- and discuss the bias, variance and optimality properties served data has a long history in engineering and experi- of the resulting estimator. The undertaken minimax ap- mental science in general. When the observations and the proach to robustness is in the spirit of the distributional unknown parameters are related by a linear model, and robustness approach discussed in Huber (1981) for par- a stochastic setting is assumed, then the application of ametrized families of distributions. In our case, the the maximum likelihood (ML) principle (see for instance minimax is performed with respect to unknown-but- the monograph Berger & Wolpert, 1988) leads to the bounded parameters appearing in the underlying statist- well-known least-squares (LS) parameter estimate. How- ical model. The techniques introduced in this paper may ever, the well-established ML principle assumes that the also be viewed as the stochastic counterpart of the deter- true parametric model for the data is exactly known, ministic robust estimation methods that appeared re- a seldom veri"ed assumption in practice, where models cently in Chandrasekaran, Golub, Gu, and Sayed (1998) only approximate reality (Knight, 2000, Chapter 5). This and El Ghaoui and Lebret (1997). In particular, the paper introduces a family of estimators that are based on model uncertainty will here be represented using the a robust version of the ML principle, where uncertainty linear fractional transformation (LFT) formalism (El in the underlying statistical model is explicitly taken into Ghaoui and Lebret, 1997), which allows to treat cases account. In particular, we will study estimators that where the regression matrix has a particular form, such maximize a lower bound on the worst-case (with respect as Toeplitz or Vandermonde, and where the uncertainty a!ects the data in a structured way. Robust estimation trades accuracy which is best ଝThis paper was not presented at any IFAC meeting. This paper was achieved using standard techniques as LS or total least recommended for publication in revised form by Associate Editor squares (TLS) (Van Hu!el & Vandewalle, 1991), with T. Sugie under the direction of Editor Roberto Tempo. This work was robustness, i.e. insensitvity with respect to parameters supported in part by Italy CNR funds. Research of second author variations. In this latter context, links between robust partially supported via a National Science Foundation CAREER estimation, sensitivity, and regularization techniques, award. * Corresponding author. Tel.: #39-011-564-7066; fax: #39-011- such as Tikhonov regularization (Tikhonov & Arsenin, 564-7099. 1977), may be found in BjoK rk (1991), Elden (1985) and E-mail address: cala"[email protected] (G. Cala"ore). El Ghaoui and Lebret (1997) and references therein. 0005-1098/01/$- see front matter ( 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 0 0 5 - 1 0 9 8 ( 0 0 ) 0 0 1 8 9 - 8 574 G. Calaxore, L.El Ghaoui / Automatica 37 (2001) 573}580 In this paper, we will study a mixed-uncertainty prob- To cast our problem in a ML setting, the log- lem, where the regression matrix is a!ected by determin- likelihood function L is de"ned as the logarithm of istic, structured and norm-bounded uncertainty, while the a posteriori joint probability density of x,y the measure is a!ected by Gaussian noise whose *" " covariance matrix is also uncertain. In this setting, we L(x, xQ, yQ) log ( fV(xQ) fW(yQ)), will compute a robust (with respect to the deterministic where fV, fW are the probability density functions of x,y, model uncertainty) estimate via semide"nite program- respectively. Since x,y are independent Gaussian vectors, ming (SDP). An example of application of the introduced maximizing the log-likelihood is equivalent to minimiz- theory to the estimation of dynamic parameters of a ro- ing the following function: bot manipulator from real experimental data is presented * " ! 2 \ * ! in Section 4. l(x, ) (xQ x) P ( N)(xQ x) # ! * 2 \ * ! * Notation. For a square matrix X, X Y 0 (resp. X ) 0) (yQ C( A)x) D ( B)(yQ C( A)x), means X is symmetric, and positive-de"nite (resp. where * is the total uncertainty matrix, containing the j " 2 semide"nite). !(X), where X X , denotes the max- blocks * , * , * . We notice that, for "xed *, computing "" "" N B A imum eigenvalue of X. X denotes the operator (max- the ML estimate reduces to solving the following stan- 31L"L imum singular value) norm of X. For P , with dard norm minimization problem: P Y 0, and xN 31L, the notation x&N(x , P) means that * " "" * ! * "" x is a Gaussian random vector with expected value x and x( +*( ) arg min F( )x g( ) , covariance matrix P. V where 2. Problem statement D\(* )C(* ) D\(* )y F(*)" B A ; g(*)" B Q . (1) C P\(* ) D CP\(* )x D We consider the problem of estimating a parameter N N Q from noisy observations that are related to the unknown If now * is allowed to vary in a given norm-bounded set, parameter by a linear statistical model. To set up the as precised in Section 2.1, we de"ne the worst-case max- problem, we shall take the Bayesian point of view, and imum likelihood (WCML) estimate x( 5!+* as assume an a priori normal distribution on the unknown 31L & * 31L x( "arg min max ""F(*)x !g(*)"". (2) parameter x , i.e. x N(x, P( N)), where x is the 5!+* V * expected value of x, and the a priori covariance The WCML estimate provides therefore a guaranteed P(* )31L L depends on a matrix * of uncertain para- N N level of the likelihood function, for any possible value of meters, as it will be discussed in detail in Section 2.1. the uncertainty. In Section 2.1, we detail the uncertainty Similarly, the observations vector y31K is assumed to be model used throughout the paper, and state a funda- independent of x, and with normal distribution & * 31K * 31K K mental technical lemma. y N(y, D( B)), with y , and D( B) . The linear statistical model assumes further that the expected values of x and y are related by a linear relation which, in our 2.1. LFT uncertainty models case, is also uncertain We shall consider matrices subject to structured " * y C( A)x. uncertainty in the so-called linear-fractional (LFT) form Given some a priori estimate xQ of x, and given the vector of measurements yQ, we seek an estimate of x M(*)"M#¸*(I!H*)\R, (3) that maximizes a lower bound on the worst-case (with respect to the uncertainty) a posteriori probability where M, ¸, H, R are constant matrices, while the uncer- * D of the observed event. When no deterministic uncertainty tainty matrix belongs to the set , where D "+*3D ""*""4 , D is present on the model, this is the celebrated maximum : 1 , and is a linear subspace. The likelihood (ML) approach to parameter estimation, norm used is the spectral (maximum singular value) which enjoys special properties such as e$ciency norm. The subspace D, referred to as the structure sub- and unbiasedness (see for instance Berger and Wolpert space in the sequel, de"nes the structure of the perturba- (1988), Goodwin and Payne (1997), and Ljung tion, which is otherwise only bounded in norm. Together, (1987)). For the important special case of linear estima- the matrices M, ¸, H, R and the subspace D, constitute tion, we will discuss in Section 3.2 how these properties a linear-fractional representation of an uncertain model. extend to the robust estimator, and how the resulting We will make from now on the standard assumption that D estimate is related to the minimum a posteriori variance all LFT models are well-posed over , meaning that ! * O *3D estimator. det(I H ) 0, for all , see Fan, Tits, and Doyle G. Calaxore, L.El Ghaoui / Automatica 37 (2001) 573}580 575 \#¸ * ! * \ \ * " \#¸ (1991). We also introduce the following linear subspace D B B(I HB B) RB, P ( N) P N D * ! * \ B( ), referred to as the scaling subspace: N(I HN N) RN, be the LFT representation of the Cholesky factors of D(* ) and P(* ), respectively. Then, (D)"+(S, ¹, G)"S*"*¹, G*"!*2G2 N N B using the common rules for LFT operation (see for for every *3D,. instance Zhou et al., 1996), we obtain an LFT repres- entation of LFT models of uncertainty are general and now widely * * " #¸* ! * \ used in robust control (Fan et al., 1991; Zhou, Doyle, [F( ) g( )] [Fg] (I H ) [R$ RE], & Glover, 1996) (especially in conjunction with SDP techniques, see for instance Asai, Hara, & Iwasaki, 1996), where F(*), g(*) are given in (1), * is a structured matrix * * * in identi"cation (Wolodkin, Rangan, & Poolla, 1997), containing the (possibly repeated) blocks A, B, N on and "ltering (El Ghaoui & Cala"ore, 1999; Xie, Soh, & de the diagonal, and Souza, 1994).