Overview of Total Least-Squares Methods
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE IN PRESS Signal Processing 87 (2007) 2283–2302 www.elsevier.com/locate/sigpro Overview of total least-squares methods Ivan Markovskya,Ã, Sabine Van Huffelb aSchool of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK bKatholieke Universiteit Leuven, ESAT-SCD/SISTA, Kasteelpark Arenberg 10 B– 3001 Leuven, Belgium Received 28 September 2006; received in revised form 30 March 2007; accepted 3 April 2007 Available online 14 April 2007 Abstract We review the development and extensions of the classical total least-squares method and describe algorithms for its generalization to weighted and structured approximation problems. In the generic case, the classical total least-squares problem has a unique solution, which is given in analytic form in terms of the singular value decomposition of the data matrix. The weighted and structured total least-squares problems have no such analytic solution and are currently solved numerically by local optimization methods. We explain how special structure of the weight matrix and the data matrix can be exploited for efficient cost function and first derivative computation. This allows to obtain computationally efficient solution methods. The total least-squares family of methods has a wide range of applications in system theory, signal processing, and computer algebra. We describe the applications for deconvolution, linear prediction, and errors-in- variables system identification. r 2007 Elsevier B.V. All rights reserved. Keywords: Total least squares; Orthogonal regression; Errors-in-variables model; Deconvolution; Linear prediction; System identification b 1. Introduction The least-squares approximation X ls is obtained as a solution of the optimization problem The total least-squares method was introduced by b Golub and Van Loan [1,2] as a solution technique fX ls; DBlsg:¼ arg min kDBkF for an overdetermined system of equations AX B, X;DB where A 2 RmÂn and B 2 RmÂd are the given data subject to AX ¼ B þ DB. ðLSÞ and X 2 RnÂd is unknown. With m4n, typically there is no exact solution for X, so that an The rationale behind this approximation method is approximate one is sought for. The total least- to correct the right-hand side B as little as possible in the Frobenius norm sense, so that the corrected squares method is a natural generalization of the b b least-squares approximation method when the data system of equations AX ¼ B, B:¼B þ DB has an exact solution. Under the condition that vec A is in both A and B is perturbed. b full column rank, the unique solution X ls ¼ ðA>AÞÀ1A>B of the optimally corrected system of b b ÃCorresponding author. equations AX ¼ Bls, Bls:¼B þ DBls is by definition E-mail addresses: [email protected] (I. Markovsky), the least-squares approximate solution of the [email protected] (S. Van Huffel). original incompatible system of equations. 0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2007.04.004 ARTICLE IN PRESS 2284 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 Nomenclature B Rnþd a static model in Rnþd L linear static model class R and Rþ the set of real numbers and non- B 2 Ln linear static model of dimension at most negative real numbers n, i.e., a subspace (in Rnþd ) of dimension :¼ and :3 left-hand side is defined by the right- at most n hand side X, R, P parameters of input/output, kernel, and ¼: and 3: right-hand side is defined by the left- image representations hand side Bi=oðXÞ input/output representation, see (I/O vec column-wise vectorization of a matrix repr) in Section 3.1.3 C, DC, Cb data, correction, and approximation col spanðPÞ image representation, i.e., the space matrices spanned by the columns of P C ¼½AB input/output partitioning of the data kerðRÞ kernel representation, i.e., the right null > c1; ...; cm observations, ½c1 ÁÁÁ cm¼C space of R a c ¼ colða; bÞ the column vector c ¼½b The definition of the total least-squares method is ‘‘true data’’, X¯ is the ‘‘true’’ value of the parameter motivated by the asymmetry of the least-squares X, and A~, B~ consist of ‘‘measurement noise’’. method: B is corrected while A is not. Provided that Our first aim is to review the development and both A and B are given data, it is reasonable to treat generalizations of the total least-squares method. them symmetrically. The classical total least-squares We start in Section 2 with an overview of the problem looks for the minimal (in the Frobenius classical total least-squares method. Section 2.1 norm sense) corrections DA and DB on the given gives historical notes that relate the total least- data A and B that make the corrected system of squares method to work on consistent estimation in equations AXb ¼ Bb, Ab:¼A þ DA, Bb:¼B þ DB solva- the EIV model. Section 2.2 presents the solution of ble, i.e., the total least-squares problem and the resulting basic computational algorithm. Some properties, b fX tls; DAtls; DBtlsg:¼ arg min k½DA DBkF generalizations, and applications of the total least- X;DA;DB squares method are stated in Sections 2.3–2.5. subject to ðA þ DAÞX ¼ B þ DB. ðTLS1Þ Our second aim is to present an alternative b formulation of the total least-squares problem as a The total least-squares approximate solution X tls matrix low rank approximation problem for X is a solution of the optimally corrected b b b b b b Ctls:¼ arg min kC À Ck subject to rankðCÞpn, system of equations AtlsX ¼ Btls, Atls:¼A þ DAtls, b F b C Btls:¼B þ DBtls. (TLS2) The least-squares approximation is statistically motivated as a maximum likelihood estimator in a which in some respects, described in detail later, has linear regression model under standard assumptions advantages over the classical one. With C ¼½AB, (zero mean, normally distributed residual with a the classical total least-squares problem (TLS1) is covariance matrix that is a multiple of the identity). generically equivalent to the matrix low rank Similarly, the total least-squares approximation is a approximation problem (TLS2), however, in maximum likelihood estimator in the errors-in- certain exceptional cases, known in the literature variables model as non-generic total least-squares problems, (TLS1) fails to have a solution, while (TLS2) always has a A ¼ A¯ þ A~; B ¼ B¯ þ B~ there exists an solution. The following example illustrates the geometry ¯ nÂd ¯ ¯ ¯ X 2 R such that AX ¼ B ðEIVÞ behind the least-squares and total least-squares approximations. under the assumption that vec ð½A~ B~Þ is a zero mean, normally distributed random vector with a Example 1 (Geometry of the least-squares and total covariance matrix that is a multiple of the identity. least-squares methods). Consider a data matrix In the errors-in-variables (EIV) model, A¯ , B¯ are the C ¼½ab with m ¼ 20 rows and n þ d ¼ 2 columns. ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2285 Least squares fit Total least squares fit 1 1 0.5 0.5 0 0 i i b b -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 a i ai Fig. 1. Least-squares and total least-squares fits of a set of m ¼ 20 data points in the plane. —data points ½ai bi, —approximations b b ½abi bi, solid line—fitting model abxb ¼ b, dashed lines—approximation errors. b b The data are visualized in the plane: the rows ½ai bi In (TLS1) the constraint AX ¼ B represents the b of C correspond to the circles in Fig. 1. Finding an rank constraint rankðCÞpn, via the implication approximate solution xb of the incompatible system nÂd b b of equations ax b amounts to fitting the data there exists an X 2 R such that AX ¼ B b b b b points by a non-vertical line passing through the ) rankðCÞpn; where C:¼½A B. origin. (The vertical line cannot be represented by an x 2 R.) The cases when the best fitting line Note, however, that the reverse implication does not happens to be vertical correspond to non-generic hold in general. This lack of equivalence is the problems. reason for the existence of non-generic total least- Alternatively, finding a rank-1 approximation Cb squares problems. Problem (TLS1) is non-generic b of the given matrix C (refer to problem (TLS2)) when the rank deficiency of Ctls (an optimal amounts to fitting the data points ½ai bi by points solution of (TLS2)) cannot be expressed as existence b b b b nÂd ½abi bi (corresponding to the rows of C) that lie on a of linear relations AX ¼ B for some X 2 R .In line passing through the origin. Note that now we Section 3.1, we give an interpretation of the linear do not exclude an approximation by the vertical system of equations AXb ¼ Bb as an input/output line, because approximation points lying on a representation of a linear static model. vertical line define a rank deficient matrix Cb and Apart from AXb ¼ Bb with Cb ¼½Ab Bb, there are problem (TLS2) does not impose further restrictions numerous other ways to represent the rank con- b b b on the solution. straint rankðCÞpn. For example, AX ¼ B with The least-squares and total least-squares CbP ¼½Ab Bb, where P is an arbitrary permutation methods assess the fitting accuracy in different matrix, i.e., in (TLS2) we can choose to express any ways: the least-squares method minimizes the sum d columns of Cb as a linear combination of the of the squared vertical distances from the data remaining columns in order to ensure rank defi- points to the fitting line, while the total least- ciency of Cb.