ARTICLE IN PRESS

Signal Processing 87 (2007) 2283–2302 www.elsevier.com/locate/sigpro

Overview of total least-squares methods

Ivan Markovskya,Ã, Sabine Van Huffelb

aSchool of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK bKatholieke Universiteit Leuven, ESAT-SCD/SISTA, Kasteelpark Arenberg 10 B– 3001 Leuven, Belgium

Received 28 September 2006; received in revised form 30 March 2007; accepted 3 April 2007 Available online 14 April 2007

Abstract

We review the development and extensions of the classical total least-squares method and describe algorithms for its generalization to weighted and structured approximation problems. In the generic case, the classical total least-squares problem has a unique solution, which is given in analytic form in terms of the singular value decomposition of the data matrix. The weighted and structured total least-squares problems have no such analytic solution and are currently solved numerically by local optimization methods. We explain how special structure of the weight matrix and the data matrix can be exploited for efficient cost function and first derivative computation. This allows to obtain computationally efficient solution methods. The total least-squares family of methods has a wide range of applications in system theory, signal processing, and computer algebra. We describe the applications for deconvolution, linear prediction, and errors-in- variables system identification. r 2007 Elsevier B.V. All rights reserved.

Keywords: Total ; Orthogonal regression; Errors-in-variables model; Deconvolution; Linear prediction; System identification

b 1. Introduction The least-squares approximation X ls is obtained as a solution of the optimization problem The total least-squares method was introduced by b Golub and Van Loan [1,2] as a solution technique fX ls; DBlsg:¼ arg min kDBkF for an overdetermined system of equations AX B, X;DB where A 2 Rmn and B 2 Rmd are the given data subject to AX ¼ B þ DB. ðLSÞ and X 2 Rnd is unknown. With m4n, typically there is no exact solution for X, so that an The rationale behind this approximation method is approximate one is sought for. The total least- to correct the right-hand side B as little as possible in the Frobenius norm sense, so that the corrected squares method is a natural generalization of the b b least-squares approximation method when the data system of equations AX ¼ B, B:¼B þ DB has an exact solution. Under the condition that vec A is in both A and B is perturbed. b full column rank, the unique solution X ls ¼ ðA>AÞ1A>B of the optimally corrected system of b b ÃCorresponding author. equations AX ¼ Bls, Bls:¼B þ DBls is by definition E-mail addresses: [email protected] (I. Markovsky), the least-squares approximate solution of the [email protected] (S. Van Huffel). original incompatible system of equations.

0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2007.04.004 ARTICLE IN PRESS 2284 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302

Nomenclature B Rnþd a static model in Rnþd L linear static model class R and Rþ the set of real numbers and non- B 2 Ln linear static model of dimension at most negative real numbers n, i.e., a subspace (in Rnþd ) of dimension :¼ and :3 left-hand side is defined by the right- at most n hand side X, R, P parameters of input/output, kernel, and ¼: and 3: right-hand side is defined by the left- image representations hand side Bi=oðXÞ input/output representation, see (I/O vec column-wise vectorization of a matrix repr) in Section 3.1.3 C, DC, Cb data, correction, and approximation col spanðPÞ image representation, i.e., the space matrices spanned by the columns of P C ¼½AB input/output partitioning of the data kerðRÞ kernel representation, i.e., the right null > c1; ...; cm observations, ½c1 cm¼C space of R a c ¼ colða; bÞ the column vector c ¼½b

The definition of the total least-squares method is ‘‘true data’’, X¯ is the ‘‘true’’ value of the parameter motivated by the asymmetry of the least-squares X, and A~, B~ consist of ‘‘measurement noise’’. method: B is corrected while A is not. Provided that Our first aim is to review the development and both A and B are given data, it is reasonable to treat generalizations of the total least-squares method. them symmetrically. The classical total least-squares We start in Section 2 with an overview of the problem looks for the minimal (in the Frobenius classical total least-squares method. Section 2.1 norm sense) corrections DA and DB on the given gives historical notes that relate the total least- data A and B that make the corrected system of squares method to work on consistent estimation in equations AXb ¼ Bb, Ab:¼A þ DA, Bb:¼B þ DB solva- the EIV model. Section 2.2 presents the solution of ble, i.e., the total least-squares problem and the resulting basic computational algorithm. Some properties, b fX tls; DAtls; DBtlsg:¼ arg min k½DA DBkF generalizations, and applications of the total least- X;DA;DB squares method are stated in Sections 2.3–2.5. subject to ðA þ DAÞX ¼ B þ DB. ðTLS1Þ Our second aim is to present an alternative b formulation of the total least-squares problem as a The total least-squares approximate solution X tls matrix low rank approximation problem for X is a solution of the optimally corrected b b b b b b Ctls:¼ arg min kC Ck subject to rankðCÞpn, system of equations AtlsX ¼ Btls, Atls:¼A þ DAtls, b F b C Btls:¼B þ DBtls. (TLS2) The least-squares approximation is statistically motivated as a maximum likelihood estimator in a which in some respects, described in detail later, has model under standard assumptions advantages over the classical one. With C ¼½AB, (zero mean, normally distributed residual with a the classical total least-squares problem (TLS1) is that is a multiple of the identity). generically equivalent to the matrix low rank Similarly, the total least-squares approximation is a approximation problem (TLS2), however, in maximum likelihood estimator in the errors-in- certain exceptional cases, known in the literature variables model as non-generic total least-squares problems, (TLS1) fails to have a solution, while (TLS2) always has a A ¼ A¯ þ A~; B ¼ B¯ þ B~ there exists an solution. The following example illustrates the geometry ¯ nd ¯ ¯ ¯ X 2 R such that AX ¼ B ðEIVÞ behind the least-squares and total least-squares approximations. under the assumption that vec ð½A~ B~Þ is a zero mean, normally distributed random vector with a Example 1 (Geometry of the least-squares and total covariance matrix that is a multiple of the identity. least-squares methods). Consider a data matrix In the errors-in-variables (EIV) model, A¯ , B¯ are the C ¼½ab with m ¼ 20 rows and n þ d ¼ 2 columns. ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2285

Least squares fit Total least squares fit

1 1

0.5 0.5

0 0 i i b b -0.5 -0.5

-1 -1

-1.5 -1.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 a i ai

Fig. 1. Least-squares and total least-squares fits of a set of m ¼ 20 data points in the plane. —data points ½ai bi, —approximations b b ½abi bi, solid line—fitting model abxb ¼ b, dashed lines—approximation errors.

b b The data are visualized in the plane: the rows ½ai bi In (TLS1) the constraint AX ¼ B represents the b of C correspond to the circles in Fig. 1. Finding an rank constraint rankðCÞpn, via the implication approximate solution xb of the incompatible system nd b b of equations ax b amounts to fitting the data there exists an X 2 R such that AX ¼ B b b b b points by a non-vertical line passing through the ) rankðCÞpn; where C:¼½A B. origin. (The vertical line cannot be represented by an x 2 R.) The cases when the best fitting line Note, however, that the reverse implication does not happens to be vertical correspond to non-generic hold in general. This lack of equivalence is the problems. reason for the existence of non-generic total least- Alternatively, finding a rank-1 approximation Cb squares problems. Problem (TLS1) is non-generic b of the given matrix C (refer to problem (TLS2)) when the rank deficiency of Ctls (an optimal amounts to fitting the data points ½ai bi by points solution of (TLS2)) cannot be expressed as existence b b b b nd ½abi bi (corresponding to the rows of C) that lie on a of linear relations AX ¼ B for some X 2 R .In line passing through the origin. Note that now we Section 3.1, we give an interpretation of the linear do not exclude an approximation by the vertical system of equations AXb ¼ Bb as an input/output line, because approximation points lying on a representation of a linear static model. vertical line define a rank deficient matrix Cb and Apart from AXb ¼ Bb with Cb ¼½Ab Bb, there are problem (TLS2) does not impose further restrictions numerous other ways to represent the rank con- b b b on the solution. straint rankðCÞpn. For example, AX ¼ B with The least-squares and total least-squares CbP ¼½Ab Bb, where P is an arbitrary permutation methods assess the fitting accuracy in different matrix, i.e., in (TLS2) we can choose to express any ways: the least-squares method minimizes the sum d columns of Cb as a linear combination of the of the squared vertical distances from the data remaining columns in order to ensure rank defi- points to the fitting line, while the total least- ciency of Cb. Any a priori fixed selection, however, squares method minimizes the sum of the squared leads to non-generic problems and therefore will be orthogonal distances from the data points to the inadequate in certain cases. Of special importance fitting line. Fig. 1 shows the least-squares and are the kernel representation RCb> ¼ 0, where > b> total least-squares fitting lines as well as the data RR ¼ I d , and the image representation C ¼ PL, approximation (the crosses lying on the lines). where P 2 RðnþdÞn, L 2 Rnm. In contrast to the In the least-squares case, the data approximation input/output representations, the kernel and image b b Cls ¼½abþ Dbls is obtained by correcting the representations are equivalent to rankðCÞpn. second coordinate only. In the total least- The representation-free total least-squares pro- b squares case, the data approximation Ctls ¼½a þ blem formulation (TLS2), described in Section 3, is Datls b þ Dbtls is obtained by correcting both inspired by the behavioral approach to system coordinates. theory, put forward by Willems in the three part ARTICLE IN PRESS 2286 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 remarkable paper [3]. We give an interpretation of described the properties of these so-called non- the abstract rank condition as existence of a linear generic total least-squares problems and proved that static model for the given data. Then: the proposed generalization still satisfies the total least-squares criteria if additional constraints are the total least squares method is viewed as a tool imposed on the solution space. This seemingly for deriving approximate linear static models. different linear algebraic approach is actually This point of view is treated in more details for equivalent to the method of multivariate EIV dynamic as well as static models in [4]. , studied by Gleser [12]. Gleser’s In Sections 3 and 5 we describe the extensions of method is based on an eigenvalue–eigenvector the classical total least squares problem to weighted analysis, while the total least-squares method uses and structured total least-squares problems and the singular value decomposition which is numeri- classify the existing methods according to the cally more robust in the sense of algorithmic representation of the rank constraint (input/output, implementation. Furthermore, the total least- kernel, or image) and the optimization method that squares algorithm computes the minimum norm is used for the solution of the resulting parameter solution whenever the total least-squares solution is optimization problem. We show that the block- not unique. These extensions are not considered by Hankel structured total least-squares problem is a Gleser. kernel problem for approximate modeling by a In engineering fields, e.g., experimental modal linear time-invariant dynamical model. Motivating analysis, the total least-squares technique (more examples are the deconvolution problem, the linear commonly known as the Hv technique), was also prediction problem, and the EIV system identifica- introduced about 20 years ago by Leuridan et al. tion problem. [15]. In the field of system identification, Levin [16] first studied the problem. His method, called the 2. The classical total least-squares method eigenvector method or Koopmans–Levin method [17], computes the same estimate as the total least- 2.1. History squares algorithm whenever the total least-squares problem has a unique solution. Compensated least Although the name ‘‘total least squares’’ ap- squares was yet another name arising in this area: peared only recently in the literature [1,2], this this method compensates for the bias in the fitting method is not new and has a long history in estimator, due to measurement error, and is shown the statistical literature where it is known as by Stoica and So¨derstro¨m [18] to be asymptotically ‘‘orthogonal regression’’, ‘‘errors-in-variables’’, equivalent to total least squares. Furthermore, in and ‘‘measurement errors’’. The univariate (n ¼ 1, the area of signal processing, the minimum norm d ¼ 1) problem is discussed already in 1877 by method Kumaresan and Tufts [19] was introduced Adcock [5]. Latter on contributions are made by and shown to be equivalent to minimum norm total Adcock [6], Pearson [7], Koopmans [8], Madansky least squares, see Dowling and Degroat [20]. [9], and York [10]. The orthogonal regression Finally, the total least-squares approach is tightly method has been rediscovered many times, often related to the maximum likelihood principal com- independently. About 30 years ago, the technique ponent analysis method introduced in chemometrics was extended by Sprent [11] and Gleser [12] to by Wentzell et al. [21,22], see the discussion in multivariate (n41, d41) problems. Section 4.2. More recently, the total least-squares method also The key role of least squares in regression analysis stimulated interest outside . In the field of is the same as that of total least squares in EIV , this problem was first studied by modeling. Nevertheless, a lot of confusion exists in Golub and Van Loan [1,2]. Their analysis, as well as the fields of numerical analysis and statistics about their algorithm, is based on the singular value the principle of total least squares and its relation to decomposition. Geometrical insight into the proper- EIV modeling. The computational advantages of ties of the singular value decomposition brought total least squares are still largely unknown in the Staar [13] independently to the same concept. Van statistical community, while inversely the concept Huffel and Vandewalle [14] generalized the algo- of EIV modeling did not penetrate sufficiently well rithm of Golub and Van Loan to all cases in in the field of computational mathematics and which their algorithm fails to produce a solution, engineering. ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2287

A comprehensive description of the state of the Algorithm 1. Basic total least-squares algorithm. art on total least squares from its conception up to mn md the summer of 1990 and its use in parameter Input: A 2 R and B 2 R . estimation has been presented in Van Huffel and 1: Compute the singular value decomposition > Vandewalle [23]. While the latter book is entirely ½AB¼USV . devoted to total least squares, a second [24] and 2: if V 22 is non-singular then b 1 third [25] edited books present the progress in total 3: Set X tls ¼V 12V 22 . least squares and in the field of EIV modeling, 4: else respectively, from 1990 till 1996 and from 1996 till 5: Output a message that the problem (TLS1) has 2001. no solution and stop. end if b 2.2. Algorithm Output: X tls—a total least-squares solution of AX B. The following theorem gives conditions for the existence and uniqueness of a total least-squares Most total least-squares problems which arise in solution. practice can be solved by Algorithm 1. Extensions of the basic total least-squares algorithm to problems in Theorem 2 (Solution of the classical total least- which the total least-squares solution does not exist squares problem). Let or is not unique are considered in detail in [23].In > C:¼½AB¼USV where S ¼ diagðs1; ...; snþd Þ addition, it is shown how to speed up the total least- squares computations directly by computing the be a singular value decomposition of C, s1X singular value decomposition only partially or Xsnþd be the singular values of C, and define the iteratively if a good starting vector is available. More partitionings recent advances, e.g., recursive total least-squares algorithms, neural based total least-squares algo- nd nd "# "# rithms, rank-revealing total least-squares algorithms, V:¼ V 11 V 12 n and S:¼ S1 0 n . total least-squares algorithms for large scale pro- V 21 V 22 d 0 S2 d blems, etc., are reviewed in [24,25].Anovel theoretical and computational framework for treat- A total least-squares solution exists if and only if V 22 ing non-generic and non-unique total least-squares is non-singular. In addition, it is unique if and only if problems is presented by Paige and Strakos [26]. snasnþ1. In the case when the total least-squares solution exists and is unique, it is given by 2.3. Properties

b 1 X tls ¼V 12V 22 Consider the EIV model and assume that ~ and the corresponding total least-squares correction vecð½A B~Þ is a zero mean random vector with a matrix is multiple of the identity covariance matrix. In > addition, assume that limm!1 A~ A~=m exists and is > DCtls:¼½DAtls DBtls¼U diagð0; S2ÞV . a positive definite matrix. Under these assumptions it is proven [1,27] that the total least-squares In the generic case when a unique total least- b squares solution Xb exists, it is computed from the solution X tls is a weakly consistent estimator of tls ¯ d right singular vectors corresponding to the the true parameter values X, i.e., smallest singular values by normalization. This Xb ! X¯ in probability as m !1. gives Algorithm 1 as a basic algorithm for solving tls the classical total least-squares problem (TLS1). This total least-squares property does not depend on Note that the total least-squares correction matrix the distribution of the errors. The total least-squares DCtls is such that the total least-squares data correction ½DAtls DBtls, however, being a rank d approximation matrix is not an appropriate estimator for the measurement error matrix ½A~ B~ (which is a full rank Cb :¼C þ DC ¼ U diagðS ; 0ÞV > tls tls 1 matrix with probability one). Note that the least- b is the best rank-n approximation of C. squares estimator X ls is inconsistent in the EIV case. ARTICLE IN PRESS 2288 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302

In the special case of a single right-hand side value. Additional algebraic connections and sensi- (d ¼ 1) and A full rank, the total least-squares tivity properties of the total least-squares and least- problem has an analytic expression that is similar to squares problems, as well as other statistical proper- the one of the least-squares solution ties have been described in [23,24]. b > 1 > least squares: xls ¼ðA AÞ A b, 2.4. Extensions b > 2 1 > total least squares: xtls ¼ðA A snþ1IÞ A b, ðÞ The that corresponds to the where snþ1 is the smallest singular value of ½Ab. basic total least-squares approach is the EIV model From a numerical analyst’s point of view, (*) tells with the restrictive condition that the measurement that the total least-squares solution is more ill- errors are zero mean independent and identically conditioned than the least-squares solution since it distributed. In order to relax these restrictions, has a higher condition number. The implication is several extensions of the total least-squares problem that errors in the data are more likely to affect the have been investigated. The mixed least-squares– total least-squares solution than the least-squares total least-squares problem formulation allows to solution. This is particularly true for the worst case extend consistency of the total least-squares esti- perturbations. In fact, total least-squares is a mator in EIV models, where some of the variables deregularizing procedure. However, from a statisti- are measured without error. The data least-squares cian’s point of view, (*) tells that the total least- problem [28] refers to the special case in which the A squares method asymptotically removes the bias by matrix is noisy and the B matrix is exact. When the subtracting the error covariance matrix (estimated errors ½A~ B~ are row-wise independent with equal 2 > by snþ1I) from the data covariance matrix A A. row covariance matrix (which is known up to a While least-squares minimizes a sum of squared scaling factor), the generalized total least-squares residuals, total least-squares minimizes a sum of problem formulation [29] allows to extend consis- weighted squared residuals: tency of the total least-squares estimator. More general problem formulations, such as least squares: min kAx bk2, x restricted total least squares [30], which also allow kAx bk2 the incorporation of equality constraints, have been total least squares: min . proposed, as well as total least-squares problem x kxk2 þ 1 formulations using ‘p norms in the cost function. From a numerical analyst’s point of view, total The latter problems, called total ‘p approximations, least-squares minimizes the Rayleigh quotient. proved to be useful in the presence of outliers. From a statistician’s point of view, total least- Robustness of the total least-squares solution is also squares weights the residuals by multiplying them improved by adding regularization, resulting in with the inverse of the corresponding error covar- regularized total least-squares methods [31–35].In iance matrix in order to derive a consistent estimate. addition, various types of bounded uncertainties Other properties of total least squares, which have been proposed in order to improve robustness were studied in the field of numerical analysis, are of the estimators under various noise conditions its sensitivity in the presence of errors on all data [36,37]. [23]. Differences between the least-squares and total Similarly to the classical total least-squares least-squares solution are shown to increase when estimator, the generalized total least-squares esti- the ratio between the second smallest singular value mator is computed reliably using the singular value of ½Ab and the smallest singular value of A is decomposition. This is no longer the case for more growing. In particular, this is the case when the set general weighted total least-squares problems where of equations Ax b becomes less compatible, the the measurement errors are differently sized and/or vector y is growing in length, or A tends to be rank- correlated from row to row. Consistency of the deficient. Assuming independent and identically weighted total least-squares estimator is proven and distributed errors, the improved accuracy of the an iterative procedure for its computation is total least-squares solution compared to that of the proposed in [38]. This problem is discussed in more least-squares solution is maximal when the ortho- detail in Section 3. gonal projection of b is parallel to the singular Furthermore, constrained total least-squares pro- vector of A corresponding to the smallest singular blems have been formulated. Arun [39] addressed ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2289 the unitarily constrained total least-squares pro- astronomy [66]. An overview of EIV methods in blem, i.e., AX B, subject to the constraint that the system identification is given by So¨derstro¨min[67]. solution matrix X is unitary. He proved that this In [24,25], the use of total least squares and EIV solution is the same as the solution to the models in the application fields are surveyed and orthogonal Procrustes problem [40, p. 582]. Abat- new algorithms that apply the total least-squares zoglou et al. [41] considered yet another constrained concept are described. total least-squares problem, which extends the A lot of common problems in system identifica- classical total least-squares problem to the case tion and signal processing can be reduced to special where the errors ½A~ B~ are algebraically related. In types of block-Hankel and block-Toeplitz struc- this case, the total least-squares solution is no longer tured total least-squares problems. In the field of statistically optimal (e.g., maximum likelihood in signal processing, in particular in vivo magnetic the case of normal distribution). resonance spectroscopy, and audio coding, new In the so-called structured total least-squares state-space based methods have been derived by problems [42], the data matrix ½AB is structured. making use of the total least-squares approach for In order to preserve the maximum likelihood spectral estimation with extensions to decimation properties of the solution, the total least-squares and multichannel data quantification [68,69].In problem formulation is extended [43] with the addition, it has been shown how to extend the least additional constraint that the structure of the data mean squares algorithm to the EIV context for use matrix ½AB is preserved in the correction matrix in adaptive signal processing and various noise ½DA DB. Similarly to the weighted total least- environments. Finally, total least-squares applica- squares problem, the structured total least-squares tions also emerge in other fields, including informa- solution, in general, has no closed form expression tion retrieval [70], shape from moments [71], and in terms of the singular value decomposition. An computer algebra [72,73]. important exception is the circulant structured total least squares, which can be solved using the fast 3. Representation-free total least-squares problem Fourier transform, see [44]. In the general case, a formulation structured total least-squares solution is searched via numerical optimization methods. However, An insightful way of viewing the abstract rank efficient algorithms are proposed in the literature constraint rankðCÞpn is as the existence of a linear that exploit the matrix structure on the level of the static model for C: rankðCÞpn is equivalent to the computations. This research direction is further existence of a subspace B Rnþd of dimension at described in Section 5. most n that contains the rows of C. Regularized structured total least-squares solu- nþd tion methods are proposed in [45,46]. Regulariza- A subspace B R is referred to as a linear tion turns out to be important in the application of static model. Its dimension n is a measure of the structured total least-squares method for image the model complexity: the higher the dimension deblurring [47–49]. In addition, solution methods the more complex and therefore less useful is the for nonlinearly structured total least-squares meth- model B. ods are developed in [50,51]. The set of all linear static models of dimension at most n is denoted by Ln. It is a non-convex set and 2.5. Applications has special properties that make it a Grassman manifold. Since the publication of the singular value > Let ½c1 cm:¼C , i.e., ci is the transposed ith decomposition based total least-squares algorithm row of the matrix C and define the shorthand [2], many new total least-squares algorithms have notation been developed and, as a result, the number of nþd 3 applications in total least squares and EIV modeling C 2 B R : ci 2 B for i ¼ 1; ...; m. has increased in the last decade. Total least squares We have the following equivalence is applied in computer vision [52], image reconstruc- ð Þ 3 2 2 tion [53–55], speech and audio processing [56,57], rank C pn C B Ln, modal and spectral analysis [58,59], linear system which relates the total least-squares problem (TLS2) theory [60,61], system identification [62–65], and to approximate linear static modeling. We restate ARTICLE IN PRESS 2290 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 problem (TLS2) with this new interpretation and section, we present three representations of a linear notation. static model: kernel, image, and input/output. They give different parameterizations of the model and Problem 3 (Total least squares). Given a data mðnþdÞ are important in setting up algorithms for the matrix C 2 R and a complexity specification solution of the problem. n, solve the optimization problem b b b 3.1.1. Kernel representation fBtls; Ctlsg:¼ arg min min kC CkF. (TLS) B2L n Cb2B Let B 2 Ln, i.e., B is an n-dimensional subspace nþd Note that (TLS) is a double minimization of R . A kernel representation of B is given by a problem. On the inner level is the search for the system of equations Rc ¼ 0, such that best approximation of the given data C in a given B ¼fc 2 Rnþd j Rc ¼ 0g¼: kerðRÞ. model B. The optimum value of this minimization The matrix R2RgðnþdÞ is a parameter of the model B. b MtlsðC; BÞ:¼ min kC CkF (Mtls) The parameter R is not unique. There are two Cb2B sources for the non-uniqueness: is a measure of the lack of fit between the data and the model and is called misfit. On the outer level is 1. R might have redundant rows, and the search for the optimal model in the model class 2. for a full rank matrix U, kerðRÞ¼kerðURÞ. Ln of linear static models with bounded complex- ity. The optimality of the model is in terms of the The parameter R having redundant rows is related total least-squares misfit function Mtls. to the minimality of the representation. For a given linear static model B, the representation Rc ¼ 0of The double minimization structure, described B is minimal if R has the minimal number of rows above, is characteristic for all total least-squares among all parameters R that define a kernel problems. Since the model is linear and the B representation of B. The kernel representation, cost function is convex quadratic, the inner mini- defined by R, is minimal if and only if R is full mization can be solved analytically yielding a row rank. closed form expression for the misfit function. Because of item 2, a minimal kernel representa- The resulting outer minimization, however, is tion is still not unique. All minimal representations, a non-convex optimization problem and needs however, are related to a given one via a pre- numerical solution methods. In the case of the multiplication of the parameter R with a non- basic total least-squares problem and the general- singular matrix U. In a minimal kernel representa- ized total least-squares problem, presented in tion, the rows of R are a basis for B?, the Section 3.3, the outer minimization can be brought orthogonal complement of B, i.e., back to a singular value decomposition computa- tion. In more general cases, however, one has to B? ¼ row spanðRÞ. rely on non-convex optimization methods and the The choice of R is non-unique due to the non- guarantee to compute a global solution quickly and uniqueness in the choice of basis of B?. efficiently is lost. The minimal number of independent linear In order to solve numerically the abstract total equations necessary to define a linear static model least-squares problem (TLS), we need to parameter- B is d, i.e., in a minimal representation B ¼ kerðRÞ ize the fitting model. This important issue is with row dimðRÞ¼d. discussed next. 3.1.2. Image representation 3.1. Kernel, image, and input/output representations The dual of the kernel representation B ¼ kerðRÞ is the image representation As argued in the Introduction, the representation- ¼f 2 nþd j ¼ 2 ng¼ ð Þ free formulation is conceptually useful. For analy- B c R c Pl; l R :col span P . sis, however, often it is more convenient to consider Again for a given B 2 Ln an image representation concrete representations of the model, which turn B ¼ col spanðPÞ is not unique because of possible the abstract problem (TLS) into concrete parameter non-minimality of P and the choice of basis. optimization problems, such as (TLS1). In this The representation is minimal if and only if P is ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2291 a full column rank matrix. In a minimal image representation, col dimðPÞ¼dimðBÞ and the col- umns of P form a basis for B. Clearly col spanðPÞ¼ col spanðPUÞ, for any non-singular matrix U 2 Rnn. Note that kerðRÞ¼col spanðPÞ¼B 2 Ln ) RP ¼ 0, which gives a link between the parameters P and R. Fig. 2. Links among kernel, image, and input/output representa- tions of B 2 Ln. "# 3.1.3. Input/output representation Pi nn Both, the kernel and the image representations, P¼: ; Pi 2 R . treat all variables on an equal footing. In contrast, Po the more classical input/output representation The links among the parameters X, R, and P are nþd > summarized in Fig. 2. Bi=oðXÞ:¼fc¼:colða; bÞ2R j X a ¼ bg (I/O repr) 3.2. Solution of the total least-squares problem distinguishes free variables a 2 Rn, called inputs, and dependent variables b 2 Rd , called outputs. In Approximation of the data matrix C with a model an input/output representation, a can be chosen B in the model class Ln is equivalent to finding a freely, while b is fixed by a and the model. Note that matrix Cb 2 RmðnþdÞ with rank at most n. In the case > b for repeated observations C ¼½c1 cm the when the approximation criterion is kC Ck b F statement C 2 Bi=oðXÞ is equivalent to the linear (total least-squares problem) or kC Ck2, the system of equations AX ¼ B, where ½AB:¼C with problem has a solution in terms of the singular A 2 Rmn and B 2 Rmd . value decomposition of C. The result is known as The partitioning c ¼ colða; bÞ gives an input/ the Eckart–Young–Mirsky low-rank matrix ap- output partitioning of the variables: the first proximation theorem [74]. We state it in the next n:¼ dimðaÞ variables are inputs and the remaining lemma. d:¼ dimðbÞ variables are outputs. An input/output partitioning is not unique. Given a kernel or image Lemma 4 (Matrix approximation lemma). Let C ¼ > representation, finding an input/output partitioning USV be the singular value decomposition of C 2 mðnþdÞ is equivalent to selecting a d d full rank submatrix R and partition the matrices U, of R or an n n full rank submatrix of P. In fact, S¼:diagðs1; ...; snþd Þ, and V as follows: generically, any splitting of the variables into a nd group of d variables (outputs) and a group of nd "# remaining variables (inputs), defines a valid input/ ; S1 0 n and U¼: ½ U 1 U2 m S¼: output partitioning. In non-generic cases certain 0 S2 d partitionings of the variables into inputs and out- puts are not possible. nd Note that in (I/O repr), the first n variables are ðSVD PRTÞ V¼: ½ V 1 V 2 n þ d: fixed to be inputs, so that given X, the input/output represent Bi=oðXÞ is fixed and vice versa, given b > Then the rank-n matrix C ¼ U 1S1V 1 is such that B 2 Ln, the parameter X (if it exists) is unique. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Thus, as opposed to the parameters R and P in the kC Cb k ¼ min kC Cbk ¼ s2 þþs2 . F b F nþ1 nþd kernel and the image representations, the parameter rankðC Þpn X in the input/output representation (I/O repr) is The solution Cb is unique if and only if s as . unique. nþ1 n Consider the input/output Bi=oðXÞ, kernel kerðRÞ, The solution of the total least-squares problem and image col spanðPÞ representations of B 2 Ln (TLS) trivially follows from Lemma 4. and define the partitionings Theorem 5 (Solution of the total least-squares dd > R¼:½Ri Ro; Ro 2 R and problem). Let C ¼ USV be the singular value ARTICLE IN PRESS 2292 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 decomposition of C and partition the matrices U, S, Theorem 8 (Solution of the generalized total least- and V as in (SVD PRT). Then a total least-squares squares problem). Define the modified data matrix pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi approximation of C in Ln is Cm:¼ W ‘C W r, b > b > Ctls ¼ U 1S1V ; Btls ¼ kerðV Þ¼col spanðV 1Þ, b b 1 2 and let Cm;tls, Bm;tls ¼ kerðRm;tlsÞ¼col spanðPm;tlsÞ be a total least-squares approximation of C in L . and the total least-squares misfit is m n qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Then a solution of the generalized total least-squares 2 2 problem (GTLS) is MtlsðC; BÞ¼kS2kF ¼ snþ1 þþsnþd where pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi Cb ¼ð W Þ1Cb ð W Þ1, S2¼:diagðsnþ1; ...; snþd Þ. gtls ‘ m;tls r pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi b 1 A total least-squares approximation always exists. It Bgtls ¼ kerðRm;tls W rÞ¼col spanðð W rÞ Pm;tlsÞ is unique if and only if snasnþ1. and the corresponding generalized total least-squares Note 6 (Non-generic total least-squares problems). The misfit is M ðC; B Þ¼M ðC ; B Þ. A general- b gtls gtls tls m m;tls optimal approximating model Btls might have no ized total least-squares solution always exists. It is b input/output representation (I/O repr). In this case, unique if and only if Bm;tls is unique. the optimization problem (TLS1) has no solution. By suitable permutation of the variables, however, Robust algorithms for solving the generalized b total least-squares problemffiffiffiffiffiffiffi without explicitlyffiffiffiffiffiffiffi com- (TLS1) can be made solvable, so that X tls exists and p 1 p 1 b b puting the inverses ð W ‘Þ and ð W rÞ are Btls ¼ Bi=oðX tlsÞ. proposed in [29,30,75]. These algorithms give better The issue of whether the total least-squares accuracy when the weight matrices are nearly rank problem is generic or not is not related to the deficient. In addition, they can treat the singular approximation of the data per se but to the case, which implies that some rows and/or columns b possibility of representing the optimal model Btls of C are considered exact and are not modified in in the form (I/O repr), i.e., to the possibility of the solution Cb. b imposing a given input/output partition on Btls. If the matrices W ‘ and W r are diagonal, i.e., m W ‘ ¼ diagðw‘;1; :::; w‘;mÞ, where w‘ 2 Rþ and nþd W r ¼ diagðwr;1; :::; wr;nþd Þ, where wl 2 Rþ the gen- 3.3. Generalized total least-squares problem eralized total least-squares problem is called scaled total least-squares. mm ðnþdÞðnþdÞ Let W ‘ 2 R and W r 2 R be given positive definite matrices and define the following 4. Weighted total least squares generalized total least-squares misfit function pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi b For a given positive definite weight matrix W 2 MgtlsðC; BÞ¼min k W ‘ðC CÞ W rkF. mðnþdÞmðnþdÞ Cb2B R define the weighted matrix norm qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (Mgtls) > > > kCkW :¼ vec ðC ÞWvecðC Þ (W ‘ allows for a row weighting and W r for a column weighting in the cost function.) The result- and the weighted total least-squares misfit function ing approximation problem is called generalized b MwtlsðC; BÞ:¼ min kC CkW . (Mwtls) total least-squares problem. Cb2B Problem 7 (Generalized total least squares). Given a The approximation problem with weighted total data matrix C 2 RmðnþdÞ, positive definite weight least-squares misfit function is called the weighted matrices W ‘ and W r, and a complexity specification total least-squares problem. n, solve the optimization problem Problem 9 (Weighted total least squares). Given a b b mðnþdÞ fBgtls; Cgtlsg¼arg min MgtlsðC; BÞ. (GTLS) data matrix C 2 R , a positive definite weight Bb2Ln matrix W, and a complexity specification n, solve The solution of the generalized total least-squares the optimization problem problem can be obtained from the solution of a total b b fBwtls; Cwtlsg:¼ arg min MwtlsðC; BÞ. (WTLS) least-squares problem for a modified data matrix. B2Ln ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2293

The motivation for considering the weighted total uncorrelated errors, we introduce the weight matrix least-squares problem is that it defines the max- W¯ as follows: covðvecðC~ÞÞ ¼ s2W¯ 1; compare with imum likelihood estimator for the EIV model when (**), where C~ is transposed. the measurement noise C~ ¼½A~ B~ is zero mean, With W ¼ I, (WTLS) coincides with the total normally distributed, with a covariance matrix least-squares problem (TLS). Except for the special case of generalized total least squares, however, the ~ > 2 1 covðvecðC ÞÞ ¼ s W ,(**) weighted total least-squares problem has no closed i.e., the weight matrix W is up to a scaling factor s2 form solution in terms of the singular value the inverse of the measurement noise covariance decomposition. As an optimization problem it is matrix. non-convex, so that the currently available solution methods do not guarantee convergence to a global Note 10 (Element-wise weighted total least-squares). The optimum solution. In the rest of this section, we give special case when the weight matrix W is diagonal is an overview of solution methods for the weighted called element-wise weighted total least squares. It total least-squares problem, with emphasis on the corresponds to an EIV problem with uncorrelated row-wise weighted total least-squares case, i.e., measurement errors. Let W ¼ diagðw ; ...; w Þ when the weight matrix W is block diagonal 1 mðnþdÞ ¼ ð Þ 2 ðnþdÞðnþdÞ and define the m ðn þ dÞ weight matrix S by W diag W 1; ...; W m , W i R , W i40. S :¼w . Denote by the element-wise In the EIV setting, this assumption implies that the ij ði1ÞðnþdÞþj ~ ~ product A B ¼½a b . Then measurement errors ci and cj are uncorrelated for all ij ij i; j ¼ 1; ...; m, iaj, which is a reasonable assump- kDCkW ¼kS DCkF. tion for most applications. Note 11 (Total least squares as an unweighted Similarly to the total least-squares and general- weighted total least squares). The extreme special ized total least-squares problems, the weighted total case when W ¼ I is called unweighted. Then the least-squares problem is a double minimization weighted total least-squares problem reduces to problem. The inner minimization is the search for the total least-squares problem. The total least- the best approximation of the data in a given model and an outer minimization is the search for the squares misfit Mtls weights equally all elements of the correction matrix DC. It is a natural choice model. First, we solve the inner minimization when there is no prior knowledge about the data. problem—the misfit computation. In addition, the unweighted case is computa- tionally easier to solve than the general weighted 4.1. Best approximation of the data by a given model case. Since the model is linear, (Mwtls) is a convex Special structure of the weight matrix W results in optimization problem with an analytic solution. In special weighted total least-squares problems. Fig. 3 order to give explicit formulas for the optimal shows a hierarchical classification of various pro- b approximation Cwtls and misfit MwtlsðC; BÞ, how- blems considered in the literature. From top to ever, we need to choose a particular parameteriza- bottom the generality of the problems decreases: on tion of the given model B. We state the results for the top is a weighted total least-squares problem for the kernel and the image representations. The a general positive semi-definite weight matrix and results for the input/output representation follow on the bottom is the classical total least-squares from the given ones by the substitutions R!½7 X > I problem. In between are weighted total least- I and P7!½ >. squares problems with (using the stochastic termi- X nology) uncorrelated errors among the rows, among Theorem 12 (Weighted total least-squares misfit the columns, and among all elements (element-wise computation, kernel representation version). Let weighted total least-squares case). Row-wise and kerðRÞ be a minimal kernel representation of column-wise uncorrelated weighted total least- B 2 Ln. The best weighted total least-squares squares problems, in which the row or column approximation of C in B, i.e., the solution of (Mwtls), weight matrices are equal are generalized total least- is squares problems with, respectively, W ‘ ¼ I and b 1 > 1 > 1 cwtls;i ¼ðI W i R ðRW i R Þ RÞci W r ¼ I. In order to express easily the structure of the weight matrix in the case of column-wise for i ¼ 1; ...; m ARTICLE IN PRESS 2294 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302

WTLS W ≥ 0

Row-wise WTLS Column-wise WTLS ¯ ¯¯ W = diag (W1,...,Wm) W = diag (W1,..., Wn+d)

Row-wise GTLS Column-wise GTLS EWTLS W = diag (W ,...,W ) ¯ r r W = diag(w) m n+d

Row-wises caled TLS Column-wises caled TLS

¯ W = diag (col (wr,...,wr)) m n+d

TLS W = 2I m (n+d)

TLS — total least squares GTLS — generalized total least squares WTLS — weighted total least squares EWTLS — element-wise weighted total least squares

Fig. 3. Hierarchy of weighted total least-squares problems according to the structure of the weight matrix W. On the left side are weighted total least-squares problems with row-wise uncorrelated measurement errors and on the right side are weighted total least-squares problems with column-wise uncorrelated measurement errors.

with the corresponding misfit with the corresponding misfit M ðC; kerðRÞÞ M ðC; col spanðPÞÞ wtlssffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wtlssffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xm Xm > > 1 > 1 > > 1 > ¼ ci R ðRW i R Þ Rci. ðMwtlsRÞ ¼ ci W iðI PðP W iPÞ P W iÞci. i¼1 i¼1 The image representation is dual to the kernel ðMwtlsPÞ representation. Correspondingly, the misfit compu- tation with kernel and with image representations of 4.2. Optimization over the model parameters the model are dual problems. The kernel represen- tation leads to a least norm problem and the image The remaining problem—the minimization with representation leads to a least-squares problem. respect to the model parameters is a non-convex optimization problem that in general has no closed Theorem 13 (Weighted total least-squares misfit form solution. For this reason numerical optimiza- computation, image representation version). Let tion methods are employed for its solution. col spanðPÞ be a minimal image representation of Special optimization methods for the weighted B 2 Ln. The best weighted total least-squares total least-squares problem are proposed in approximation of C in B is [21,42,76–78]. The Riemannian singular value de- composition framework of De Moor [42] is derived b > 1 > cwtls;i ¼ PðP W iPÞ P W ici for i ¼ 1; ...; m for the structured total least-squares problem but ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2295 includes the weighted total least-squares problem Table 1 with diagonal weight matrix and d ¼ 1 as a special Model representations and optimization algorithms used in the case. The restriction to more general weighted total methods of [21,22,42,76,78] least-squares problems comes from the fact that Method Representation Algorithm the Riemannian singular value decomposition framework is derived for matrix approximation Riemannian Kernel Inverse power problems with rank reduction by one. De Moor singular value iteration decomposition proposed an algorithm resembling the inverse Maximum Image Alternating power iteration algorithm for computing the solu- likelihood projections tion. The method, however, has no proven conver- principle gence properties. component The maximum likelihood principle component analysis Premoli–Rastello Input/output Iteration based on analysis method of Wentzell et al. [21] is an heuristic alternating least-squares algorithm. It applies to linearization the general weighted total least-squares problems Weighted low rank Kernel Newton method and is globally convergent, with linear convergence approximation rate. The method of Premoli and Rastello [76] is a heuristic for solving the first order optimality condition of (WTLS). A solution of a nonlinear t ¼ 1; ...; T l, the difference equation (KER) is equation is sought instead of a minimum point of equivalent to the block-Hankel structured system of the original optimization problem. The method is equations locally convergent with superlinear convergence 2 3 rate. The region of convergence around a minimum w1 w2 wTl 6 7 6 7 point could be rather small in practice. The 6 w2 w3 wTlþ1 7 ½ R0 R1 Rl 6 7 ¼ 0. weighted low rank approximation framework of 6 . . . 7 Manton et al. [78] proposes specialized optimization 4 . . . 5 methods on a Grassman manifold. The least- wlþ1 wlþ2 wT |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} squares nature of the problem is not exploited by Hl ðwÞ the algorithms proposed in [78]. (Hank eqn) The Riemannian singular value decomposition, maximum likelihood principle component analy- Thus the constraint that a time series w ¼ sis, Premoli–Rastello, and weighted low rank ðwð1Þ; ...; wðTÞÞ is a trajectory of the linear time- approximation methods differ in the parameter- invariant model implies rank deficiency of the block- ization of the model and the optimization Hankel matrix HlðwÞ. algorithm used, see Table 1. Next we show three typical examples that illustrate the occurrence of structured system of equations in approximate modeling problems. 5. Structured total least squares 5.1. Examples The total least-squares problem is a tool for approximate modeling by a static linear model. 5.1.1. Deconvolution Similarly, the structured total least-squares problem The convolution of the (scalar) sequences with block-Hankel structured data matrix is a tool for approximate modeling by a linear time-invariant ð...; a1; a0; a1; ...Þ and ð...; x1; x0; x1; ...Þ dynamic model. In order to show how the block- is the sequence ð...; b1; b0; b1; ...Þ defined as Hankel structure occurs, consider a difference equa- follows: tion represented by an linear time-invariant model X1 bi ¼ xjaij. (CONV) R0wt þ R1wtþ1 þþRlwtþl ¼ 0. (KER) j¼1

Here R0; ...; Rl are the model parameters and Assume that xj ¼ 0 for all jo1 and for all j4n. the integer l is the lag of the equation. For Then (CONV) for i ¼ 1; ...; m can be written as the ARTICLE IN PRESS 2296 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 following structured system of equations: difference equation 2 3 2 3 2 3 a a a x b Xl 0 1 1n 1 1 b b 6 7 6 7 6 7 yt þ atytþt ¼ 0. (LP) 6 a1 a0 a2n 7 6 x2 7 6 b2 7 6 7 6 7 6 7 t¼1 6 . . . 7 6 . 7 ¼ 6 . 7 . 4 . . . 5 4 . 5 4 . 5 b . . . . . Approximating yd by a signal y that satisfies (LP) is a linear prediction problem, so modeling y am1 amþn2 amn xn bm d |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflffl{zfflffl} |fflfflffl{zfflfflffl} as a sum of damped exponentials is equivalent A x b to the linear prediction problem. Of course, there (CONV0) is a one-to-one relation between the initial b b l Note that the matrix A is Toeplitz structured and is conditions y0; ...; ylþ1 and parameters faigi¼1 of l parameterized by the vector a ¼ colða1n; ...; am1Þ (LP) and the parameters fci; di; oi; figi¼1 of 2 Rmþn1. (SDE). The aim of the deconvolution problem is to find For a time horizon t ¼ 1; ...; T, with T4l þ 1, x, given a and b. With exact data the problem boils (LP) can be written as the structured system of down to solving the system of equations (CONV0). equations 2 3 2 3 By construction it has an exact solution. Moreover b b b 2 3 b y1 y2 ... yl a y the solution is unique whenever A is of full column 6 7 1 6 lþ1 7 6 yb yb ... yb 76 7 6 yb 7 rank, which can be translated to a persistency of 6 2 3 lþ1 76 a2 7 6 lþ2 7 6 76 7 ¼6 7, excitation condition on a, see [79]. 6 . . . 76 . 7 6 . 7 4 . . . 54 . 5 4 . 5 The deconvolution problem is more realistic and b b b b more challenging when the data a; b are perturbed. ym ymþ1 yT1 al yT We assume that m4n, so that the system of where m:¼T l. Therefore, the Hankel matrix equations (CONV0) is overdetermined. Because H þ ðybÞ with l þ 1 columns, constructed from yb is both a and b are perturbed and the A matrix is l 1 rank deficient. Conversely, if H þ ðybÞ has a one- structured, the deconvolution problem is a total l 1 dimensional left kernel, then yb satisfies the linear least-squares problem with structured data matrix recursion (LP). Therefore, the linear pre- C ¼½Ab, A Toeplitz and b unstructured. diction problem is the problem of finding the smallest in some sense (e.g., 2-norm) correction Dy on the given sequence y that makes a block- 5.1.2. Linear prediction d Hankel matrix H þ ðybÞ constructed from the In many signal processing applications the sum of l 1 corrected sequence yb:¼y Dy rank deficient. damped exponentials model d This is an structured total least-squares problem Xl pffiffiffiffiffiffiffi Ax b with Hankel structured data matrix b dit iðoitþfiÞ yt ¼ cie e where i:¼ 1 (SDE) C ¼½Ab. i¼1 is considered. Given an observed sequence 5.1.3. EIV identification ðyd;1; ...; yd;T Þ (‘‘d’’ stands for data), the aim is to Consider the linear time-invariant system de- l find parameters fci; di; oi; figi¼1 of a sum of damped scribed by the difference equation exponentials model, such that the signal yb given by Xl Xl (SDE) is close to the observed one, e.g., yb þ a yb ¼ b ub (DE) 2 3 t t tþt t tþt 2 3 t¼1 t¼0 y yb 6 d;1 7 1 6 7 6 7 and define the parameter vector 6 . 7 6 . 7 min 4 . 5 4 . 5 .

b 2lþ1 yd;T yT x:¼colðb0; ...; bl; a0; ...; al1Þ2R .

Note that the sum of damped exponentials model Given a set of input/output data ðud;1; yd;1Þ; ...; is just an autonomous linear time-invariant model, ðud;T ; yd;T Þ and an order specification l,we i.e., yb is a free response of an linear time-invariant want to find the parameter x of a system that fits system. Therefore yb satisfies a homogeneous linear the data. ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2297

For a time horizon t ¼ 1; ...; T, (DE) can be the method of Cadzow is an iterative method written as the structured system of equations that alternates between unstructured low rank approximation and structure enforcement, thereby only requiring singular value decomposition computations and manipulation of the matrix entries. Tufts and Shah propose in [83],anon-iterative 0 (DE ) method for Hankel structured total least-squares approximation that is based on perturbation analy- where m:¼T l. We assume that the time horizon is 0 sis and provides nearly optimal solution for high large enough to ensure m42l þ 1. The system (DE ) signal-to-noise ratio (SNR). In a statistical setting, is satisfied for exact data and a solution is the true this method achieves the Cramer–Rao lower bound value of the parameter x. Moreover, under addi- asymptotically as the SNR tends to infinity. Non- tional assumption on the input (persistency of iterative methods for solving the linear prediction excitation) the solution is unique. problem (which, as shown in Section 5.1, is For perturbed data an approximate solution is equivalent to Hankel structured total least-squares sought and the fact that the system of equation 0 problem) are proposed by Tufts and Kumaresan in (DE ) is structured suggests the use of the structured their seminal work [84,85]. total least-squares method. Under appropriate Abatzoglou et al. [41] are considered to be the conditions for the data generating mechanism an first who formulated a structured total least-squares structured total least-squares solution provides a problem. They called their approach constrained maximum likelihood estimator. The structure aris- total least squares and motivate the problem as an ing in the EIV identification problem is > > extension of the total least-squares method to C ¼½Hl ðudÞ Hl ðydÞ. matrices with structure. The solution approach adopted by Abatzoglou et al. is closely related to 5.2. History of the structured total least-squares the one of Aoki and Yue. Again an equivalent problem optimization problem is derived, but it is solved numerically using a Newton-type optimization The origin of the structured total least-squares method. problem dates back to the work of Aoki and Yue Shortly after the publication of the work on the [80], although the name ‘‘structured total least- constrained total least-squares problem, De Moor squares’’ appeared only 23 years later in the [42] lists many applications of the structured total literature [42]. Aoki and Yue consider a single input least-squares problem and outlines a new frame- single output system identification problem, where work for deriving analytical properties and numer- both the input and the output are noisy (EIV ical methods. His approach is based on the setting) and derive a maximum likelihood solution. Lagrange multipliers and the basic result is an Under the normality assumption for the measure- equivalent problem, called Riemannian singular ment errors, a maximum likelihood estimate turns value decomposition, which can be considered as a out to be a solution of the structured total least- ‘‘nonlinear’’ extension of the classical singular value squares problem. Aoki and Yue approach the decomposition. As an outcome of the new problem optimization problem in a similar way to the one formulation, an iterative solution method based on presented in Section 5.3: they use classical nonlinear the inverse power iteration is proposed. least-squares minimization methods for solving an Another algorithm for solving the structured total equivalent unconstrained problem. least-squares problem (even with ‘1 and ‘1 norm in The structured total least-squares problem occurs the cost function), called structured total least frequently in signal processing applications. Cad- norm, is proposed by Rosen et al. [86]. In contrast zow [81], Bresler and Macovski [82] propose to the approaches of Aoki, Yue and Abatzoglou heuristic solution methods that turn out to be et al., Rosen et al. solve the problem in its original suboptimal with respect to the ‘2-optimality criter- formulation. The constraint is linearized around the ion, see Tufts and Shah [83] and De Moor [61, current iteration point, which results in a linearly Section V]. These methods, however, became constrained least-squares problem. In the algorithm popular because of their simplicity. For example, of Rosen et al., the constraint is incorporated in the ARTICLE IN PRESS 2298 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 cost function by adding a multiple of its residual 5.3. Structured total least-squares problem norm. formulation and solution method The weighted low rank approximation frame- work of Manton et al. [78] has been extended in Let S : Rnp ! RmðnþdÞ be an injective function. [87,88] to structured low rank approximation A matrix C 2 RmðnþdÞ is said to be S-structured if problems. All problem formulations and solution C 2 imageðSÞ. The vector p for which C ¼ SðpÞ is methods cited above, except for the ones in the called the parameter vector of the structured matrix structured low rank approximation framework, aim C. Respectively, Rnp is called the parameter space of at rank reduction of the data matrix C by one. the structure S. A generalization of the algorithm of Rosen et al. to problems with rank reduction by more than one is The aim of the structured total least-squares proposed by Van Huffel et al. [89]. It involves, problem is to perturb as little as possible a given however, Kronecker products that unnecessary parameter vector p by a vector Dp, so that the inflate the dimension of the involved matrices. perturbed structured matrix Sðp þ DpÞ becomes When dealing with a general affine structure the rank deficient with rank at most n. constrained total least squares, Riemannian singular (Structured total least squares). Given a value decomposition, and structured total least Problem 14 data vector p 2 np , a structure specification norm methods have cubic computational complex- R np ! mðnþdÞ n ity per iteration in the number of measurements. S : R R , and a rank specification , solve the optimization problem Fast algorithms with linear computational complex- ity are proposed by Mastronardi et al. [90–92] for Dpstls ¼ arg min kDpk subject to rankðSðp DpÞÞpn. special structured total least-squares problems with Dp data matrix C ¼½Ab that is Hankel or composed In what follows, we will use the input/output of a Hankel block A and an unstructured column b. representation They use the structured total least norm approach  but recognize that a matrix appearing in the kernel X Sðp DpÞX ext ¼ 0; X ext:¼ subproblem of the algorithm has low displacement I rank. This structure is exploited using the Schur of the rank constraint, so that the structured total algorithm. least-squares problem becomes the following para- The structured total least-squares solution meter optimization problem methods outlined above point out the following b issues: X stls ¼ arg min kDpk X;Dp "# X Structure: The structure specification for the subject to Sðp DpÞ ¼ 0. ðSTLSXÞ data matrix C varies from general affine to I specific affine, like Hankel/Toeplitz, or Hankel/ The structured total least-squares problem is said Toeplitz block augmented with an unstructured to be affine structured if the function is affine, column. S i.e., Rank reduction: All methods, except for [87–89], reduce the rank of the data matrix by one. Xnp np Computational efficiency: The efficiency varies SðpÞ¼S0 þ Sipi for all p 2 R from cubic for the methods that use a general i¼1 affine structure to linear for the efficient methods and for some Si; i ¼ 1; ...; np. ðAFFÞ of Lemmerling et al. [90] and Mastronardi et al. In an affine structured total least-squares problem, [91] that use a Hankel/Toeplitz type structure. the constraint Sðp DpÞX ext ¼ 0 is bilinear in the decision variables X and Dp. Efficient algorithms for problems with block- Hankel/Toeplitz structure and rank reduction with Lemma 15. Let S : Rnp ! RmðnþdÞ be an affine more than one are proposed by Markovsky et al. function. Then [93–95]. In addition, a numerically reliable and 3 robust software implementation is available [96]. Sðp DpÞX ext ¼ 0 GðXÞDp ¼ rðXÞ, ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2299 where the weight matrix GðXÞ has a block-Toeplitz and > > mdnp block-banded structure, which can be exploited for GðXÞ:¼½vecððS1X extÞ ÞvecððSnp X extÞ Þ 2 R , efficient cost function and first derivative evalua- (G) tions. According to Assumption (A), SðpÞ is and composed of blocks, each one of which is block- > md rðXÞ:¼vecððSðpÞX extÞ Þ2R . Toeplitz, block-Hankel, unstructured, or exact (an exact block Cl is not modified in the solution Using Lemma 15, we rewrite the affine structured b bl l C:¼Sðp DpÞ, i.e., C ¼ C ). total least-squares problem as follows: min min kDpk subject to GðXÞDp ¼ rðXÞ . Theorem 17 (Structure of the weight matrix G X Dp [93]). Consider the equivalent optimization problem 0 (STLS0 ). If in addition to the assumptions of (STLS ) X x Theorem 16, the structure S is such that (A) holds, The inner minimization problem has an analytic then the weight matrix GðXÞ has the block-Toeplitz solution, which allows to derive an equivalent and block-banded structure: optimization problem. 2 3 C C> C> 0 6 0 1 s 7 6 . . . . 7 Theorem 16 (Equivalent optimization problem for 6 . . . . 7 6 C1 . . . . 7 affine structured total least squares). Assuming that 6 7 6 . . . . . 7 npXmd, the affine structured total least squares 6 ...... C> 7 GðXÞ¼6 s 7, problem (STLSX ) is equivalent to 6 . . . . . 7 6 C ...... 7 min r>ðXÞGyðXÞrðXÞ where GðXÞ:¼GðXÞG>ðXÞ, 6 s 7 X 6 . . . . 7 6 . . . . > 7 00 4 . . . . C1 5 (STLSX ) 0 Cs C1 C0 and Gy is the pseudoinverse of G. where s ¼ maxl¼1;...;qðnl 1Þ and nl is the number of The significance of Theorem 16 is that the block columns in the block Cl. constraint and the decision variable Dp in problem (STLSX ) are eliminated. Typically the number of elements nd in X is much smaller than the number of 6. Conclusions elements np in the correction Dp. Thus the reduction in the complexity is significant. We reviewed the development and extensions of 00 the classical total least-squares problem and pre- The equivalent optimization problem (SRLS X )is a nonlinear least-squares problem, so that classical sented a new total least-squares problem formula- optimization methods can be used for its solution. tion. The new formulation is a matrix low rank The optimization methods require a cost function approximation problem and allows for different and first derivative evaluation. In order to evaluate representations of the rank constraint. Once a the cost function for a given value of the argument representation is fixed the matrix low rank approx- X, we need to form the weight matrix GðXÞ and to imation problem becomes a parameter optimization solve the system of equations GðXÞyðXÞ¼rðXÞ. problem. The classical total least-squares formula- This straightforward implementation requires tion results from the new one when an input/output Oðm3Þ floating point operation (flops). For large m representation is chosen. The input/output repre- (the applications that we aim at) this computational sentation is a linear system of equations AX ¼ B, complexity becomes prohibitive. which is the classical way of addressing approxima- It turns out, however, that for the special case of tion problems. However, the input/output repre- affine structures sentation is not equivalent to the low rank constraint, which leads to non-generic total least- SðpÞ¼½C1 ... Cq for all p 2 Rnp squares problems. Using the representation-free formulation, we classified existing total least- where Cl; for l ¼ 1; ...; q; is squares solution methods. The existing methods blockToeplitz; blockHankel, differ in the representation and the optimization unstructured; or exact. ðAÞ method used. ARTICLE IN PRESS 2300 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302

The basic and generalized total least-squares [3] J. C. Willems, From time series to linear system—Part I. Finite problems have an analytic solution in terms of the dimensional linear time invariant systems, Part II. Exact singular value decomposition of the data matrix, modelling, Part III. Approximate modelling, Automatica 22 (1986) 561–580, 22 (1986) 675–694, 23 (1987) 87–115. which allows fast and reliable computation of the [4] I. Markovsky, J.C. Willems, S. Van Huffel, B. De Moor, solution. Moreover, all globally optimal solutions Exact and approximate modeling of linear systems: a can be classified in terms of the singular value behavioral approach. Monographs on Mathematical Model- decomposition. In contrast, more general total ing and Computation, vol. 11, SIAM, Philadelphia, PA, 2006. least-squares problems like the weighted and [5] R. Adcock, Note on the method of least squares, Analyst 4 (1877) 183–184. structured total least-squares problems require [6] R. Adcock, A problem in least squares, Analyst 5 (1878) numerical optimization methods, which at best find 53–54. a single locally optimal solution. The separation [7] K. Pearson, On lines and planes of closest fit to points in between the global total least-squares problem and space, Philos. Mag. 2 (1901) 559–572. general weighted and structured total least-squares [8] T. Koopmans, Linear Regression Analysis of Economic problems is an important dividing line in the total Time Series, De Erven F. Bohn, 1937. [9] A. Madansky, The fitting of straight lines when both least-squares hierarchy. variables are subject to error, J. Amer. Statist. Assoc. 54 We emphasized the double minimization struc- (1959) 173–205. ture of the total least-squares problems and showed [10] D. York, Least squares fitting of a straight line, Can. J. Phys. how it can be used for deriving efficient solution 44 (1966) 1079–1086. methods. The key step in our approach is the [11] P. Sprent, Models in Regression and Related Topics, Methuen & Co. Ltd., 1969. elimination of the correction by analytically mini- [12] L. Gleser, Estimation in a multivariate ‘‘errors in variables’’ mizing over it. Then the structure of the data and regression model: large sample results, Ann. Statist. 9 (1) weight matrices are exploited for efficient cost (1981) 24–44. function and first derivative evaluation. [13] J. Staar, Concepts for reliable modelling of linear systems with application to on-line identification of multivariable state space descriptions, Ph.D. Thesis, Department EE, K.U. Acknowledgments Leuven, Belgium, 1982. [14] S. Van Huffel, J. Vandewalle, Analysis and solution of the I. Markovsky is a lecturer at the University of nongeneric total least squares problem, SIAM J. Matrix Southampton, UK, and S. Van Huffel is a full Anal. Appl. 9 (1988) 360–372. [15] J. Leuridan, D. De Vis, H. Van Der Auweraer, F. professor at the Katholieke Universiteit Leuven, Lembregts, A comparison of some frequency response Belgium. Our research is supported by Research function measurement techniques, in: Proceedings of the Council KUL: GOA-AMBioRICS, GOA-Mefisto Fourth International Modal Analysis Conference, 1986, 666, Center of Excellence EF/05/006 ‘‘Optimization pp. 908–918. in engineering’’, several PhD/postdoc and fellow [16] M. Levin, Estimation of a system pulse transfer function in the presence of noise, IEEE Trans. Automat. Control 9 grants; Flemish Government: FWO: PhD/postdoc (1964) 229–235. grants, projects, G.0360.05 (EEG signal processing), [17] K. Fernando, H. Nicholson, Identification of linear systems G.0321.06 (numerical tensor techniques), research with input and output noise: the Koopmans–Levin method, communities (ICCoS, ANMMM); IWT: PhD IEE Proc. D 132 (1985) 30–36. Grants; Belgian Federal Science Policy Office [18] P. Stoica, T. So¨derstro¨m, Bias correction in least-squares IUAP P5/22 (‘Dynamical Systems and Control: identification, Int. J. Control 35 (3) (1982) 449–457. [19] R. Kumaresan, D. Tufts, Estimating the angles of arrival of Computation, Identification and Modelling’); EU: multiple plane waves, IEEE Trans. Aerospace Electronic BIOPATTERN, ETUMOUR; HEALTHagents; Systems 19 (1) (1983) 134–139. HPC-EUROPA (RII3-CT-2003-506079), with the [20] E. Dowling, R. Degroat, The equivalence of the total least- support of the European Community—Research squares and minimum norm methods, IEEE Trans. Signal Infrastructure Action under the FP6 ‘‘Structuring Process. 39 (1991) 1891–1892. [21] P. Wentzell, D. Andrews, D. Hamilton, K. Faber, B. the European Research Area’’ Program. Kowalski, Maximum likelihood principle component analy- sis, J. Chemometrics 11 (1997) 339–366. [22] M. Schuermans, I. Markovsky, P. Wentzell, S. Van Huffel, References On the equivalence between total least squares and maximum likelihood PCA, Anal. Chim. Acta 544 (2005) [1] G. Golub, Some modified matrix eigenvalue problems, 254–267. SIAM Rev. 15 (1973) 318–344. [23] S. Van Huffel, J. Vandewalle, The Total Least Squares [2] G. Golub, C. Van Loan, An analysis of the total least Problem: Computational Aspects and Analysis, SIAM, squares problem, SIAM J. Numer. Anal. 17 (1980) 883–893. Philadelphia, 1991. ARTICLE IN PRESS I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302 2301

[24] S. Van Huffel (Ed.), Recent Advances in Total Least [44] A. Beck, A. Ben-Tal, A global solution for the structured Squares Techniques and Errors-in-Variables Modeling, total least squares problem with block circulant matrices, SIAM, Philadelphia, 1997. SIAM J. Matrix Anal. Appl. 27 (1) (2006) 238–255. [25] S. Van Huffel, P. Lemmerling (Eds.), Total Least Squares and [45] N. Younan, X. Fan, Signal restoration via the regularized Errors-in-variables Modeling: Analysis, Algorithms and constrained total least squares, Signal Processing 71 (1998) Applications, Kluwer Academic Publishers, Dordrecht, 2002. 85–93. [26] C. Paige, Z. Strakos, Core problems in linear algebraic [46] N. Mastronardi, P. Lemmerling, S. Van Huffel, Fast systems, SIAM J. Matrix Anal. Appl. 27 (2005) 861–875. regularized structured total least squares algorithm for [27] W. Fuller, Measurement error Models, Wiley, New York, solving the basic deconvolution problem, Numer. Linear 1987. Algebra Appl. 12 (2–3) (2005) 201–209. [28] R. Degroat, E. Dowling, The data least squares problem and [47] V. Mesarovic´, N. Galatsanos, A. Katsaggelos, Regularized channel equalization, IEEE Trans. Signal Process. 41 (1991) constrained total least squares image restoration, IEEE 407–411. Trans. Image Process. 4 (8) (1995) 1096–1108. [29] S. Van Huffel, J. Vandewalle, Analysis and properties of the [48] M. Ng, R. Plemmons, F. Pimentel, A new approach to generalized total least squares problem AX B when some constrained total least squares image restoration, Linear or all columns in A are subject to error, SIAM J. Matrix Algebra Appl. 316 (1–3) (2000) 237–258. Anal. 10 (3) (1989) 294–315. [49] M. Ng, J. Koo, N. Bose, Constrained total least squares [30] S. Van Huffel, H. Zha, The restricted total least squares computations for high resolution image reconstruction with problem: formulation, algorithm and properties, SIAM multisensors, Int. J. Imaging Systems Technol. 12 (2002) J. Matrix Anal. Appl. 12 (2) (1991) 292–309. 35–42. [31] R. Fierro, G. Golub, P. Hansen, D. O’Leary, Regularization [50] J. Rosen, H. Park, J. Glick, Structured total least norm for by truncated total least squares, SIAM J. Sci. Comput. 18 (1) nonlinear problems, SIAM J. Matrix Anal. Appl. 20 (1) (1997) 1223–1241. (1998) 14–30. [32] G. Golub, P. Hansen, D. O’Leary, [51] P. Lemmerling, S. Van Huffel, B. De Moor, The structured and total least squares, SIAM J. Matrix Anal. Appl. 21 (1) total least squares approach for nonlinearly structured (1999) 185–194. matrices, Numer. Linear Algebra Appl. 9 (1–4) (2002) [33] D. Sima, S. Van Huffel, G. Golub, Regularized total least 321–332. squares based on quadratic eigenvalue problem solvers, BIT [52] M. Mu¨hlich, R. Mester, The role of total least squares in Numer. Math. 44 (2004) 793–812. motion analysis, in: H. Burkhardt (Ed.), Proceedings of the [34] D. Sima, S. Van Huffel, Appropriate cross-validation for Fifth European Conference on Computer Vision, Springer, regularized errors-in-variables linear models, in: Proceedings of Berlin, 1998, pp. 305–321. the COMPSTAT 2004 Symposium, Prague, Czech Republic, [53] A. Pruessner, D. O’Leary, Blind deconvolution using a Physica-Verlag, Springer, Wurzburg, Berlin, August 2004. regularized structured total least norm algorithm, SIAM J. [35] A. Beck, A. Ben-Tal, On the solution of the Tikhonov Matrix Anal. Appl. 24 (4) (2003) 1018–1037. regularization of the total least squares, SIAM J. Optim. 17 [54] N. Mastronardi, P. Lemmerling, A. Kalsi, D. O’Leary, (1) (2006) 98–118. S. Van Huffel, Implementation of the regularized structured [36] L. El Ghaoui, H. Lebret, Robust solutions to least-squares total least squares algorithms for blind image deblurring, problems with uncertain data, SIAM J. Matrix Anal. Appl. 18 (1997) 1035–1064. Linear Algebra Appl. 391 (2004) 203–221. [37] S. Chandrasekaran, G. Golub, M. Gu, A. Sayed, Parameter [55] H. Fu, J. Barlow, A regularized structured total least squares estimation in the presence of bounded data uncertainties, algorithm for high-resolution image reconstruction, Linear SIAM J. Matrix Anal. Appl. 19 (1998) 235–252. Algebra Appl. 391 (1) (2004) 75–98. [38] A. Kukush, S. Van Huffel, Consistency of elementwise- [56] P. Lemmerling, N. Mastronardi, S. Van Huffel, Efficient weighted total least squares estimator in a multivariate implementation of a structured total least squares based errors-in-variables model AX ¼ B, Metrika 59 (1) (2004) speech compression method, Linear Algebra Appl. 366 75–97. (2003) 295–315. [39] K. Arun, A unitarily constrained total least-squares problem [57] K. Hermus, W. Verhelst, P. Lemmerling, P. Wambacq, in signal-processing, SIAM J. Matrix Anal. Appl. 13 (1992) S. Van Huffel, Perceptual audio modeling with exponentially 729–745. damped sinusoids, Signal Processing 85 (2005) 163–176. [40] G. Golub, C. Van Loan, Matrix Computations, third ed., [58] P. Verboven, P. Guillaume, B. Cauberghe, E. Parloo, Johns Hopkins University Press, Baltimore, MD, 1996. S. Vanlanduit, Frequency-domain generalized total least- [41] T. Abatzoglou, J. Mendel, G. Harada, The constrained total squares identification for modal analysis, J. Sound Vib. 278 least squares technique and its application to harmonic (1–2) (2004) 21–38. superresolution, IEEE Trans. Signal Process. 39 (1991) [59] A. Yeredor, Multiple delays estimation for chirp signals 1070–1087. using structured total least squares, Linear Algebra Appl. [42] B. De Moor, Structured total least squares and L2 391 (2004) 261–286. approximation problems, Linear Algebra Appl. 188–189 [60] B. De Moor, J. David, Total and the (1993) 163–207. algebraic Riccati equation, Control Lett. 18 (5) (1992) [43] A. Kukush, I. Markovsky, S. Van Huffel, Consistency of the 329–337. structured total least squares estimator in a multivariate [61] B. De Moor, Total least squares for affinely structured errors-in-variables model, J. Statist. Plann. Inference 133 (2) matrices and the noisy realization problem, IEEE Trans. (2005) 315–358. Signal Process. 42 (11) (1994) 3104–3113. ARTICLE IN PRESS 2302 I. Markovsky, S. Van Huffel / Signal Processing 87 (2007) 2283–2302

[62] B. Roorda, C. Heij, Global total least squares modeling of [79] J.C. Willems, P. Rapisarda, I. Markovsky, B. De Moor, A multivariate time series, IEEE Trans. Automat. Control 40 note on persistency of excitation, Control Lett. 54 (4) (2005) (1) (1995) 50–63. 325–329. [63] P. Lemmerling, B. De Moor, Misfit versus latency, Auto- [80] M. Aoki, P. Yue, On a priori error estimates of some matica 37 (2001) 2057–2067. identification methods, IEEE Trans. Automat. Control 15 [64] R. Pintelon, P. Guillaume, G. Vandersteen, Y. Rolain, (5) (1970) 541–548. Analyses, development, and applications of TLS algorithms [81] J. Cadzow, Signal enhancement—a composite property in frequency domain system identification, SIAM J. Matrix mapping algorithm, IEEE Trans. Signal Process. 36 (1988) Anal. Appl. 19 (4) (1998) 983–1004. 49–62. [65] I. Markovsky, J.C. Willems, S. Van Huffel, B. De Moor, [82] Y. Bresler, A. Macovski, Exact maximum likelihood R. Pintelon, Application of structured total least squares for parameter estimation of superimposed exponential signals system identification and model reduction, IEEE Trans. in noise, IEEE Trans. Acoust. Speech Signal Process. 34 Automat. Control 50 (10) (2005) 1490–1500. (1986) 1081–1089. [66] R. Branham, Multivariate orthogonal regression in [83] D. Tufts, A. Shah, Estimation of a signal waveform from astronomy, Celestial Mech. Dyn. Astron. 61 (3) (1995) noisy data using low-rank approximation to a data matrix, 239–251. IEEE Trans. Signal Process. 41 (4) (1993) 1716–1721. [67] T. So¨derstro¨m, Errors-in-variables methods in system [84] D. Tufts, R. Kumaresan, Estimation of frequencies of identification, in: 14th IFAC Symposium on System multiple sinusoids: making linear prediction perform like Identification, Newcastle, Australia, March 2006, pp. 29–41. maximum likelihood, Proc. IEEE 70 (9) (1982) 975–989. [68] T. Laudadio, N. Mastronardi, L. Vanhamme, P. Van Hecke, [85] R. Kumaresan, D. Tufts, Estimating the parameters of S. Van Huffel, Improved Lanczos algorithms for blackbox exponentially damped sinusoids and pole-zero modeling in MRS data quantitation, J. Magn. Res. 157 (2002) noise, IEEE Trans. Acoust. Speech Signal Process. 30 (6) 292–297. (1982) 833–840. [69] T. Laudadio, Y. Selen, L. Vanhamme, P. Stoica, P. Van [86] J. Rosen, H. Park, J. Glick, Total least norm formulation Hecke, S. Van Huffel, Subspace-based MRS data quantita- and solution of structured problems, SIAM J. Matrix Anal. tion of multiplets using prior knowledge, J. Magn. Res. 168 Appl. 17 (1996) 110–126. (2004) 53–65. [87] M. Schuermans, P. Lemmerling, S. Van Huffel, Structured [70] R. Fierro, E. Jiang, Lanczos and the Riemannian SVD in weighted low rank approximation, Numer. Linear Algebra information retrieval applications, Numer. Linear Algebra Appl. 11 (2004) 609–618. Appl. 12 (2005) 355–372. [88] M. Schuermans, P. Lemmerling, S. Van Huffel, Block-row [71] M. Schuermans, P. Lemmerling, L. De Lathauwer, S. Van hankel weighted low rank approximation, Numer. Linear Huffel, The use of total least squares data fitting in the Algebra Appl. 13 (2006) 293–302. shape from moments problem, Signal Process. 86 (2006) [89] S. Van Huffel, H. Park, J. Rosen, Formulation and solution 1109–1115. of structured total least norm problems for parameter [72] L. Zhi, Z. Yang, Computing approximate GCD of estimation, IEEE Trans. Signal Process. 44 (10) (1996) univariate polynomials by structure total least norm, in: 2464–2474. MM Research Preprints, vol. 24, Acad. Sin. (2004), [90] P. Lemmerling, N. Mastronardi, S. Van Huffel, Fast pp. 375–387. algorithm for solving the Hankel/Toeplitz structured total [73] I. Markovsky, S. Van Huffel, An algorithm for approximate least squares problem, Numer. Algorithms 23 (2000) common divisor computation, in: Proceedings of the 17th 371–392. Symposium on Mathematical Theory of Networks and [91] N. Mastronardi, P. Lemmerling, S. Van Huffel, Fast Systems, Kyoto, Japan, 2006, pp. 274–279. structured total least squares algorithm for solving the basic [74] G. Eckart, G. Young, The approximation of one matrix by deconvolution problem, SIAM J. Matrix Anal. Appl. 22 another of lower rank, Psychometrika 1 (1936) 211–218. (2000) 533–553. [75] S. Van Huffel, J. Vandewalle, The total least squares [92] N. Mastronardi, Fast and reliable algorithms for structured technique: computation, properties and applications, in: total least squares and related matrix problems, Ph.D. E. Deprettere (Ed.), SVD and Signal Processing: Algo- Thesis, ESAT/SISTA, K.U. Leuven, 2001. rithms, Applications and Architectures, Elsevier, Amster- [93] I. Markovsky, S. Van Huffel, R. Pintelon, Block-Toeplitz/ dam, 1988, pp. 189–207. Hankel structured total least squares, SIAM J. Matrix Anal. [76] A. Premoli, M.-L. Rastello, The parametric quadratic form Appl. 26 (4) (2005) 1083–1099. method for solving TLS problems with elementwise weight- [94] I. Markovsky, S. Van Huffel, A. Kukush, On the computa- ing, in: S. Van Huffel, P. Lemmerling (Eds.), Total Least tion of the structured total least squares estimator, Numer. Squares Techniques and Errors-in-Variables Modeling: Linear Algebra Appl. 11 (2004) 591–608. Analysis, Algorithms and Applications, Kluwer Academic [95] I. Markovsky, S. Van Huffel, On weighted structured total Publishers, Dordrecht, 2002, pp. 67–76. least squares, in: I. Lirkov, S. Margenov, J. Was´niewski [77] I. Markovsky, M.-L. Rastello, A. Premoli, A. Kukush, (Eds.), Large-Scale Scientific Computing, Lecture Notes S. Van Huffel, The element-wise weighted total least squares in Computer Science, vol. 3743, Springer, Berlin, 2006, problem, Comput. Statist. Data Anal. 50 (1) (2005) 181–209. pp. 695–702. [78] J. Manton, R. Mahony, Y. Hua, The geometry of weighted [96] I. Markovsky, S. Van Huffel, High-performance numerical low-rank approximations, IEEE Trans. Signal Process. 51 algorithms and software for structured total least squares, J. (2) (2003) 500–514. Comput. Appl. Math. 180 (2) (2005) 311–331.