Total Least Squares Approach in Regression Methods M
Total Page:16
File Type:pdf, Size:1020Kb
ISBN 978-80-7378-065-4 © MATFYZPRESS WDS'08 Proceedings of Contributed Papers, Part I, 88–93, 2008. Total Least Squares Approach in Regression Methods M. Peˇsta Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic. Abstract. Total least squares (TLS) is a data modelling technique which can be used for many types of statistical analysis, e.g. a regression. In the regression setup, both dependent and independent variables are considered to be measured with errors. Thereby, the TLS approach in statistics is sometimes called an errors-in- variables (EIV) modelling and, moreover, this type of regression is usually known as an orthogonal regression. We take an EIV regression model into account. Necessary algebraic tools are introduced in order to construct the TLS estimator. A comparison with the classical ordinary least squares estimator is illustrated. Consequently, the existence and uniqueness of the TLS estimator are discussed. Finally, we show the large sample properties of the TLS estimator, i.e. a strong and weak consistency, and an asymptotic distribution. Introduction Observing several characteristics—may be thought as variables—straightforwardly postulates a nat- ural question: “What is the relationship between these measured characteristics?” One of many possible attitudes can arise, that some of the characteristics might be explained by a (functional) dependence on the other characteristics. Therefore, we consider the first mentioned variables as dependent or response and the second ones as independent or explanatory. Our proposed model of dependence contains errors in the response variable (we think only of one dependent variable) and in the explanatory variables as well. But firstly, we just try to find an appropriate fit for some points in the Euclidean space using a hyperplane, i.e. approximating several incompatible linear relations. Afterwards, some properties for the measurement errors are added and, hence, several statistical asymptotical qualities are developed. Overdetermined System Let us consider the overdetermined system of linear relations n n×m Ê y Xβ, y Ê , X , n > m. (1) ≈ ∈ ∈ Relations in (1) are deliberately not denoted as equations, because in many cases, the exact solution need not exist. Thereby, only an approximation can be found. Hence, one can speak about the “best” solution of the overdetermined system (1). But the “best” in which way? Singular Value Decomposition Before inquiring into an appropriate solution of (1), we should introduce some very important tools for further exploration. n×m Theorem (Singular Value Decomposition – SVD) If A Ê then there exist orthonormal matrices n×n m×m∈ Ê U = [u1,..., un] Ê and V = [v1,..., vm] such that ∈ ∈ ⊤ n×m U AV = Σ = diag σ1, . , σp Ê , σ1 ... σp 0, and p = min n, m . (2) { } ∈ ≥ ≥ ≥ { } Proof. See Golub and Van Loan [1996]. In SVD, the diagonal matrix Σ is uniquely determined by A (though the matrices U and V are not). Previous powerful matrix decomposition allows us to define a cutting point r for a given matrix n×m A Ê using its singular values σi ∈ σ1 ... σr > σr+1 = ... = σp = 0, p = min n, m . ≥ ≥ { } 88 PESTA:ˇ TOTAL LEAST SQUARES APPROACH IN REGRESSION METHODS Since the matrices U and V in (2) are orthonormal, it yields rank(A) = r and one may obtain a dyadic decomposition (expansion) of the matrix A: r ⊤ A = σiuivi . (3) Xi=1 A suitable matrix norm is also required and, hence, the for matrix (a )n,m Frobenius norm A ij i,j=1 is defined as follows ≡ n m p r 2 ⊤ 2 2 A F := v aij = tr(A A) = v σi = v σi , p = min n, m . (4) k k u u u { } uXi=1 Xj=1 q uXi=1 uXi=1 t t t Furthermore, the following approximation theorem plays the main role in the forthcoming derivation, where a matrix is approximated with another one with lower rank. n×m Theorem (Eckart-Young-Mirsky Matrix Approximation) Let the SVD of A Ê be given by A = r σ ⊤ rank( ) = r k < r = k σ ⊤ ∈ i=1 iuivi with A . If and Ak i=1 iuivi , P P r 2 min A B F = A Ak F = v σi . (5) rank(B)=k k − k k − k u ui=Xk+1 t Proof. See Eckart and Young [1936] and Mirsky [1960]. Above all, one more technical property needs to be incorporated. n×m Theorem (Sturm Interlacing Property) Let n m and the singular values of A Ê are σ1 ... ≥ ∈ ′ ′ ≥ ≥ σm. If B results from A by deleting one column of A and B has singular values σ ... σm , then 1 ≥ ≥ −1 ′ ′ ′ σ1 σ σ2 σ ... σm σm 0. (6) ≥ 1 ≥ ≥ 2 ≥ ≥ −1 ≥ ≥ Proof. See Thompson [1972]. Total Least Squares Solution Now, three basic approximation ways of the overdetermined system (1) are suggested. The tradi- tional approach penalizes only the misfit in the dependent variable part min ǫ s.t. y + ǫ = Xβ (7) n m 2 Ê ǫ∈ Ê ,β∈ k k and is called the ordinary least squares (OLS). Here, the data matrix X is thought as exactly known and errors occur only in the vector y. An opposite case to the OLS is represented by the data least squares (DLS), which allow corrections only in the explanatory variables (independent input data) min Θ s.t. y = (X + Θ)β. (8) n×m m F Ê Θ∈ Ê ,β∈ k k Finally, we concentrate ourselves on the total least squares approach minimizing the squares of errors in the values of both dependent and independent variables min [ε, Ξ] F s.t. y + ε = (X + Ξ)β. (9) n×(m+1) m Ê [ε,Ξ]∈ Ê ,β∈ k k A graphical illustration of three previous cases can be found in Figure 1. One may notice that the TLS “search” for the orthogonal projection of the observed data onto the unknown approximation corresponding to a TLS solution. Once a minimizing [εˆ, Ξˆ ] of the TLS problem (9) is found, then any β satisfying y+εˆ = (X+Ξˆ )β is called a TLS solution. The “basic” form of the TLS solution was investigated for the first time by Golub and Van Loan [1980]. 89 PESTA:ˇ TOTAL LEAST SQUARES APPROACH IN REGRESSION METHODS OLS DLS 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 0 1 2 3 4 5 TLS Various Least Squares Fit 5 5 OLS 4 4 DLS TLS 3 3 2 2 1 1 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Figure 1. Various least squares fits (ordinary, data, and total LS) for the same three data points in the two-dimensional plane that coincides with the regression setup of one response and one explanatory variable. n×m m ′ ′ ′⊤ (TLS Solution of ) Ê = σ Theorem y Xβ Let the SVD of X be given by X i=1 iuivi and [ , ] = m+1 σ ≈⊤ σ′ > σ ∈ the SVD of y X i=1 iuivi . If m m+1, then P P ˆ ˆ ˆ ⊤ ˆ [yˆ, X] := [y + εˆ, X + Ξ] = UΣV and Σ = diag σ1, . , σm, 0 (10) { } with the corresponding TLS correction matrix ˆ ⊤ [εˆ, Ξ] = σm+1um+1vm+1 (11) solves the TLS problem and 1 ˆ ⊤ = [v ,m , . , vm ,m ] (12) β ⊤ 2 +1 +1 +1 −e1 vm+1 exists and is the unique solution to yˆ = Xˆ β. ⊤ Proof. Proof by contradiction, we firstly show that e1 vm+1 = 0. Suppose v1,m+1 = 0, then there exist m 6 0 = w Ê such that 6 ∈ 0 0, w⊤ [y, X]⊤ [y, X] = σ2 w m+1 ⊤ ⊤ 2 ′ which yields into w X Xw = σm+1. But this is a contradiction with the assumption σm > σm+1, since ′2 ⊤ σm is the smallest eigenvalue of X X. ′ Sturm interlacing theorem (6) and the assumption σm > σm+1 yield σm > σm+1. Therefore, σm+1 is not a repeated singular value of [y, X] and σm > 0. ˆ ˆ If σm+1 = 0, then rank[y, X] = m+1. We want to find [yˆ, X] such that [y, X] [yˆ, X] F is minimal 6 k − k and [yˆ, Xˆ ][ 1, β⊤]⊤ = 0 for some β. Therefore, rank([yˆ, Xˆ ]) = m and applying Eckart-Young-Mirsky − 90 PESTA:ˇ TOTAL LEAST SQUARES APPROACH IN REGRESSION METHODS theorem (5), one may easily obtain the SVD of [yˆ, Xˆ ] in (10) and the TLS correction matrix (11), which must have rank one. Now, it is clear that the TLS solution is given by the last column of V. Finally, since dim Ker([yˆ, Xˆ ]) = 1, then the TLS solution (12) must be unique. ⊤ ⊤ If σm+1 = 0, then vm+1 Ker([y, X]) and [y, X][ 1, β ] = 0. Hence, no approximation is needed, overdetermined system (1) is∈ compatible, and the exact− TLS solution is given by (12). Uniqueness of this TLS solution follows from the fact that [ 1, β⊤]⊤ Range([y, X]⊤). − ⊥ ′ A closed-form expression of the TLS solution (12) can be derived. If σm > σm+1, the existence and uniqueness of the TLS solution has already been shown. Thereby, since singular vectors vi, i.e. from (10), are eigenvectors of [y, X]⊤ [y, X], then βˆ also satisfies ⊤ ⊤ ⊤ 1 y y y X 1 2 1 [y, X] [y, X] − = − = σm − βˆ X⊤y X⊤X βˆ +1 βˆ and, hence, ˆ ⊤ 2 −1 ⊤ β = (X X σm Im) X y. (13) − +1 Previous equation reminds us a form of an estimator in the ridge regression setup. Therefore, one may expect avoiding multicollinearity problems with classical OLS regression (7), due to the ridge regression and the TLS “orthogonal” regression correspondence.