Linear Least Squares Problems

Chapter 8 Linear Least Squares Problems Of all the principles that can be proposed, I think there is none more general, more exact, and more easy of application than that which consists of rendering the sum of squares of the errors a minimum. —Adrien Maria Legendre, Nouvelles méthodes pour la détermination des orbites des cométes. Paris 1805 8.1 Preliminaries 8.1.1 The Least Squares Principle A fundamental task in scientific computing is to estimate parameters in a math- ematical model from collected data which are subject to errors. The influence of the errors can be reduced by using a greater number of data than the number of unknowns. If the model is linear, the resulting problem is then to “solve” an in m n general inconsistent linear system Ax = b, where A R × and m n. In other words, we want to find a vector x Rn such that Ax∈ is in some sense≥ the “best” approximation to the known vector∈b Rm. There are many possible ways of defining∈ the “best” solution to an inconsistent linear system. A choice which can often be motivated for statistical reasons (see Theorem 8.1.6) and leads also to a simple computational problem is the following: Let x be a vector which minimizes the Euclidian length of the residual vector r = b Ax; i.e., a solution to the minimization problem − min Ax b 2, (8.1.1) x k − k where 2 denotes the Euclidian vector norm. Note that this problem is equivalent k·k m 2 to minimizing the sum of squares of the residuals i=1 ri . Hence, we call (8.1.1) a linear least squares problem and any minimizer x a least squares solution of the system Ax = b. P 189 190 Chapter 8. Linear Least Squares Problems Example 8.1.1. Consider a model described by a scalar function y(t) = f(x,t), where x Rn ∈ is a parameter vector to be determined from measurements (yi,ti), i = 1 : m, m > n. In particular, let f(x,t) be linear in x, n f(x,t) = xjφj(t). j=1 X n Then the equations yi = j=1 xjφj(ti), i = 1 : m form an overdetermined system, which can be written in matrix form Ax = b, where aij = φj(ti), and bi = yi. P 6 b b − Ax R(A) 1 Ax Figure 8.1.1. Geometric characterization of the least squares solution. We shall see that a least squares solution x is characterized by r (A), where (A) the range space of A. The residual vector r is always uniquely⊥ R determined andR the solution x is unique if and only if rank (A) = n, i.e., when A has linearly independent columns. If rank (A) < n, we seek the unique least squares solution of minimum Euclidean norm. We now show a necessary condition for a vector x to minimize b Ax . k − k2 Theorem 8.1.1. m n m Given the matrix A R × and a vector b R . The vector x minimizes ∈ ∈ b Ax 2 if and only if the residual vector r = b Ax is orthogonal to (A), i.e. Ak T−(b Axk ) = 0, or equivalently x satisfies the normal− equations. R − ATAx = AT b (8.1.2) Proof. Let x be a vector for which AT (b Ax) = 0. Then for any y Rn b Ay = (b Ax) + A(x y). Squaring this− and using (8.1.2) we obtain∈ − − − b Ay 2 = b Ax 2 + A(x y) 2 b Ax 2. k − k2 k − k2 k − k2 ≥ k − k2 On the other hand assume that AT (b Ax) = z = 0. Then if x y = ǫz we have for sufficiently small ǫ = 0, − 6 − − 6 b Ay 2 = b Ax 2 2ǫ z 2 + ǫ2 Az 2 < b Ax 2 k − k2 k − k2 − k k2 k k2 k − k2 8.1. Preliminaries 191 so x does not minimize b Ax . k − k2 T n n The matrix A A R × is symmetric and positive semidefinite since ∈ xT ATAx = Ax 2 0. k k2 ≥ The normal equations ATAx = AT b are always consistent since AT b (AT ) = (ATA) and therefore a least squares solution always exists. Any solution∈ R to the normalR equations is a least squares solution. By Theorem 8.1.1 any least squares solution x will decompose the right hand side b into two orthogonal components b = Ax + r, r Ax. (8.1.3) ⊥ Here Ax = b r = P (A)b is the orthogonal projection of b (see Section 8.1.3) onto (A) and r − (ATR) (cf. Figure 8.1.1). Note that although the least squares solutionR x may∈ not N be unique the decomposition in (8.1.3) always is unique. We now introduce a related problem. Suppose that the vector y Rm is required to satisfy exactly n<m linearly independent equations AT y =∈ c. We want to find the minimum norm solution, i.e. to solve the problem min y subject to AT y = c. (8.1.4) k k2 T Let y be any solution of A y = c, and write y = y1 + y2, where y1 (A). T T ∈ R y2 (A ). Then A y2 = 0 and hence y1 is also a solution. Since y1 y2 we have∈ N ⊥ y 2 = y 2 y 2 y 2, k 1k2 k k2 − k 2k2 ≤ k k2 with equality only if y2 = 0. This shows that the minimum norm solution lies in (A), i.e., y = Az for some z Rn, Substituting this in (8.1.4) gives the normal equationsR ATAz = c, Since A has∈ full column rank the matrix ATA is nonsingular and the solution is given by T 1 y = A(A A)− c (8.1.5) A slightly more general problem is the conditional least squares problem T min y b 2 subject to A y = c. (8.1.6) y k − k By a similar argument as used above the solution satisfies y b (A). Setting y = b Az, and substituting in AT y = c we find that z satisfies− the∈ R equation − ATAz = AT b c. (8.1.7) − Hence, the unique solution to problem (8.1.6) is T 1 T T 1 y = (I A(A A)− A )b + A(A A)− c. (8.1.8) − T 1 T T where P (AT ) = I A(A A)− A is the orthogonal projection onto (A ). N − N 192 Chapter 8. Linear Least Squares Problems Example 8.1.2. The height hk = h(tk) of a falling body is measured at times tk = t0 + k∆t, k = 1 : m. The adjusted values hˆk = hk yk should lie on a parabola, that is, the third differences must vanish, This leads− to the problem minimizing T min y h 2 subject to A y = 0 y k − k where (m = 7) 1 3 3 1 0 0 0 0− 1 3− 3 1 0 0 AT = . 0 0− 1 3− 3 1 0 − − 0 0 0 1 3 3 1 − − which is a conditional least squares problem. The solution to the standard linear least squares problem minx Ax b 2 is characterized by the two conditions AT r = 0 and r = b Ax. Thesek are−n k+ m equations − I A r b = . (8.1.9) AT 0 x 0 for the unknowns x and the residual r. This special case of is often called the augmented system for the least squares problem. The following theorem gives a unified formulation of the least squares and conditional least squares problems in terms of an augmented system. Theorem 8.1.2. m n Let the matrix if A R × have full column rank and consider the symmetric linear system ∈ I A y b = , (8.1.10) AT 0 x c Then the system is nonsingular and gives the first order conditions for the following two optimization problems: 1. Linear least squares problems 1 2 T min Ax b 2 + c x. (8.1.11) x 2 k − k 2. Conditional least squares problem 1 T min y b 2, subject to A y = c, (8.1.12) r 2 k − k Proof. The system (8.1.10) can be obtained by differentiating (8.1.11) to give AT (Ax b) + c = 0, − 8.1. Preliminaries 193 and setting y = r = b Ax. The system can− also be obtained by differentiating the Lagrangian 1 L(x,y) = yT y yT b + xT (AT y c) 2 − − of (8.1.12), and equating to zero. Here x is the vector of Lagrange multipliers . The augmented system plays a key role in the perturbation analysis of least squares problems (Section 8.2.3) as well as in the iterative refinement of least squares solutions (Section 8.3.7) 8.1.2 The Gauss–Markov Model Gauss claims he discovered the method of least squares in 1795. He used it for analyzing surveying data and for astronomical calculation. A famous example is when Gauss successfully predicted the orbit of the asteroid Ceres in 1801. Gauss [232] in 1821 put the method of least squares on a sound theoretical basis. To describe his results we first need to introduce some concepts from statis- tics. Let the probability that random variable y x be equal to F (x), where F (x) is nondecreasing, right continuous, and satisfies ≤ 0 F (x) 1, F ( ) = 0, F ( ) = 1. ≤ ≤ −∞ ∞ Then F (x) is called the distribution function for y. The expected value and the variance of y are defined as the Stieltjes inte- grals ∞ ∞ (y) = µ = ydF (y), (y µ)2 = σ2 = (y µ)2dF (y), E E − − Z−∞ Z−∞ T T If y = (y1,...,yn) is a vector of random variables and µ = (µ1, .

Load more