Chapter 3 LEAST SQUARES PROBLEMS

Chapter 3 LEAST SQUARES PROBLEMS E B D C A the sea F One application is geodesy & surveying. Let z = elevation, and suppose we have the mea- surements: zA ≈ 1., zB ≈ 2., zC ≈ 3., zB − zA ≈ 1., zC − zB ≈ 2., zC − zA ≈ 1. 71 This is overdetermined and inconsistent: 0 1 0 1 100 1. B C 0 1 B . C B 010C z B 2 C B C A B . C B 001C @ z A ≈ B 3 C . B − C B B . C B 110C z B 1 C @ 0 −11A C @ 2. A −101 1. Another application is fitting a curve to given data: y y=ax+b x 2 3 2 3 x1 1 y1 6 7 6 7 6 x2 1 7 a 6 y2 7 6 . 7 ≈ 6 . 7 . 4 . 5 b 4 . 5 xn 1 yn More generally Am×nxn ≈ bm where A is known exactly, m ≥ n,andb is subject to independent random errors of equal variance (which can be achieved by scaling the equations). Gauss (1821) proved that the “best” solution x minimizes kb − Axk2, i.e., the sum of the squares of the residual components. Hence, we might write Ax ≈2 b. Even if the 2-norm is inappropriate, it is the easiest to work with. 3.1 The Normal Equations 3.2 QR Factorization 3.3 Householder Reflections 3.4 Givens Rotations 3.5 Gram-Schmidt Orthogonalization 3.6 Singular Value Decomposition 72 3.1 The Normal Equations Recall the inner product xTy for x, y ∈ Rm. (What is the geometric interpretation of the 3 m T inner product in R ?) A sequence of vectors x1,x2,...,xn in R is orthogonal if xi xj =0 T if and only if i =6 j and orthonormal if xi xj = δij.Twosubspaces S, T are orthogonal if T T x ∈ S, y ∈ T ⇒ x y =0.IfX =[x1,x2,...,xn], then orthonormal means X X = I. Exercise. Define what it means for a set (rather than a multiset) of vectors to be orthogonal. An orthogonal matrix Q satisfies QT = Q−1. In 2D or in 3D it represents a reflection and/or rotation. The problem min kb − Axk2 ⇔ min kb − (x1a1 + ···+ xnan)k2 x x1,...,xn where A =[a1,...,an] : find a linear combination of the columns of A which is nearest b. b r Ax R(A) Here R(A)=columnspaceofA. The best approximation is when r⊥R(A). Hence the best approximation Ax = orthogonal projection of b onto R(A), i.e., r⊥aj, j =1, 2,...,n normal ⇔ ATr =0⇔ AT(b − Ax)=0⇔ (ATA)x = ATb . Clearly x is unique ⇔ equations columns of A are linearly independent. Otherwise, there are infinitely many solutions. The least squares problem is sometimes written IA r b = . AT 0 x 0 73 analytical argument Let rank(A)=n. Therefore ATA is nonsingular. Let x satisfy ATr =0wherer = b − Ax. Then for any other value x + w 2 2 T T T T T kb − A(x + w)k2 = kr − Awk2 = r r − 2w A r + w A Aw 2 2 = kb − Axk2 + kAwk2. Hence, a solution of the normal equations is a solution of the least squares problem. The solution x is a unique minimum if rank(A)=n because then kwk2 =0⇒ w =0. r−Aw b r Aw Ax R(A) the pseudo-inverse Assume rank(A)=n. Then x =(ATA)−1ATb. We call (ATA)−1AT = A† the pseudo-inverse and we write x = A†b. (The definition can be extended to the case rank(A) <n.) For a column vector v the pseudo- † T 2 † inverse v = v /kvk2. The product A A = In×n: = . What about AA†? This is an orthogonal projector. 74 orthogonal projector vvT The product = vv† produces an orthogonal projection of a vector onto span{v} because vTv 1. vv†x ∈ span{v}, 2. x − vv†x⊥ span {v}. vvT x x v Alternatively, vTv is the closest vector to that is some multiple of . x v T vv x v Tv Similarly AA† = A(ATA)−1AT, rank (A)=n ≤ m, produces an orthogonal projection onto R(A), because AA†b ∈R(A)(sinceAA†b = A(A†b)). For any b b − AA†b⊥R(A) since AT(b − AA†b)=0. DEFN A matrix P is an orthogonal projector if for any x x − Px⊥R(P ) ⇔∀x (x − Px)TP =0 ⇔ P TP = P z }| { P T PP2 P ⇔ = = symmetry idempotence If S is a subspace of Rn,thenitsorthogonal complement S⊥ := {x|xTy = 0 for all y ∈ S}.) Every v ∈ Rn has a unique decomposition v = x + y, x ∈ S, y ∈ S⊥. solving the normal equations The direct approach to solving (ATA)x = ATb is to 1. form ATA, 75 2. use Cholesky to get factorization GGT, 3. set x = G−T(G−1(ATb)). Example With 4-digit round-to-even floating-point arithmetic 2 3 11.02 T 33.02 33.02 A = 4 11 5 ,AA = → . 3.02 3.0404 3.02 3.04 11 The result is not even positive definite. The problem is that (it can be shown that) T 2 † κ2(A A)=κ2(A) where κ2(A)=kA k2kAk2 . Any roundoff made in forming ATA will have a very significant effect if A is ill-conditioned. The solution is to do entire computation in double precision or look for another algorithm. Review questions 1. Define a least squares solution to an overdetermined system Ax ≈ b using matrix notation. 2. For a least squares solution to an overdetermined system Ax ≈ b,howshouldthe “equations” be scaled? 3. Define an orthogonal sequence? an orthonormal sequence? 4. How is a subspace represented computationally? 5. What does it mean for two subspaces to be orthogonal? 6. What is the orthogonal complement of a subspace? 7. What is the null space of a matrix? 8. Give an alternative expression for R(A)⊥ which is more useful computationally. 9. Give a geometric interpretation of a linear least squares problem Ax ≈ b. 10. Give necessary and sufficient condition for existence of a solution to a linear least squares problem Ax ≈ b. for existence of a unique solution. 76 11. What are the normal equations for a linear least squares problem Ax ≈ b? 12. Express the linear least squares problem Ax ≈ b as a system of m + n equations in m + n unknowns where m and n are the dimensions of A. 13. If x satisfies the normal equations, show that no other vector can be a better solution to the least squares problem. 14. If y0 is the orthogonal projection of y onto a subspace S, what two conditions does y0 satisfy? vvT 15. What does do? vTv 16. Give a formula for the orthogonal projector onto a subspace S in terms of a basis a1, a2, ..., an for S. 17. What two simple conditions must a matrix P satisfy for it to be an orthogonal projector? 18. What is an oblique projector? 19. What does idempotent mean? 20. Why is the explicit use of the normal equations undesirable computationally? 21. If we do use the normal equations, what method is used for the matrix factorization? Exercises 1. What can you say about a nonsingular orthogonal projector? 2. (a) What is the orthogonal complement of R(A)where 2 3 11.001 A = 4 115? 11 (b) What is the projector onto the orthogonal complement of R(A)? (c) What is the projector onto R(A)? 3. Construct the orthogonal projector onto the plane through (0,0,0), (1,1,0), (1,0,1). 4. Assuming u and v are nonzero column vectors, show that I + uvT is orthogonal only if u = −2v/(vTv). 77 5. Assume that A ∈ Rm×n has linearly independent columns. Hence ATA has a (unique) Cholesky factorization. Prove that there exists a factorization A = Q1R1 where Q1 ∈ m×n n×n R has columns forming an orthonormal set and R1 ∈ R is an upper triangular matrix. What is the solution of the least squares problem Ax ≈ b in terms of Q1,R1, and b? 6. Let x∗ be a least squares solution to an overdetermined system Ax ≈ b. What is the geometric interpretation of Ax∗? When is Ax∗ unique? When is x∗ unique? 7. Show that an upper triangular orthogonal matrix is diagonal. What are the diagonal elements? 8. Suppose that the matrix AB 0 C is orthogonal where A and C are square submatrices. Prove in a logical, deductive fashion that B = 0. (Hint: there is no need to have a formula for the inverse of a 2 × 2 block upper triangular matrix, and there is no need to consider individual elements, columns, or rows or A, B,orC.) 9. Show that kQxk2 = kxk2 if Q is orthogonal. 10. Show that if Q is an m by m orthogonal matrix and A is an m by n matrix, then kQAk2 = kAk2. 11. Assume Qqˆ 0T ρ is an orthogonal matrix where Qˆ is square and ρ is a scalar. What can we say about Qˆ, q,andρ? (The answer should be simplified.) 12. (a) What is the orthogonal projector for span{v1,v2} where v1, v2 are linearly independent real vectors? (b) How does this simplify if v2 is orthogonal to v1? In pariticular, what is the usual way of expressing an orthogonal projector in this case? 3.2 QR Factorization The least squares problem min kb − Axk2 is simplified if A is reduced to upper triangular x form by means of orthogonal transformations: QTA = R right triangular 78 m m m n m n Then kb − Axk kQT b − Ax k 2 = ( ) 2 see exercise 3.1.9 T T c n Rˆ n = kQ b − Rxk2 partition Q b = ,R= d m − n 0 m − n c Rˆ = k − xk2 d 0 q c − Rxˆ k k kc − Rxˆ k2 kdk2 .

Chapter 3 LEAST SQUARES PROBLEMS

2 Homework Solutions 18.335 " Fall 2004

A Fast Algorithm for the Recursive Calculation of Dominant Singular Subspaces N

Numerical Linear Algebra Revised February 15, 2010 4.1 the LU

Massively Parallel Poisson and QR Factorization Solvers

The Sparse Matrix Transform for Covariance Estimation and Analysis of High Dimensional Signals

SVD TR1690.Pdf

Discontinuous Plane Rotations and the Symmetric Eigenvalue Problem

CORDIC-Based LMMSE Equalizer for Software Defined Radio

On Computing Givens Rotations Reliably and Efficiently

The Holy Grail of Adam's Ale Locating Aquifers Through Geostatistic Modeling

Restructuring the QR Algorithm for High-Performance Application of Givens Rotations

Jacobi Methods