Chapter 3

LEAST SQUARES PROBLEMS

E

B

D

C A the sea F

One application is geodesy & surveying. Let z = elevation, and suppose we have the mea- surements:

zA ≈ 1., zB ≈ 2., zC ≈ 3., zB − zA ≈ 1., zC − zB ≈ 2., zC − zA ≈ 1.

71 This is overdetermined and inconsistent:     100 1.      .   010 z  2    A  .   001  z  ≈  3  .  −  B  .   110 z  1   0 −11 C  2.  −101 1. Another application is fitting a curve to given data: y

y=ax+b

x     x1 1 y1        x2 1  a  y2   . .  ≈  .  .  . .  b  .  xn 1 yn

More generally Am×nxn ≈ bm where A is known exactly, m ≥ n,andb is subject to inde- pendent random errors of equal variance (which can be achieved by scaling the equations). Gauss (1821) proved that the “best” solution x

minimizes kb − Axk2, i.e., the sum of the squares of the residual components. Hence, we might write Ax ≈2 b. Even if the 2-norm is inappropriate, it is the easiest to work with. 3.1 The Normal Equations 3.2 QR Factorization 3.3 Householder Reflections 3.4 Givens Rotations 3.5 Gram-Schmidt Orthogonalization 3.6 Singular Value Decomposition

72 3.1 The Normal Equations

Recall the inner product xTy for x, y ∈ Rm. (What is the geometric interpretation of the 3 m T inner product in R ?) A sequence of vectors x1,x2,...,xn in R is orthogonal if xi xj =0 T if and only if i =6 j and orthonormal if xi xj = δij.Twosubspaces S, T are orthogonal if T T x ∈ S, y ∈ T ⇒ x y =0.IfX =[x1,x2,...,xn], then orthonormal means X X = I. Exercise. Define what it means for a set (rather than a multiset) of vectors to be orthog- onal. An orthogonal Q satisfies QT = Q−1. In 2D or in 3D it represents a reflection and/or . The problem

min kb − Axk2 ⇔ min kb − (x1a1 + ···+ xnan)k2 x x1,...,xn

where A =[a1,...,an] : find a linear combination of the columns of A which is nearest b.

b r

Ax

R(A)

Here R(A)=columnspaceofA. The best approximation is when r⊥R(A). Hence the best approximation Ax = orthogonal projection of b onto R(A), i.e., r⊥aj, j =1, 2,...,n normal ⇔ ATr =0⇔ AT(b − Ax)=0⇔ (ATA)x = ATb . Clearly x is unique ⇔ equations columns of A are linearly independent. Otherwise, there are infinitely many solutions. The least squares problem is sometimes written      IA r b = . AT 0 x 0

73 analytical argument Let rank(A)=n. Therefore ATA is nonsingular. Let x satisfy ATr =0wherer = b − Ax. Then for any other value x + w 2 2 T T T T T kb − A(x + w)k2 = kr − Awk2 = r r − 2w A r + w A Aw 2 2 = kb − Axk2 + kAwk2. Hence, a solution of the normal equations is a solution of the least squares problem. The solution x is a unique minimum if rank(A)=n because then kwk2 =0⇒ w =0.

r−Aw b r

Aw Ax R(A) the pseudo-inverse Assume rank(A)=n. Then x =(ATA)−1ATb. We call (ATA)−1AT = A† the pseudo-inverse and we write x = A†b. (The definition can be extended to the case rank(A)

= .

What about AA†? This is an orthogonal projector.

74 orthogonal projector vvT The product = vv† produces an orthogonal projection of a vector onto span{v} because vTv 1. vv†x ∈ span{v},

2. x − vv†x⊥ span {v}. vvT x x v Alternatively, vTv is the closest vector to that is some multiple of .

x

v T vv x v Tv

Similarly AA† = A(ATA)−1AT, rank (A)=n ≤ m, produces an orthogonal projection onto R(A), because AA†b ∈R(A)(sinceAA†b = A(A†b)). For any b b − AA†b⊥R(A) since AT(b − AA†b)=0. DEFN A matrix P is an orthogonal projector if for any x

x − Px⊥R(P ) ⇔∀x (x − Px)TP =0 ⇔ P TP = P z }| { P T PP2 P ⇔ = = symmetry idempotence

If S is a subspace of Rn,thenitsorthogonal complement S⊥ := {x|xTy = 0 for all y ∈ S}.) Every v ∈ Rn has a unique decomposition v = x + y, x ∈ S, y ∈ S⊥. solving the normal equations The direct approach to solving (ATA)x = ATb is to

1. form ATA,

75 2. use Cholesky to get factorization GGT, 3. set x = G−T(G−1(ATb)). Example With 4-digit round-to-even floating-point arithmetic   11.02     T 33.02 33.02 A =  11  ,AA = → . 3.02 3.0404 3.02 3.04 11

The result is not even positive definite. The problem is that (it can be shown that)

T 2 † κ2(A A)=κ2(A) where κ2(A)=kA k2kAk2 .

Any roundoff made in forming ATA will have a very significant effect if A is ill-conditioned. The solution is to do entire computation in double precision or look for another algorithm.

Review questions 1. Define a least squares solution to an overdetermined system Ax ≈ b using matrix notation.

2. For a least squares solution to an overdetermined system Ax ≈ b,howshouldthe “equations” be scaled?

3. Define an orthogonal sequence? an orthonormal sequence?

4. How is a subspace represented computationally? 5. What does it mean for two subspaces to be orthogonal?

6. What is the orthogonal complement of a subspace?

7. What is the null space of a matrix?

8. Give an alternative expression for R(A)⊥ which is more useful computationally.

9. Give a geometric interpretation of a linear least squares problem Ax ≈ b.

10. Give necessary and sufficient condition for existence of a solution to a linear least squares problem Ax ≈ b. for existence of a unique solution.

76 11. What are the normal equations for a linear least squares problem Ax ≈ b?

12. Express the linear least squares problem Ax ≈ b as a system of m + n equations in m + n unknowns where m and n are the dimensions of A.

13. If x satisfies the normal equations, show that no other vector can be a better solution to the least squares problem.

14. If y0 is the orthogonal projection of y onto a subspace S, what two conditions does y0 satisfy? vvT 15. What does do? vTv

16. Give a formula for the orthogonal projector onto a subspace S in terms of a basis a1, a2, ..., an for S.

17. What two simple conditions must a matrix P satisfy for it to be an orthogonal projec- tor?

18. What is an oblique projector?

19. What does idempotent mean?

20. Why is the explicit use of the normal equations undesirable computationally?

21. If we do use the normal equations, what method is used for the matrix factorization?

Exercises 1. What can you say about a nonsingular orthogonal projector?

2. (a) What is the orthogonal complement of R(A)where   11.001 A =  11? 11

(b) What is the projector onto the orthogonal complement of R(A)? (c) What is the projector onto R(A)?

3. Construct the orthogonal projector onto the plane through (0,0,0), (1,1,0), (1,0,1).

4. Assuming u and v are nonzero column vectors, show that I + uvT is orthogonal only if u = −2v/(vTv).

77 5. Assume that A ∈ Rm×n has linearly independent columns. Hence ATA has a (unique) Cholesky factorization. Prove that there exists a factorization A = Q1R1 where Q1 ∈ m×n n×n R has columns forming an orthonormal set and R1 ∈ R is an upper . What is the solution of the least squares problem Ax ≈ b in terms of Q1,R1, and b?

6. Let x∗ be a least squares solution to an overdetermined system Ax ≈ b. What is the geometric interpretation of Ax∗? When is Ax∗ unique? When is x∗ unique?

7. Show that an upper triangular is diagonal. What are the diagonal elements?

8. Suppose that the matrix   AB 0 C is orthogonal where A and C are square submatrices. Prove in a logical, deductive fashion that B = 0. (Hint: there is no need to have a formula for the inverse of a 2 × 2 block upper triangular matrix, and there is no need to consider individual elements, columns, or rows or A, B,orC.)

9. Show that kQxk2 = kxk2 if Q is orthogonal.

10. Show that if Q is an m by m orthogonal matrix and A is an m by n matrix, then kQAk2 = kAk2.

11. Assume   Qqˆ 0T ρ is an orthogonal matrix where Qˆ is square and ρ is a scalar. What can we say about Qˆ, q,andρ? (The answer should be simplified.)

12. (a) What is the orthogonal projector for span{v1,v2} where v1, v2 are linearly inde- pendent real vectors?

(b) How does this simplify if v2 is orthogonal to v1? In pariticular, what is the usual way of expressing an orthogonal projector in this case?

3.2 QR Factorization

The least squares problem min kb − Axk2 is simplified if A is reduced to upper triangular x form by means of orthogonal transformations:

QTA = R right triangular

78 m m m n m n Then kb − Axk kQT b − Ax k 2 = ( ) 2 see exercise 3.1.9     T T c n Rˆ n = kQ b − Rxk2 partition Q b = ,R= d m − n 0 m − n     c Rˆ = k − xk2 d 0   q c − Rxˆ k k kc − Rxˆ k2 kdk2 . = d 2 = 2 + 2

Obviously minimized for x = Rˆ−1c, which is computed by back substitution. A special case is m = n; i.e., A is square. This method is numerically very stable because there is no growth in the elements of the reduced matrix: T kRk2 = kQ Ak2 = ···= kAk2. 2 n3 (Show this for matrices.) It is twice as much work as Gaussian elimination: 3 multiplica- tions using Householder reflections.

Review questions 1. What is an orthogonal matrix? 2. What is the effect of multiplication by an orthogonal matrix on the 2-norm of a vector? 2-norm of a matrix? Prove it. 3. What is a QR factorization? 4. Show how the QR factorization of a matrix A can be used to solve the linear least squares problem.

Exercise 1. Consider the problem of solving an overdetermined system Ax ≈ b in the least squares sense. Suppose A is such that it is possible to compute an accurate factorization LU where L is a square lower triangular matrix and U is upper triangular with the same dimensions as A. Why would this be of little use in solving the least squares problem; that is, what is special about a QR factorization?

79 2. Assume that A ∈ Rm×n has linearly independent columns. Hence ATA has a (unique) Cholesky factorization. Prove that there exists a factorization A = Q1R1 where Q1 ∈ m×n n×n R has columns forming an orthonormal set and R1 ∈ R is an upper triangular matrix. What is the solution of the least squares problem Ax ≈ b in terms of Q1,R1, and b?

3.3 Householder Reflections

Definition (Householder) An elementary matrix has the form

I + rank one matrix.

It is easy to show that an elementary matrix has the form

I + uvT,u=06 ,v =06 .

T An example of an elementary matrix is a Gauss transformation Mk = I − mkek .Anelemen- tary matrix is efficient computationally; it is easy to invert. It is not difficult to show that an orthogonal elementary matrix has the form vvT I − P. 2vTv =:

P is symmetric and P 2 = I. Exercise. Prove this. vvT Recall that x is the orthogonal projection of x onto v. vTv v

T vv x vTv

x vvT P I − What does = 2vTv do?

80 T vv x vTv

vvT x hyperplane x−2 T v v orthogonal to v

It reflects in the hyperplane through the origin orthogonal to v. (How should one store P ? How should one multiply by P ?) In sum v

T P=I−2 vv vTv x (Householder) reflection

Px

Note that the length of v is irrelevant—only its direction matters. In practice we want to determine v so that for some given x   α    0  Px =  .  for some α ∈ R.  .  0

81 Recalling that orthogonal matrices preserve Euclidean length,

kxk2 = kPxk2 = |α| =⇒ α = ±kxk2.

To get v to point in the right direction, choose v = x − Px.

v=x−Px x v=x−Px x

Px

Px

Px=sign(x1)kxk2e1 Px = − sign(x1)kxk2e1 first components of x and Px first components of x and Px have same sign have opposite sign

which do we choose?

Note that for very acute angles between x and Px, the direction of v becomes very sensitive to perturbations in x or Px—there is a lot of cancellation in x–Px if they point in similar directions. Therefore,

v = x +sign(x1)kxk2e1, sign(0) = 1 Px = −sign(x1)kxk2e1.

new computing P LAPACK normalizes v so that v1 = 1, and hence uses

new v = v/v1

instead. Also, it is careful to avoid underflow/overflow in the computation of kxk2.It computes 2 β := , vTv so P = I − βvvT.

computing PA for A ∈ Rm×n We are given v and β where P = I − βvvT. The product P ·A would require m2n multiplications. However, A−v(β(vTA)) requires mn multiplications followed by n multiplications followed by mn multiplications. Hence, m2n multiplications vs. 2mn multiplications.

82 reduction to triangular form As in Gaussian elimination, we reduce A column by column. Recursive description. The case m ≥ n = 1 is an exercise. Consider the case m ≥ n ≥ 2. Determine orthogonal P such that   αaT PA= . 0 A˜ P is to be constructed in the next section. Then  αaT A = P 0 A˜ and recursion gives A˜ = Q˜R˜ so that     10T αaT A = QR where Q = P and R = . 0 Q˜ 0 R˜

Nonrecursive description. First, determine orthogonal P1 such that   × ×××    0     0  P A   . 1 =  A˜   0 1   0  0 Similarly,     × ××     × ×  0    × ˜ ˜   ˜ ˜  0  ˜ ˜   P2A1 =  0 A˜2  , P3A2 =   , P4A3 = 0 . 0 A˜3  0  0 0 0 Then     1   ××××      1  1    0 ×××        1   1  1  00××     P1A =   .   P˜3 P˜2  000×    | {z }   P˜4 0000 | {z } P2 0000 | {z } P3 | {z } P4 R T Thus P4P3P2P1 = Q , but we do not need to form this product. Rather we compute T Q b = P4(P3(P2(P1b))). Note. If m = n,thenn − 1 reductions suffice.

83 Householder orthogonalization At beginning of the kth stage k

r11 r12 r1n

r22 r2n

rk−1,k−1 rk−1,n k

~ Ak−1

T We want to find P˜k = I − βkv˜kv˜ such that k   rkk rkk+1 ··· rkn    0  P˜kA˜k−1 =  .  .  . A˜k  0 the algorithm Householder orthogonalization for k = 1,2,..., n do { T determine a Householder reflection P˜k = I − βkv˜kv˜k such that     akk rkk      ak+1,k   0  P˜k  .  =  .  ;  .   .  amk 0 for j = k +1,k +2,..., n do     rkj akj      ak+1,j   ak+1,j   .  = P˜k  .  ;  .   .  amj amj

84 } The storage scheme is as follows: after the kth step we have

beta 1

R beta k m ? n v~ ~ 1 v 2 ... ~ vk ~ Ak ?

n

Review questions 1. What is an elementary matrix? Give an explicit form for an elementary matrix. 2. Give an explicit for an elementary orthogonal matrix. What do we call such a matrix? 3. Describe geometrically the effect of multiplying a vector x by a matrix H = I − 2v(vTv)−1vT? 4. In practice H is constructed so that y = Hx where x and y are given. What condition must y satisfy for H to exist? If all but the first element of y is to vanish, what are the choices for y? 5. Assuming the ability to construct a Householder transformation that maps a given vector to another of the same length, give a recursive algorithm for QR factorization using Householder reflections. As always, use partitioned matrices to describe the algorithm. 6. How much storage is required for the matrix Q in a QR factorization using Householder reflections?

Exercises T 1. Suppose that P2P1A = R where Pi = I − βivivi and       2 0 3/20       1  −1  1  4   0 −9/4  β1 = ,v1 = ,β2 = ,v2 = , and R = . 3  0  9  1   00 1 −1 00

85 Determine the least squares solution of Ax ≈ b where

 T 3 5 11 b = − , − , 0, . 2 2 4 Do an efficient computation and show every step. 2. Determine a reflection P = I − 2v(vTv)−1vT such that Px is a multiple of y where x =0and6 y =6 0. Of the two possibilities, which uses for v the sum of two vectors separated by an angle of at most π/2. 3. Prove or disprove: a Householder reflection is positive definite.

T 4. Write an algorithm for overwriting b with Pn ···P2P1b where Pk = I − βkvkvk and vk T =[0 ··· 0 vkk ··· vmk] . Only the following array values should be referenced by your algorithm bi, 1 ≤ i ≤ m, vik,k≤ i ≤ m, 1 ≤ k ≤ n, βk, 1 ≤ k ≤ n. 5. Let   7/8 −21/25  −123/50  A =   .  2 −48/25  2 102/25

Determine β1,β2,v1,v2 and right triangular matrix R such that P2P1A = R where T Pi = I − βivivi . (As a check confirm that A = P1P2R.) 6. Assuming u and v are nonzero column vectors, show that I + uvT is orthogonal only if u = −2v/(vTv). 7. Count the number of multiplications for the Householder orthogonalization algorithm as described in this section. vvT P I − Px 8. Recall that for a Householder reflection = 2vTv the vector is the mirror reflection of a vector x in the hyperplane through the origin normal to v.Letx and y be linearly independent vectors of equal Euclidean length. By means of a pictorial argument determine a formula for v such that Px = y. 9. Apply Householder orthogonalization to compute the QR decomposition of the matrix   11    12 .  13 14

86 Normalize the Householder vectors vk so that their first (nonzero) element is 1, as does LAPACK, and calculate βk values. (Do not actually form the matrices Pk or worse yet the matrix Qk.).

10. The application of a Householder reduction to an m by n matrix A, n

Pn ···P2P1A = R

yields an upper triangular matrix R. This can be used to create a reduced QR factor- ization A = QˆRˆ where Qˆ is m by n and Rˆ is a square upper triangular matrix. What is Rˆ in terms of P1, P2, ..., Pn, R? What is Qˆ in terms of P1, P2, ..., Pn, R? 11. Given on the graph below is a vector x:

| |/x |/ |/ |/ |/ |/ |/ |/ |/ ------+------| | | | | | | | | | |

0 def Construct the vector x = −sign(x1)kxk2e1 on this graph. Also construct on the graph vvT a vector v in terms of x and x0 such that Px= x0 where P = I − 2 . vTv

2 n T 12. Simplify ka −kak2e1k2 where a, e1 ∈ R and e1 =[1, 0,...,0].

87 3.4 Givens Rotations

The goal is to use one row of a matrix such as   ××××××    ×××××    a ×××    ×××    ×××    b ×××  ×××× ×××× to zero out an element in another row by recombining the two rows. By using   1    1     cs    1     1     −sc  1  1

where c2 + s2 = 1, we can accomplish this with an orthogonal matrix. How should we choose c and s?Sothat −sa + cb =0; that is, √ √ s = b/ a2 + b2,c= a/ a2 + b2. Cost is about 6(n−k) operations per eliminated element vs. 4(n−k) operations per eliminated element for a Householder reflection where k is the column index of the eliminated element. Note. The matrix   cos θ sin θ − sin θ cos θ denotes a 2-dimensional clockwise rotation by an angle of θ radians.

Review questions 1. What is the form of a Givens rotation? What is its geometric interpretation?

2. Show how a typical Givens rotation is determined in the course of computing a QR factorization.

88 Exercises 1. Apply the first 2 (out of 5) Givens rotations in the QR decomposition of the matrix   11    12 .  13 14   11  1 −1  2. Calculate a Givens rotation that zeros out the (4, 1) entry of   without  −1 −1  −11 changing the 1st and 3rd rows.

3.5 Gram-Schmidt Orthogonalization

Recall that if P is an orthogonal projector for a subspace S,theny − Py⊥S. The classical Gram-Schmidt process produces an orthonormal set q1, q2, ..., qn from a linearly independent set a1, a2, ..., an, and, in particular, q1, q2, ..., qk are formed from a1, a2, ..., ak, k =1, 2,...,n:

v2 a2

q1

q1 = a1/ka1k2, T v2 = a2 − q1q1 a2,

q2 = v2/kv2k2,

v3

a3

q2

q1

89 T T v3 = a3 − q1q1 a3 − q2q2 a3,

q3 = v3/kv3k2. More generally, Xj−1 T vj = aj − qkqk aj, k=1

qj = vj/kvjk2. Gram-Schmidt orthogonalization computes a reduced QR factorization: Xj−1 a q qTa kv k q j = k (| k{z j}) + | {zj }2 j k=1 rkj rjj Xj = qkrkj, k=1 so   r11 r12 ··· r1n        r22 ··· r2n  a1 a2 ··· an = q1 q2 ··· qn  . .   .. .  rnn or   T T kv1k2 q1 a2 q1 a3 ···  T   kv2k2 q2 a3 ···  A = QˆR,ˆ Rˆ =    kv3k2 ···  . .. This is a reduced QR factorization. The algorithm is

for k = 1,2,..., n do { vk = ak; for j = 1,2,..., k − 1 do { T rjk = qj ak; vk = vk − qjrjk; } rkk = kvkk2; qk = vk/rkk; }

90 QˆRˆ vs. QR. The classical Gram-Schmidt process is not very satisfactory numerically. A mathemati- cally equivalent but numerically superior alternative is given by the modified Gram-Schmidt process. Instead of T T v3 = a3 − q1q1 a3 − q2q2 a3, use T T v3 =(I − q2q2 )(I − q1q1 )a3.

v3 T (I−q 1q1)a3

q2

a3

q1

Computationally,

v3 = a3; T v3 = v3 − q1q1 v3; T v3 = v3 − q2q2 v3.

An additional change is also desirable and that is to compute the elements of R row by row instead of column by column. A row-oriented version of MGS is preferable because it lends itself readily to the use of column-pivoting to deal with possible rank deficiency; whereas, the column-oriented version does not. Both versions compute Q column by column, but after computing each new column qk, the row-oriented version immediately orthogonalizes the remaining vectors, ak+1, ..., an, against qk:

for k = 1,2,..., n do vk = ak; for k = 1,2,..., n do { rkk = kvkk2; qk = vk/rkk; for j = k +1,k +2,..., n do { T rkj = qk vj; vj = vj − qkrjk; } }

91 Review questions

1. Let q1, q2, ..., qn be a sequence obtained from a linearly independent sequence a1, a2, ..., an by Gram-Schmidt orthogonalization. The sequence q1, q2, ..., qn is uniquely determined by what two properties? One involves relationships among the terms of the generated sequence and the other involves relationships between the two sequences. Express the second of these properties as a matrix equation.

2. Reproduce the algorithm for Gram-Schmidt orthogonalization.

3. What is the “defining” idea of the row-oriented version of MGS?

Exercises 1. (a) Give a high-level recursive algorithm for the reduced QR factorization A = QˆRˆ of an m by n matrix by working in terms of a (1,n− 1) partitioning of the columns of A and Qˆ and a (1,n− 1) by (1,n− 1) partitioning of Rˆ. (You will have to use the orthogonality of Qˆ, not to mention the triangularity of Rˆ.) (b) Give the nonrecursive equivalent of part (a). (c) How is this algorithm related to the classical or modified Gram-Schmidt process?

2. Apply the first 2 (out of 5) Givens rotations in the QR decomposition of the matrix   11    12 .  13 14

3.6 Singular Value Decomposition

THEOREM (SVD) Let A ∈ Rm×n, m ≥ n. There exist orthogonal matrices U ∈ Rm×m, V ∈ Rn×n such that   σ1 0    σ2    A U V T  ..  = Σ Σ= .   0 σn  00 where the singular values σ1 ≥ σ2 ≥···≥σn ≥ 0. If m>n, one can use the SVD for AT to get an SVD for A. The SVD provides a geometric interpretation for A:

V T: rotations and/or reflections

92 Σ: differential scaling along coordinate axes

U: rotations and/or reflections

T unit B VB sphere

T T VB U VB sigma 2

sigma 1

Hence kAk2 = σ1. The condition number κ2(A)=σ1/σn. Computing the SVD is discussed in Section 4.7

3.6.1 Application to linear least squares problem T T T kb − UΣV xk2 = kU b − ΣV xk2. Let c = U Tb, y = V Tx. The problem is now to minimize q 2 2 2 2 kc − Σyk2 = (c1 − σ1y1) + ···+(cn − σnyn) + cn+1 + ···+ cm solution yi = ci/σi.

What if σr+1 = ···= σn = 0 but σr > 0? rank deficient Then yr+1,...,yn can be anything, and we have a family of solutions x = Vy. Suppose we ask for x having minimum kxk2?

kxk2 = kVyk2 = kyk2

93 For smallest kyk2 choose  ci/σi,i=1, 2,...,r, yi = 0,i= r +1,r+2,...,n.

That is,  −1  σ1  .   .. 0     σ−1  y †c †  r  . =Σ where Σ =    0  .  .. 0  0 Moore–Penrose pseudoinverse

x = A†b, A† = V Σ†U T

3.6.2 Data Compression m×n THEOREM Let A ∈ R ,n≤ m,andletB beamatrixofrankk ≤ n for which kB −AkF is smallest. Then Xk T B = uiσivi i=1 where   σ   1 vT Xn  .  1 T  ..   .  A = uiσiv =[u1 ···um]    .  i  σ  . i=1 n vT 0 n is the SVD of A. The SVD is a good way to compute the rank of a matrix.

Review questions 1. Under what conditions does a matrix A ∈ Rm×n, m ≥ n, have a singular value decom- position?

2. Specify precisely the form of a singular value decomposition of a matrix A ∈ Rm×n for nm.

3. What is the 2-norm of a matrix in terms of its SVD?

4. What is the 2-norm condition number of a nonsingular matrix in terms of its SVD?

94 5. What additional requirement makes the solution of any linear least squares problem unique?

6. What is the Moore-Penrose pseudoinverse of a rank deficient matrix A ∈ Rm×n, n

7. What is the solution of a linear least squares problem with a rank deficient coefficient matrix?

8. What is the matrix B of rank k ≤ n for which kB − AkF is smallest where A ∈ Rm×n,n

95