<<

Jim Lambers MAT 610 Summer Session 2009-10 Lecture 3 Notes

These notes correspond to Sections 2.5-2.7 in the text.

Orthogonality

n T A set of vectors {x1, x2,..., xk} in ℝ is said to be orthogonal if xi xj = 0 whenever i ∕= j. If, in T addition, xi xi = 1 for i = 1, 2, . . . , k, then we say that the set is orthonormal. In this case, we can T also write xi xj = ij, where  1 i = j  = ij 0 i ∕= j is called the . Example The vectors x =  1 −3 2 T and y =  4 2 1 T are orthogonal, as xT y = 1(4) − √ T 3(2) + 2(1) = 4 − 6 + 2 = 0. Using the fact that ∥x∥2 = x x, we can normalize these vectors to obtain orthonormal vectors ⎡ 1 ⎤ ⎡ 4 ⎤ x 1 y 1 x˜ = = √ −3 , y˜ = = √ 2 , ∥x∥ ⎣ ⎦ ∥y∥ ⎣ ⎦ 2 14 2 2 21 1 which satisfy x˜T x˜ = y˜T y˜ = 1, x˜T y˜ = 0. □

m The concept of extends to subspaces. If S1,S2,...,Sk is a set of subspaces of ℝ , T we say that these subspaces are mutually orthogonal if x y = 0 whenever x ∈ Si, y ∈ Sj, for i ∕= j. n ⊥ Furthermore, we define the of a subspace S ⊆ ℝ to be the set S defined by ⊥ m T S = {y ∈ ℝ ∣ x y = 0, x ∈ S}.

3 Example Let S be the subspace of ℝ defined by S = span{v1, v2}, where ⎡ 1 ⎤ ⎡ 1 ⎤ v1 = ⎣ 1 ⎦ , v2 = ⎣ −1 ⎦ . 1 1 n o Then, S has the orthogonal complement S⊥ = span  1 0 −1 T . It can be verified directly, by computing inner products, that the single vector of S⊥,  T v3 = 1 0 −1 , is orthogonal to either basis vector of S, v1 or v2. It follows that any

1 multiple of this vector, which describes any vector in the span of v3, is orthogonal to any of these basis vectors, which accounts for any vector in S, due to the linearity of the inner product:

T T T (c3v3) (c1v1 + c2v2) = c3c1v3 v1 + c3c2v3 v2 = c3c3(0) + c3c2(0) = 0.

Therefore, any vector in S⊥ is in fact orthogonal to any vector in S. □

3 It is helpful to note that given a basis {v1, v2} for any two-dimensional subspace S ⊆ ℝ , a basis ⊥ {v3} for S can be obtained by computing the cross-product of the basis vectors of S, ⎡ ⎤ (v1)2(v2)3 − (v1)3(v2)2 v3 = v1 × v2 = ⎣ (v1)3(v2)1 − (v1)1(v2)3 ⎦ . (v1)1(v2)2 − (v1)2(v2)1

A similar process can be used to compute the orthogonal complement of any (n − 1)-dimensional n subspace of ℝ . This orthogonal complement will always have 1, because the sum of the dimension of any subspace of a finite-dimensional vector , and the dimension of its orthogonal complement, is equal to the dimension of the . That is, if dim(V ) = n, and S ⊆ V is a subspace of V , then dim(S) + dim(S⊥) = n. Let S = range(A) for an m × n A. By the definition of the range of a matrix, if y ∈ S, n m T then y = Ax for some vector x ∈ ℝ . If z ∈ ℝ is such that z y = 0, then we have

zT Ax = zT (AT )T x = (AT z)T x = 0.

If this is true for all y ∈ S, that is, if z ∈ S⊥, then we must have AT z = 0. Therefore, z ∈ null(AT ), and we conclude that S⊥ = null(AT ). That is, the orthogonal complement of the range is the null space of the . Example Let ⎡ 1 1 ⎤ A = ⎣ 2 −1 ⎦ . 3 2

⊥ If S = range(A) = span{a1, a2}, where a1 and a2 are the columns of A, then S = span{b}, where  T b = a1 × a2 = 7 1 −3 . We then note that

⎡ 7 ⎤  1 2 3   0  AT b = 1 = . 1 −1 2 ⎣ ⎦ 0 −3

2 That is, b is in the null space of AT . In fact, because of the relationships

(AT ) + dim(null(AT )) = 3,

rank(A) = rank(AT ) = 2, we have dim(null(AT )) = 1, so null(A) = span{b}. □

m A basis for a subspace S ⊆ ℝ is an if the vectors in the basis form an orthonormal set. For example, let Q1 be an m × r matrix with orthonormal columns. Then these columns form an orthonormal basis for range(Q1). If r < m, then this basis can be extended to a m ⊥ T basis of all of ℝ by obtaining an orthonormal basis of range(Q1) = null(Q1 ). ⊥ If we define Q2 to be a matrix whose columns form an orthonormal basis of range(Q1) , then   T the columns of the m × m matrix Q = Q1 Q2 are also orthonormal. It follows that Q Q = I. That is, QT = Q−1. Furthermore, because the inverse of a matrix is unique, QQT = I as well. We say that Q is an . One interesting property of orthogonal matrices is that they preserve certain norms. For exam- 2 T T T 2 ple, if Q is an orthogonal n×n matrix, then ∥Qx∥2 = x Q Qx = x x = ∥x∥2. That is, orthogonal matrices preserve ℓ2-norms. Geometrically, multiplication by an n × n orthogonal matrix preserves the length of a vector, and performs a in n-dimensional space. Example The matrix ⎡ cos  0 sin  ⎤ Q = ⎣ 0 1 0 ⎦ − sin  0 cos 

3 is an orthogonal matrix, for any . For any vector x ∈ ℝ , the vector Qx can be obtained by rotating x clockwise by the angle  in the xz-, while QT = Q−1 performs a counterclockwise rotation by  in the xz-plane. □

In addition, orthogonal matrices preserve the matrix ℓ2 and Frobenius norms. That is, if A is an m × n matrix, and Q and Z are m × m and n × n orthogonal matrices, respectively, then

∥QAZ∥F = ∥A∥F , ∥QAZ∥2 = ∥A∥2.

The Singular Value Decomposition

The matrix ℓ2- can be used to obtain a highly useful decomposition of any m × n matrix A. Let x be a unit ℓ2-norm vector (that is, ∥x∥2 = 1 such that ∥Ax∥2 = ∥A∥2. If z = Ax, and we define y = z/∥z∥2, then we have Ax = y, with  = ∥A∥2, and both x and y are unit vectors.

3 n m We can extend x and y, separately, to orthonormal bases of ℝ and ℝ , respectively, to obtain   n×n   m×m orthogonal matrices V1 = x V2 ∈ ℝ and U1 = y U2 ∈ ℝ . We then have

 T  T y   U1 AV1 = T A x V2 U2  T T  y Ax y AV2 = T T U2 Ax U2 AV2  T T  y y y AV2 = T T U2 y U2 AV2   wT  = 0 B

≡ A1,

T T T where B = U2 AV2 and w = V2 A y. Now, let    s = . w

2 2 2 2 2 Then ∥s∥2 =  + ∥w∥2. Furthermore, the first component of A1s is  + ∥w∥2. It follows from the invariance of the matrix ℓ2-norm under orthogonal transformations that

2 2 2 2 2 2 2 ∥A1s∥2 ( + ∥w∥2) 2 2  = ∥A∥2 = ∥A1∥2 ≥ 2 ≥ 2 2 =  + ∥w∥2, ∥s∥2  + ∥w∥2 and therefore we must have w = 0. We then have   0  U T AV = A = . 1 1 1 0 B

Continuing this process on B, and keeping in mind that ∥B∥2 ≤ ∥A∥2, we obtain the decom- position U T AV = Σ where   m×m   n×n U = u1 ⋅ ⋅ ⋅ um ∈ ℝ ,V = v1 ⋅ ⋅ ⋅ vn ∈ ℝ are both orthogonal matrices, and

m×n Σ = diag(1, . . . , p) ∈ ℝ , p = min{m, n} is a diagonal matrix, in which Σii = i for i = 1, 2, . . . , p, and Σij = 0 for i ∕= j. Furthermore, we have ∥A∥2 = 1 ≥ 2 ≥ ⋅ ⋅ ⋅ ≥ p ≥ 0.

4 This decomposition of A is called the singular value decomposition, or SVD. It is more commonly written as a factorization of A, A = UΣV T . The diagonal entries of Σ are the singular values of A. The columns of U are the left singular vectors, and the columns of V are the right singluar vectors. It follows from the SVD itself that the singular values and vectors satisfy the relations

T Avi = iui,A ui = ivi, i = 1, 2,..., min{m, n}.

For convenience, we denote the ith largest singular value of A by i(A), and the largest and smallest singular values are commonly denoted by max(A) and min(A), respectively. The SVD conveys much useful information about the structure of a matrix, particularly with regard to systems of linear equations involving the matrix. Let r be the number of nonzero singular values. Then r is the rank of A, and

range(A) = span{u1,..., ur}, null(A) = span{vr+1,..., vn}. That is, the SVD yields orthonormal bases of the range and null space of A. It follows that we can write r X T A = iuivi . i=1 This is called the SVD expansion of A. If m ≥ n, then this expansion yields the “economy-size” SVD T A = U1Σ1V , where   m×n n×n U1 = u1 ⋅ ⋅ ⋅ un ∈ ℝ , Σ1 = diag(1, . . . , n) ∈ ℝ .

Example The matrix ⎡ 11 19 11 ⎤ A = ⎣ 9 21 9 ⎦ 10 20 10 has the SVD A = UΣV T where √ √ √ √ √ √ ⎡ 1/ 3 −1/ 2 1/ 6 ⎤ ⎡ 1/ 6 −1/ 3 1/ 2 ⎤ √ √ √ p √ U = ⎣ 1/√3 1/ 2 1/ 6 ⎦ ,V = ⎣ 2√/3 1/√3√ 0 ⎦ , 1/ 3 0 −p2/3 1/ 6 −1/ 3 −1/ 2 and ⎡ 42.4264 0 0 ⎤ S = ⎣ 0 2.4495 0 ⎦ . 0 0 0

5     Let U = u1 u2 u3 and V = v1 v2 v3 be column partitions of U and V , respectively. Because there are only two nonzero singular values, we have rank(A) = 2, Furthermore, range(A) = span{u1, u2}, and null(A) = span{v3}. We also have

T T A = 42.4264u1v1 + 2.4495u2v2 .

The SVD is also closely related to the ℓ2-norm and Frobenius norm. We have

2 2 2 2 ∥A∥2 = 1, ∥A∥F = 1 + 2 + ⋅ ⋅ ⋅ + r , and ∥Ax∥2 min = p, p = min{m, n}. x∕=0 ∥x∥2 These relationships follow directly from the invariance of these norms under orthogonal transfor- mations. The SVD has many useful applications, but one of particular interest is that the truncated SVD expansion k X T Ak = iuivi , i=1 where k < r = rank(A), is the best approximation of A by a rank-k matrix. It follows that the between A and the set of matrices of rank k is

r X T min ∥A − B∥2 = ∥A − Ak∥2 = iuivi = k+1, rank(B)=k i=k+1 because the ℓ2-norm of a matrix is its largest singular value, and k+1 is the largest singular value of the matrix obtained by removing the first k terms of the SVD expansion. Therefore, p, where p = min{m, n}, is the distance between A and the set of all rank-deficient matrices (which is zero when A is already rank-deficient). Because a matrix of full rank can have arbitarily small, but still m×n positive, singular values, it follows that the set of all full-rank matrices in ℝ is both open and dense. Example The best approximation of the matrix A from the previous example, which has rank two, by a matrix of rank one is obtained by taking only the first term of its SVD expansion,

⎡ 10 20 10 ⎤ T A1 = 42.4264u1v1 = ⎣ 10 20 10 ⎦ . 10 20 10

6 The absolute error in this approximation is

T ∥A − A1∥2 = 2∥u2v2 ∥2 = 2 = 2.4495, while the relative error is ∥A − A ∥ 2.4495 1 2 = = 0.0577. ∥A∥2 42.4264 That is, the rank-one approximation of A is off by a little less than six percent. □ Suppose that the entries of a matrix A have been determined experimentally, for example by measurements, and the error in the entries is determined to be bounded by some value . Then, if k >  ≥ k+1 for some k < min{m, n}, then  is at least as large as the distance between A and the set of all matrices of rank k. Therefore, due to the uncertainty of the measurement, A could very well be a matrix of rank k, but it cannot be of lower rank, because k > . We say that the -rank of A, defined by rank(A, ) = min rank(B), ∥A−B∥2≤ is equal to k. If k < min{m, n}, then we say that A is numerically rank-deficient.

Projections

n n Given a vector z ∈ ℝ , and a subspace S ⊆ ℝ , it is often useful to obtain the vector w ∈ S that best approximates z, in the sense that z − w ∈ S⊥. In other words, the error in w is orthogonal to the subspace in which w is sought. To that end, we define the orthogonal projection onto S to be an n × n matrix P such that if w = P z, then w ∈ S, and wT (z − w) = 0. Substituting w = P z into wT (z − w) = 0, and noting the commutativity of the inner product, we see that P must satisfy the relations

zT (P T − P T P )z = 0, zT (P − P T P )z = 0,

n for any vector z ∈ ℝ . It follows from these relations that P must have the following properties: range(P ) = S, and P T P = P = P T , which yields P 2 = P . Using these properties, it can be shown that the orthogonal projection P onto a given subspace S is unique. Furthermore, it can be shown that P = VV T , where V is a matrix whose columns are an orthonormal basis for S.

Projections and the SVD

m×n T Let A ∈ ℝ have the SVD A = UΣV , where     U = Ur U˜r ,V = Vr V˜r ,

7 where r = rank(A) and     Ur = u1 ⋅ ⋅ ⋅ ur ,Vr = v1 ⋅ ⋅ ⋅ vr .

Then, we obtain the following projections related to the SVD:

T ∙ UrUr is the projection onto the range of A ˜ ˜ T ∙ VrVr is the projection onto the null space of A ˜ ˜ T ∙ UrUr is the projection onto the orthogonal complement of the range of A, which is the null space of AT

T ∙ VrVr is the projection onto the orthogonal complement of the null space of A, which is the range of AT .

These projections can also be determined by noting that the SVD of AT is AT = V ΣU T . Example For the matrix ⎡ 11 19 11 ⎤ A = ⎣ 9 21 9 ⎦ , 10 20 10 that has rank two, the orthogonal projection onto range(A) is

 T  T   u1 T T U2U2 = u1u2 T = u1u1 + u2u2 , u2   where U = u1 u2 u3 is the previously defined column partition of the matrix U from the SVD of A. □

Sensitivity of Linear Systems

m×n T Let A ∈ ℝ be nonsingular, and let A = UΣV be the SVD of A. Then the solution x of the system of linear equations Ax = b can be expressed as

n X uT b x = A−1b = V Σ−1U T b = i v .  i i=1 i

This formula for x suggests that if n is small relative to the other singular values, then the system Ax = b can be sensitive to perturbations in A or b. This makes sense, considering that n is the distance between A and the set of all singular n × n matrices.

8 In an attempt to measure the sensitivity of this system, we consider the parameterized system

(A + E)x() = b + e,

n×n n where E ∈ ℝ and e ∈ ℝ are perturbations of A and b, respectively. Taking the Taylor expansion of x() around  = 0 yields

x() = x + x′(0) + O(2), where x′() = (A + E)−1(e − Ex), which yields x′(0) = A−1(e − Ex). Using norms to measure the relative error in x, we obtain ∥x() − x∥ ∥A−1(e − Ex)∥  ∥e∥  = ∣∣ + O(2) ≤ ∣∣∥A−1∥ + ∥E∥ + O(2). ∥x∥ ∥x∥ ∥x∥ Multiplying and dividing by ∥A∥, and using Ax = b to obtain ∥b∥ ≤ ∥A∥∥x∥, yields ∥x() − x∥  ∥e∥ ∥E∥ = (A)∣∣ + , ∥x∥ ∥b∥ ∥A∥ where (A) = ∥A∥∥A−1∥ is called the condition number of A. We conclude that the relative errors in A and b can be amplified by (A) in the solution. Therefore, if (A) is large, the problem Ax = b can be quite sensitive to perturbations in A and b. In this case, we say that A is ill-conditioned; otherwise, we say that A is well-conditioned. The definition of the condition number depends on the matrix norm that is used. Using the ℓ2-norm, we obtain −1 1(A) 2(A) = ∥A∥2∥A ∥2 = . n(A)

It can readily be seen from this formula that 2(A) is large if n is small relative to 1. We also note that because the singular values are the lengths of the semi-axes of the hyperellipsoid {Ax ∣ ∥x∥2 = 1}, the condition number in the ℓ2-norm measures the elongation of this hyperellipsoid. Example The matrices  0.7674 0.0477   0.7581 0.1113  A = ,A = 1 0.6247 0.1691 2 0.6358 0.0933

10 do not appear to be very different from one another, but 2(A1) = 10 while 2(A2) = 10 . That is, A1 is well-conditioned while A2 is ill-conditioned.

9 To illustrate the ill-conditioned nature of A2, we solve the systems A2x1 = b1 and A2x2 = b2 for the unknown vectors x1 and x2, where

 0.7662   0.7019  b = , b = . 1 0.6426 2 0.7192

These vectors differ from one another by roughly 10%, but the solutions

 0.9894   −1.4522 × 108  x = , x = 1 0.1452 2 9.8940 × 108 differ by several orders of magnitude, because of the sensitivity of A2 to perturbations. □

Just as the largest singular value of A is the ℓ2-norm of A, and the smallest singular value is the distance from A to the nearest singular matrix in the ℓ2-norm, we have, for any ℓp-norm,

1 ∥ΔA∥ = min p . p(A) A+ΔA singular ∥A∥p

That is, in any ℓp-norm, p(A) measures the relative distance in that norm from A to the set of singular matrices. Because det(A) = 0 if and only if A is singular, it would appear that the could be used to measure the distance from A to the nearest singular matrix. However, this is generally not the case. It is possible for a matrix to have a relatively large determinant, but be very close to a singular matrix, or for a matrix to have a relatively small determinant, but not be nearly singular. In other words, there is very little correlation between det(A) and the condition number of A. Example Let ⎡ 1 −1 −1 −1 −1 −1 −1 −1 −1 −1 ⎤ ⎢ 0 1 −1 −1 −1 −1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ 0 0 1 −1 −1 −1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ 0 0 0 1 −1 −1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ 0 0 0 0 1 −1 −1 −1 −1 −1 ⎥ A = ⎢ ⎥ . ⎢ 0 0 0 0 0 1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ 0 0 0 0 0 0 1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ 0 0 0 0 0 0 0 1 −1 −1 ⎥ ⎢ ⎥ ⎣ 0 0 0 0 0 0 0 0 1 −1 ⎦ 0 0 0 0 0 0 0 0 0 1

Then det(A) = 1, but 2(A) ≈ 1, 918, and 10 ≈ 0.0029. That is, A is quite close to a singular ˜ T matrix, even though det(A) is not near zero. For example, the nearby matrix A = A − 10u10v10,

10 which has entries ⎡ 1 −1 −1 −1 −1 −1 −1 −1 −1 −1 ⎤ ⎢ −0 1 −1 −1 −1 −1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ −0 −0 1 −1 −1 −1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ −0 −0 −0 1 −1 −1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ −0.0001 −0 −0 −0 1 −1 −1 −1 −1 −1 ⎥ A˜ ≈ ⎢ ⎥ , ⎢ −0.0001 −0.0001 −0 −0 −0 1 −1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ −0.0003 −0.0001 −0.0001 −0 −0 −0 1 −1 −1 −1 ⎥ ⎢ ⎥ ⎢ −0.0005 −0.0003 −0.0001 −0.0001 −0 −0 −0 1 −1 −1 ⎥ ⎢ ⎥ ⎣ −0.0011 −0.0005 −0.0003 −0.0001 −0.0001 −0 −0 −0 1 −1 ⎦ −0.0022 −0.0011 −0.0005 −0.0003 −0.0001 −0.0001 −0 −0 −0 1 is singular. □

11