10 Orthogonality
Total Page:16
File Type:pdf, Size:1020Kb
10 Orthogonality 10.1 Orthogonal subspaces In the plane R2 we think of the coordinate axes as being orthogonal (perpen- dicular) to each other. We can express this in terms of vectors by saying that every vector in one axis is orthogonal to every vector in the other. Let V be an inner product space and let S and T be subsets of V . We say that S and T are orthogonal, written S T , if every vector in S is orthogonal to every vector in T : ⊥ S T s t for all s S, t T ⊥ ⇐⇒ ⊥ ∈ ∈ 3 10.1.1 Example In R let S be the x1x2-plane and let T be the x3-axis. Show that S T . ⊥ T Solution Let s S and let t T . We can write s = [x1, x2, 0] and t = T ∈ ∈ [0, 0, x3] so that 0 T s, t = s t = x1 x2 0 0 =0. h i x 3 Therefore, s t and we conclude that S T . ⊥ ⊥ 3 10.1.2 Example In R let S be the x1x2-plane and let T be the x1x3- plane. Is it true that S T ? ⊥ Solution Although the planes S and T appear to be perpendicular in the in- formal sense, they are not so in the technical sense. For instance, the vector T e1 = [1, 0, 0] is in both S and T , yet 1 T e1, e1 = e1 e1 = 1 0 0 0 =1 =0, h i 0 6 so e e . Viewing the first e in S and the second e in T , we see that S T . 1 6⊥ 1 1 1 6⊥ 1 10 ORTHOGONALITY 2 Orthogonal complement. Let S be a subspace of V . The orthogonal complement of S (in V ), written S⊥, is the set of all vectors in V that are orthogonal to every vector in S: S⊥ = v V v s for all s S . { ∈ | ⊥ ∈ } 3 For instance, in R the orthogonal complement of the x1x2-plane is the x3-axis. On the other hand, the orthogonal complement of the x3-axis is the x1x2-plane. It follows using the inner product axioms that if S is a subspace of V , then so is its orthogonal complement S⊥. The next theorem says that in order to show that a vector is orthogonal to a subspace it is enough to check that it is orthogonal to each vector in a spanning set for the subspace. Theorem. Let b , b ,..., bn be a set vectors in V and let { 1 2 } S = Span b , b ,..., bn . A vector v in V is in S⊥ if and only { 1 2 } if v bi for each i. ⊥ Proof. For simplicity of notation, we prove only the special case when n =2 so that S = Span b , b . If v is in S⊥, then it is orthogonal to every vector in { 1 2} S, so, in particular, v b1 and v b2. Now assume that v V and v b1 and v b . If s S,⊥ then we can⊥ write s = αb + αb for some∈ α , α ⊥ R. ⊥ 2 ∈ 1 2 1 2 ∈ 10 ORTHOGONALITY 3 So using properties of the inner product we get v, s = v, α b + α b h i h 1 1 2 2i = α v, b + α v, b (ii), (iii), and (iv) 1h 1i 2h 2i = α 0+ α 0 v b and v b 1 · 2 · ⊥ 1 ⊥ 2 =0. Therefore, v s. This shows that v is orthogonal to every vector in S so that ⊥ v S⊥. ∈ Theorem. If A is a matrix, then the orthogonal complement of the row space of A is the null space of A: (Row A)⊥ = Null A. Proof. Let A be an m n matrix and let b1, b2,..., bm be the rows of A written × n as columns so that Row A = Span b1, b2,..., bm . Let x be a vector in R . The product Ax is the m 1 matrix{ obtained by taking} the dot products of x with the rows of A: × T T b1 b1 x . Ax = . x = . bT bT x m m T so, in view of the previous theorem, x is in (Row A)⊥ if and only if bi x = 0 for each i and this holds if and only if x is in Null A. 10.1.3 Example Let S be the subspace of R4 spanned by the vectors T T [1, 0, 2, 1] and [0, 1, 3, 2] . Find a basis for S⊥. − − Solution We first note that S = Row A, where 1 0 2 1 A = . 01− 3 2 − According to the theorem, S⊥ = (Row A)⊥ = Null A so we need only find a basis for the null space of A. The augmented matrix we write to solve Ax = 0 is already in reduced row echelon form (RREF): 1 0 2 1 0 0 1− 3 2 0 − so Null A = [2t s, 3t +2s,t,s]T t,s R . We have { − − | ∈ } 2t s 2 1 3t−+2s 3 −2 = t + s , − t −1 0 s 0 1 10 ORTHOGONALITY 4 T T and therefore [2, 3, 1, 0] , [ 1, 2, 0, 1] is a basis for Null A (and hence S⊥). { − − } The next theorem generalizes the notion that the shortest distance from a point to a plane is achieved by dropping a perpendicular. Theorem. Let S be a subspace of V , let b be a vector in V and assume that b = s0 + r with s0 S and r S⊥. For every s S we have ∈ ∈ ∈ dist(b, s ) dist(b, s). 0 ≤ Proof. Let s S. We have ∈ b s 2 = (b s ) + (s s) 2 k − k k − 0 0 − k = b s 2 + s s 2 Pythagorean theorem k − 0k k 0 − k b s 2 ≥k − 0k (note that the Pythagorean theorem applies since b s0 = r S⊥ and s0 s S so that (b s ) (s s)). Therefore, − ∈ − ∈ − 0 ⊥ 0 − dist(b, s )= b s b s = dist(b, s). 0 k − 0k≤k − k 10.2 Least squares Suppose that the matrix equation Ax = b has no solution. In terms of distance, this means that dist(b, Ax) is never zero, no matter what x is. 10 ORTHOGONALITY 5 Instead of leaving the equation unsolved, it is sometimes useful to find an x0 that is as close to being a solution as possible, that is, for which the distance from b to Ax0 is less than or equal to the distance from b to Ax for every other x. This is called a “least squares solution.” A least squares solution can be used, for instance, to find the line that best fits some data points (see Example 10.2.1). Least squares. n Let A be an m n matrix and let b be a vector in R . If x = x0 is a solution to× AT Ax = AT b. then, for every x Rn, ∈ dist(b, Ax ) dist(b, Ax). 0 ≤ Such an x0 is called a least squares solution to the equation Ax = b. n T T Proof. Let x0 R and assume that A Ax0 = A b. Letting s0 = Ax0, we have ∈ AT (b s )= AT (b Ax )= 0, − 0 − 0 so that b s Null AT − 0 ∈ T = (Row A )⊥ (Theorem in Section 10.1) = (Col A)⊥ = S⊥, where S = Ax x Rn (the product Ax can be interpreted as the linear combination{ of the| columns∈ } of A with the entries of x as scalar factors, so S is the set of all linear combinations of the columns of A, which is Col A). This shows that the vector r = b s0 is in S⊥, so that b = s0 + r with s0 S and − n ∈ r S⊥. By the last theorem of Section 10.1, for every x R , we have ∈ ∈ dist(b, Ax ) = dist(b, s ) dist(b, Ax). 0 0 ≤ 10.2.1 Example Use a least squares solution to find a line that best fits the data points (1, 2), (3, 2), and (4, 5). 10 ORTHOGONALITY 6 Solution Here is the graph: If we write the desired line as c + dx = y, then ideally the line would go through all three points giving the system c + d =2 c +3d =2 c +4d =5, which can be written as the matrix equation Ax = b, where 1 1 2 c A = 1 3 , x = , b = 2 . d 1 4 5 The least squares solution is obtained by solving the equation AT Ax = AT b. We have 1 1 1 1 1 3 8 AT A = 1 3 = 1 3 4 8 26 1 4 and 2 1 1 1 9 AT b = 2 = , 1 3 4 28 5 so the matrix equation has corresponding augmented matrix 3 8 9 8 3 8 9 − 8 26 28 3 ∼ 0 14 12 1 2 3 8 9 7 ∼ 0 7 6 8 − 1 21 0 15 21 ∼ 0 7 6 1 7 5 1 0 7 ∼ 0 1 6 7 10 ORTHOGONALITY 7 5 6 5 6 Therefore, c = 7 and d = 7 and the best fitting line is y = 7 + 7 x, which is the line shown in the graph. (We could tell in advance that the matrix equation Ax = b has no solution since the points are not collinear. In general, however, it is not necessary to first check that an equation has no solution before applying the least squares method since, if it has a solution, then that is what the method will produce.) We can use the last example to explain the terminology “least squares.” The method gives a line that minimizes the sum of the squares of the vertical dis- 2 2 2 tances from the points to the lines.