<<

10

10.1 Orthogonal subspaces

In the R2 we think of the coordinate axes as being orthogonal (perpen- dicular) to each other. We can express this in terms of vectors by saying that every vector in one axis is orthogonal to every vector in the other. Let V be an inner product and let S and T be subsets of V . We say that S and T are orthogonal, written S T , if every vector in S is orthogonal to every vector in T : ⊥

S T s t for all s S, t T ⊥ ⇐⇒ ⊥ ∈ ∈

3 10.1.1 Example In R let S be the x1x2-plane and let T be the x3-axis. Show that S T . ⊥ T Solution Let s S and let t T . We can write s = [x1, x2, 0] and t = T ∈ ∈ [0, 0, x3] so that

0 T s, t = s t = x1 x2 0 0 =0. h i x    3   Therefore, s t and we conclude that S T . ⊥ ⊥ 3 10.1.2 Example In R let S be the x1x2-plane and let T be the x1x3- plane. Is it true that S T ? ⊥ Solution Although the planes S and T appear to be in the in- formal sense, they are not so in the technical sense. For instance, the vector T e1 = [1, 0, 0] is in both S and T , yet

1 T e1, e1 = e1 e1 = 1 0 0 0 =1 =0, h i 0 6     so e e . Viewing the first e in S and the second e in T , we see that S T . 1 6⊥ 1 1 1 6⊥

1 10 ORTHOGONALITY 2

Orthogonal complement. Let S be a subspace of V . The of S (in V ), written S⊥, is the set of all vectors in V that are orthogonal to every vector in S:

S⊥ = v V v s for all s S . { ∈ | ⊥ ∈ }

3 For instance, in R the orthogonal complement of the x1x2-plane is the x3-axis. On the other hand, the orthogonal complement of the x3-axis is the x1x2-plane.

It follows using the inner product axioms that if S is a subspace of V , then so is its orthogonal complement S⊥. The next theorem says that in order to show that a vector is orthogonal to a subspace it is enough to check that it is orthogonal to each vector in a spanning set for the subspace.

Theorem. Let b , b ,..., bn be a set vectors in V and let { 1 2 } S = Span b , b ,..., bn . A vector v in V is in S⊥ if and only { 1 2 } if v bi for each i. ⊥

Proof. For simplicity of notation, we prove only the special case when n =2 so that S = Span b , b . If v is in S⊥, then it is orthogonal to every vector in { 1 2} S, so, in particular, v b1 and v b2. Now assume that v V and v b1 and v b . If s S,⊥ then we can⊥ write s = αb + αb for some∈ α , α ⊥ R. ⊥ 2 ∈ 1 2 1 2 ∈ 10 ORTHOGONALITY 3

So using properties of the inner product we get v, s = v, α b + α b h i h 1 1 2 2i = α v, b + α v, b (ii), (iii), and (iv) 1h 1i 2h 2i = α 0+ α 0 v b and v b 1 · 2 · ⊥ 1 ⊥ 2 =0. Therefore, v s. This shows that v is orthogonal to every vector in S so that ⊥ v S⊥. ∈

Theorem. If A is a , then the orthogonal complement of the row space of A is the null space of A:

(Row A)⊥ = Null A.

Proof. Let A be an m n matrix and let b1, b2,..., bm be the rows of A written × n as columns so that Row A = Span b1, b2,..., bm . Let x be a vector in R . The product Ax is the m 1 matrix{ obtained by taking} the dot products of x with the rows of A: × T T b1 b1 x . . Ax =  .  x =  .  bT bT x  m  m      T so, in view of the previous theorem, x is in (Row A)⊥ if and only if bi x = 0 for each i and this holds if and only if x is in Null A.

10.1.3 Example Let S be the subspace of R4 spanned by the vectors T T [1, 0, 2, 1] and [0, 1, 3, 2] . Find a for S⊥. − − Solution We first note that S = Row A, where 1 0 2 1 A = . 01− 3 2  −  According to the theorem, S⊥ = (Row A)⊥ = Null A so we need only find a basis for the null space of A. The augmented matrix we write to solve Ax = 0 is already in reduced row echelon form (RREF): 1 0 2 1 0 0 1− 3 2 0  −  so Null A = [2t s, 3t +2s,t,s]T t,s R . We have { − − | ∈ } 2t s 2 1 3t−+2s 3 −2 = t + s , − t  −1   0   s   0   1              10 ORTHOGONALITY 4

T T and therefore [2, 3, 1, 0] , [ 1, 2, 0, 1] is a basis for Null A (and hence S⊥). { − − }

The next theorem generalizes the notion that the shortest from a to a plane is achieved by dropping a perpendicular.

Theorem. Let S be a subspace of V , let b be a vector in V and assume that b = s0 + r with s0 S and r S⊥. For every s S we have ∈ ∈ ∈ dist(b, s ) dist(b, s). 0 ≤

Proof. Let s S. We have ∈ b s 2 = (b s ) + (s s) 2 k − k k − 0 0 − k = b s 2 + s s 2 k − 0k k 0 − k b s 2 ≥k − 0k

(note that the Pythagorean theorem applies since b s0 = r S⊥ and s0 s S so that (b s ) (s s)). Therefore, − ∈ − ∈ − 0 ⊥ 0 − dist(b, s )= b s b s = dist(b, s). 0 k − 0k≤k − k

10.2 Least squares

Suppose that the matrix equation Ax = b has no solution. In terms of distance, this means that dist(b, Ax) is never zero, no matter what x is. 10 ORTHOGONALITY 5

Instead of leaving the equation unsolved, it is sometimes useful to find an x0 that is as close to being a solution as possible, that is, for which the distance from b to Ax0 is less than or equal to the distance from b to Ax for every other x. This is called a “least squares solution.” A least squares solution can be used, for instance, to find the that best fits some data points (see Example 10.2.1).

Least squares.

n Let A be an m n matrix and let b be a vector in R . If x = x0 is a solution to× AT Ax = AT b. then, for every x Rn, ∈ dist(b, Ax ) dist(b, Ax). 0 ≤

Such an x0 is called a least squares solution to the equation Ax = b.

n T T Proof. Let x0 R and assume that A Ax0 = A b. Letting s0 = Ax0, we have ∈ AT (b s )= AT (b Ax )= 0, − 0 − 0 so that

b s Null AT − 0 ∈ T = (Row A )⊥ (Theorem in Section 10.1)

= (Col A)⊥

= S⊥, where S = Ax x Rn (the product Ax can be interpreted as the { of the| columns∈ } of A with the entries of x as factors, so S is the set of all linear combinations of the columns of A, which is Col A). This shows that the vector r = b s0 is in S⊥, so that b = s0 + r with s0 S and − n ∈ r S⊥. By the last theorem of Section 10.1, for every x R , we have ∈ ∈ dist(b, Ax ) = dist(b, s ) dist(b, Ax). 0 0 ≤

10.2.1 Example Use a least squares solution to find a line that best fits the data points (1, 2), (3, 2), and (4, 5). 10 ORTHOGONALITY 6

Solution Here is the graph:

If we write the desired line as c + dx = y, then ideally the line would go through all three points giving the system c + d =2 c +3d =2 c +4d =5, which can be written as the matrix equation Ax = b, where 1 1 2 c A = 1 3 , x = , b = 2 .   d   1 4   5     The least squares solution is obtained by solving the equation AT Ax = AT b. We have 1 1 1 1 1 3 8 AT A = 1 3 = 1 3 4   8 26   1 4   and   2 1 1 1 9 AT b = 2 = , 1 3 4   28   5   so the matrix equation has corresponding augmented  matrix 3 8 9 8 3 8 9 − 8 26 28 3 ∼ 0 14 12 1     2 3 8 9 7 ∼ 0 7 6 8  − 1 21 0 15 21 ∼ 0 7 6 1   7 5 1 0 7 ∼ 0 1 6  7  10 ORTHOGONALITY 7

5 6 5 6 Therefore, c = 7 and d = 7 and the best fitting line is y = 7 + 7 x, which is the line shown in the graph. (We could tell in advance that the matrix equation Ax = b has no solution since the points are not collinear. In general, however, it is not necessary to first check that an equation has no solution before applying the least squares method since, if it has a solution, then that is what the method will produce.)

We can use the last example to explain the terminology “least squares.” The method gives a line that minimizes the sum of the squares of the vertical dis- 2 2 2 tances from the points to the lines. This sum of squares is s1 + s2 + s3, with s1, s2, and s3 as indicated:

We have

2 2 2 s2 + s2 + s2 = 2 (c + d) + 2 (c +3d) + 5 (c +4d) 1 2 3 − − − = b Ax 2 k − k    = dist(b, Ax)2

A least squares solution x = [c, d]T makes dist(b, Ax) as small as possible and 2 2 2 therefore makes s1 + s2 + s3 as small as possible as well.

10.2.2 Example Use a least squares solution to find a parabola that best fits the data points ( 1, 8), (0, 8), (1, 4), and (2, 16). − Solution Here is the graph: 10 ORTHOGONALITY 8

Ideally, a parabola would go through all four points, so if we write its equation as c + dx + ex2 = y, then

c d + e =8 − c =8 c + d + e =4 c +2d +4e = 16, which can be written as the matrix equation Ax = b, where

1 1 1 8 c 1− 0 0 8 A = , x = d , b = . 1 1 1  4  e 1 2 4 16           The least squares solution is obtained by solving the equation AT Ax = AT b. We have 1 1 1 1 111 42 6 1− 0 0 AT A = 1 0 1 2 = 26 8 1 1 1 −1 014 6 8 18 1 2 4         and 8 1 111 36 8 AT b = 1 0 1 2 = 28 ,  4  −1 014 76 16         10 ORTHOGONALITY 9

so the matrix equation has corresponding augmented matrix

1 42 6 36 2 2 1 3 18 1 3 1 − 26 8 28 2 1 3 4 14 2   1 ∼  − 6 8 18 76 2 3 4 9 38 2    2 1 3 18 0 5 5 10 1 ∼  0− 5− 9 −22   2 1 3 18  1 0 5 5 10 5 ∼  − − − − 1 0 0 4 12 4   2 1 3 18 0 1 1 2 ∼  0 0 1 3  1 3 − −  2 1 0 9  0 1 0 1 1 ∼  0 0 1 −3 −   1 2 0 0 10 2 0 1 0 1 ∼  0 0 1 −3   1 0 0 5 0 1 0 1 ∼  0 0 1 −3    Therefore, c = 5, d = 1, and e = 3, and the best fitting parabola has equation y =5 x +3x2, which− is the parabola shown in the graph. −

10.3 Gram-Schmidt process

n The vectors e1, e2,..., en in R have two useful properties:

ˆ they are pairwise orthogonal (any two are orthogonal),

ˆ each is a (a vector of one).

If we have a basis for an V that does not already have these properties, then we can change it into a basis that does by using the Gram-Schmidt process. 10 ORTHOGONALITY 10

Orthonormal set.

Let b , b ,..., bs be a set of vectors in the inner product { 1 2 } space V . The set is orthogonal if bi, bj = 0 for all i = j (the vectors are pairwise orthogonal). Theh seti is orthonormal6 if it is orthogonal and each vector is a unit vector.

Any orthogonal set of nonzero vectors can be changed into an orthonormal set by dividing each vector by its norm. For instance, [1, 1]T , [ 1, 1]T is an orthogo- nal set in R2 and each of these vectors has norm{√2, so [−1 , 1 ]}T , [ 1 , 1 ]T { √2 √2 − √2 √2 } is an orthonormal set:

Let V be an inner product space. The Gram-Schmidt process requires a formula for the projection of a vector on a subspace S. 10 ORTHOGONALITY 11

Theorem. Let S be a subspace of V and let u1, u2,..., us be an for S. Let b be a vector{ in V and let }

s p = b, ui ui. h i i=1 X

Then p S and b p S⊥: ∈ − ∈

The vector p is the projection of b on S.

Proof. We prove only the special case s = 2 so that u1, u2 is a basis for S and { } p = b, u u + b, u u . h 1i 1 h 2i 2 Since p is a linear combination of u1 and u2, it is in S. We have

b p, u = b b, u u + b, u u , u h − 1i − h 1i 1 h 2i 2 1 = b, u b, u u , u b, u u , u (iii) and (iv) h 1i − h 1ih 1 1i − h 2 ih 2 1i = b, u b, u (1) b, u (0) u is unit vector and u u h 1i − h 1i − h 1i 1 1 ⊥ 2 =0, so b p is orthogonal to u1. Similarly, b p is orthogonal to u2. Since S = Span− u , u , it follows from the theorem− in Section 10.1 that b p is in { 1 2} − S⊥.

We now describe the Gram-Schmidt process in the case of three initial vectors. Suppose that b , b , b is a basis for an inner product space V . The Gram- { 1 2 3} Schmidt process uses these vectors to produce an orthonormal basis u1, u2, u3 for V : { } 10 ORTHOGONALITY 12

Here are the first two steps of the process (the view is looking straight down at v 1 the plane spanned by b1 and b2 and we write v to mean v v): k k k k

b u = 1 1 b k 1k

b p u = 2 − 1 2 b p k 2 − 1k

where p = b , u u 1 h 2 1i 1

Here is the third step of the process:

b p u = 3 − 2 3 b p k 3 − 2k

where p = b , u u + b , u u 2 h 3 1i 1 h 3 2i 2

The general statement is as follows: 10 ORTHOGONALITY 13

Gram-Schmidt process.

Let b , b ,..., bn be a basis for the inner product space V . { 1 2 } Define vectors u1, u2,..., un recursively by b u = 1 1 b k 1k k 1 bk pk 1 − uk = − − , where pk 1 = bk, ui ui (k> 1) bk pk − h i 1 i=1 k − − k X

Then u , u ,..., un is an orthonormal basis for V . Moreover, { 1 2 } Span u , u ,..., uk = Span b , b ,..., bk for each k. { 1 2 } { 1 2 }

We will not give a detailed proof. However, we note that for each k the vector bk pk 1 is in the orthogonal complement of Span u1, u2,..., uk 1 by the − − { − } preceding theorem so that uk ui for 1 i < k. ⊥ ≤ T T 10.3.1 Example Let b1 = [1, 2, 2, 4] , b2 = [ 2, 0, 4, 0] , and b3 = [ 1, 1, 2, 0]T , and let S be the span of these vectors. Apply− − the Gram-Schmidt process− to b , b , b to obtain an orthonormal basis u , u , u for S. { 1 2 3} { 1 2 3} Solution Since the given vectors are in R4, the inner product is the and the norm of a vector x is given by the formula

x = √xT x = x2 + x2 + x2 + x2. k k 1 2 3 4 q We use the formula αx = α x to simplify computations. k k | |k k First,

b [1, 2, 2, 4]T u = 1 = = 1 [1, 2, 2, 4]T . 1 b [1, 2, 2, 4]T 5 k 1k k k Next,

p = b , u u = [ 2, 0, 4, 0]T , 1 [1, 2, 2, 4]T u = 2u = 2 [1, 2, 2, 4]T 1 h 2 1i 1 − − 5 1 − 1 − 5 and

b p = [ 2, 0, 4, 0]T + 2 [1, 2, 2, 4]T = 4 [ 2, 1, 4, 2]T , 2 − 1 − − 5 5 − − so b p 4 [ 2, 1, 4, 2]T u = 2 1 = 5 − − = 1 [ 2, 1, 4, 2]T . 2 − 4 T 5 b2 p1 [ 2, 1, 4, 2] − − k − k k 5 − − k 10 ORTHOGONALITY 14

Finally, p = b , u u + b , u u 2 h 3 1i 1 h 3 2i 2 = [ 1, 1, 2, 0]T , 1 [1, 2, 2, 4]T u + [ 1, 1, 2, 0]T , 1 [ 2, 1, 4, 2]T u − 5 1 − 5 − − 2 = u u = 1 [1, 2, 2, 4]T 1 [ 2, 1, 4, 2]T = 1 [3, 1, 6, 2]T 1 − 2 5 − 5 − − 5 and b p = [ 1, 1, 2, 0]T 1 [3, 1, 6, 2]T = 2 [ 4, 2, 2, 1]T , 3 − 2 − − 5 5 − − so b p 2 [ 4, 2, 2, 1]T u 3 2 5 − − 1 T 3 = − = 2 T = 5 [ 4, 2, 2, 1] . b3 p2 [ 4, 2, 2, 1] − − k − k k 5 − − k

1 T 1 T 1 T Therefore, 5 [1, 2, 2, 4] , 5 [ 2, 1, 4, 2] , 5 [ 4, 2, 2, 1] is an orthonormal basis for S.{ − − − − }

2 10.3.2 Example Let S be the span of the functions 1, x, and x in C[0,1]. Apply the Gram-Schmidt process to 1, x, x2 to obtain an orthonormal basis u , u , u for S. { } { 1 2 3} 1 Solution The inner product in C[0,1] is given by f,g = 0 f(x)g(x) dx. Let b = 1, b = x, and b = x2. h i 1 2 3 R First,

1 b1 = b1, b1 = 1, 1 = 1 dx =1, k k h i h i s 0 p p Z so b u = 1 =1. 1 b k 1k Next,

1 p = b , u u = x, 1 u = x dx u = 1 u = 1 1 h 2 1i 1 h i 1 1 2 1 2 Z0  and b p = x 1 , 2 − 1 − 2 so b p x 1 x 1 2 1 2 2 √ 1 √ u2 = − = − 1 = − =2 3(x 2 )= 3(2x 1). b p x 1 1 − − k 2 − 1k 2 (x )2 dx k − k 0 − 2 qR 10 ORTHOGONALITY 15

Finally,

p = b , u u + b , u u = x2, 1 u + x2, √3(2x 1) u 2 h 3 1i 1 h 3 2i 2 h i 1 h − i 2 1 1 = x2 dx u + √3 x2(2x 1) dx u = 1 u + √3 u 1 − 2 3 1 6 2 Z0   Z0  = 1 (1) + √3 √3(2x 1) = x 1 3 6 − − 6 and 

b p = x2 (x 1 )= x2 x + 1 , 3 − 2 − − 6 − 6 so b p x2 x + 1 x2 x + 1 u = 3 2 = − 6 = − 6 3 − 2 1 b p x x + 1 1 k 3 − 2k 6 (x2 x + )2 dx k − k 0 − 6 =6√5(x2 x + 1 )= √5(6x2 6qx R+ 1). − 6 −

Therefore 1, √3(2x 1), √5(6x2 6x + 1) is an orthonormal basis for S. { − − }

10 – Exercises

T T T 10–1 Let x1 = [ 1, 0, 1] , x2 = [0, 1, 1] , and x3 = [1, 1, 1] . Show that S T , where S =− Span x , x and T =− Span x . ⊥ { 1 2} { 3}

2 10–2 Let S be the subspace of P3 spanned by x and x . Find S⊥, where the inner product on P is as in Section 9.5 with x = 1, x = 0, and x = 1. 3 1 − 2 3 Hint: By the first theorem of Section 10.1 a polynomial p = ax2 + bx + c is in 2 S⊥ if and only if it is orthogonal to both x and x . 10 ORTHOGONALITY 16

10–3 Let S be the subspace of R4 spanned by the vectors [1, 3, 4, 0]T and [ 2, 6, 8, 1]T . − − −

(a) Find a basis for S⊥. (b) Verify that the basis vectors you found in (a) are orthogonal to the given vectors.

10–4 Use a least squares solution to find a line that best fits the data points (1, 2), (2, 4), and (3, 3).

10–5 Use a least squares solution to find a curve of the form y = a cos x+b sin x that best fits the data points (0, 1), (π/2, 2), and (π, 0).

Answer: 1 y = 2 cos x + 2 sin x.

10–6 Let u , u , u be an orthonormal set in an inner product space V . { 1 2 3} Prove that u1, u2, and u3 are linearly independent.

Hint: Suppose that α u + α u + α u = 0. Apply , u to both sides of 1 1 2 2 3 3 h 1i this equation and use properties of the inner product to conclude that α1 = 0. Note that similarly α2 = 0 and α3 = 0. You may use the fact that 0, v = 0 for every vector v V . h i ∈

10–7 Let v and w be vectors in an inner product space V and assume that w is nonzero. Let

v, w p = h i w w, w h i

The vector p is the projection of v on w. 10 ORTHOGONALITY 17

(a) Show that (v p) w. − ⊥ (b) Find p when v = [1, 2]T and w = [3, 1]T and sketch.

T T T 10–8 Let b1 = [0, 0, 1, 1] , b2 = [1, 0, 0, 1] , and b3 = [1, 0, 1, 0] , and let S be the span of these− vectors in R4. Apply the Gram-Schmidt− process to b , b , b to obtain an orthonormal basis u , u , u for S. { 1 2 3} { 1 2 3} Answer: 1 [0, 0, 1, 1]T , 1 [2, 0, 1, 1]T , 1 [1, 0, 1, 1]T . { √2 − √6 √3 − − }

2 10–9 Let S be the span of the functions x, x + 1, and x 1 in C[0,1]. Apply the Gram-Schmidt process to x, x +1, x2 1 to obtain an− orthonormal basis u , u , u for S. { − } { 1 2 3} Answer: √3x, 3x +2, √5(6x2 6x + 1) . { − − }