<<

The Gram-Schmidt Procedure, Orthogonal Complements, and Orthogonal Projections

1 Orthogonal Vectors and Gram-Schmidt

In this section, we will develop the standard algorithm for production orthonormal sets of vectors and explore some related matters We present the results in a general, real, , V rather than just in Rn. We will make use of this level of generality later on when we discuss the topic of conjugate direction methods and the related conjugate gradient methods for optimization. There, once again, we will meet the Gram-Schmidt process.

We begin by recalling that a set of non-zero vectors {v1,..., vk} is called an orthogonal set provided for any indices i, j, i 6= j,the inner products hvi, vji = 0. It is called 2 an orthonormal set provided that, in addition, hvi, vii = kvik = 1. It should be clear that any orthogonal set of vectors must be a linearly independent set since, if

α1v1 + ··· + αkvk = 0 then, for any i = 1, . . . , k, taking the inner product of the sum with vi, and using linearity of the inner product and the of the vectors,

hvi, α1v1 + ··· αkvki = αihvi, vii = 0 .

But since hvi, vii= 6 0 we must have αi = 0. This means, in particular, that in any n-dimensional space any set of n orthogonal vectors forms a .

The Gram-Schmidt Process is a constructive method, valid in any

finite-dimensional inner product space, which will replace any basis U = {u1, u2,..., un} with an V = {v1, v2,..., vn}. Moreover, the replacement is made in such a way that for all k = 1, 2, . . . , n, the subspace spanned by the first k vectors

{u1,..., uk} and that spanned by the new vectors {v1,..., vk} are the same.

To do this we proceed inductively. First observe that u1 6= 0 since U is a linearly independent set. We take v1 = u1/ku1k. Suppose now that the v1,..., vk have been chosen so that they form an orthonormal set and so that each vj , j = 1, . . . , k, is a linear combination of the vectors u1,..., uk. We write

w = uk+1 − (α1v1 + ··· + αkvk) , where the values of the scalars α1, . . . , αk are still to be determined. Since

hw, vji = huk+1 − (α1v1 + ··· + αkvk) , vji = huk+1, vji − αj , for i = 1, . . . , k ,

1 it follows that if we choose αj = huk+1, vji then hw, vji = 0 for j = 1, . . . , k.

Since, moreover, w is a linear combination of uk+1 and v1,..., vk, it is also a linear combination of uk+! and u1,..., uk. Finally, the vector w 6= 0 since u1,..., uk, uk+1 are linearly independent and the coefficient of uk+1 in the expression for w is not zero. We may now define vk+1 = w/kwk. The set {v1, vk, vk+1} is certainly an orthonormal set with the required properties and the proof by induction is complete.

We can summarize the procedure by listing a series of steps, It is really irrelevant whether we normalize with each step. We do not do it here, preferring to do so, if necessary, at the end of the procedure.

The Gram-Schmidt Procedure

Step 1: v1 = u1. Compute kv1k ;

hu2, v1i Step 2: v2 = u2 − 2 v1 . Compute kv2k ; kv1k

hu3, v1i hu3, v2i Step 3: v3 = u3 − 2 v1 − 2 v2 . Compute kv3k ; kv1k kv2k . .

k−1 X huk, vii Step k: v = u − v . Compute kv k . k k kv k2 i k i=1 i . .

2 Examples

Let us give some examples.

Example 2.1 Let

 1   1   1     U =  −1  ,  0  ,  1  = {u1, u2, u3} .  1 1 2 

2 > 2 Then v1 = (1, −1, 1) and kv1k = 3. Then, we compute v2:

 1  * 1   1 +  1  hu , v i 1 v = u − 2 1 v = 1 − 0 , −1 −1 2 2 kv k2 1   3       1 1 1 1 1

     1  1   1 3 2 2 2 2 =  0  −  −1  =  3  and kv2k = . 3 1 3 1 1 3

Finally,

hu3, v1i hu3, v2i v3 = u3 − 2 v1 − 2 v2 kv1k kv2k

 1   1   1  * 1   1 +  1  * 1  + 1 3 3 3 = 1 − 1 , −1 −1 − 1 ,  2   2    3       2    3   3  2 2 1 1 2 1 1 3 3

 1   1   1   1  − 2 5 2 1 = 1 − −1 − 2 =  0  and kv k2 = .   3   6     3 2 2 1 1 1 2

The normalized set is     √1 √1  −√1  (3) (6) (2)  √1   √2    vˆ1 =  −  , vˆ2 =   , vˆ3 =  0  .  (3)   (6)     √1   √1  √1 (3) (6) (2)

In a more geometric vein, we consider the next example.

3 > Example 2.2 Let H be the plane in R spanned by the vectors u1 = (1, 2, 2) and > u2 = (−1, 0, 2) . These vectors are clearly linearly independent and so form a basis for the plane. We wish to find an orthonormal basis for the plane and extend it to an orthonormal basis for all of R3. We add one linearly independent vector to the original 3 > set of two to form a basis for all of R by adding the vector u3 = (0, 0, 1) . Then the 3 set of vectors {u1, u2, u3} are a linearly independent set in R and so form a basis for

3 the entire space. If one has any doubt about the of this set, just compute det (col [u1, u2, v3]) = 2 6= 0.

Now, we could have orthogonalized the set consisting of the two given vectors, and then added a third, but the fact that the Gram-Schmidt procedure preserves the span at each stage, it is simpler to add the additional linearly independent vector now. The process then proceeds as usual:

2 2 2 2 v1 = u1 and ku1k = 1 + 2 + 2 = 9,.

 −1  * −1   1 +  1  hu , v i 1 v = u − 2 1 v = 0 − 0 , 2 2 2 2 9 1   9       2 2 2 2

 4   −1   1  − 3 3 = 0 − 2 =  − 2    9    3  2 2 4 3

2 Note that kv2k = 36/9 = 4. Finally,

hu , v i hu , v i v = u − 3 1 v − 3 2 v 3 3 9 1 4 2  0  * 0   1 +  1  * 0   −4/3 +  −4/3  1 1 = 0 − 0 , 2 2 − 0 , −2/3 −2/3   9       4       1 1 2 2 1 4/3 4/3

 4   2   0   1  − 2 1 3 9 = 0 − 2 −  − 2  =  − 2    9   3  3   9  1 2 4 1 3 9

Now v1 and v2 are an for the plane H, and, together with v3 form an orthogonal basis for all of R3. In order to get the orthonormal basis, we merely divide each by their norm. Since, as we have seen, kv1k = 3 and kv2k = 2, we need p p only compute the norm of kv3k = (4/81 + 4/81 + 1/81) = (1/9) = 1/3. Hence the vectors of the required orthonormal basis are

 1   2   2  3 − 3 3 vˆ =  2  , vˆ =  − 1  , and vˆ =  − 2  . 1  3  2  3  3  3  2 2 1 3 3 3

4 As another example, we leave the Rn.

Example 2.3 Here we look at the space of polynomials of degree at most 3, defined on the interval [−1, 1] and having real coefficients. This is the vector space we denote by P3([−1, 1]). We take, as a basis, the monomials {1, t, t2, t3}. These polynomials clearly 2 3 span the vector space and are linearly independent since, αo 1, α1 t + α2 t + α3 t = 0 for all t ∈ [−1, 1] then all the αi = 0 because such a polynomial, if not the zero polynomial, can have at most three real roots according to the Fundamental Theorem of Algebra.

In this vector space we introduce the form 1 Z hp1, p2i = p1(t) p2(t) dt . −1 We claim that this form is an inner product on P3([−1, 1]). To verify that the claim is true, we must show that the form is a positive definite, symmetric, bi-linear form.

First, since p1(t)p2(t) = p2(t)p1(t) the form is clearly symmetric. Moreover, since p2(t) ≥ 0 for any p ∈ P3([−1, 1]) we certainly know that 1 Z p2(t) dt ≥ 0

−1 and is equal to 0 if and only if p(t) ≡ 0 on [−1, 1]. So the form is positive definite.

Since we already know that the form is symmetric, it suffices to show that the form is 3 linear in the first argument. To this end, let p1, p2, p3 ∈ P ([−1, 1]) , α, β ∈ R. Then

1 Z Z 1 hα p1 + β p2, p3i = 1] (α p1(t) + β p2(t)) p3(t) dt = [α p1(t) p3(t) + β p2(t) p3(t)] dt − −1

1 1 Z Z = α p1(t) p3(t) dt + β p2(t) p3(t) dt = α hp1, p3i + β hp2, p3i . −1 −1 So the given form is, by definition, an inner product.

Now the given basis vectors certainly do not form an orthogonal set as can be seen by computing 1 Z " 3 1 2 2 t 2 h1, t i = t dt = = . 3 −1 3 −1

5 To find an orthogonal basis, we apply the Gram-Schmidt procedure to replace {1, t, t2, t3} with an orthogonal set {`o(t), `1(t), `2(t), `3(t)}.

1 Z " 1 2 `o(t) = 1 , k`ok = dt = t = 1 − (−1) = 2 . −1 −1

Z 1  h`o, ti 1 1 `1(t) = t − 2 `o(t) = t − t dt = t − 0 = t , k`ok 2 −1 2

1 Z " 3 1 2 2 t 2 k`1k = t dt = = . 3 −1 3 −1

These examples show us we have both a way of replacing a given basis with an or- thonormal one, and a way of extending a given set of vectors to an orthonormal basis for the entire space. Now we look at some aspects of the structure of inner product spaces that use the notion of orthogonality.

3 Direct Sum Decompositions and Projections

We begin with a finite dimensional inner product space, V , and a set U ⊂ V , U 6= ∅. Define the set U ⊥ by

U ⊥ = {v ∈ V | hv, ui = 0 for all u ∈ U} .

This set is called the orthogonal complement of U in V. The first observation we make is that, whatever the nature of the non-empty set U, the set U ⊥ is a subspace of V . This fact follows immediately from the bilinearity of the inner product. We leave that verification to the reader. Here is a list of properties that are also almost trivial to check:

(1) U ⊂ W implies W ⊥ ⊂ U ⊥.

(2) U ⊥ = [ span (U)]⊥.

(3) If U ∩ U ⊥ 6= ∅ then U ∩ U ⊥ = {0}. Clearly 0 ∈ U implies U ∩ U ⊥ = {0}.

(4) V ⊥ = {0}.

6 (5) U ⊂ span (U) ⊂ [U ⊥]⊥.

Our first result is to show that if U is in fact a subspace of V , then every vector can be decomposed into a sum of an element of U and a vector in U ⊥.

Theorem 3.1 If U is a subspace of V then every vector v ∈ V can be written uniquely as a sum v = u + w where u ∈ U and w ∈ U ⊥.

Proof: Let {u1, u2,..., uk} be an orthonormal basis for U. Then for any v we can write k ! k ! X X v = hv, uii ui + v − hv, uii ui = u + (v − u) . i=1 i=1 From this simple expression, we can see that, for any index j

* k + X hv − u, uji = hv, uji − hu, uji = hv, uji − hv, uii ui, uj i=1 k X = hv, uji − hv, uii hui uji = hv, uji − hv, uji = 0 . i=1

⊥ ⊥ n Thus v − u ∈ [span {u1,..., uk}] = U . So every vector v ∈ R can be written as a sum of a vector in U and a vector in U ⊥ and we indicate this fact by writing Rn = U + U ⊥. This is, of course, a familiar fact in R3 where we can write every vector in the (x, y)-plane as a combination of the orthogonal vectors ˆı and ˆ and every vector in R3 in the form v = (αˆı + βˆ} + γkˆ the vectors of the form γkˆ being orthogonal to the (x, y)-plane.

In fact this decomposition of a vector into a component lying in U and one lying in U ⊥ can be done in only one way. Suppose, on the contrary, that for some vector v ∈ V we ⊥ ⊥ have v = u1 +w1 with u1 ∈ U and w1 ∈ U as well as v = u2 +w2, u2 ∈ U, w2 ∈ U . Now we have assumed that U is a subspace and we know that U ⊥ is always a subspace ⊥ of V . Then v = u1 + w1 = u2 + w2 imply that u1 − u2 ∈ U and w2 − w1 ∈ U . But u1 + w1 = u2 + w2 implies that u1 − u2 = w2 − w1 and so both of these vectors must ⊥ lie in U ∩ U = {0}. So both differences vanish and hence u1 = u2 and w1 = w2; the decomposition is indeed unique. To reflect this unique decomposition, we say that V is a direct sum of the subspaces U and U ⊥ and we write V = U ⊕ U ⊥. So, for example, R3 = span{ˆı, ˆ} ⊕ span{kˆ}.

Now that we have the unique decomposition we can define a map P : V −→ V by

PU (v) = PU (u + w) = u. The uniqueness of the decomposition guarantees that this

7 map is well-defined. This map,PU , is called the projection operator on the subspace U. 2 Notice that it is an idempotent map: PU = PU ◦ PU = PU since

PU (PU (v)) = PU (PU (u + w)) = PU (u) = u .

It is also true that this mapping is a linear transformation as can be easily checked. These operators are particularly easy to write down if we work with an orthonormal basis of the subspace U, {u1,... uk}, and extend it to a basis for the entire vector space V . Indeed, by the linearity we have

k X PU (v) = hv, uii ui , for all v ∈ V. i=1

⊥ We claim also that the map I − PU is the projection onto the subspace U . This is ⊥ easy to check since (I − PU )(v) = Iv − PU v = v − u = w ∈ U . It is also clear that 2 PU (I − PU ) = PU − PU = PU − PU = 0, the zero operator. Also note that, for any two vectors v1, v2 ∈ V , and writing v1 = u1 + w1, v2 = u2 + w2, we have both

hPU v1, v2i = hu1, u2 + w2i = hu1, u2i + hu1, w2i = hu1, u2i ,

and

hv1,PU v2i = hu1 + w1, u2i = hu1, u2i + hw1, u2i = hu1, u2i ,

since hu1, w2i = hw1, u2i = 0.

This means that the orthogonal projector satisfies hPU v1, v2i = hv1,PU v2i, for all v1, v2 ∈ V so that PU is a self-adjoint operator. In fact, the class of orthogonal projection operators is characterized by linearity, idempotence, and symmetry.

Theorem 3.2 Let V be a finite dimensional inner product space. A necessary and sufficient condition for a linear operator to be a projection operator on a subspace U of V is that it be self-adjoint and idempotent.

Proof: We have already seen that if P is the projection operator onto U then it is both idempotent and self-adjoiint. So we need only deal with the converse statement. So we suppose that P is a linear, idempotent, self-adjoint operator on V and let U = {v ∈ V | P v = v}. Since I − P is also linear, U is just the null space of I − P and so is certainly a subspace. Now, for any v ∈ V we use the identity v = P v + (I − P )v. Since P is idempotent P (P v) = P 2v = P v so P v ∈ U. Moreover (I − P )v ∈ U ⊥.

8 Indeed, since P u = u for all u ∈ U and since hP v, ui = hv,P ui since P is self- adjoint, we have, for u ∈ U,

h(I − P )v, ui = hv, ui − hP v, ui = hv, ui − hv,P ui = hv, ui − hv, ui = 0 , we see that the decomposition v = P v + (I − P )v is the unique decomposition of v into components in U and U ⊥.

It is sometimes useful to have a concrete matrix representation of a projection operator. Suppose U is a subspace of V and write V = U ⊕U ⊥. Let us take an orthonormal basis for the subspace U, say U = {u1,..., uk}, and extend this basis to an orthonormal ⊥ basis for V . Then {uk+1,..., un} is an orthonormal basis for U . If we consider P : V −→ V and take this orthonormal basis in both the domain and the range, then

P u1 = u1 + 0u2 + ... + 0uk + 0uk+1 + ··· + 0un

P u2 = 0u1 + 1u2 + ... + 0uk + 0uk+1 + ··· + 0un ......

P uk = 0u1 + 0u2 + ... + 1uk + 0uk+1 + ··· + 0un ......

P un = u1 + 0u2 + ... + 0uk + 0uk+1 + ··· + 0un and so we see that the matrix representation in block form is ! I 0 P = k , 0 0n−k while I − P has the representation

! 0 0 I − P = k . 0 In−k

9