Chapter 5

Orthogonality

Orthogonality is the mathematical formalization of the geometrical property of per- pendicularity — when adapted to general inner product spaces. In finite-dimensional spaces, bases that consist of mutually orthogonal elements play an essential role in theoret- ical development, in applications, and in practical numerical algorithms. Many computa- tions become dramatically simpler and less prone to numerical instabilities when performed in orthogonal systems. In infinite-dimensional , orthogonality unlocks the secrets of and its extensions, and underlies basic series solution methods for partial differential equations. Indeed, many large scale modern applications, including signal processing and computer vision, would be impractical, if not completely infeasible were it not for the dramatic simplifying power of orthogonality. As we will later discover, orthogonal systems naturally arise as eigenvector and bases for symmetric matrices and self-adjoint boundary value problems for both ordinary and partial differen- tial equations, and so play a major role in both finite-dimensional and infinite-dimensional analysis. Orthogonality is motivated by geometry, and the resultinng contrsuctions have signifi- cant geometrical consequences. Orthogonal matrices play an essential role in the geometry of Euclidean space, in computer graphics and animation, and in three-dimensional image analysis; details will appear in Chapter 7. The orthogonal projection of a point onto a subspace turns out to be the closest point, that is the least squares minimizer. Moreover, when written in terms of an orthogonal for the subspace, the normal equations un- derlying data estimation have an elegant explicit solution formula that offers numerical advantages over the direct approach to least squares minimization. Yet another important fact is that the four fundamental subspaces of a matrix form mutually orthogonal pairs. This observation leads directly to a new characterization of the compatibility conditions for linear systems known as the Fredholm alternative, of further importance in the analysis of boundary value problems and differential equations. The duly famous Gram–Schmidt process will convert an arbitrary basis of an inner product space into an orthogonal basis. As such, it forms one of the premier algorithms of linear analysis, in both finite-dimensional vector spaces and also in function space, where it is used to construct and other systems of orthogonal functions. In Euclidean space, the Gram–Schmidt process can be re-interpreted as a new kind of matrix factorization, in which a nonsingular matrix A = QR is written as the product of an orthogonal matrix Q and an upper triangular matrix R. This turns out to have significant applications in the numerical computation of eigenvalues, cf. Section 10.6.

2/25/04 150 c 2004 Peter J. Olver ° Figure 5.1. Orthogonal Bases in R2 and R3.

5.1. Orthogonal Bases.

Let V be a fixed real† inner product space. Recall that two elements v, w V are called orthogonal if their inner product vanishes: v ; w = 0. In the case of vectors∈ in Euclidean space, orthogonality under the dot producth meansi that they meet at a right angle. A particularly important configuration is when V admits a basis consisting of mutually orthogonal elements.

Definition 5.1. A basis u1, . . . , un of V is called orthogonal if ui ; uj = 0 for all i = j. The basis is called orthonormal if, in addition, each vectorhhas uniti length: u 6 = 1, for all i = 1, . . . , n. k i k For the Euclidean space Rn equipped with the standard dot product, the simplest example of an orthonormal basis is the standard basis 1 0 0 0 1 0  0   0   0  e = . , e = . , . . . e = . . 1  .  2  .  n  .   .   .   .         0   0   0         0   0   1        Orthogonality follows because e e = 0, for i = j, while e = 1 implies normality. i · j 6 k i k Since a basis cannot contain the zero vector, there is an easy way to convert an orthogonal basis to an orthonormal basis. Namely, one replaces each basis vector by a unit vector pointing in the same direction, as in Lemma 3.16.

† The methods can be adapted more or less straightforwardly to complex inner product spaces. The main complication, as noted in Section 3.6, is that we need to be careful with the order of vectors appearing in the non-symmetric complex inner products. In this chapter, we will write all inner product formulas in the proper order so that they retain their validity in complex vector spaces.

2/25/04 151 c 2004 Peter J. Olver ° Lemma 5.2. If v1, . . . , vn is an orthogonal basis of a V , then the normalized vectors u = v / v form an orthonormal basis. i i k i k Example 5.3. The vectors

1 0 5 v = 2 , v = 1 , v = 2 , 1   2   3   1 2 −1 −   3     are easily seen to form a basis of R . Moreover, they are mutually perpendicular, v1 v2 = v v = v v = 0, and so form an orthogonal basis with respect to the standard· dot 1 · 3 2 · 3 product on R3. When we divide each orthogonal basis vector by its length, the result is the orthonormal basis

1 5 1 √6 0 0 5 √30 1 1 1 1 u = 2 = 2 , u = 1 = , u = 2 = 2 , 1 √6  √6 2 √5  √5 3 √30 −  − √30 1 2 2 1 −  1     1     − √6     √5     √30        satisfying u1 u2 = u1 u3 = u2 u3 = 0 and u1 = u2 = u3 = 1. The appearance of square roots· in the elemen· ts of· an orthonormalk kbasisk is kfairlyk typical.k

A useful observation is that any orthogonal collection of nonzero vectors is automati- cally linearly independent.

Proposition 5.4. If v1, . . . , vk V are nonzero, mutually orthogonal, so vi ; vj = 0 for all i = j, then they are linearly indep∈ endent. h i 6 Proof : Suppose c v + + c v = 0. 1 1 · · · k k

Let us take the inner product of this equation with any vi. Using linearity of the inner product and orthogonality, we compute

0 = c v + + c v ; v = c v ; v + + c v ; v = c v ; v = c v 2. h 1 1 · · · k k i i 1 h 1 i i · · · k h k i i i h i i i i k i k Therefore, provided v = 0, we conclude that the coefficient c = 0. Since this holds for i 6 i all i = 1, . . . , k, linear independence of v1, . . . , vk follows. Q.E.D. As a direct corollary, we infer that any collection of nonzero orthogonal vectors forms a basis for its span.

Proposition 5.5. Suppose v , . . . , v V are nonzero, mutually orthogonal ele- 1 n ∈ ments of an inner product space V . Then v1, . . . , vn form an orthogonal basis for their span W = span v , . . . , v V , which is therefore a subspace of dimension n = dim W . { 1 n} ⊂ In particular, if dim V = n, then v1, . . . , vn form a orthogonal basis for V . Orthogonality is also of profound significance for function spaces. Here is a relatively simple example.

2/25/04 152 c 2004 Peter J. Olver ° Example 5.6. Consider the vector space (2) consisting of all quadratic polynomials p(x) = α + β x + γ x2, equipped with the L2 innerP product and norm

1 1 p ; q = p(x) q(x) dx, p = p ; p = p(x)2 dx . h i 0 k k h i s 0 Z p Z The standard monomials 1, x, x2 do not form an orthogonal basis. Indeed,

1 ; x = 1 , 1 ; x2 = 1 , x ; x2 = 1 . h i 2 h i 3 h i 4 One orthogonal basis of (2) is provided by following polynomials: P p (x) = 1, p (x) = x 1 , p (x) = x2 x + 1 . (5.1) 1 2 − 2 3 − 6 Indeed, one easily verifies that p ; p = p ; p = p ; p = 0, while h 1 2 i h 1 3 i h 2 3 i 1 1 1 1 p = 1, p = = , p = = . k 1 k k 2 k √12 2 √3 k 3 k √180 6 √5 The corresponding orthonormal basis is found by dividing each orthogonal basis element by its norm:

u (x) = 1, u (x) = √3 ( 2x 1 ) , u (x) = √5 6x2 6x + 1 . 1 2 − 3 − In Section 5.4 below, we will learn how to systematically construct suc¡ h orthogonal¢systems of polynomials.

Computations in Orthogonal Bases What are the advantages of orthogonal and orthonormal bases? Once one has a basis of a vector space, a key issue is how to express other elements as linear combinations of the basis elements — that is, to find their coordinates in the prescribed basis. In general, this is not so easy, since it requires solving a system of linear equations, (2.22). In high dimensional situations arising in applications, computing the solution may require a considerable, if not infeasible amount of time and effort. However, if the basis is orthogonal, or, even better, orthonormal, then the change of basis computation requires almost no work. This is the crucial insight underlying the efficacy of both discrete and continuous Fourier analysis, least squares approximations and statistical analysis of large data sets, signal, image and video processing, and a multitude of other essential applications.

Theorem 5.7. Let u1, . . . , un be an orthonormal basis for an inner product space V . Then one can write any element v V as a linear combination ∈ v = c u + + c u , (5.2) 1 1 · · · n n in which its coordinates

c = v ; u , i = 1, . . . , n, (5.3) i h i i

2/25/04 153 c 2004 Peter J. Olver ° are explicitly given as inner products. Moreover, the norm

n v = c2 + + c2 = v ; u 2 (5.4) k k 1 · · · n v h i i u i = 1 q u X t is the square root of the sum of the squares of its orthonormal basis coordinates. Proof : Let us compute the inner product of (5.2) with one of the basis vectors. Using the orthonormality conditions 0 i = j, u ; u = 6 (5.5) h i j i 1 i = j, ½ and bilinearity of the inner product, we find

n n v ; u = c u ; u = c u ; u = c u 2 = c . h i i j j i j h j i i i k i k i * j = 1 + j = 1 X X To prove formula (5.4), we similarly expand

n n v 2 = v ; v = c c u ; u = c2, k k h i i j h i j i i i,j = 1 i = 1 X X again making use of the orthonormality of the basis elements. Q.E.D. Example 5.8. Let us rewrite the vector v = ( 1, 1, 1 )T in terms of the orthonormal basis 1 5 √6 0 √30 1 u =  2 , u =  , u =  2 , 1 √6 2 √5 3 − √30 2  1     1   − √6   √5   √30        constructed in Example 5.3. Computing the dot products v u = 2 , v u = 3 , v u = 4 , · 1 √6 · 2 √5 · 3 √30 we immediately conclude that

v = 2 u + 3 u + 4 u . √6 1 √5 2 √30 3 Needless to say, a direct computation based on solving the associated linear system, as in Chapter 2, is more tedious. While passage from an orthogonal basis to its orthonormal version is elementary — one simply divides each basis element by its norm — we shall often find it more convenient to work directly with the unnormalized version. The next result provides the corresponding formula expressing a vector in terms of an orthogonal, but not necessarily orthonormal basis. The proof proceeds exactly as in the orthonormal case, and details are left to the reader.

2/25/04 154 c 2004 Peter J. Olver ° Theorem 5.9. If v1, . . . , vn form an orthogonal basis, then the corresponding coor- dinates of a vector v ; v v = a v + + a v are given by a = h i i . (5.6) 1 1 · · · n n i v 2 k i k In this case, its norm can be computed using the formula

n n v ; v 2 v 2 = a2 v 2 = h i i . (5.7) k k i k i k v i = 1 i = 1 i X X µ k k ¶ Equation (5.6), along with its orthonormal simplification (5.3), is one of the most important and useful formulas we shall establish. Applications will appear repeatedly throughout the remainder of the text. Example 5.10. The wavelet basis 1 1 1 0 1 1 1 0 v = , v = , v = , v = , (5.8) 1  1  2  1  3  −0  4  1  −  1   1   0   1     −     −  introduced in Example 2.33 is, in fact, an orthogonal basis of R4. The norms are

v = 2, v = 2, v = √2, v = √2. k 1 k k 2 k k 3 k k 4 k Therefore, using (5.6), we can readily express any vector as a linear combination of the wavelet basis vectors. For example, 4 2 v = − = 2 v v + 3 v 2 v ,  1  1 − 2 3 − 4  5    where the wavelet basis coordinates are computed directly by v ; v 8 v ; v 4 v ; v 6 v ; v 4 h 1 i = = 2 , h 2 i = − = 1, h 3 i = = 3 h 4 i = − = 2 . v 2 4 v 2 4 − v 2 2 v 2 2 − k 1 k k 2 k k 3 k k 4 k This is clearly quicker than solving the linear system, as we did in Example 2.33. Finally, we note that

46 = v 2 = 22 v 2 +( 1)2 v 2 +32 v 2 +( 2)2 v 2 = 4 4+1 4+9 2+4 2, k k k 1 k − k 2 k k 3 k − k 4 k · · · · in conformity with (5.7). Example 5.11. The same formulae are equally valid for orthogonal bases in function spaces. For example, to express a quadratic polynomial

p(x) = c p (x) + c p (x) + c p (x) = c + c x 1 + c x2 x + 1 1 1 2 2 3 3 1 2 − 2 3 − 6 ¡ ¢ ¡ ¢ 2/25/04 155 c 2004 Peter J. Olver ° in terms of the orthogonal basis (5.1), we merely compute the L2 inner product

p ; p 1 p ; p 1 c = h 1 i = p(x) dx, c = h 2 i = 12 p(x) x 1 dx, 1 p 2 2 p 2 − 2 k 1 k Z0 k 2 k Z0 p ; p 1 ¡ ¢ c = h 2 i = 180 p(x) x2 x + 1 dx. 3 p 2 − 6 k 2 k Z0 ¡ ¢ Thus, for example,

p(x) = x2 + x + 1 = 11 + 2 x 1 + x2 x + 1 , 6 − 2 − 6 as is easily checked. ¡ ¢ ¡ ¢ Example 5.12. Perhaps the most important example of an orthogonal basis is provided by the basic trigonometric functions. Let (n) denote the vector space consisting of all trigonometric polynomials T

j k T (x) = ajk (sin x) (cos x) (5.9) 0 j+k n ≤ X≤ of degree n. The individual monomials (sin x)j (cos x)k span (n), but they do not form a basis owing≤ to identities stemming from the basic trigonometricTformula cos2 x+sin2 x = 1; see Example 2.19 for additional details. Exercise introduced a more convenient spanning set consisting of the 2n + 1 functions

1, cos x, sin x, cos 2x, sin 2x, . . . cos nx, sin nx. (5.10) Let us prove that these functions form an orthogonal basis of (n) with respect to the L2 inner product and norm: T

π π f ; g = f(x) g(x) dx, f 2 = f(x)2 dx. (5.11) h i π k k π Z− Z− The elementary integration formulae

π 0, k = l, 0, k = l, sin k x sin l x dx = 6 π 6 π ( π, k = l = 0, cos k x cos l x dx = 2π, k = l = 0, Z− 6 π  π Z−  π, k = l = 0, 6 cos k x sin l x dx = 0, (5.12) π  Z− which are valid for all nonnegative integers k, l 0, imply the orthogonality relations ≥ cos k x ; cos l x = sin k x ; sin l x = 0, k = l, cos k x ; sin l x = 0, h i h i 6 h i (5.13) cos k x = sin k x = √π , k = 0, 1 = √2π . k k k k 6 k k Proposition 5.5 now assures us that the functions (5.10) form a basis for (n). One key consequence is that dim (n) = 2n + 1 — a fact that is not so easy to establishT directly. T

2/25/04 156 c 2004 Peter J. Olver ° Orthogonality of the trigonometric functions (5.10) means that we can compute the coefficients a0, . . . , an, b1, . . . , bn of any trigonometric polynomial n

p(x) = a0 + ak cos k x + bk sin k x (5.14) k = 1 X ¡ ¢ by an explicit integration formula. Namely, f ; 1 1 π f ; cos k x 1 π a0 = h 2i = f(x) dx, ak = h 2i = f(x) cos k x dx, 1 2π π cos k x π π k k Z− k k Z− (5.15) f ; sin k x 1 π bk = h 2i = f(x) sin k x dx, k 1. sin k x π π ≥ k k Z− These formulae will play an essential role in the theory and applications of ; see Chapter 12. 5.2. The Gram–Schmidt Process. Once one becomes convinced of the utility of orthogonal and orthonormal bases, a natural question arises: How can we construct them? A practical algorithm was first discovered by Laplace in the eighteenth century. Today the algorithm is known as the Gram–Schmidt process, after its rediscovery by Jorgen Gram, who we already met in Chapter 3, and Erhard Schmidt, a nineteenth century German mathematician. It forms one of the award-winning algorithms of applied and computational linear algebra. Let V denote a finite-dimensional inner product space. (To begin with, you might wish to think of V as a subspace of Rm, equipped with the standard Euclidean dot product, although the algorithm will be formulated in complete generality.) We assume that we already know some basis w1, . . . , wn of V , where n = dim V . Our goal is to use this information to construct an orthogonal basis v1, . . . , vn. We will construct the orthogonal basis elements one by one. Since initially we are not worrying about normality, there are no conditions on the first orthogonal basis element v1 and so there is no harm in choosing

v1 = w1.

Note that v1 = 0 since w1 appears in the original basis. The second basis vector must be orthogonal to6 the first: v ; v = 0. Let us try to arrange this by subtracting a suitable h 2 1 i multiple of v1, and set v = w c v , 2 2 − 1 where c is a scalar to be determined. The orthogonality condition 0 = v ; v = w ; v c v ; v = w ; v c v 2 h 2 1 i h 2 1 i − h 1 1 i h 2 1 i − k 1 k w ; v requires that c = h 2 1 i , and therefore v 2 k 1 k w ; v v = w h 2 1 i v . (5.16) 2 2 − v 2 1 k 1 k

2/25/04 157 c 2004 Peter J. Olver ° Linear independence of v = w and w ensures that v = 0. (Check!) 1 1 2 2 6 Next, we construct v = w c v c v 3 3 − 1 1 − 2 2

by subtracting suitable multiples of the first two orthogonal basis elements from w3. We want v3 to be orthogonal to both v1 and v2. Since we already arranged that v1 ; v2 = 0, this requires h i

0 = v ; v = w ; v c v ; v , 0 = v ; v = w ; v c v ; v , h 3 1 i h 3 1 i − 1 h 1 1 i h 3 2 i h 3 2 i − 2 h 2 2 i and hence w ; v w ; v c = h 3 1 i , c = h 3 2 i . 1 v 2 2 v 2 k 1 k k 2 k Therefore, the next orthogonal basis vector is given by the formula w ; v w ; v v = w h 3 1 i v h 3 2 i v . 3 3 − v 2 1 − v 2 2 k 1 k k 2 k Continuing in the same manner, suppose we have already constructed the mutually orthogonal vectors v1, . . . , vk 1 as linear combinations of w1, . . . , wk 1. The next or- − − thogonal basis element vk will be obtained from wk by subtracting off a suitable linear combination of the previous orthogonal basis elements:

vk = wk c1 v1 ck 1 vk 1. − − · · · − − −

Since v1, . . . , vk 1 are already orthogonal, the orthogonality constraint − 0 = v ; v = w ; v c v ; v h k j i h k j i − j h j j i requires w ; v c = h k j i for j = 1, . . . , k 1. (5.17) j v 2 − k j k In this fashion, we establish the general Gram–Schmidt formula

k 1 − wk ; vj v = w h i v , k = 1, . . . , n. (5.18) k k − v 2 j j = 1 j X k k The Gram–Schmidt process (5.18) defines a recursive procedure for constructing the or- thogonal basis vectors v1, . . . , vn. If we are actually after an orthonormal basis u1, . . . , un, we merely normalize the resulting orthogonal basis vectors, setting uk = vk/ vk for k = 1, . . . , n. k k Example 5.13. The vectors

1 1 2 w = 1 , w = 0 , w = 2 , (5.19) 1   2   3   1 2 −3 −       2/25/04 158 c 2004 Peter J. Olver ° 3 are readily seen to form a basis† of R . To construct an orthogonal basis (with respect to the standard dot product) using the Gram–Schmidt procedure, we begin by setting 1 v = w = 1 . The next basis vector is 1 1   1 −   4 1 1 w v 1 3 v = w 2 · 1 v = 0 − 1 = 1 . 2 2 − v 2 1   − 3    3  k 1 k 2 1 5 −      3  The last orthogonal basis vector is  

2 1 4 1 w v w v 3 7 3 v = w 3 · 1 v 3 · 2 v = − 1 = 3 . 3 3 2 1 2 2  2   1  14  3   2  − v1 − v2 − − 3 − − k k k k 3 1 3 5 1      3   2     −     −  The reader can easily validate the orthogonality of v1, v2, v3. An orthonormal basis is obtained by dividing each vector by its length. Since

14 7 √ v1 = 3 , v2 = , v3 = . k k k k r 3 k k r2 we produce the corresponding orthonormal basis vectors

1 4 2 √3 √42 √14 u =  1 , u =  1 , u =  3 . (5.20) 1 √3 2 √42 3 − √14  1   5   1   − √3   √42   − √14        Example 5.14. Here is a typical sort of problem: Find an orthonormal basis (with respect to the dot product) for the subspace V R4 consisting of all vectors which are ⊂T T orthogonal to the given vector a = ( 1, 2, 1, 3 ) . Now, a vector x = ( x1, x2, x3, x4 ) is orthogonal to a if and only if − − x a = x + 2x x 3x = 0. · 1 2 − 3 − 4 Solving this homogeneous linear system by the usual method, we find that the free variables are x2, x3, x4, and so a (non-orthogonal) basis for the subspace is 2 1 3 −1 0 0 w = , w = , w = . 1  0  2  1  3  0   0   0   1       

† This will, in fact, be a consequence of the successful completion of the Gram–Schmidt al- gorithm and does not need to be checked in advance. If the given vectors were not linearly v 0 independent, then eventually one of the Gram–Schmidt vectors would vanish, k = , and the process will break down.

2/25/04 159 c 2004 Peter J. Olver ° To obtain an orthogonal basis, we apply the Gram–Schmidt process. First, v1 = w1 = ( 2, 1, 0, 0 )T . The next element is − 1 2 1 − 5 w2 v1 0 2 1 2 v = w v = = 5 . 2 2 · 2 1   −     − v1 1 − 5 0 1 k k 0 0        0        The last element of our orthogonal basis is

3 2 1 1 − 5 2 w v w v 0 6 1 3 2 1 v = w 3 · 1 v 3 · 2 v =   −   5  5  =  . 3 3 2 1 2 2 6 1 − v1 − v2 0 − 5 0 − 1 k k k k     5    − 2           1   0   0   1          An orthonormal basis can then be obtained by dividing each vi by its length:

2 1 1 − √5 √30 √10 1 2 2      √  u = √5 , u = √30 , u = 10 . (5.21) 1 2 3 1 0 5      √30   − √10         0   0   2       √10        The Gram–Schmidt procedure has one final important consequence. By definition, every finite-dimensional vector space admits a basis. Given an inner product, the Gram– Schmidt process enables one to construct an orthogonal and even orthonormal basis of the space. Therefore, we have, in fact, implemented a constructive proof of the existence of orthogonal and orthonormal bases of finite-dimensional inner product spaces.

Theorem 5.15. A (non-zero) finite-dimensional inner product space has an or- thonormal basis.

In fact, if the dimension is greater than 1, then the inner product space has infinitely many orthonormal bases.

Modifications of the Gram–Schmidt Process

With the basic Gram–Schmidt algorithm now in hand, it is worth looking at a couple of reformulations that have both practical and theoretical uses. The first is an alternative approach that can be used to directly construct the orthonormal basis vectors u1, . . . , un from the basis w1, . . . , wn. We begin by replacing each orthogonal basis vector in the basic Gram–Schmidt for- mula (5.18) by its normalized version u = v / v . As a result, we find that the original j j k j k

2/25/04 160 c 2004 Peter J. Olver ° basis vectors can be expressed in terms of the orthonormal basis via a “triangular” system

w1 = r11 u1,

w2 = r12 u1 + r22 u2, w = r u + r u + r u , 3 13 1 23 2 33 3 (5.22) ...... w = r u + r u + + r u . n 1n 1 2n 2 · · · nn n The coefficients rij can, in fact, be directly computed without using the intermediate derivation. Indeed, taking the inner product of the equation for wj with the orthonormal basis vector u for i j, we find, in view of the orthonormality constraints (5.5), i ≤ w ; u = r u + + r u ; u = r u ; u + + r u ; u = r , h j i i h 1j 1 · · · jj j i i 1j h 1 i i · · · jj h n i i ij and hence r = w ; u . (5.23) ij h j i i On the other hand, according to (5.4),

2 2 2 2 2 wj = r1j u1 + + rjj uj = r1j + + rj 1,j + rjj. (5.24) k k k · · · k · · · − The pair of equations (5.23), (5.24) can be rearranged to devise a recursive procedure to compute the orthonormal basis. At stage j, we assume that we have already constructed u1, . . . , uj 1. We then compute† − r = w ; u , for each i = 1, . . . , j 1. (5.25) ij h j i i − We obtain the next orthonormal basis vector uj by computing w r u r u 2 2 2 j 1j 1 j 1,j j 1 rjj = wj r1j rj 1,j , uj = − − · · · − − − . (5.26) k k − − · · · − − rjj q Running through the formulae (5.25), (5.26) for j = 1, . . . , n leads to the same orthonormal basis u1, . . . , un as the previous version of the Gram–Schmidt procedure. Example 5.16. Let us apply the revised algorithm to the vectors 1 1 2 w = 1 , w = 0 , w = 2 , 1   2   3   1 2 −3 − of Example 5.13. To begin, we set    

1 √ w 3 r = w = √3 , u = 1 = 1 . 11 1 1  √3  k k r11  1   − √3   

† When j = 1, there is nothing to do.

2/25/04 161 c 2004 Peter J. Olver ° The next step is to compute

4 √42 1 14 w r u r = w ; u = , r = w 2 r2 = , u = 2 − 12 1 = 1 . 12 2 1 22 2 12 2  √42  h i − √3 k k − r 3 r22 q  5   √42  The final step yields   21 √ r13 = w3 ; u1 = 3 , r23 = w3 ; u2 = , h i − h i r 2 2 √14 7 w r u r u r = w 2 r2 r2 = , u = 3 − 13 1 − 23 2 = 3 . 33 3 13 23 3  √14  k k − − r2 r33 − q  1   − √14  As advertised, the result is the same orthonormal basis vectors that we found in Exam- ple 5.13. For hand computations, the original version (5.18) of the Gram–Schmidt process is slightly easier — even if one does ultimately want an orthonormal basis — since it avoids the square roots that are ubiquitous in the orthonormal version (5.25), (5.26). On the other hand, for numerical implementation on a computer, the orthonormal version is a bit faster, as it involves fewer arithmetic operations. However, in practical, large scale computations, both versions of the Gram–Schmidt process suffer from a serious flaw. They are subject to numerical instabilities, and so round- off errors may seriously corrupt the computations, producing inaccurate, non-orthogonal vectors. Fortunately, there is a simple rearrangement of the calculation that obviates this difficulty and leads to the numerically robust algorithm that is most often used in practice. The idea is to treat the vectors simultaneously rather than sequentially, making full use of the orthonormal basis vectors as they arise. More specifically, the algorithm begins as before — we take u = w / w . We then subtract off the appropriate multiples of u 1 1 k 1 k 1 from all of the remaining basis vectors so as to arrange their orthogonality to u1. This is accomplished by setting w(2) = w w ; u u , for k = 2, . . . , n. k k − h k 1 i 1 (2) (2) The second orthonormal basis vector u2 = w2 / w2 is then obtained by normalizing. (2) (2) k k We next modify the remaining w3 , . . . , wn to produce vectors w(3) = w(2) w(2) ; u u , k = 3, . . . , n, k k − h k 2 i 2 (3) (3) that are orthogonal to both u1 and u2. Then u3 = w3 / w3 is the next orthonormal basis element, and the process continues. The full algorithmk startsk with the initial basis (1) vectors wj = wk , k = 1, . . . , n, and then recursively computes

(j) wj (j+1) (j) (j) j = 1, . . . n, uj = , wk = wk wk ; uj uj, (5.27) w(j) − h i k = j + 1, . . . , n. k j k

2/25/04 162 c 2004 Peter J. Olver ° (In the final phase, when j = n, the second formula is no longer relevant.) The result is a numerically stable computation of the same orthonormal basis vectors u1, . . . , un. Example 5.17. Let us apply the stable Gram–Schmidt process (5.27) to the basis vectors 2 0 1 w(1) = w = 2 , w(1) = w = 4 , w(1) = w = 2 . 1 1   2 2   3 3   1 1 3 − − −       2 w(1) 3 1 2 The first orthonormal basis vector is u1 = = . Next, we compute w(1)  3  k 1 k 1  3   −  2 1 w(2) = w(1) w(1) ; u u = −2 , w(2) = w(1) w(1) ; u u = −0 . 2 2 2 1 1   3 3 3 1 1   − h i 0 − h i 2 −     1 w(2) − √2 The second orthonormal basis vector is u = 2 =  1 . Finally, 2 (2) √2 w2 k k  0      1 √2 − 2 w(3) − 6 (3) (2) (2) 1 3 √2 w3 = w3 w3 ; u2 u2 = , u3 = = . − h i  − 2  w(3)  − 6  2 k 3 k 2 √2    3   −   −  The resulting vectors u1, u2, u3 form the desired orthonormal basis.

5.3. Orthogonal Matrices.

Matrices whose columns form an orthonormal basis of Rn relative to the standard Euclidean dot product have a distinguished role. Such “orthogonal matrices” appear in a wide range of applications in geometry, physics, quantum mechanics, crystallography, partial differential equations, symmetry theory, and special functions. Rotational motions of bodies in three-dimensional space are described by orthogonal matrices, and hence they lie at the foundations of rigid body mechanics, including satellite and underwater vehicle motions, as well as three-dimensional computer graphics and animation. Furthermore, orthogonal matrices are an essential ingredient in one of the most important methods of numerical linear algebra: the QR algorithm for computing eigenvalues of matrices, to be presented in Section 10.6.

Definition 5.18. A square matrix Q is called an orthogonal matrix if it satisfies

QT Q = I . (5.28)

2/25/04 163 c 2004 Peter J. Olver ° The orthogonality condition implies that one can easily invert an orthogonal matrix:

1 T Q− = Q . (5.29) In fact, the two conditions are equivalent, and hence a matrix is orthogonal if and only if its inverse is equal to its transpose. The second important characterization of orthogonal matrices relates them directly to orthonormal bases. Proposition 5.19. A matrix Q is orthogonal if and only if its columns form an orthonormal basis with respect to the Euclidean dot product on Rn. T T Proof : Let u1, . . . , un be the columns of Q. Then u1 , . . . , un are the rows of the transposed matrix QT . The (i, j)th entry of the product QT Q = I is given as the product 1, i = j, of the ith row of QT times the jth column of Q. Thus, u u = uT u = which i · j i j 0, i = j, ½ 6 are precisely the conditions (5.5) for u1, . . . , un to form an orthonormal basis. Q.E.D. Warning: Technically, we should be referring to an “orthonormal” matrix, not an “orthogonal” matrix. But the terminology is so standard throughout mathematics that we have no choice but to adopt it here. There is no commonly accepted name for a matrix whose columns form an orthogonal but not orthonormal basis. a b Example 5.20. A 2 2 matrix Q = is orthogonal if and only if its columns × c d µ ¶ a b u = , u = , form an orthonormal basis of 2. Equivalently, the requirement 1 c 2 d R µ ¶ µ ¶ a c a b a2 + c2 a c + b d 1 0 QT Q = = = , b d c d a c + b d b2 + d2 0 1 µ ¶ µ ¶ µ ¶ µ ¶ implies that its entries must satisfy the algebraic equations

a2 + c2 = 1, a c + b d = 0, b2 + d2 = 1. T T The first and last equations say the points ( a, c ) and ( b, d ) lie on the unit circle in R2, and so a = cos θ, c = sin θ, b = cos ψ, d = sin ψ, for some choice of angles θ, ψ. The remaining orthogonality condition is

0 = ac + bd = cos θ cos ψ + sin θ sin ψ = cos(θ ψ). − 1 This implies that θ and ψ differ by a right angle: ψ = θ 2 π. The sign leads to two cases: § § b = sin θ, d = cos θ, or b = sin θ, d = cos θ. − − As a result, every 2 2 orthogonal matrix has one of two possible forms × cos θ sin θ cos θ sin θ − or , where 0 θ < 2π. (5.30) sin θ cos θ sin θ cos θ ≤ µ ¶ µ − ¶

2/25/04 164 c 2004 Peter J. Olver ° u2

u1 u1

θ θ

u2

Figure 5.2. Orthonormal Bases in R2.

The corresponding orthonormal bases are illustrated in Figure 5.2. Note that the former is a right-handed basis which can be obtained from the standard basis e1, e2 by a rotation through angle θ, while the latter has the opposite, reflected orientation. Example 5.21. A 3 3 orthogonal matrix Q = ( u u u ) is prescribed by 3 × 1 2 3 mutually perpendicular vectors of unit length in R3. For instance, the orthonormal basis 1 4 2 √3 √42 √14 constructed in (5.20) corresponds to the orthogonal matrix Q =  1 1 3 . √3 √42 − √14  1 5 1   − √3 √42 − √14  A complete list of 3 3 orthogonal matrices can be found in Exercises and .  × Lemma 5.22. An orthogonal matrix has determinant det Q = 1. § Proof : Taking the determinant of (5.28) gives 1 = det I = det(QT Q) = det QT det Q = (det Q)2, which immediately proves the lemma. Q.E.D. An orthogonal matrix is called proper if it has determinant +1. Geometrically, the columns of a proper orthogonal matrices form a right-handed basis of R n, as defined in Exercise . An improper orthogonal matrix, with determinant 1, corresponds to a left- handed basis that lives in a mirror image world. − Proposition 5.23. The product of two orthogonal matrices is also orthogonal. T T T T T T Proof : If Q1 Q1 = I = Q2 Q2, then (Q1 Q2) (Q1 Q2) = Q2 Q1 Q1 Q2 = Q2 Q2 = I , and so Q1 Q2 is also orthogonal. Q.E.D.

This property says that the set of all orthogonal matrices forms a group†, known as the orthogonal group. The orthogonal group lies at the foundation of everyday Euclidean geometry.

† The precise mathematical definition of a group can be found in Exercise . Although they

2/25/04 165 c 2004 Peter J. Olver ° The QR Factorization The Gram–Schmidt procedure for orthonormalizing bases of Rn can be reinterpreted as a matrix factorization. This is more subtle than the LU factorization that resulted from Gaussian elimination, but is of comparable significance, and is used in a broad range of applications in mathematics, physics, engineering and numerical analysis. n Let w1, . . . , wn be a basis of R , and let u1, . . . , un be the corresponding orthonormal basis that results from any one of the three implementations of the Gram–Schmidt process. We assemble both sets of column vectors to form nonsingular n n matrices × A = ( w1 w2 . . . wn ), Q = ( u1 u2 . . . un ).

Since the ui form an orthonormal basis, Q is an orthogonal matrix. In view of the matrix multiplication formula (2.14), the Gram–Schmidt equations (5.22) can be recast into an equivalent matrix form:

r11 r12 . . . r1n 0 r22 . . . r2n A = QR, where R =  . . . .  (5.31) . . .. .    0 0 . . . rnn    is an upper triangular matrix, whose entries are the coefficients in (5.25), (5.26). Since the Gram–Schmidt process works on any basis, the only requirement on the matrix A is that its columns form a basis of Rn, and hence A can be any nonsingular matrix. We have therefore established the celebrated QR factorization of nonsingular matrices. Theorem 5.24. Any nonsingular matrix A can be factorized, A = QR, into the product of an orthogonal matrix Q and an upper triangular matrix R. The factorization is unique if all the diagonal entries of R are assumed to be positive. The proof of uniqueness is left to Exercise . 1 1 2 Example 5.25. The columns of the matrix A = 1 0 2 are the same as   1 2 −3 − the basis vectors considered in Example 5.16. The orthonormal basis (5.20) constructed using the Gram–Schmidt algorithm leads to the orthogonal and upper triangular matrices

1 4 2 √3 1 √3 √3 √42 √14 − √3 − Q =  1 1 3 , R =  0 √14 √21 . √3 √42 − √14 √3 √2  1 5 1   0 0 √7   − √3 √42 − √14   √2  The reader maywish to verify that, indeed, A = QR.  

will not play a significant role in this text, groups underlie the mathematical formalization of symmetry and, as such, form one of the most fundamental concepts in advanced mathematics and its applications, particularly quantum mechanics and modern theoretical physics. Indeed, according to the mathematician Felix Klein, cf. [152], all geometry is based on group theory.

2/25/04 166 c 2004 Peter J. Olver ° QR Factorization of a Matrix A

start for j = 1 to n set r = a2 + + a2 jj 1j · · · nj if rjj = 0,qstop; print “A has linearly dependent columns” else for i = 1 to n

set aij = aij/rjj next i for k = j + 1 to n set r = a a + + a a jk 1j 1k · · · nj nk for i = 1 to n set a = a a r ik ik − ij jk next i next k next j end

While any of the three implementations of the Gram–Schmidt algorithm will produce the QR factorization of a given matrix A = ( w1 w2 . . . wn ), the stable version, as encoded in equations (5.27), is the one to use in practical computations, as it is the least likely to fail due to numerical artifacts due to round-off errors. The accompanying pseudocode program reformulates the algorithm purely in terms of the matrix entries aij of A. During the course of the algorithm, the entries of the matrix A are successively overwritten; the final result is the orthogonal matrix Q appearing in place of A. The entries rij of R must be stored separately.

Example 5.26. Let us factorize the matrix

2 1 0 0 1 2 1 0 A =  0 1 2 1   0 0 1 2    using the numerically stable QR algorithm. As in the program, we work directly on the matrix A, gradually changing it into orthogonal form. In the first loop, we set r11 = √5 to be the norm of the first column vector of A. We then normalize the first column

2/25/04 167 c 2004 Peter J. Olver ° 2 1 0 0 √5 1 2 1 0  √5  by dividing by r11; the resulting matrix is . The next entries r12 =  0 1 2 1       0 0 1 2  4 , r = 1 , r = 0, are obtained by taking the dot pro ducts of the first column √5 13 √5 14 with the other three columns. For j = 1, 2, 3, we subtract r1j times the first column 2 3 2 0 √5 − 5 − 5  1 6 4 0  from the jth column; the result √5 5 5 is a matrix whose first column is  0 1 2 1       0 0 1 2  normalized to have unit length, and whose second, third and fourth columns are orthogonal to it. In the next loop, we normalize the second column by dividing by its norm r22 = 2 3 2 0 √5 − √70 − 5 1 6 4  0  14 , and so obtain the matrix √5 √70 5 . We then take dot products of 5 5  0 2 1  q  √70     0 0 1 2    the second column with the remaining two columns to produce r = 16 , r = 5 . 23 √70 24 14 Subtracting these multiples of the second column from the third and fourth columns,q we 2 3 2 3 √5 − √70 7 14 1 6 4 3   obtain √5 √70 − 7 − 7 , which now has its first two columns orthonormalized, 5 6 9  0   √70 7 14     0 0 1 2  and orthogonal to the last two columns. We then normalize the third column by dividing 2 3 2 3 √5 − √70 √105 14  1 6 4 3  15 √5 √70 − √105 − 7 20 by r33 = 7 , and so . Finally, we subtract r34 = √  0 5 6 9  105 q  √70 √105 14     0 0 7 2   √105  times the third column from the fourth column. Dividing the resulting fourth column by 5 its norm r44 = 6 results in the final formulas q 2 3 2 1 √5 4 1 0 √5 − √70 √105 − √30 √5 √5  1 6 4 2   0 √14 16 √5  Q = √5 √70 − √105 √30 , R = √5 √70 √14 ,  0 5 6 3   0 0 √15 20   √70 √105 − √30   √7 √105     √   0 0 7 4   0 0 0 5   √105 √30   √6      for the A = QR factorization.

2/25/04 168 c 2004 Peter J. Olver ° The QR factorization can be used as an alternative to Gaussian elimination to solve linear systems. Indeed, the system

A x = b becomes QR x = b, and hence R x = QT b, (5.32) 1 T since Q− = Q is an orthogonal matrix. Since R is upper triangular, the latter system can be solved for x by back substitution. The resulting algorithm, while more expensive to compute, does offer some numerical advantages over traditional Gaussian elimination as it is less prone to inaccuracies resulting from ill-conditioned coefficient matrices. Example 5.27. Let us apply the A = QR factorization

1 4 2 √3 1 √3 1 1 2 √3 √42 √14 − √3 − 1 0 2 =  1 1 3   0 √14 √21 ,  −  √3 √42 − √14 √3 √2 1 2 3 √ −  1 5 1   0 0 7     − √3 √42 − √14   √2      that we found in Example 5.25 to solve the linear system A x = ( 0, 4, 5 )T . We first compute − 1 1 1 3 √3 √3 √3 − √3 0 −√ QT b = 4 1 5 = 21 .  √42 √42 √42   4   √2  − √7  2 3 1  5    √14 − √14 − √14     √2        We then solve the upper triangular system

√3 1 √3 3√3 − √3 − x −√ R x = 0 √14 √21 = 21  √3 √2  y   √2  √ √7  0 0 7  z    √2    √2       by back substitution, leading to the solution x = ( 2, 0, 1 )T . − 5.4. Orthogonal Polynomials. Orthogonal and orthonormal bases play, if anything, an even more essential role in function space. Unlike the Euclidean space Rn, most of the obvious bases of a typical (finite dimensional) function space are not orthogonal with respect to a natural inner product. Thus, the computation of an orthonormal basis of functions is a critical step towards simplification of the analysis. The Gram–Schmidt algorithm, in any of the above formulations, can be successfully applied to construct suitably orthogonal functions. The most important examples are the classical orthogonal polynomials that arise in approxi- mation and interpolation theory. Other orthogonal systems of functions play starring roles in Fourier analysis and its generalizations, in quantum mechanics, in the solution of par- tial differential equations by separation of variables, and a host of further applications in mathematics, physics, and elsewhere. In this section, we concentrate on orthogonal polynomials. Orthogonal systems of trigonometric functions will be the focus of Chapters 12 and 13. Orthogonal systems of

2/25/04 169 c 2004 Peter J. Olver ° special functions, including Bessel functions and , used in the solution to linear partial differential equations, will be covered in Chapters 17 and 18.

The

We shall construct an orthonormal basis for the vector space (n) of polynomials of degree n. For definiteness the construction will be based on theP particular L2 inner product≤ 1 p ; q = p(t) q(t) dt (5.33) h i 1 Z− on the interval [ 1, 1]. The underlying method will work on any other bounded interval, as well as for weigh− ted inner products, but (5.33) is of particular importance. We shall apply the Gram–Schmidt orthogonalization process to the elementary, but non-orthogonal monomial basis 1, t, t2, . . . tn. Because

2 1 , k + l even, tk ; tl = tk+l dt = k + l + 1 (5.34) h i 1  Z−  0, k + l odd, odd degree monomials are orthogonal to those of even degree, but that is all. We will use q0(t), q1(t), . . . , qn(t) to denote the resulting orthogonal polynomials. We begin by setting

1 2 2 q0(t) = 1, with q0 = q0(t) dt = 2. k k 1 Z− According to formula (5.16), the next orthogonal basis polynomial is

t ; q q (t) = t h 0 i q (t) = t, with q 2 = 2 . 1 − q 2 0 k 1 k 3 k 0 k In general, the Gram–Schmidt formula (5.18) says we should define

k 1 k − t ; q q (t) = tk h j i q (t) for k = 1, 2, . . . . k − q 2 j j = 0 j X k k We can thus recursively compute the next few orthogonal polynomials:

q (t) = t2 1 , q 2 = 8 , 2 − 3 k 2 k 45 q (t) = t3 3 t, q 2 = 8 , (5.35) 3 − 5 k 3 k 175 q (t) = t4 6 t2 + 3 , q 2 = 128 , 4 − 7 35 k 4 k 11025 and so on. The reader can verify that they satisfy the orthogonality conditions

1 qi ; qj = qi(t) qj(t) dt = 0, i = j. h i 1 6 Z−

2/25/04 170 c 2004 Peter J. Olver ° The resulting polynomials q0, q1, q2, . . . are known as the monic† Legendre polynomials, in honor of the 18th century French mathematician Adrien–Marie Legendre who first used them for studying Newtonian gravitation. Since the first n, namely q0, . . . , qn 1 span the (n 1) − subspace − of polynomials of degree n 1. The next one, qn, can be characterized as the uniqueP monic polynomial that is orthogonal≤ − to every polynomial of degree n 1: ≤ − tk ; q = 0, k = 0, . . . , n 1. (5.36) h n i − Since the monic Legendre polynomials form a basis for the space of polynomials, one can uniquely rewrite any polynomial of degree n as a linear combination: p(t) = c q (t) + c q (t) + + c q (t). (5.37) 0 0 1 1 · · · n n In view of the general orthogonality formula (5.6), the coefficients are simply given by inner products 1 p ; qk 1 ck = h 2i = 2 p(t) qk(t) dt, k = 0, . . . , n. (5.38) qk qk 1 k k k k Z− For example, t4 = q (t) + 6 q (t) + 1 q (t) = (t4 6 t2 + 3 ) + 6 (t2 1 ) + 1 , 4 7 2 5 0 − 7 35 7 − 3 5 where the coefficients can either be obtained directly, or via (5.38); for example, 1 1 11025 4 175 4 c4 = t q4(t) dt = 1, c3 = t q3(t) dt = 0, and so on. 128 1 8 1 Z− Z− The classical Legendre polynomials are certain scalar multiples, namely (2k)! P (t) = q (t), k = 0, 1, 2, . . . , (5.39) k 2k (k !)2 k and so also define a system of orthogonal polynomials. The multiple is fixed by the requirement that Pk(1) = 1, (5.40) which is not so important here, but does play a role in other applications. The first few classical Legendre polynomials are P (t) = 1, P 2 = 2, 0 k 0 k P (t) = t, P 2 = 2 , 1 k 1 k 3 P (t) = 3 t2 1 , P 2 = 2 , 2 2 − 2 k 2 k 5 P (t) = 5 t3 3 t, P 2 = 2 , 3 2 − 2 k 3 k 7 P (t) = 35 t4 15 t2 + 3 , P 2 = 2 , 4 8 − 4 8 k 4 k 9 P (t) = 63 t5 35 t3 + 15 t. P 2 = 2 , 5 8 − 4 8 k 5 k 11 P (t) = 231 t6 315 t4 + 105 t2 5 , P 2 = 2 , 6 16 − 16 16 − 16 k 6 k 13

† A polynomial is called monic if its leading coefficient is equal to 1.

2/25/04 171 c 2004 Peter J. Olver ° 1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

-1 -0.5 0.5 1 -1 -0.5 0.5 1 -1 -0.5 0.5 1

-0.5 -0.5 -0.5

-1 -1 -1

-1.5 -1.5 -1.5

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

-1 -0.5 0.5 1 -1 -0.5 0.5 1 -1 -0.5 0.5 1

-0.5 -0.5 -0.5

-1 -1 -1

-1.5 -1.5 -1.5

Figure 5.3. The Legendre Polynomials P0(t), . . . , P5(t). and are graphed in Figure 5.3. There is, in fact, an explicit formula for the Legendre poly- nomials, due to the early nineteenth century Portuguese mathematician Olinde Rodrigues. Theorem 5.28. The Rodrigues formula for the classical Legendre polynomials is

1 dk 2 2 k (5.41) Pk(t) = k k (t 1) , Pk = , k = 0, 1, 2, . . . . 2 k ! dt − k k r2k + 1 Thus, for example, 1 d4 1 d4 P (t) = (t2 1)4 = (t2 1)4 = 35 t4 15 t2 + 3 . 4 16 4! dt4 − 384 dt4 − 8 − 4 8 · Proof of Theorem 5.28 : Let dj R (t) = (t2 1)k, (5.42) j,k dtj − which is evidently a polynomial of degree 2k j. In particular, the Rodrigues formula (5.41) − claims that Pk(t) is a multiple of Rk,k(t). Note that d R (t) = R (t). (5.43) dt j,k j+1,k Moreover, R (1) = 0 = R ( 1) whenever j < k, (5.44) j,k j,k − since, by the product rule, differentiating (t2 1)k a total of j < k times still leaves at least one factor of t2 1 in each summand, whic− h therefore vanishes at t = 1. − §

2/25/04 172 c 2004 Peter J. Olver ° Lemma 5.29. If j k, then the polynomial Rj,k(t) is orthogonal to all polynomials of degree j 1. ≤ ≤ − Proof : In other words, 1 i i t ; Rj,k = t Rj,k(t) dt = 0, for all 0 i < j k. (5.45) h i 1 ≤ ≤ Z− Since j > 0, we use (5.43) to write Rj,k(t) = Rj0 1,k(t). Integrating by parts, − 1 i i t ; Rj,k = t Rj0 1,k(t) dt h i 1 − Z− 1 1 i i 1 i 1 = i t Rj 1,k(t) i t − Rj 1,k(t) dt = i t − ; Rj 1,k , − t = 1 − 1 − − h − i ¯ − Z− ¯ where the boundary terms vanish¯ owing to (5.44). We then repeat the process, and even- tually i i 1 t ; Rj,k = i t − ; Rj 1,k h i − h − i i 2 i = i(i 1) t − ; Rj 2,k = = ( 1) i(i 1) 3 2 1 ; Rj i,k − h − i · · · − − · · · · h − i 1 1 i i = ( 1) i! Rj i,k(t) dt = ( 1) i! Rj i 1,k(t) = 0, − 1 − − − − t = 1 Z− ¯ − ¯ Q.E.D. using (5.44) and the restriction that j > i. ¯ In particular, Rk,k(t) is a polynomial of degree k which is orthogonal to every polyno- mial of degree k 1. By our earlier remarks, this implies that it is a constant multiple, ≤ − Rk,k(t) = ck Pk(t) th of the k Legendre polynomial. To determine ck, we need only compare the leading terms: dk dk (2k)! (2k)! R (t) = (t2 1)k = (t2 k+ ) = tk+ , while P (t) = t2 k+ . k,k dtk − dtk · · · k ! · · · k 2k (k !)2 · · · k We conclude that ck = 2 k !, which proves (5.41). The proof of the formula for Pk can be found in Exercise . k Q.E.D.k The Legendre polynomials play an important role in many aspects of applied mathe- matics, including numerical analysis, least squares approximation of functions, and solution of partial differential equations, and we shall encounter them again later in the book.

Other Systems of Orthogonal Polynomials The standard Legendre polynomials form an orthogonal system with respect to the L2 inner product on the interval [ 1, 1]. Dealing with any other interval, or, more generally, a weighted inner product betw−een functions on an interval, leads to a different, suitably adapted collection of orthogonal polynomials. In all cases, applying the Gram–Schmidt process to the standard monomials 1, t, t2, t3, . . . will produce the desired orthogonal sys- tem.

2/25/04 173 c 2004 Peter J. Olver ° Example 5.30. In this example, we construct orthogonal polynomials for the weighted inner product ∞ t f ; g = f(t) g(t) e− dt (5.46) h i Z0 on the interval [0, ). A straightforward integration by parts proves that ∞

∞ k t i j i 2 t e− dt = k !, and hence t ; t = (i + j)! t = (2i)! (5.47) h i k k Z0 We apply the Gram–Schmidt process to construct a system of orthogonal polynomials for this inner product. The first few are

q (t) = 1, q 2 = 1, 0 k 0 k t ; q q (t) = t h 0 i q (t) = t 1, q 2 = 1, 1 − q 2 0 − k 1 k k 0 k (5.48) t2 ; q t2 ; q q (t) = t2 h 0 i q (t) h 1 i q (t) = t2 4t + 2, q 2 = 4, 2 − q 2 0 − q 2 1 − k 2 k k 0 k k 1 k q (t) = t3 9t2 + 18t 6, q 2 = 36. 3 − − k 3 k The resulting orthogonal polynomials are known as the (monic) , named after the nineteenth century French mathematician Edmond Laguerre.

In some cases, a change of variables may be used to relate systems of orthogonal poly- nomials and thereby circumvent the Gram–Schmidt computation. Suppose, for instance, that our goal is to construct an orthogonal system of polynomials for the L2 inner product b f ; g = f(t) g(t) dt on the interval [a, b]. The key remark is that we can map the hh ii a interval [ Z1, 1] to [a, b] by a simple linear change of variables of the form s = α + β t. Specifically−, 2t b a s = − − will change a t b to 1 s 1. (5.49) b a ≤ ≤ − ≤ ≤ − The map changes functions F (s), G(s), defined for 1 s 1, into the functions − ≤ ≤ 2t b a 2t b a f(t) = F − − , g(t) = G − − , (5.50) b a b a µ − ¶ µ − ¶ defined for a t b. Moreover, interpreting (5.49) as a change of variables for the ≤ ≤ 2 integrals, we have ds = dt, and so the inner products are related by b a − b b 2t b a 2t b a f ; g = f(t) g(t) dt = F − − G − − dt h i b a b a Za Za µ − ¶ µ − ¶ (5.51) 1 b a b a = F (s) G(s) − ds = − F ; G , 1 2 2 h i Z−

2/25/04 174 c 2004 Peter J. Olver ° where the final L2 inner product is over the interval [ 1, 1]. In particular, the change of variables maintains orthogonality, while rescaling the norms:−

b a f ; g = 0 if and only if F ; G = 0, while f = − F . h i h i k k r 2 k k (5.52) Moreover, if F (s) is a polynomial of degree n in s, then f(t) is a polynomial of degree n in t and vice versa. Let us apply these observations to the Legendre polynomials:

Proposition 5.31. The transformed Legendre polynomials

2t b a b a P (t) = P − − , k = 0, 1, 2, . . . , P = − , (5.53) k k b a k k k 2k + 1 µ − ¶ r formean orthogonal system of polynomials with respect to thee L2 inner product on the interval [a, b].

1 Example 5.32. Consider the L2 inner product f ; g = f(t) g(t) dt. The map hh ii 0 s = 2t 1 will change 0 t 1 to 1 s 1. According Zto Proposition 5.31, this − ≤ ≤ − ≤ ≤ change of variables will convert the Legendre polynomials Pk(s) into an orthogonal system of polynomials on [0, 1], namely 1 P (t) = P (2t 1), with corresponding L2 norms P = . k k − k k k 2k + 1 r The firste few are e P (t) = 1, P (t) = 20t3 30t2 + 12t 1, 0 3 − − P (t) = 2t 1, P (t) = 70t4 140t3 + 90t2 20t + 1, (5.54) e1 − e4 − − P (t) = 6t2 6t + 1, P (t) = 252t5 630t4 + 560t3 210t2 + 30t 1. e2 − e5 − − − One can, as an alternative, derive these formulae through a direct application of the Gram– e e Schmidt process.

5.5. Orthogonal Projections and Least Squares.

In Chapter 4, we introduced, solved, and learned the significance of the problem of finding the point on a prescribed subspace that lies closest to a given point. In this section, we shall discover an important geometrical interpretation of our solution: the closest point is the orthogonal projection of the point onto the subspace. Furthermore, if we adopt an orthogonal, or, even better, orthonormal basis for the subspace, then the closest point can be constructed through a very elegant, explicit formula. In this manner, orthogo- nality provides a very efficient shortcut that completely bypasses tedious solving of the normal equations. The resulting orthogonal projection formulae have important practical consequences for the solution of a wide range of least squares minimization problems.

2/25/04 175 c 2004 Peter J. Olver ° v z

w

W

Figure 5.4. The Orthogonal Projection of a Vector onto a Subspace.

Orthogonal Projection We begin by characterizing the orthogonal projection of a vector onto a subspace. Throughout this section, we will consider a prescribed finite-dimensional subspace W V of a real inner product space. While the subspace W is necessarily finite-dimensional,⊂the inner product space V may be infinite-dimensional. Initially, though, you may wish to think of W as a subspace of Euclidean space V = Rm equipped with the ordinary dot product, which is the easiest case to visualize as it coincides with our geometric intuition. Definition 5.33. A vector z V is said to be orthogonal to the subspace W V if it is orthogonal to every vector in W∈, so z ; w = 0 for all w W . ⊂ h i ∈ Given a basis w1, . . . , wn for W , we note that z is orthogonal to W if and only if it is orthogonal to every basis vector: z ; w = 0 for i = 1, . . . , n. Indeed, any other h i i vector in W has the form w = c1 w1 + + cn wn and hence, by linearity, z ; w = c z ; w + + c z ; w = 0, as required.· · · h i 1 h 1 i · · · n h n i Definition 5.34. The orthogonal projection of v onto the subspace W is the element w W that makes the difference z = v w orthogonal to W . ∈ − The geometric configuration underlying orthogonal projection is illustrated in Fig- ure 5.4. As we shall see, the orthogonal projection is unique. The explicit construction is greatly simplified by taking a orthonormal basis of the subspace, which, if necessary, can be arranged by applying the Gram–Schmidt process to a known basis. (The direct construction of the orthogonal projection in terms of a general basis appears in Exercise .)

Theorem 5.35. Let u1, . . . , un be an orthonormal basis for the subspace W V . Then the orthogonal projection of a vector v V onto W is ⊂ ∈ w = c u + + c u where c = v ; u , i = 1, . . . , n. (5.55) 1 1 · · · n n i h i i

2/25/04 176 c 2004 Peter J. Olver ° Proof : First, since u1, . . . , un form a basis of the subspace, the orthogonal projection element w = c1 u1 + + cn un must be some linear combination. Definition 5.34 requires that the difference z·=· · v w be orthogonal to W , and, to this end, it suffices to check orthogonality to the basis −vectors. By our orthonormality assumption, 0 = z ; u = v ; u w ; u = v ; u c u + + c u ; u h i i h i i − h i i h i i − h 1 1 · · · n n i i = v ; u c u ; u c u ; u = v ; u c . h i i − 1 h 1 i i − · · · − n h n i i h i i − i We deduce that the coefficients ci = v ; ui of the orthogonal projection w are uniquely prescribed by the orthogonality requiremenh t,i which also proves its uniqueness. Q.E.D.

More generally, if we employ an orthogonal basis v1, . . . , vn for the subspace W , then the same argument demonstrates that the orthogonal projection of v onto W is given by v ; v w = a v + + a v , where a = h i i , i = 1, . . . , n. (5.56) 1 1 · · · n n i v 2 k i k Of course, we could equally well replace the orthogonal basis by the orthonormal basis obtained by dividing each vector by its length: ui = vi/ vi . The reader should be able to prove that the two formulae (5.55), (5.56) for the orthogonalk k projection yield the same vector w. Example 5.36. Consider the plane W R3 spanned by the orthogonal vectors ⊂ 1 1 v = 2 , v = 1 . 1   2   −1 1     T According to formula (5.56), the orthogonal projection of v = ( 1, 0, 0 ) onto W is 1 1 1 v ; v v ; v 1 1 2 w 1 v 2 v = h 2i 1 + h 2i 2 = 2 + 1 = 0 . v1 v2 6  −  3    1  k k k k 1 1 2       Alternatively, we can replace v1, v2 by the orthonormal basis 1 1 √ √ v 6 v 3 u = 1 = 2 , u = 2 = 1 . 1  √6  2  √3  v1 − v2 k k  1  k k  1   √6   √3  Then, using the orthonormal version (5.55),  

1 1 1 √6 √3 1 1 2 w = v ; u u + v ; u u =  2  +  1  = 0 . h 1 i 1 h 2 i 2 √6 − √6 √3 √3   1 1 1  √6   √3   2        The answer is, of course, the same. As the reader may notice, while the theoretical formula is simpler when written in an orthonormal basis, for hand computations the orthogonal ba- sis version avoids dealing with square roots. (Of course, when performing the computation on a computer, this is not a significant issue.)

2/25/04 177 c 2004 Peter J. Olver ° An intriguing observation is that the coefficients in the orthogonal projection formulae (5.55), (5.56) coincide with the formulae (5.3), (5.6) for writing a vector in terms of an orthonormal or orthogonal basis. Indeed, if v were an element of W , then it would coincide with its orthogonal projection, w = v. (Why?) As a result, the orthogonal projection formulae include the orthogonal basis formulae as a special case. It is also worth noting that the same formulae occur in the Gram–Schmidt algorithm, (5.17). This observation leads to a useful geometric interpretation for the Gram–Schmidt construction. For each k = 1, . . . , n, let V = span w , . . . , w = span v , . . . , v = span u , . . . , u (5.57) k { 1 k} { 1 k} { 1 k} denote the k-dimensional subspace spanned by the first k basis elements. The basic Gram– Schmidt formula (5.18) can be rewritten in the form v = w y , where y is the orthog- k k − k k onal projection of wk onto the subspace Vk 1. The resulting vector vk is, by construction, orthogonal to the subspace, and hence orthogonal− to all of the previous basis elements, which serves to rejustify the Gram–Schmidt procedure. Orthogonal Least Squares Now we make an important connection: The orthogonal projection of a vector onto a subspace is also the least squares vector — the closest point in the subspace! Theorem 5.37. Let W V be a finite-dimensional subspace of an inner product space. Given a vector v V , ⊂the closest point or least squares minimizer w W is the same as the orthogonal pro∈ jection of v onto W . ∈ Proof : Let w W be the orthogonal projection of v onto the subspace, which requires that the difference ∈z = v w be orthogonal to W . Suppose w W is any other vector in the subspace. Then, − ∈ v w 2 = w + z w 2 = w w 2 + 2 w w ; z +e z 2 = w w 2 + z 2. k − k k − k k − k h − i k k k − k k k The inner product term w w ; z = 0 vanishes because z is orthogonal to every vector in W , includinge w w. Sinceeh −z = vi we is uniquely prescribe ed by the vector ev, the second 2 − − 2 term z does not change withe the choice of the point w W . Therefore, v w k k 2 ∈ k − k will be minimized if eand only if w w is minimized. Since w W is allowed to be any element of the subspace W , kthe −minimalk value w w 2 = 0 ∈occurs when w = w. k − e k e Thus, the closest point w coincides withe the orthogonal projectione w. Q.E.D. In particular, if we are supplied with an orthonormal ore orthogonal basis of oure sub- space, then we can easilye compute the closest least squares point w W to v using our orthogonal projection formulae (5.55) or (5.56). In this way, orthogonal∈ bases have a very dramatic simplifying effect on the least squares approximation formulae. They completely avoid the construction of and solution to the much more complicated normal equations. Example 5.38. Consider the least squares problem of finding the closest point w to T the vector v = ( 1, 2, 2, 1 ) in the three-dimensional subspace spanned by the orthogonal†

4 † We use the ordinary Euclidean norm on R throughout this example.

2/25/04 178 c 2004 Peter J. Olver ° T T T vectors v1 = ( 1, 1, 2, 0 ) , v2 = ( 0, 2, 1, 2 ) , v3 = ( 1, 1, 0, 1 ) . Since the spanning vectors are orthogonal− (but not orthonormal),− we can use the orthogonal projection for- mula (5.56) to find w = a1 v1 + a2 v2 + a3 v3: v ; v 3 1 v ; v 4 v ; v 4 a = h 1 i = = , a = h 2 i = , a = h 3 i = . 1 v 2 6 2 2 v 2 9 3 v 2 3 k 1 k k 2 k k 3 k 1 4 4 11 31 13 4 T Thus, w = 2 v1 + 9 v2 + 3 v3 = 6 , 18 , 9 , 9 is the closest point to v in the subspace. Even when we only know a¡non-orthogonal¢ basis for the subspace, it may still be a good strategy to first utilize Gram–Schmidt in order to replace it by an orthogonal or even orthonormal basis, and then apply the orthogonal projection formulae (5.55), (5.56) to calculate the least squares point. Not only does this simplify the final computation, it will often avoid the numerical inaccuracies due to ill-conditioning that can afflict the direct solution to the normal equations (4.28). The following example illustrates this alternative procedure. Example 5.39. Let us return to the problem, solved in Example 4.6, of finding the closest point on the plane V spanned by w = ( 1, 2, 1 )T , w = ( 2, 3, 1 )T to the point 1 − 2 − − b = ( 1, 0, 0 )T . We proceed now by first using the Gram–Schmidt process to compute an orthogonal basis 5 1 2 w2 v1 v1 = w1 = 2 , v2 = w2 · w1 = 2 ,   − v 2  −  1 k 1 k 3 −    − 2  for our subspace. As a result, we can use the orthogonal projection form ula (5.56) to produce the closest point 2 b v b v 3 v? = · 1 v + · 2 v = 1 , v 2 1 v 2 2  − 15  k 1 k k 2 k 7  − 15  reconfirming our earlier result. By this device, we have managed to completely circumvent the tedious solving of linear equations. Let us revisit the problem, described in Section 4.4, of approximating experimental data by a least squares minimization procedure. The required calculations are significantly simplified by the introduction of an orthogonal basis of the least squares subspace. Given sample points t1, . . . , tm, let

k k k T tk = t1 , t2 , . . . , tm , k = 0, 1, 2, . . . , be the vector obtained by sampling¡ the monomial¢ tk. More generally, sampling a polyno- mial y = p(t) = α + α t + + α tn (5.58) 0 1 · · · n results in the self-same linear combination

p = ( p(t ), . . . , p(t ) )T = α t + α t + + α t (5.59) 1 n 0 0 1 1 · · · n n

2/25/04 179 c 2004 Peter J. Olver ° of monomial sample vectors. We conclude that the sampled polynomial vectors form a m subspace W = span t0, . . . , tn R spanned by the monomial sample vectors. T ⊂ Let y = ( y1, y2,©. . . , ym ) indicateª data that has been measured at the sample points. The polynomial least squares approximation is, by definition, the polynomial y = p(t) whose sample vector p W is the closest point or, equivalently, the orthogonal projection ∈ of the data vector y onto the subspace W . The sample vectors t0, . . . , tn are not orthogonal, and so a direct approach requires solving the normal equations (4.35) for the least squares coefficients α0, . . . , αn. An alternative is to first use the Gram–Schmidt procedure to construct an orthogonal basis for the subspace W , from which the least squares coefficients are then found by simply taking inner products. Let us adopt the rescaled version m 1 v ; w = v w = v w (5.60) h i m i i i = 1 X m of the standard dot product† on R . If v, w represent the sample vectors corresponding to the functions v(t), w(t), then their inner product v ; w is equal to the average value of the product function v(t) w(t) on the m sample poinh ts. Ini particular, the inner product between our “monomial” basis vectors corresponding to sampling tk and tl is m m 1 1 t ; t = tk tl = tk+l = tk+l, (5.61) h k l i m i i m i i = 1 i = 1 X X which is the averaged sample value of the monomial tk+l. To keep the formulae reasonably simple, let us further assume‡ that the sample points are evenly spaced and symmetric about 0. The second requirement means that if ti is a sample point, so is ti. An example would be the seven sample points 3, 2, 1, 0, 1, 2, 3. As a consequence of−these two assumptions, the averaged sample values− of−the−odd powers 2i+1 of t vanish: t = 0. Hence, by (5.61), the sample vectors tk and tl are orthogonal whenever k + l is odd.

Applying the Gram–Schmidt algorithm to t0, t1, t2, . . . produces the orthogonal basis T vectors q0, q1, q2, . . . . Each qk = ( qk(t1), . . . , qk(tm) ) can be interpreted as the sample vector for a certain interpolating polynomial qk(t) of degree k. The first few polynomials along with their corresponding orthogonal sample vectors follow: q (t) = 1, q = t , q 2 = 1, 0 0 0 k 0k 2 2 q1(t) = t, q1 = t1, q1 = t , k k 2 q (t) = t2 t2 , q = t t2 t , q 2 = t4 t2 , (5.62) 2 − 2 2 − 0 k 2k − 2 t4 t4 ¡ t4 ¢ q (t) = t3 t , q = t t , q 2 = t6 . 3 2 3 3 2 1 3 2 − t − t k k − ¡ t ¢

† For weighted least squares, we would use an appropriately weighted inner product.

‡ The method works without these particular assumptions, but the formulas become more unwieldy. See Exercise for details.

2/25/04 180 c 2004 Peter J. Olver ° 3 3 3

2 2 2

1 1 1

-3 -2 -1 1 2 3 -3 -2 -1 1 2 3 -3 -2 -1 1 2 3

-1 -1 -1

Linear Quadratic Cubic Figure 5.5. Least Squares Data Approximations.

With these in hand, the least squares approximating polynomial of degree n to the given data vector y is given by a linear combination

p(t) = a q (t) + a q (t) + a q (t) + + a q (t). (5.63) 0 0 1 1 2 2 · · · n n The required coefficients are obtained directly through the orthogonality formulae (5.56), and so q ; y q y a = h k i = k . (5.64) k 2 2 qk q k k k Thus, once we have arranged the orthogonal basis, we no longer need to solve a linear system to construct the least squares approximation. An additional advantage of the orthogonal basis is that, unlike the direct method, the formulae (5.64) for the least squares coefficients do not depend on the degree of the approx- imating polynomial. As a result, one can readily increase the degree, and, presumably, the accuracy, of the approximant without having to recompute any of the lower degree terms. For instance, if a quadratic polynomial p2(t) = a0 + a1 q1(t) + a2 q2(t) is insufficiently ac- curate, the cubic least squares approximant p3(t) = p2(t) + a3 q3(t) can be constructed without having to recompute the quadratic coefficients a0, a1, a2. This doesn’t work when using non-orthogonal monomials, whose coefficients will be affected by an increase in the degree of the approximating polynomial. .

Example 5.40. Consider the following tabulated sample values:

t 3 2 1 0 1 2 3 i − − − y 1.4 1.3 .6 .1 .9 1.8 2.9 i − − − To compute polynomial least squares fits of degrees 1, 2 and 3, we begin by computing the polynomials (5.62), which for the given sample points ti are

q (t) = 1, q (t) = t, q (t) = t2 4, q (t) = t3 7t , 0 1 2 − 3 − q 2 = 1, q 2 = 4, q 2 = 12, q 2 = 216 . k 0k k 1k k 2k k 3k 7

2/25/04 181 c 2004 Peter J. Olver ° Thus, to four decimal places, the coefficients for the least squares approximation (5.63) are

a = q ; y = 0.3429, a = 1 q ; y = 0.7357, 0 h 0 i 1 4 h 1 i a = 1 q ; y = 0.0738, a = 7 q ; y = 0.0083. 2 12 h 2 i 3 216 h 3 i − To obtain the best linear approximation, we use

p1(t) = a0 q0(t) + a1 q1(t) = 0.3429 + 0.7357 t, with a least squares error of 0.7081. Similarly, the quadratic and cubic least squares approximations are

p (t) = 0.3429 + 0.7357 t + 0.0738 (t2 4), 2 − p (t) = 0.3429 + 0.7357 t + 0.0738 (t2 4) 0.0083 (t3 7t), 3 − − − with respective least squares errors 0.2093 and 0.1697 at the sample points. Note partic- ularly that the lower order coefficients do not change as we increase the degree. A plot of the three approximations appears in Figure 5.5. The cubic term does not significantly in- crease the accuracy of the approximation, and so this data probably comes from sampling a quadratic function.

Orthogonal Polynomials and Least Squares

In a similar fashion, the orthogonality of Legendre polynomials and their relatives serves to simplify the construction of least squares approximants in function space. Sup- pose, for instance, that our goal is to approximate the exponential function et by a poly- nomial on the interval 1 t 1, where the least squares error is measured using the standard L2 norm. We −will≤write≤the best least squares approximant as a linear combina- tion of the Legendre polynomials,

p(t) = a P (t) + a P (t) + + a P (t) = a + a t + a 3 t2 1 + . (5.65) 0 0 1 1 · · · n n 0 1 2 2 − 2 · · · By orthogonality, the least squares coefficients can be immediately¡ computed¢ by the inner product formula (5.56), so

t 1 e ; Pk 2k + 1 t ak = h 2i = e Pk(t) dt. (5.66) Pk 2 1 k k Z− For example, the quadratic approximation is given by the first three terms in (5.65), whose coefficients are

1 1 1 t 1 1 3 t 3 a0 = e dt = e 1.175201, a1 = te dt = 1.103638, 2 1 2 − e ' 2 1 e ' Z− µ ¶ Z− 1 5 3 2 1 t 5 7 a2 = 2 t 2 e dt = e .357814. 2 1 − 2 − e ' Z− µ ¶ ¡ ¢ 2/25/04 182 c 2004 Peter J. Olver ° 2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

-1 -0.5 0.5 1 -1 -0.5 0.5 1 -1 -0.5 0.5 1

Figure 5.6. Quadratic Least Squares Approximation to et.

Graphs appear in Figure 5.6; the first shows et, the second its quadratic approximant

et 1.175201 + 1.103638 t + .357814 3 t2 1 , (5.67) ≈ 2 − 2 ¡ ¢ and the third compares the two by laying them on top of each other. As in the discrete case, there are two major advantages of the orthogonal Legendre polynomials over the direct approach presented in Example 4.21. First, we do not need to solve any linear systems of equations since the required coefficients (5.66) are found by direct integration. Indeed, the coefficient matrix for polynomial least squares approx- imation based on the monomial basis is some variant of the notoriously ill-conditioned Hilbert matrix, (1.67), and the computation of an accurate solution can be tricky. Our precomputation of an orthogonal system of polynomials has successfully circumvented the ill-conditioned normal system.

The second advantage is that the coefficients ak do not depend on the degree of the approximating polynomial. Thus, if the quadratic approximation (5.67) is insufficiently accurate, we can add in the cubic correction

1 5 3 3 7 5 3 3 t 7 5 a3 P3(t) = a3 2 t 2 t , where a3 = 2 t 2 t e dt = 37e .070456. − 2 1 − 2 − e ' Z− µ ¶ ¡ ¢ ¡ ¢ There is no need to recompute the coefficients a0, a1, a2. And, if we desire yet further accuracy, we need only compute the next few coefficients. To exploit orthogonality, each interval and norm requires the construction of a cor- responding system of orthogonal polynomials. Let us reconsider Example 4.21, in which we used the method of least squares to approximate et based on the L2 norm on [0, 1]. Now, the ordinary Legendre polynomials are no longer orthogonal, and so we must use the rescaled Legendre polynomials (5.54) instead. Thus, the quadratic least squares approxi- mant can be written as

p(t) = a0 + a1 P 1(t) + a2 P 2(t) = 1.718282 + .845155 (2t 1) + .139864 (6t2 6t + 1) e e − − = 1.012991 + .851125 t + .839184 t2,

2/25/04 183 c 2004 Peter J. Olver ° Figure 5.7. Orthogonal Complement to a Line.

2 t where the coefficients a = P − e ; P are found by direct integration: k k k k h k i 1 1 t e e t a0 = e dt = e 1 1.718282, a1 = 3 (2t 1)e dt = 3(3 e) .845155, 0 − ' 0 − − ' Z 1 Z a = 5 (6t2 6t + 1)et dt = 5(7e 19) .139864. 2 − − ' Z0 It is worth emphasizing that this is the same approximating polynomial as we computed in (4.63). The use of an orthogonal system of polynomials merely streamlines the compu- tation.

5.6. Orthogonal Subspaces. We now extend the notion of orthogonality from individual elements to subspaces of an inner product space V . Definition 5.41. Two subspaces W, Z V are called orthogonal if every vector in W is orthogonal to every vector in Z. ⊂ In other words, W and Z are orthogonal subspaces if and only if w ; z = 0 for every w W and z Z. In practice, one only needs to check orthogonalityh of basisi elements: ∈ ∈

Lemma 5.42. If w1, . . . , wk is a basis for W and z1, . . . , zl a basis for Z, then W and Z are orthogonal subspaces if and only if w ; z = 0 for all i = 1, . . . , k and j = 1, . . . , l. h i j i Example 5.43. Let V = R3 have the ordinary dot product. Then the plane W R3 defined by the equation 2x y + 3z = 0 is orthogonal to the line Z spanned by its normal⊂ vector n = ( 2, 1, 3 )T . Indeed,− every w = ( x, y, z )T W satisfies the orthogonality condition w n =− 2x y + 3z = 0, which is just the equation∈ for the plane. · − Example 5.44. Let W be the span of w = ( 1, 2, 0, 1 )T , w = ( 3, 1, 2, 1 )T , and 1 − 2 Z the span of z = ( 3, 2, 0, 1 )T , z = ( 1, 0, 1, 1 )T . Then all w z = 0 for all i, j, and 1 2 − − i · j hence W and Z are orthogonal subspaces of R4 under the Euclidean dot product.

2/25/04 184 c 2004 Peter J. Olver ° Definition 5.45. The orthogonal complement to a subspace W V , denoted W ⊥, is defined as the set of all vectors which are orthogonal to W : ⊂

W ⊥ = v V v ; w = 0 for all w W . (5.68) { ∈ | h i ∈ } One easily checks that the orthogonal complement W ⊥ is also a subspace. Moreover, W W ⊥ = 0 . (Why?) Note that the orthogonal complement will depend upon which inner∩ product{ is} being used. T Example 5.46. Let W = ( t, 2t, 3t ) t R be the line (one-dimensional sub- { | ∈ T} space) in the direction of the vector w = ( 1, 2, 3 ) R3. For the dot product, the 1 ∈ orthogonal complement W ⊥ is the plane passing through the origin having normal vector T w , as sketched in Figure 5.7. In other words, z = ( x, y, z ) W ⊥ if and only if 1 ∈ z w = x + 2y + 3z = 0. (5.69) · 1 Thus, W ⊥ is characterized as the solution space to the homogeneous linear equation (5.69), T or, equivalently, the kernel of the 1 3 matrix A = w1 = ( 1 2 3 ). We can write the general solution in the form × 2y 3z 2 3 z = − y− = y −1 + z −0 = y z + z z ,       1 2 z 0 1       where y, z are the free variables. The indicated vectors z = ( 2, 1, 0 )T , z = ( 3, 0, 1 )T , 1 − 2 − form a (non-orthogonal) basis for the orthogonal complement W ⊥. Proposition 5.47. Suppose that W V is a finite-dimensional subspace of an inner product space. Then every vector v V ⊂can be uniquely decomposed into v = w + z ∈ where w W and z W ⊥. ∈ ∈ Proof : We let w W be the orthogonal projection of v onto W . Then z = v w ∈ − is, by definition, orthogonal to W and hence belongs to W ⊥. Note that z can be viewed as the orthogonal projection of v onto the complementary subspace W ⊥. If we are given two such decompositions, v = w + z = w + z, then w w = z z. The left hand side − − of this equation lies in W while the right hand side belongs to W ⊥. But, as we already noted, the only vector that belongs to bothe We and W ⊥ isethe zeroe vector. Thus, w = w and z = z, which proves uniqueness. Q.E.D. As a direct consequence of Exercise , we conclude that a subspace and its orthogonale complemene t have complementary dimensions:

Proposition 5.48. If dim W = m and dim V = n, then dim W ⊥ = n m. − Example 5.49. Return to the situation described in Example 5.46. Let us decom- T pose the vector v = ( 1, 0, 0 ) R3 into a sum v = w +z of a vector w lying in the line W ∈ and a vector z belonging to its orthogonal plane W ⊥, defined by (5.69). Each is obtained by an orthogonal projection onto the subspace in question, but we only need to compute one of the two directly since the second can be obtained by subtracting the first from v.

2/25/04 185 c 2004 Peter J. Olver ° v

z

w

W

⊥ W Figure 5.8. Orthogonal Decomposition of a Vector.

Orthogonal projection onto a one-dimensional subspace is easy since any basis is, trivially, an orthogonal basis. Thus, the projection of v onto the line spanned by w1 = T 2 1 2 3 T ( 1, 2, 3 ) is w = w1 − v ; w1 w1 = 14 , 14 , 14 . The component in W ⊥ is then k k h i T z v w 13 2 3 obtained by subtraction: = = 14¡ , 14 , 14¢ . Alternatively, one can obtain z directly by orthogonal projection− onto the plane− −W . But you need to be careful: the ¡ ⊥¢ basis found in Example 5.46 is not orthogonal, and so you will need to set up and solve the normal equations to find the closest point z. Or, you can first convert to an orthogonal basis by a single Gram–Schmidt step, and then use the orthogonal projection formula (5.56). All three methods lead to the same vector z W ⊥. ∈ Example 5.50. Let W R4 be the two-dimensional subspace spanned by the ⊂ T T orthogonal vectors w1 = ( 1, 1, 0, 1 ) and w2 = ( 1, 1, 1, 2 ) . Its orthogonal complement − T W ⊥ (with respect to the Euclidean dot product) is the set of all vectors v = ( x, y, z, w ) that satisfy the linear system v w = x + y + w = 0, v w = x + y + z 2w = 0. · 1 · 2 − Applying the usual algorithm — the free variables are y and w — we find that the so- T T lution space is spanned by z1 = ( 1, 1, 0, 0 ) , z2 = ( 1, 0, 3, 1 ) , which form a non- − − T 1 orthogonal basis for W ⊥. An orthogonal basis y1 = z1 = ( 1, 1, 0, 0 ) , y2 = z2 2 z1 = 1 1 T − − , , 3, 1 , for W ⊥ is obtained by a single Gram–Schmidt step. To decompose the − 2 − 2 v T w z ¡vector = ( 1¢, 0, 0, 0 ) = + , say, we compute the two orthogonal projections: T w = 1 w + 1 w = 10 , 10 , 1 , 1 W, 3 1 7 2 21 21 7 21 ∈ 1 1 11 10 1 1 T z = v w = y ¡ y = ¢ , , , W ⊥. − − 2 1 − 21 2 21 − 21 − 7 − 21 ∈ Proposition 5.51. If W is a finite-dimensional¡ subspace of¢an inner product space, then (W ⊥)⊥ = W . This result is an immediate corollary of the orthogonal decomposition in Proposi- tion 5.47. Warning: Propositions 5.47 and 5.51 are not necessarily true for infinite- dimensional subspaces. If dim W = , one can only assert that W (W ⊥)⊥. For ∞ ⊆

2/25/04 186 c 2004 Peter J. Olver ° ker A A rng A cokerA

corngA

Figure 5.9. The Fundamental Matrix Subspaces.

example, it can be shown that, [125], on any bounded interval [a, b] the orthogonal com- ( ) 0 2 plement to the subspace of all polynomials ∞ C [a, b] with respect to the L inner ( ) P ⊂ product is trivial: ( ∞ )⊥ = 0 . This means that the only continuous function which satisfies the momentPequations { } b xn ; f(x) = xnf(x) dx = 0, for all n = 0, 1, 2, . . . h i Za is the zero function f(x) 0. But the orthogonal complement of 0 is the entire space, ( ) 0 ≡ ( ) { } and so (( ∞ )⊥)⊥ = C [a, b] = ∞ . P 6 P The difference is that, in infinite-dimensional function space, a proper subspace W (V can be dense†, whereas in finite dimensions, every proper subspace is a “thin” subset that only occupies an infinitesimal fraction of the entire vector space. This seeming paradox is the reason for the success of numerical approximation schemes in function space, such as the finite element method. Orthogonality of the Fundamental Matrix Subspaces and the Fredholm Alternative In Chapter 2, we introduced the four fundamental subspaces associated with an m n matrix A. According to the fundamental Theorem 2.47, the first two, the kernel (null space)× and the corange (row space), are subspaces of Rn having complementary dimensions. The second two, the cokernel (left null space) and the range (column space), are subspaces of Rm, also of complementary dimensions. In fact, more than this is true — the paired subspace are orthogonal complements with respect to the standard Euclidean dot product! Theorem 5.52. Let A be an m n matrix. Then its kernel and corange are or- × thogonal complements as subspaces of Rn, while its cokernel and range are orthogonal complements in Rm:

n m ker A = (corng A)⊥ R , coker A = (rng A)⊥ R . (5.70) ⊂ ⊂

† In general, a subset W ⊂ V of a normed vector space is dense if, for every v ∈ V , and every ε > 0, one can find w ∈ W with k v − w k < ε. The Weierstrass approximation theorem, [126], tells us that the polynomials form a dense subspace of the space of continuous functions, and underlies the proof of the result mentioned in the preceding paragraph.

2/25/04 187 c 2004 Peter J. Olver ° Proof : A vector x Rn lies in ker A if and only if A x = 0. According to the rules ∈ th th T of matrix multiplication, the i entry of A x equals the vector product of the i row ri T of A and x. But this product vanishes, ri x = ri x = 0, if and only if x is orthogonal to r . Therefore, x ker A if and only if x is orthogonal· to all the rows of A. Since the i ∈ rows span corng A, this is equivalent to x lying in its orthogonal complement (corng A)⊥, which proves the first statement. Orthogonality of the range and cokernel follows by the same argument applied to the transposed matrix AT . Q.E.D. Combining Theorems 2.47 and 5.52, we deduce the following important characteriza- tion of compatible linear systems. Theorem 5.53. The linear system A x = b has a solution if and only if b is orthog- onal to the cokernel of A. Indeed, the system has a solution if and only if the right hand side b rng A belongs to the range of A, which, by (5.70), requires that b be orthogonal to the cok∈ ernel coker A. As a result, the compatibility conditions for the linear system A x = b can be written in the form y b = 0 for every y satisfying AT y = 0. (5.71) ·

In practice, one only needs to check orthogonality of b with respect to a basis y1, . . . , ym r of the cokernel, leading to a system of m r compatibility constraints − − y b = 0, i = 1, . . . , m r. (5.72) i · − Here r = rank A denotes the rank of the coefficient matrix, and so m r is also the number of all zero rows in the row echelon form of A. Hence, (5.72) contains− precisely the same number of constraints as would be found by Gaussian elimination. Theorem 5.53 is known as the Fredholm alternative, named after the Swedish mathe- matician Ivar Fredholm. Fredholm’s primary interest was in solving linear equa- tions, but his compatibility criterion was recognized to be a general property of linear systems, including linear matrix systems, linear differential equations, linear boundary value problems, and so on. Example 5.54. In Example 2.40, we analyzed the linear system A x = b with 1 0 1 coefficient matrix A = 0 1 −2 . Using direct Gaussian elimination, we were led to   1 2 −3 − a single compatibility condition, namely b1 +2b2 +b3 = 0, required for the system to have a solution. We now understand the meaning− behind this equation: it is telling us that the right hand side b must be orthogonal to the cokernel of A. The cokernel is determined by solving the homogeneous adjoint system AT y = 0, and is the line spanned by the vector T y1 = ( 1, 2, 1) . Thus, the compatibility condition requires that b be orthogonal to y1, in accordance− with the Fredholm alternative (5.72). Example 5.55. Let us determine the compatibility conditions for the linear system

x x + 3x = b , x + 2x 4x = b , 2x + 3x + x = b , x + 2x = b , 1 − 2 3 1 − 1 2 − 3 2 1 2 3 3 1 3 4

2/25/04 188 c 2004 Peter J. Olver ° 1 1 3 1 −2 4 by computing the cokernel of its coefficient matrix A = . To this end,  −2 3 −1   1 0 2  we need to solve the homogeneous adjoint system AT y =0, namely 

y y + 2y + y = 0, y + 2y + 3y = 0, 3y 4y + y + 2y = 0. 1 − 2 3 4 − 1 2 3 1 − 2 3 4 Applying Gaussian elimination, we deduce the general solution

y = y ( 7, 5, 1, 0 )T + y ( 2, 1, 0, 1 )T 3 − − 4 − − is a linear combination (whose coefficients are the free variables) of the two basis vectors for coker A. Thus, the Fredholm compatibility conditions (5.72) are obtained by taking their dot products with the right hand side of the original system:

7b 5b + b = 0, 2b b + b = 0. − 1 − 2 3 − 1 − 2 4 The reader can check that these are indeed the same compatibility conditions that result from a direct Gaussian elimination on the augmented matrix A b . | We are now very close to a full understanding of the fascinating¡ ¢geometry that lurks behind the simple algebraic operation of multiplying a vector x Rn by an m n matrix, ∈ × resulting in a vector b = A x Rm. Since the kernel and corange of A are orthogonal ∈ complements in the domain space Rn, Proposition 5.48 tells us that we can uniquely decompose x = w + z where w corng A, while z ker A. Since A z = 0, we have ∈ ∈ b = A x = A(w + z) = Aw. Therefore, we can regard multiplication by A as a combination of two operations: (i) The first is an orthogonal projection onto the corange taking x to w. (ii) The second maps a vector in corng A Rn to a vector in rng A Rm, taking the orthogonal projection w to the image⊂ vector b = Aw = A x. ⊂ Moreover, if A has rank r then, according to Theorem 2.47, both rng A and corng A are r- dimensional subspaces, albeit of different vector spaces. Each vector b rng A corresponds to a unique vector w corng A. Indeed, if w, w corng A satisfy b ∈= Aw = A w, then A(w w) = 0 and hence∈ w w ker A. But, since∈ they are complementary subspaces, − − ∈ the only vector that belongs to both the kernele and the corange is the zero vector,e and hence we= w. In this manner,ewe have proved the first part of the following result; the second is left as Exercise . Propositione 5.56. Multiplication by an m n matrix A of rank r defines a one-to- × one correspondence between the r-dimensional subspaces corng A Rn and rng A Rm. ⊂ ⊂ Moreover, if v1, . . . , vr forms a basis of corng A then their images Av1, . . . , Avr form a basis for rng A. In summary, the linear system A x = b has a solution if and only if b rng A, or, equivalently, is orthogonal to every vector y coker A. If the compatibility∈ conditions hold, then the system has a unique solution w∈ corng A that, by the definition of the ∈

2/25/04 189 c 2004 Peter J. Olver ° corange, is a linear combination of the rows of A. The general solution to the system is x = w + z where w is the particular solution belonging to the corange, while z ker A is an arbitrary element of the kernel. ∈

Theorem 5.57. A compatible linear system A x = b with b rng A = (coker A)⊥ has a unique solution w corng A satisfying Aw = b. The general ∈solution is x = w + z where z ker A. The particular∈ solution w is distinguished by the fact that it has smallest Euclidean∈ norm of all possible solutions: w x whenever A x = b. k k ≤ k k Proof : We have already established all bu the last statement. Since the corange and kernel are orthogonal subspaces, the norm of a general solution x = w + z is

x 2 = w + z 2 = w 2 + 2 w z + z 2 = w 2 + z 2 w 2, k k k k k k · k k k k k k ≥ k k with equality if and only if z = 0. Q.E.D. Example 5.58. Consider the linear system 1 1 2 2 x 1 0 −1 2 −1 y −1 = .  1 3 −5 2  z   4  −  5 1 9 6  w   6   − −     Applying the usual Gaussian elimination algorithm, we discover that the coefficient matrix T has rank 3, and its kernel is spanned by the single vector z1 = ( 1, 1, 0, 1 ) . The system itself is compatible; indeed, the right hand side is orthogonal to the− basis cokernel vector ( 2, 24, 7, 1 )T , and so satisfies the Fredholm condition (5.72). − The general solution to the linear system is x = ( t, 3 t, 1, t )T where t = w is the free variable. We decompose the solution x = w + z into a−vector w in the corange and an element z in the kernel. The easiest way to do this is to first compute its orthogonal 2 T projection z = z − (x z )z = ( t 1, 1 t, 0, t 1 ) onto the one-dimensional kernel. k 1 k · 1 1 − − − We conclude that w = x z = ( 1, 2, 1, 1 )T corng A is the unique solution belonging to the corange of the coefficien− t matrix, i.e., the∈only solution that can be written as a linear combination of its row vectors, or, equivalently, the only solution which is orthogonal to the kernel. The reader should check this by finding the coefficients in the linear combination, or, equivalently, writing w = AT v for some v R4. Moreover, its norm is the smallest among all the solutions: w = √7 x =∈ ( t, 3 t, 1, t )T = √3t2 6t + 10 , as you can readily check. k k ≤ k k k − k − In this example, the analysis was simplified by the fact that the kernel was one- dimensional, and hence the orthogonal projection was relatively easy to compute. In more complicated situations, to determine the decomposition x = w + z one needs to solve the normal equations (4.28) in order to find the orthogonal projection or least squares point in the corange. Alternatively, one can first determine an orthogonal basis for the corange by applying the Gram–Schmidt process to a basis, and then use the orthogonal (or orthonormal) projection formula (5.56). Of course, once one of the constituents w, z has been found, the other can be simply obtained by subtraction from x.

2/25/04 190 c 2004 Peter J. Olver °