<<

Math 240 TA: Shuyi Weng Winter 2017 March 10, 2017

Inner Product and

Inner Product

The notion of inner product is important in linear in the sense that it provides a sensible notion of and in a . This seems very natural in the Rn through the concept of . However, the inner product is much more general and can be extended to other non- spaces. For this course, you are not required to understand the non-Euclidean examples. I just want to show you a glimpse of in a more general setting in .

Definition. Let V be a vector space. An inner product on V is a

h−, −i: V × V → R

such that for all vectors u, v, w ∈ V and c ∈ R, it satisfies the axioms

1. hu, vi = hv, ui; 2. hu + v, wi = hu, wi + hv, wi, and hu, v + wi = hu, vi + hu, wi; 3. hcu, vi = chu, vi + hu, cvi; 4. hu, ui ≥ 0, and hu, ui = 0 if and only if u = 0.

Definition. In a Euclidean space Rn, the dot product of two vectors u and v is defined to be the function u · v = uT v.

In coordinates, if we write     u1 v1  .   .  u =  .  , and v =  .  ,     un vn then   v1 T h i  .  u · v = u v = u1 ··· un  .  = u1v1 + ··· unvn.   vn The definitions in the remainder of this note will assume the Euclidean vector space Rn, and the dot product as the natural inner product.

Lemma. The dot product on Rn is an inner product.

Exercise. Verify that the dot product satisfies the four axioms of inner products. " # 7 2 Example 1. Let A = , and define the function 2 4

hu, vi = uT AvT

We will show that this function defines an inner product on R2. Write the vectors " # " # " # u v w u = 1 , v = 1 , and w = 1 , u2 v2 w2 then " #" # h i 7 2 v1 hu, vi = u1 u2 = 7u1v1 + 2u1v2 + 2u2v1 + 4u2v2. 2 4 v2 To verify axiom 1, we compute

hv, ui = 7v1u1 + 2v1u2 + 2v2u1 + 4v2u2.

Hence, we can conclude that hu, vi = hv, ui. To verify axiom 2, we see that

hu, v + wi = 7u1(v1 + w1) + 2u1(v2 + w2) + 2u2(v1 + w1) + 4u2(v2 + w2)

= (7u1v1 + 2u1v2 + 2u2v1 + 4u2v2) + (7u1w1 + 2u1w2 + 2u2w1 + 4u2w2) = hu, vi + hu, wi.

The other equality follows from axiom 1. Thus axiom 2 is verified. To verify axiom 3, we see that

chu, vi = c(7u1v1 + 2u1v2 + 2u2v1 + 4u2v2)

= 7(cu1)v1 + 2(cu1)v2 + 2(cu2)v1 + 4(cu2)v2 = hcu, vi.

The other equality also follows from axiom 1. Thus axiom 3 is verified. To verify axiom 4, notice that

2 2 2 2 hu, ui = 7u1u1 + 2u1u2 + 2u2u1 + 4u2u2 = 7u1 + 4u1u2 + 4u2 = 3u1 + 4(u1 + u2) .

If hu, ui = 0, that means u1 = u2 = 0. The converse is clear. Hence, axiom 4 is verified. Remark (Symmetric ). Let A ∈ Rn×n be a symmetric . Then the function given by hu, vi = uT Av

for any vectors u, v ∈ Rn, defines an inner product on Rn. Inner products on Rn defined in this way are called symmetric bilinear form. In fact, every inner product on Rn is a symmetric bilinear form. In particular, the standard dot product is defined with the identity matrix I, which is symmetric.

Definition. The length (or ) of a vector v ∈ Rn, denoted by kvk, is defined by √ q 2 2 kvk = v · v = v1 + ··· vn

Remark. By the last axiom of the inner product, v · v ≥ 0, thus the length of v is always a non-negative , and the length is 0 if and only if v is the zero vector.

Definition. A vector with length 1 is called a . If v = 0, then the vector 1 1 u = v = √ v kvk v · v

is the normalization of v.

Example 2. Consider the vector 2   v = 2 . 1 Its length is √ √ √ kvk = v · v = 22 + 22 + 12 = 9 = 3,

and its normalization is 2/3 1 1 u = v = v = 2/3 . kvk 3   1/3

Definition. For vectors u, v ∈ Rn, we can define the between them by

dist(u, v) = ku − vk. " # " # −4 1 Example 3. Let u = and v = , then the distance between them is 3 1 √ ku − vk = p(−4 − 1)2 + (3 − 1)2 = 29. This distance is demonstrated in the following figure.

u

v

Orthogonality

The notion of inner product allows us to introduce the notion of orthogonality, together with a rich family of properties in linear algebra.

Definition. Two vectors u, v ∈ Rn are orthogonal if u · v = 0.

Theorem 1 (Pythagorean). Two vectors are orthogonal if and only if ku+vk2 = kuk2 +kvk2.

Proof. This well-known theorem has numerous different proofs. The linear-algebraic ver- sion looks like this. Notice that

ku + vk2 = (u + v) · (u + v) = u · u + u · v + v · u + v · v = kuk2 + kvk2 + 2u · v.

The theorem follows from the fact that u and v are orthogonal if and only if u · v = 0.

The following is an important concept involving orthogonality.

Definition. Let W ⊆ Rn be a subspace. If a vector x is orthogonal to every vector w ∈ W , we say that x is orthogonal to W . The of W , denoted by W ⊥, is the collection of all vectors orthogonal to W , i.e.,

⊥ n W = {x ∈ R | x · w = 0 for all w ∈ W }.

Lemma. W ⊥ is a subspace of Rn.

Exercise. Verify that W ⊥ satisfies the axioms of a subspace.

Exercise. Let W ⊆ Rn be a subspace. Prove that dim W + dim W ⊥ = n. [Hint: Use -Nullity Theorem] ⊥ Theorem 2. If {w1,..., wk} forms a of W , then x ∈ W if and only if x · wi = 0 for all integers 1 ≤ i ≤ k.

Proof. Let {w1,..., wk} be a basis of W . Assume that x · wi = 0. Let w ∈ W be arbitrary. Then w can be written as a

w = c1w1 + ··· + ckwk.

By the of dot product, we have

x · w = c1x · w1 + ··· + ckx · wk = 0 + ··· + 0 = 0.

Thus x ∈ W ⊥. The converse is clear.

Example 4. Find the orthogonal complement of W = span{w1, w2}, where     3 0     0 2 w1 =   , and w2 =   . 1 5     1 1

By the theorem above, to find all vectors x ∈ W ⊥, we only need to make sure we find all x ∈ R4 that satisfies

x · w1 = x · w2 = 0. If we write   x1   x2 x =   , x   3 x4 we immediately obtain a system of linear equations

3x1 + x3 + x4 = 0

2x2 + 5x3 + x4 = 0

A of row gives all solutions to this       −x3/3 − x4/3 2 2       −5x3/2 − x4/2  15  3 x =   = s   + t    x  −6  0  3      x4 0 −6 Exercise. Let A be an m × n matrix. Prove that (ColA)⊥ = NulAT . [Hint: Use a similar method as demonstrated in the previous example. ]

n Definition. A of vectors {u1,..., uk} in R is an orthogonal set if each pair of distinct vectors from the set is orthogonal, i.e., ui · uj = 0 whenever i 6= j. An for a subspace W is a basis for W that is also an orthogonal set. An for a subspace W is an orthogonal basis for W where each vector has length 1.

n Example 5. The {e1,..., en} forms an orthonormal basis for R .

Example 6. Consider the following vectors in R4:         1 1 1 1         1  1 −1 −1 v1 =   , v2 =   , v3 =   , v4 =   . 1 −1  1 −1         1 −1 −1 1

One can easily verify that {v1, v2, v3, v4} forms an orthogonal set. Furthermore, the matrix h i v1 v2 v3 v4 is row equivalent to the identity matrix. Thus the four vectors are linearly independent. It 4 follows that {v1, v2, v3, v4} forms an orthogonal basis for R . Furthermore, if we normalize the vectors and obtain         1/2 1/2 1/2 1/2         1/2  1/2 −1/2 −1/2 u1 =   , u2 =   , u3 =   , u4 =   , 1/2 −1/2  1/2 −1/2         1/2 −1/2 −1/2 1/2

4 then {u1, u2, u3, u4} forms an orthonormal basis for R

4 Exercise. Verify that {v1, v2, v3, v4} in Example 6 forms an orthogonal basis for R . Definition. Let A be an n × n matrix. We say A is an if AT A = I.

Theorem 3. Let A be an n × n matrix. Then the followings are equivalent.

1. A is an orthogonal matrix. 2. The column vectors of A are orthonormal. 3. The row vectors of A are orthonormal.

4. A preserves length, that is, kAxk = kxk for all x ∈ Rn. 5. A preserves the dot product, which means (Ax) · (Ay) = x · y for all x, y ∈ Rn. Orthogonal

The idea of orthogonal projection is best depicted in the following figure.

v

u Projuv

The orthogonal projection of v onto u gives the component vector Projuv of v in the direction of u. This fact is best demonstrated in the case that u is one of the standard basis vectors.

v Proj v e2

e2

Proj v e1 e1

As shown in the figure above, the of the orthogonal projections in the e1 and e2 directions, respectively, give the coordinates of the vector v in the standard basis. On the other hand, each coordinate can be obtained by computing the dot product of v and the corresponding standard basis vector, i.e.,

kProje1 vk = v · e1, and kProje2 vk = v · e2.

However, the orthogonal projection of v in the e1 direction should not depend on the length of the vector we use to specify the direction. Hence, the validity of the observation

above is based on the fact that e1 and e2 are “special” in some sense. The observation holds

true precisely because the vectors e1 and e2 are unit vectors. To obtain a similar conclusion in the general setting, consider vectors u and v in the first figure. We first normalize u to get u ˆu = √ u · u Now, this unit vector ˆu satisfies that

kProjˆuvk = v · ˆu.

Because u and ˆu are in the same direction, we have Projˆuv = Projuv. Thus  v · u  u v · u Proj v = kProj vk ˆu = (v · ˆu) ˆu = √ √ = u. u ˆu u · u u · u u · u

Definition. Given vectors u, v ∈ Rn, where u 6= 0, then the orthogonal projection of v onto u is defined to be v · u Proj v = u. u u · u Exercise. Find the orthogonal projection of v onto u in each case. " # " # " # " # −1 1 1 −3 1. u = , v = 2. u = , v = 3 −1 2 9

Exercise. Find the distance from the (−3, 9) to the y = 2x.

It is also useful to consider the orthogonal projection of a vector onto a subspace (not n necessarily 1-dimensional). Let W ⊆ R be a subspace, and let {u1,..., uk} be an or- thonormal basis for the subspace W . Given a vector v ∈ Rn, its projection on the or- thonormal basis vectors are v · ui Projui v = ui. ui · ui So the orthogonal projection of v onto the subspace W is the linear combination

v · u1 v · uk ProjW v = u1 + ··· + uk. u1 · u1 uk · uk Notice that the orthogonal projection of v onto u is the same with the orthogonal pro- jection of v onto the 1-dimensional subspace W spanned by the vector u, since W contains a unit vector, namely u/kuk, and it forms an orthonormal basis for W .

Example 7. Consider the subspace W ⊆ Rn given by

n W = {(x1, . . . , xn) ∈ R | xk+1 = ··· = xn = 0}.

Clearly, {e1, . . . , ek} gives an orthonormal basis for W . A straightforward computation will n show that the orthogonal projection of any vector x = {x1, . . . , xn} ∈ R is

ProjW x = (x1, . . . , xk, 0,..., 0).

In the case that n = 3 and k = 2, this is the orthogonal projection onto the xy-. Exercise. The following vectors in R4 form an orthogonal set.       1 1 1       1  1 −1 v1 =   , v2 =   , v3 =   . 1 −1  1       1 −1 −1

Find the orthogonal projection of the vector   1   2 x =   3   4

onto the subspace W = span{v1, v2, v3}.

The Gram-Schmidt Process

When we compute orthogonal projection onto a subspace W , we need an orthonor- mal basis of this subspace. The Gram-Schmidt process provides an algorithm to find an orthonormal basis of a subspace.

Algorithm (Gram-Schmidt). Given a subspace W ⊆ Rn of k, the following procedure will provide an orthonormal basis for W .

• Find a set of basis {x1,..., xk};

• Let v1 = x1;

• Let v2 = x2 − Projv1 (x2);

• Let v3 = x3 − Projv1 (x3) − Projv2 (x3); •············ • v = x − Proj (x ) − Proj (x ) − · · · − Proj (x ) Let k k v1 k v2 k vk−1 k ; vi • Now let ui = . kvik

• Then {v1,..., vk} forms an orthogonal basis for W , and {u1,..., uk} forms an or- thonormal basis for W .

2 Example 8. Let v1 and v2 be vectors in R given by " # " # 2 1 x1 = and x2 = . 0 3 By the Gram-Schmidt process, let " # " # " # " # 2 x2 · v1 1 2 2 0 v1 = x1 = , and v2 = x2 − v1 = − = . 0 v1 · v1 3 4 0 3

2 Then {v1, v2} forms an orthogonal basis for R . The geometric meaning for this process is the following figure.

x2 v2

x1 Proj x v1 2

2 To get an orthonormal basis for R , we normalize the vectors v1 and v2, and get " # " # 1 0 u1 = , and u2 = . 0 1

2 Then {u1, u2} forms an orthonormal basis for R . Notice that this orthonormal basis coin- cides with the standard basis of R2.

Exercise. Use the Gram-Schmidt process to find an orthonormal basis of the column space of the following matrix.   3 −5 1    1 1 1   −1 5 −2   3 −7 8

Least-Squares Problems

When we solve for a linear system, sometimes we run into the trouble that the sys- tem may not be consistent, which means there is no solution exactly solving the system. However, we sometimes only need an approximate solution. The least-squares method provides a way to solve for the best approximation of the solution.

Example 9. Given three points (2, 3), (3, 4), and (4, 5), find a line y = mx + b that best approximate these three points. We can set up a system of linear equations to solve for m and b:

2 · m + b = 3; 3 · m + b = 4; 4 · m + b = 5.

This system corresponds to the augmented matrix and reduced echelon form

2 1 3 1 0 1     3 1 4 ∼ 0 1 1 4 1 5 0 0 0

Thus, when m = 1, b = 1, the line y = x + 1 passes through these three given points.

Example 10. Given three points (2, 3), (3, 2), and (4, 5), find a line y = mx + b that best approximates these three points. Once again, we set up a system of linear equations

2 · m + b = 3; 3 · m + b = 2; 4 · m + b = 5.

The augmented matrix and reduced echelon form would be

2 1 3 1 0 0     3 1 4 ∼ 0 1 0 4 1 5 0 0 1

Notice that the system is inconsistent. This makes sense because the three points given are not colinear. Now let’s try to find a best to these three points. Let     2 1 3 " # m A = 3 1 , b = 2 , x∗ =     b 4 1 5

We know that Ax∗ gives a linear combination of the column vectors of A. So Ax∗ ∈ Col(A). However, b ∈/ Col(A), as the linear system Ax = b is inconsistent. The best approximation of b in the subspace Col(A) ⊆ R3 would be the orthogonal projection

∗ b = ProjCol(A)b This projection lands in the column space Col(A). Thus the system Ax∗ = b∗ is consistent. Now we need to solve for the least-squares solution x∗. Because b − b∗ is orthogonal to Col(A), we may set AT (b − Ax∗) = AT (b − b∗) = 0.

Hence, we obtain the equation

AT Ax∗ = AT b.

By the above argument, the normal equation is always consistent, and we can always find a (not necessarily unique) least- solution x∗ to the system. Now we go back to the example we are given.     " # 2 1 " # " # 3 " # 2 3 4 29 9 2 3 4 32 AT A = 3 1 = , and AT b = 2 = . 1 1 1   9 3 1 1 1   10 4 1 5

The least-square system has augmented matrix and reduced echelon form " # " # 29 9 32 1 0 1 ∼ 9 3 10 0 1 1/3

Thus, when m = 1, b = 1/3, the line y = x + 1/3 is the best approximation to these three points.

2 3 Exercise. Find the trigonometric function of the form f(x) = c0 + c1 sin (x) + c2 cos (x) that best fits the data points (0, 0), (π/2, 1), (π, 2), (3π/2, 3).