Quick viewing(Text Mode)

Notes for Chapter 5 – Orthogonality 5.1 the Scalar Product in R

Notes for Chapter 5 – Orthogonality 5.1 the Scalar Product in R

Notes for Chapter 5 –

n 5.1 The in R n The , R = {x = (x1, x2, ··· , xn): xj ∈ R for j = 1, ··· , n}. Here we define the vector and scalar as follows.

1. Given vectors x = (x1, x2, ··· , xn) and y = (y1, y2, ··· , yn) we define x + y by

x + y = (x1 + y1, x2 + y2, ··· , xn + yn).

2. Given a scalar α and a vector x = (x1, x2, ··· , xn) we define αx by

αx = (αx1, αx2, ··· , αxn).

The scalar product (or ) of x and y ∈ Rn is defined by

n T X x · y = x y = xjyj (1) j=1

We use the scalar product to define a called the .

Definition 1. 1. The length of a vector x ∈ Rn is defined by

n !1/2 T 1/2 X 2 kxk = (x x) = xj . (2) j=1

2. The distance between two vectors x and yRn is

n !1/2 X 2 kx − yk = (xj − yj) . (3) j=1

3. If θ is the between two vectors x and y ∈ Rn then xTy cos(θ) = . (4) kxk kyk

4. A vector u is called a if kuk = 1. For any vector x ∈ Rn a direction vector u pointing in the direction of x is the unit vector u = x/kxk.

5. Two vectors x and y ∈ Rn are said to be orthogonal if xTy = 0, i.e., if the angle between them is 90◦. xTy 6. The of a vector x onto a vector y is . kyk xTy 7. The of a vector x onto a vector y is P (x) = y. y kyk2 8. If vectors x and y are orthogonal then kx + yk2 = kxk2 + kyk2.

1 9. In general for all vectors x and y we have kx + yk2 ≤ kxk2 + kyk2 which is often called the inequality. 10. (Cauchy-Schwartz Inequality) For all vectors x and y we have

T x y ≤ kxk kyk. (5)

If P and P are points in n, then the vector from P = (x , x , ··· , x ) to P = (y , y , ··· , y ) is 1 2−−→ R 1 1 2 n 2 1 2 n denoted by P1P2. This vector can be written as −−→ P1P2 =< (y1 − x1), (y2 − x2), ··· , (yn − xn) > .

n n If N is a nonzero vector and P0 is a in R , then the set of points P = (x1, x2, ··· , xn) ∈ R T −−→ n satisfying N P0P = 0 forms a in R that passes through the point P0 with 0 0 0 vector N. For example, if N =< a1, a2, ··· , an > and P0 = (x1, x2, ··· , xn) then the equation of the hyperplane is

T −−→ 0 0 0 N P0P = a1(x1 − x1) + a2(x2 − x2) + ··· + an(xn − xn) = 0.

In R3 a hyperplane is called a and the above formula is the one you learned in calculus III for the equation of a plane. In R2 a hyperplane is called a and the above formula is the one you learned in calculus III for the equation of a line. We can use this information to answer some simple basic problems. For example, to find the distance 3 T −−→ d from a point P1 = (x1, y1, z1) in R to the plane N P0P = we can use the scalar projection to find T −−→ |N P0P1 | |a (x − x ) + a (y − y ) + a (z − z )| d = = 1 1 0 2 1 0 3 1 0 . kNk p 2 2 2 a1 + a2 + a3 The analog of this in R2 would be to find the distance d from a point to a line. In this case the line may be written as follows: The line passing through a point P0 = (x0, y0) and orthogonal to a vector N =< a1, a2 > is given by

T −−→ N P0P1 = a1(x − x0) + a2(y − y0).

2 T −−→ The distance from a point P1 = (x1, y1) in R to the line N P0P = 0 is

T −−→ |N P0P1 | |a (x − x ) + a (y − y )| d = = 1 1 0 2 1 0 . kNk p 2 2 a1 + a2

5.2 Orthogonal Subspaces

Definition 2. 1. Two subspaces X and Y of Rn are called orthogonal if xTy = 0 for every x ∈ X and y ∈ Y . In this case we write X ⊥ Y .

2. If Y is a subspace of Rn then the of Y , denoted Y ⊥ is defined by ⊥ n T Y = {x ∈ R : x y = 0 ∀ y ∈ Y }. (6)

2 Example 1. Let A be a m × n . Then A generates a linear mapping from Rn to Rm by T (x) = Ax. The column space of A is the same as the image of T . We denote the range space by R(A): m n R(A) = {b ∈ R : b = Ax for some x ∈ R }. (7) The column space of AT, i.e., R(AT) ⊂ Rn

T n T m R(A ) = {y ∈ R : y = A x for some x ∈ R }. (8)

If y ∈ R(AT) ⊂ Rn and x ∈ N(A) then we claim x and y are orthogonal, i.e., R(AT) ⊥ N(A). T m T Recall the y ∈ R(A ) means there is a vector x0R such that y = A x0. So we have

T T T T x y = x A x0 = (Ax) x0 = 0

since x ∈ N(A).

This gives us the so-called Fundamental Subspace Theorem: Theorem 1. Let A be a m × n matrix. The we have

N(A) = R(AT)⊥ and N(AT) = R(A)⊥. (9)

Definition 3. If U and V are subspaces of a vector space W and if each w ∈ W can be written uniquely in the form w = u + v. Notice that this is really two statements: 1) every w can be written in this form, and, 2) the representation is unique. When this is the case we say the W is the direct sum of U and V and we write W = U ⊕ V .

We also have the following general results for subspaces of the vector space Rn: Theorem 2. 1. Let S be a subspace of Rn. Then we have dim(S) + dim(S⊥) = n. Further, if r n n n {xj}j=1 is a for U and {xj}j=r+1 is a basis for V then {xj}j=1 is a basis for R .

2. If S be a subspace of Rn then S ⊕ S⊥ = Rn.

3. If S be a subspace of Rn then (S⊥)⊥ = S.

4. If A is an m × n matrix and b ∈ Rm, then one of the following alternatives must hold:

(a) There is an x ∈ Rn so that Ax = b, or, (b) There is a vector y ∈ Rm so that ATy = 0 (i.e., y ∈ N(AT) ) and yTb 6= 0. 5. In other words, b ∈ R(A) ⇔ b ⊥ N(AT). N.B. This is a very important result.

(a) When we want to solve Ax = b it can be very important to have a test to decide if the problem is solvable. That is, whether b ∈ R(A). This result tells us that if we find a basis for N(AT) we can check to see if b ∈ R(A) by simply checking to see if b ⊥ y for all y in a basis for N(AT). (b) You will also find that this is the basis for the method of least squares in the next section.

6. We can write n T R = R(A ) ⊕ N(A). Remark 1. 1. To find a basis for the R(A) we find the row echelon form to determine the pivot columns then take exactly those columns of A.

3 1 2 3 1  2. We can also proceed as in the following example. Given A = 1 3 5 −2. We note 3 8 13 −3 that a basis for the R(A) is the same thing as a basis for the column space of A. Now we also know that the column space of A is the same as the row space of AT so we find the row echelon form U of AT and take the of the pivot rows. Here

1 1 3  1 1 3 T 2 3 8  0 1 2 A =   ⇒ U =   . 3 5 13  0 0 0 1 −2 −3 0 0 0

Therefore < 1, 1, 3 > and < 0, 1, 2 > form a basis for R(A).

5.3 Least Squares When we have an overdetermined linear system it is unlikely there will be a solution. Nevertheless this is exactly the type of problem very often considered in applications. What happens in these applications is that one seeks to find a so called least squares solution. This is something that is close to being a solution in a certain sense. In particular we have an overdetermined system Ax = b where A is m × n with m > n (usually much larger).

Definition 4 (Least Squares Problem). Given a system Ax = b where A is m × n with m > n, b ∈ Rm, then for each x ∈ Rn we can form the residual r(x) = b − Ax.

The distance from b to Ax is kb − Axk = kr(x)k.

1. The least squares problem is to find a vector x ∈ Rn for which kr(x)k is minimum.

2. A vector xb for which kr(xb)k = sup kr(x)k x∈Rn is called a least squares solution of Ax = b.

3. If A is m × n with m > n and (A) = n then

P = A(ATA)−1AT

is called the projection matrix mapping Rm onto the column space of A.

Theorem 3. If A is m × n with m > n and Rank(A) = n , then for each b ∈ Rm the normal equations ATAx = ATb has a unique solution given by T −1 T xe = (A A) A b which is the unique least square solution of Ax = b.

4 n 2 T Example 2. Given a collection of points {(xj, yj)}j=1 in R let x = (x1, x2, ··· , xn) and y = T (y1, y2, ··· , yn) . Let n n 1 X 1 X x = x , oly = y n j n j j=1 j=1

and let y = c0 + c1x be a that gives the least squares fit to the points. To find the best fit by a linear equation means we must solve     1 x1 y1 1 x2 c  y2   0 =   . . .  c  .  . .  1  .  1 xn yn

Let     1 x1 y1 1 x2 c  y2 A =   , x = 0 , and b =   . . .  c  .  . .  1  .  1 xn yn Then the normal equations ATAx = ATb become n n  X   X  n xj yj        j=1  c0  j=1   n n  =  n  c1 X X 2 X   xj xj   xjyj j=1 j=1 j=1 Dividing both sides of the equation by n gives

1 x !   y ! c0 1 2 = 1 T x kxk c1 x y n n We could solve the problem by analyzing the augmented matrix

1 x y ! 1 1 . x kxk2 xTy n n But we can also try to compute (ATA)−1 and multiply it times both sides. In order to do that we must compute the : 1 det(ATA) = kxk2 − (x)2. n Under the assumption that this is not zero we can readily solve the normal equations. Consider the special case in which the data satisfies the condition x = 0, (the x-data has mean zero) the system of equations becomes diagonal and we have

xTy c = y, and c = . 0 1 xTx

5 5.4 Inner Product Spaces Definition 5. Let X be a (real) vector space.

1. A real inner product h·, ·i is a function from X × X to R satisfying: (a) hx, yi = hy, xi for all x ∈ X. (b) hαx, yi = αhx, yi for all x, y ∈ X and scalar α ∈ R. (c) hx + y, zi = hx, zi + hy, zi. (d) hx, xi ≥ 0 for all x ∈ X and hx, xi = 0 if and only if x = 0.

2. A vector space with a real inner product is called a real .

Definition 6 (Chapter 6). Let X be a (complex) vector space.

1. A complex inner product h·, ·i is a function from X × X to C satisfying:

(a) hx, yi = hy, xi for all x ∈ X. (b) hαx, yi = αhx, yi for all x, y ∈ X and scalar α ∈ C. (c) hx + y, zi = hx, zi + hy, zi. (d) hx, xi ≥ 0 for all x ∈ X and hx, xi = 0 if and only if x = 0.

2. A vector space with a complex inner product is called a complex inner product space.

Note that for a complex vector space we have

hx, αyi = hαy, xi = α hy, xi = α hx, yi, and therefore hx, αyi = α hx, yi. (10)

n n X Example 3. 1. X = R is a real inner product space with hx, yi = xjyj. j=1

Z b 2. L2(a, b) is a real inner product space with hf, gi = f(x) g(x) dx. a 3. Rm×n is a real vector space consisting of all m × n matrices A. An inner product and norm on Rm×n is m n m n !1/2 X X X X 2 A · B = aijbij, kAk = aij . i=1 j=1 i=1 j=1 Theorem 4 (Schwarz Inequality). For any x, y in a complex (or real) inner product space X we have |hx, yi| ≤ hx, xi1/2hy, yi1/2. (11) Equuality holds if and only if y is a multiple of x.

Remark 2. Every inner product space has a distance function called the norm. The norm is defined by kxk = hx, xi1/2. (12)

6 Lemma 1. The norm induced from the inner product satisfies the parallelogram law

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2.

By analogy with the special inner product spaces Rn we make the following definitions for general inner product spaces.

Definition 7. 1. Two vectors x and y in X are orthogonal if hx, yi = 0.

2. A subset S ⊂ X is an orthogonal set if each pair of elements of S are orhtogonal.

3. A set is orthonormal if it is orthogonal and every element x ∈ S satisfies kxk = 1.

Remark 3. If hx, yi = 0 then the parallelogram law reduces to the

kx − yk2 = kxk2 + kyk2 and kx + vyk2 = kxk2 + kyk2.

By definition, an inner product space X is a vector space, so that a subset M ⊂ X is a subspace if αx + βy ∈ M for all x, y ∈ M and scalars α and β. Thus if x0 ∈ X, then M = {αx0} = Span{x0} is a subspace.

Definition 8. The (orthogonal) projection of x ∈ X on M = Span{x0} is defined by

hx, x0i P x = 2 x0. kx0k Note that the following properties of P hold.

1. P 2 = P . Namely we have P 2x = P x.

2. hP x, yi = hx,P yi.

3. hP x, (I − P )yi = 0

4. For every x ∈ H we have x = P x + (I − P )x and P x ⊥ (I − P )x.

Definition 9. 1. Given x and y in X we define the angle θ between x and y by the formula

hx, yi cos(θ) = . kxk kyk

2. If M ⊂ X the the orthogonal complement of M, denoted M ⊥ is the subspace

M ⊥ = {x ∈ X : hx, mi = 0 for all m ∈ M}.

Theorem 5 (Projection Theorem). Let M be a subspace of X. For every x ∈ M there exists a ⊥ unique x0 ∈ M, y0 ∈ M such that x = x0 + y0. We call x0 the orthogonal projection of x onto M and denote it by x0 = PM (x). We note that PM ⊥ is an orthogonal projection and (I − PM ) is also an orthogonal projection, the projection onto M .

7 More generally we have

Theorem 6 (Projection Theorem). Let S be a subspace of an inner product space V and let x ∈ V . n Let {uj}j=1 be an for S. Then

n X p = hx, uji uj j=1

is the orthogonal projection of x onto S. We have p − x ∈ S⊥ and

kp − xk ≤ ky − xk for all y 6= p ∈ S.

n Further, if we define the matrix U with the columns {uj}j=1, i.e., U = [u1, u2, ··· , un] then

p = UU Tx.

5.5 Gram-Schmidt

n Given a basis {xj}j=1 for an inner product vector V we want to compute from it an orthonormal n basis {uj}j=1. The process for doing this is called Gram-Schmidt Orthogonalization.

n Theorem 7 (Gram-Schmidt Process). Let {xj}j=1 be a basis for an inner product vector V . Let

 1  u1 = x1. kx1k

Now define u2, u3, ··· , un, recursively, by  1  uk+1 = (xk+1 − pk) for k = 1, ··· , (n − 1), kxk+1 − pkk where k X pk = hxk+1, uji uj. j=1

8