Notes for Chapter 5 – Orthogonality 5.1 the Scalar Product in R
Notes for Chapter 5 – Orthogonality
n 5.1 The Scalar Product in R n The vector space , R = {x = (x1, x2, ··· , xn): xj ∈ R for j = 1, ··· , n}. Here we define the vector addition and scalar multiplication as follows.
1. Given vectors x = (x1, x2, ··· , xn) and y = (y1, y2, ··· , yn) we define x + y by
x + y = (x1 + y1, x2 + y2, ··· , xn + yn).
2. Given a scalar α and a vector x = (x1, x2, ··· , xn) we define αx by
αx = (αx1, αx2, ··· , αxn).
The scalar product (or dot product) of x and y ∈ Rn is defined by
n T X x · y = x y = xjyj (1) j=1
We use the scalar product to define a distance function called the norm.
Definition 1. 1. The length of a vector x ∈ Rn is defined by
n !1/2 T 1/2 X 2 kxk = (x x) = xj . (2) j=1
2. The distance between two vectors x and yRn is
n !1/2 X 2 kx − yk = (xj − yj) . (3) j=1
3. If θ is the angle between two vectors x and y ∈ Rn then xTy cos(θ) = . (4) kxk kyk
4. A vector u is called a unit vector if kuk = 1. For any vector x ∈ Rn a direction vector u pointing in the direction of x is the unit vector u = x/kxk.
5. Two vectors x and y ∈ Rn are said to be orthogonal if xTy = 0, i.e., if the angle between them is 90◦. xTy 6. The Scalar Projection of a vector x onto a vector y is . kyk xTy 7. The Vector Projection of a vector x onto a vector y is P (x) = y. y kyk2 8. If vectors x and y are orthogonal then kx + yk2 = kxk2 + kyk2.
1 9. In general for all vectors x and y we have kx + yk2 ≤ kxk2 + kyk2 which is often called the triangle inequality. 10. (Cauchy-Schwartz Inequality) For all vectors x and y we have
T x y ≤ kxk kyk. (5)
If P and P are points in n, then the vector from P = (x , x , ··· , x ) to P = (y , y , ··· , y ) is 1 2−−→ R 1 1 2 n 2 1 2 n denoted by P1P2. This vector can be written as −−→ P1P2 =< (y1 − x1), (y2 − x2), ··· , (yn − xn) > .
n n If N is a nonzero vector and P0 is a point in R , then the set of points P = (x1, x2, ··· , xn) ∈ R T −−→ n satisfying N P0P = 0 forms a hyperplane in R that passes through the point P0 with normal 0 0 0 vector N. For example, if N =< a1, a2, ··· , an > and P0 = (x1, x2, ··· , xn) then the equation of the hyperplane is
T −−→ 0 0 0 N P0P = a1(x1 − x1) + a2(x2 − x2) + ··· + an(xn − xn) = 0.
In R3 a hyperplane is called a plane and the above formula is the one you learned in calculus III for the equation of a plane. In R2 a hyperplane is called a line and the above formula is the one you learned in calculus III for the equation of a line. We can use this information to answer some simple basic problems. For example, to find the distance 3 T −−→ d from a point P1 = (x1, y1, z1) in R to the plane N P0P = we can use the scalar projection to find T −−→ |N P0P1 | |a (x − x ) + a (y − y ) + a (z − z )| d = = 1 1 0 2 1 0 3 1 0 . kNk p 2 2 2 a1 + a2 + a3 The analog of this in R2 would be to find the distance d from a point to a line. In this case the line may be written as follows: The line passing through a point P0 = (x0, y0) and orthogonal to a vector N =< a1, a2 > is given by
T −−→ N P0P1 = a1(x − x0) + a2(y − y0).
2 T −−→ The distance from a point P1 = (x1, y1) in R to the line N P0P = 0 is
T −−→ |N P0P1 | |a (x − x ) + a (y − y )| d = = 1 1 0 2 1 0 . kNk p 2 2 a1 + a2
5.2 Orthogonal Subspaces
Definition 2. 1. Two subspaces X and Y of Rn are called orthogonal if xTy = 0 for every x ∈ X and y ∈ Y . In this case we write X ⊥ Y .
2. If Y is a subspace of Rn then the Orthogonal Complement of Y , denoted Y ⊥ is defined by ⊥ n T Y = {x ∈ R : x y = 0 ∀ y ∈ Y }. (6)
2 Example 1. Let A be a m × n matrix. Then A generates a linear mapping from Rn to Rm by T (x) = Ax. The column space of A is the same as the image of T . We denote the range space by R(A): m n R(A) = {b ∈ R : b = Ax for some x ∈ R }. (7) The column space of AT, i.e., R(AT) ⊂ Rn
T n T m R(A ) = {y ∈ R : y = A x for some x ∈ R }. (8)
If y ∈ R(AT) ⊂ Rn and x ∈ N(A) then we claim x and y are orthogonal, i.e., R(AT) ⊥ N(A). T m T Recall the y ∈ R(A ) means there is a vector x0R such that y = A x0. So we have
T T T T x y = x A x0 = (Ax) x0 = 0
since x ∈ N(A).
This gives us the so-called Fundamental Subspace Theorem: Theorem 1. Let A be a m × n matrix. The we have
N(A) = R(AT)⊥ and N(AT) = R(A)⊥. (9)
Definition 3. If U and V are subspaces of a vector space W and if each w ∈ W can be written uniquely in the form w = u + v. Notice that this is really two statements: 1) every w can be written in this form, and, 2) the representation is unique. When this is the case we say the W is the direct sum of U and V and we write W = U ⊕ V .
We also have the following general results for subspaces of the vector space Rn: Theorem 2. 1. Let S be a subspace of Rn. Then we have dim(S) + dim(S⊥) = n. Further, if r n n n {xj}j=1 is a basis for U and {xj}j=r+1 is a basis for V then {xj}j=1 is a basis for R .
2. If S be a subspace of Rn then S ⊕ S⊥ = Rn.
3. If S be a subspace of Rn then (S⊥)⊥ = S.
4. If A is an m × n matrix and b ∈ Rm, then one of the following alternatives must hold:
(a) There is an x ∈ Rn so that Ax = b, or, (b) There is a vector y ∈ Rm so that ATy = 0 (i.e., y ∈ N(AT) ) and yTb 6= 0. 5. In other words, b ∈ R(A) ⇔ b ⊥ N(AT). N.B. This is a very important result.
(a) When we want to solve Ax = b it can be very important to have a test to decide if the problem is solvable. That is, whether b ∈ R(A). This result tells us that if we find a basis for N(AT) we can check to see if b ∈ R(A) by simply checking to see if b ⊥ y for all y in a basis for N(AT). (b) You will also find that this is the basis for the method of least squares in the next section.
6. We can write n T R = R(A ) ⊕ N(A). Remark 1. 1. To find a basis for the R(A) we find the row echelon form to determine the pivot columns then take exactly those columns of A.
3 1 2 3 1 2. We can also proceed as in the following example. Given A = 1 3 5 −2. We note 3 8 13 −3 that a basis for the R(A) is the same thing as a basis for the column space of A. Now we also know that the column space of A is the same as the row space of AT so we find the row echelon form U of AT and take the transpose of the pivot rows. Here
1 1 3 1 1 3 T 2 3 8 0 1 2 A = ⇒ U = . 3 5 13 0 0 0 1 −2 −3 0 0 0
Therefore < 1, 1, 3 > and < 0, 1, 2 > form a basis for R(A).
5.3 Least Squares When we have an overdetermined linear system it is unlikely there will be a solution. Nevertheless this is exactly the type of problem very often considered in applications. What happens in these applications is that one seeks to find a so called least squares solution. This is something that is close to being a solution in a certain sense. In particular we have an overdetermined system Ax = b where A is m × n with m > n (usually much larger).
Definition 4 (Least Squares Problem). Given a system Ax = b where A is m × n with m > n, b ∈ Rm, then for each x ∈ Rn we can form the residual r(x) = b − Ax.
The distance from b to Ax is kb − Axk = kr(x)k.
1. The least squares problem is to find a vector x ∈ Rn for which kr(x)k is minimum.
2. A vector xb for which kr(xb)k = sup kr(x)k x∈Rn is called a least squares solution of Ax = b.
3. If A is m × n with m > n and Rank(A) = n then
P = A(ATA)−1AT
is called the projection matrix mapping Rm onto the column space of A.
Theorem 3. If A is m × n with m > n and Rank(A) = n , then for each b ∈ Rm the normal equations ATAx = ATb has a unique solution given by T −1 T xe = (A A) A b which is the unique least square solution of Ax = b.
4 n 2 T Example 2. Given a collection of points {(xj, yj)}j=1 in R let x = (x1, x2, ··· , xn) and y = T (y1, y2, ··· , yn) . Let n n 1 X 1 X x = x , oly = y n j n j j=1 j=1
and let y = c0 + c1x be a linear function that gives the least squares fit to the points. To find the best fit by a linear equation means we must solve 1 x1 y1 1 x2 c y2 0 = . . . c . . . 1 . 1 xn yn
Let 1 x1 y1 1 x2 c y2 A = , x = 0 , and b = . . . c . . . 1 . 1 xn yn Then the normal equations ATAx = ATb become n n X X n xj yj j=1 c0 j=1 n n = n c1 X X 2 X xj xj xjyj j=1 j=1 j=1 Dividing both sides of the equation by n gives
1 x ! y ! c0 1 2 = 1 T x kxk c1 x y n n We could solve the problem by analyzing the augmented matrix
1 x y ! 1 1 . x kxk2 xTy n n But we can also try to compute (ATA)−1 and multiply it times both sides. In order to do that we must compute the determinant: 1 det(ATA) = kxk2 − (x)2. n Under the assumption that this is not zero we can readily solve the normal equations. Consider the special case in which the data satisfies the condition x = 0, (the x-data has mean zero) the system of equations becomes diagonal and we have
xTy c = y, and c = . 0 1 xTx
5 5.4 Inner Product Spaces Definition 5. Let X be a (real) vector space.
1. A real inner product h·, ·i is a function from X × X to R satisfying: (a) hx, yi = hy, xi for all x ∈ X. (b) hαx, yi = αhx, yi for all x, y ∈ X and scalar α ∈ R. (c) hx + y, zi = hx, zi + hy, zi. (d) hx, xi ≥ 0 for all x ∈ X and hx, xi = 0 if and only if x = 0.
2. A vector space with a real inner product is called a real inner product space.
Definition 6 (Chapter 6). Let X be a (complex) vector space.
1. A complex inner product h·, ·i is a function from X × X to C satisfying:
(a) hx, yi = hy, xi for all x ∈ X. (b) hαx, yi = αhx, yi for all x, y ∈ X and scalar α ∈ C. (c) hx + y, zi = hx, zi + hy, zi. (d) hx, xi ≥ 0 for all x ∈ X and hx, xi = 0 if and only if x = 0.
2. A vector space with a complex inner product is called a complex inner product space.
Note that for a complex vector space we have
hx, αyi = hαy, xi = α hy, xi = α hx, yi, and therefore hx, αyi = α hx, yi. (10)
n n X Example 3. 1. X = R is a real inner product space with hx, yi = xjyj. j=1
Z b 2. L2(a, b) is a real inner product space with hf, gi = f(x) g(x) dx. a 3. Rm×n is a real vector space consisting of all m × n matrices A. An inner product and norm on Rm×n is m n m n !1/2 X X X X 2 A · B = aijbij, kAk = aij . i=1 j=1 i=1 j=1 Theorem 4 (Schwarz Inequality). For any x, y in a complex (or real) inner product space X we have |hx, yi| ≤ hx, xi1/2hy, yi1/2. (11) Equuality holds if and only if y is a multiple of x.
Remark 2. Every inner product space has a distance function called the norm. The norm is defined by kxk = hx, xi1/2. (12)
6 Lemma 1. The norm induced from the inner product satisfies the parallelogram law
kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2.
By analogy with the special inner product spaces Rn we make the following definitions for general inner product spaces.
Definition 7. 1. Two vectors x and y in X are orthogonal if hx, yi = 0.
2. A subset S ⊂ X is an orthogonal set if each pair of elements of S are orhtogonal.
3. A set is orthonormal if it is orthogonal and every element x ∈ S satisfies kxk = 1.
Remark 3. If hx, yi = 0 then the parallelogram law reduces to the Pythagorean Theorem
kx − yk2 = kxk2 + kyk2 and kx + vyk2 = kxk2 + kyk2.
By definition, an inner product space X is a vector space, so that a subset M ⊂ X is a subspace if αx + βy ∈ M for all x, y ∈ M and scalars α and β. Thus if x0 ∈ X, then M = {αx0} = Span{x0} is a subspace.
Definition 8. The (orthogonal) projection of x ∈ X on M = Span{x0} is defined by
hx, x0i P x = 2 x0. kx0k Note that the following properties of P hold.
1. P 2 = P . Namely we have P 2x = P x.
2. hP x, yi = hx,P yi.
3. hP x, (I − P )yi = 0
4. For every x ∈ H we have x = P x + (I − P )x and P x ⊥ (I − P )x.
Definition 9. 1. Given x and y in X we define the angle θ between x and y by the formula
hx, yi cos(θ) = . kxk kyk
2. If M ⊂ X the the orthogonal complement of M, denoted M ⊥ is the subspace
M ⊥ = {x ∈ X : hx, mi = 0 for all m ∈ M}.
Theorem 5 (Projection Theorem). Let M be a subspace of X. For every x ∈ M there exists a ⊥ unique x0 ∈ M, y0 ∈ M such that x = x0 + y0. We call x0 the orthogonal projection of x onto M and denote it by x0 = PM (x). We note that PM ⊥ is an orthogonal projection and (I − PM ) is also an orthogonal projection, the projection onto M .
7 More generally we have
Theorem 6 (Projection Theorem). Let S be a subspace of an inner product space V and let x ∈ V . n Let {uj}j=1 be an orthonormal basis for S. Then
n X p = hx, uji uj j=1
is the orthogonal projection of x onto S. We have p − x ∈ S⊥ and
kp − xk ≤ ky − xk for all y 6= p ∈ S.
n Further, if we define the matrix U with the columns {uj}j=1, i.e., U = [u1, u2, ··· , un] then
p = UU Tx.
5.5 Gram-Schmidt Orthogonalization
n Given a basis {xj}j=1 for an inner product vector V we want to compute from it an orthonormal n basis {uj}j=1. The process for doing this is called Gram-Schmidt Orthogonalization.
n Theorem 7 (Gram-Schmidt Process). Let {xj}j=1 be a basis for an inner product vector V . Let
1 u1 = x1. kx1k
Now define u2, u3, ··· , un, recursively, by 1 uk+1 = (xk+1 − pk) for k = 1, ··· , (n − 1), kxk+1 − pkk where k X pk = hxk+1, uji uj. j=1
8