<<

5. 5.1. The in 5.1 The Scalar Product in Rn

Thusfar we have restricted ourselves to vector spaces and the operations of and scalar : how else might we combine vectors? You should know about the scalar (dot) and cross products of vectors in R3 The scalar product extends nicely to other vector spaces, while the is another story1 The basic purpose of scalar products is to define and analyze the lengths of and between vectors2

1But not for this class! You may see ‘wedge’ products in later classes. . . 2It is important to note that everything in this chapter, until mentioned, only applies to real vector spaces (where F = R) 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Euclidean Space

Definition 5.1.1 Suppose x, y Rn are written with respect to the standard ∈ e ,..., e { 1 n} 1 The scalar product of x, y is the real numbera (x, y) := xTy = x y + x y + + x y 1 1 2 2 ··· n n 2 x, y are orthogonal or perpendicular if (x, y) = 0 3 n-dimensional Euclidean Space Rn is the of n 1 column vectors R × together with the scalar product

aOther notations include x y and x, y · h i Euclidean Space is more than just a collection of co-ordinates vectors: it implicitly comes with notions of and length3 Important Fact: (y, x) = yTx = (xTy)T = xTy = (x, y) so the scalar product is symmetric 3To be seen in R2 and R3 shortly 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Angles and Lengths

Definition 5.1.2 p The length of a vector x Rn is its x = (x, x) ∈ || || The distance between two vectors x, y is given by y x || − ||

y (y1, y2) Theorem 5.1.3

θ [0, π] y x The angle between two vectors − ∈ y x, y in R2 or R3 satisfies the equation (x , x ) θ 1 2 (x, y) = x y cos θ x || || || || x Definition 5.1.4 We define the angle θ between x, y Rn to be the ∈ 1 (x, y) θ = cos− x y || || || || 1 θ is the smaller of the two possible angles, since cos− has range [0, π] 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Proof of Theorem.

y (y1, y2) If x, y are parallel then θ = 0 or π and the Theorem is trivial y x 2 − Otherwise, in R (or in the y Span(x, y) R3), the cosine rule (x , x ) ≤ θ 1 2 holds: x x x y 2 = x 2 + y 2 2 x y cos θ || − || || || || || − || || || || Applying the definition of norm and scalar product, we obtain 2 x y cos θ = x 2 + y 2 x y 2 || || || || || || || || − || − || = xTx + yTy (x y)T(x y) − − − = xTx + yTy (xTx + yTy xTy yTx) − − − = xTy + yTx = 2(x, y) as required 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Basic results & inequalities

Several results that you will have used without thinking in elementary follow directly from the definitions

Theorem 5.1.5 (Cauchy–Schwarz ) If x, y are vectors in Rn then

(x, y) x y | | ≤ || || || || with equality iff x, y are parallel

Proof.

(x, y) = x y cos θ = x y cos θ x y | | || || || || || || || || | | ≤ || || || || Equality is satisfied precisely when cos θ = 1: that is when ± θ = 0, π, and so x, y are parallel 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Theorem 5.1.6 (Triangle inequality) If x, y Rn then x + y x + y ∈ || || ≤ || || || || I.e. Any side of a triangle is shorter than the sum of the others y Proof. x + y 2 = (x + y, x + y) = (x + y)T(x + y) || || 2 2 = x + 2(x, y) + y y || || || || 2 2 x + 2 (x, y) + y x + y ≤ || || | | || || x 2 + 2 x y + y 2 ≤ || || || || || || || || 2 = ( x + y ) x || || || || x If (x, y) = 0, the second in the proof of the triangle inequality immediately yields Theorem 5.1.7 (Pythagoras’) If x, y Rn are orthogonal then x + y 2 = x 2 + y 2 ∈ || || || || || || 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

Example  1   3  Let x = 2 and y = −1 , then 1 3 − q x = (x, x) = √1 + 4 + 1 = √6 || || y = √9 + 1 + 9 = √19 || || (x, y) = 3 + 2 3 = 4 − − −

1 (x, y) 1 4 θ = cos− = cos− − x y √6√19 || || || || 1.955 rad 112◦ ≈ ≈  4  y x = −1 = √33 || − || −4 √6 + √19 ≤ 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Projections

Scalar products are useful for calculating how much of one vector points in the direction of another Definition 5.1.8 Rn 1 The in the direction of v is the vector v v ∈ || || The of x onto v = 0 in Rn is 6 the scalar product  1  (v, x) α (x) = v, x = v v v || || || || x The orthogonal (or vector) projection of x onto v = 0 in Rn is 6 1 (v, x) v πv(x) = αv(x) v = 2 v v v πv(x) || || || || Note: α (x) = π (x) : if α (x) < 0 then the projection of x v 6 || v || v onto v points in the opposite direction to v 5. Orthogonality 5.1. The Scalar Product in Euclidean Space

Orthogonal projection means several things:

1 π (Rn) (π is a ) v ∈ L v n 2 πv(R ) = Span(v) (Projection onto Span(v))

3 πv(v) = v (Identity on Span(v)) 4 ker π = v = y Rn : (y, v) = 0 (Orthogonality) v ⊥ { ∈ }

1 , 2 , 3 say that πv is a projection

4 makes the projection orthogonal: any- x thing orthogonal to v is mapped to zero ker πv

n v Similarly αv (R , R) ∈ L πv(x) 5. Orthogonality 5.1. The Scalar Product in Euclidean Space The of a projection

Since πv is a linear map, it has a standard matrix representation Indeed

(v, x) (v, x) vTx vvT πv(x) = v = v = v = x v 2 v 2 v 2 v 2 || || || || || || || || vvT whence the matrix of πv is the n n matrix 2 v × || || Example 2 x In R , orthogonal projection onto v = ( y ) has matrix

1 x 1 x2 xy A = (x y) = v x2 + y2 y x2 + y2 xy y2

1 0  Projection onto the x-axis is therefore Ai = 0 0 , while 1 1 1  projection onto the line y = x is Ai+j = 2 1 1 5. Orthogonality 5.1. The Scalar Product in Euclidean Space Planes in R3

Projections are useful for describing planes4 in R3  a  Let P be the plane normal to n = b R3 and which passes c ∈  x0  through the point with vector x0 = y0 z0

The distance d of the plane from the origin is the scalar projection of any vector in the plane onto n: thus

(x , n) d = α (x ) = 0 n 0 n || || ax + by + cz = 0 0 0 √a2 + b2 + c2

4And more general affine spaces in arbitrary dimension 5. Orthogonality 5.1. The Scalar Product in Euclidean Space The plane P is the set of points whose scalar projection onto n is d: otherwise said P = x : α (x) = d = α (x ) { n n 0 } However α (x) = α (x ) α (x x ) = 0 (x x , n) = 0 n n 0 ⇐⇒ n − 0 ⇐⇒ − 0 a(x x ) + b(y y ) + c(z z ) = 0 ⇐⇒ − 0 − 0 − 0 which is an alternative description of the plane Example  7   0  If x0 = 9 , and n = 3 , then P has equation 3 1 3(y 9) + (z 3) = 0 or 3y + z = 30 − − The distance to P is

(x0, n) 30 d = αn(x0) = = n √10 || || 5. Orthogonality 5.1. The Scalar Product in Euclidean Space The distance of a vector y from P is the scalar projection of y onto the normal n with d subtracted:

dist(y, P) = α (y) d n − (n, y) = α (x ) n − n 0 || || (n, y x ) = − 0 n || || Example  3   7   0  If y = 2 , x0 = 9 , and n = 3 , then 1 3 1 23 dist(y, P) = − √10 The negative means that y is ‘below’ P (on the opposite side of P to the direction of n) 5. Orthogonality 5.2. Orthogonal subspaces 5.2 Orthogonal subspaces

Recall that x, y Rn are orthogonal if (x, y) = xTy = 0 ∈ Definition 5.2.1 Two subspaces U, V Rn are orthogonal, written U V, iff ≤ ⊥ (u, v) = 0 for all u U, v V ∈ ∈ The orthogonal complement to U in Rn is the subspace n U⊥ := x R : (x, u) = 0, u U { ∈ ∀ ∈ }

E.g. A plane and its normal line intersecting at the origin are orthogonal complements 5. Orthogonality 5.2. Orthogonal subspaces

The previous example suggests the following Lemma 5.2.2 n 1 U⊥ as defined really is a subspace of R 2 U U = 0 ∩ ⊥ { }

Proof. 1 (0, u) = 0 for all u U, hence 0 U is non-empty ∈ ∈ ⊥ Now let u U, x, y U , and α, β R, then ∈ ∈ ⊥ ∈

(αx, u) = α(x, u) = 0 = αx U⊥ ⇒ ∈ (x + y, u) = (x, u) + (y, u) = 0 = x + y U⊥ ⇒ ∈

hence U⊥ is closed under and addition and is therefore a subspace 2 Let x U U , then (x, x) = 0, whence x = 0 ∈ ∩ ⊥ 5. Orthogonality 5.2. Orthogonal subspaces

Examples

 1   1  3 1 Suppose U = Span(u1, u2) = Span 3 , −0 R 2 1 − ≤ U⊥ is spanned by all vectors orthogonal to u1 & u2 Multiples of the cross-product u u are the only such 1 × 2 vectors, whence

 1   1   3  U⊥ = Span 3 −0 = Span 1 2 1 3 − ×

In general the orthogonal complement U⊥ to a plane U R3 is spanned by the cross-product of any two ≤ spanning vectors in U: hence U⊥ is always a line 5. Orthogonality 5.2. Orthogonal subspaces

Examples  2  2 Suppose U = Span(u) = Span −2 5 Then (x, u) = 0 uTx = 0 ( 2 2 5)x = 0, ⇐⇒ ⇐⇒ − whence we find the nullspace:

 1   5  U⊥ = N( 2 2 5) = Span 1 , 0 − 0 2

In general, the orthogonal complement to a line U R3 is ≤ the nullspace of a 1 matrix uT R1 3 ∈ × uT has nullity 3 1 = 2 and so dim U = 2: hence U is − always a plane

We will see shortly that orthogonal complements are naturally thought of as nullspaces of particular matrices 5. Orthogonality 5.2. Orthogonal subspaces Non-degeneracy

The scalar product is said to be non-degenerate in the sense that

(x, y) = 0, y Rn = x = 0 ∀ ∈ ⇒ Alternatively said, the only vector which is orthogonal to everything is the zero-vector 0:

n (R )⊥ = 0 { } We can check this: if x is orthogonal to all y Rn, then ∈

(x, ei) = 0 for every vectore1,..., en

But (x, e ) = x = x = 0 for all i and so x = 0 i i ⇒ i Similarly 0 = Rn { }⊥ 5. Orthogonality 5.2. Orthogonal subspaces Orthogonality and matrices

For a general matrix A, we consider how the N(A) and C(A) are related to orthogonality First we need to see how interacts with the scalar product Lemma 5.2.3 If x Rn, y Rm, and A Rm n, then ∈ ∈ ∈ × (Ax, y) = (x, ATy)

Proof. (Ax, y) = (Ax)Ty = xTATy = (x, ATy)

Note that the scalar product on the left is of vectors in Rm, while the product on the right is of vectors in Rn 5. Orthogonality 5.2. Orthogonal subspaces Theorem 5.2.4 (Fundamental subspaces) If A Rm n thena ∈ × T T N(A) = C(A )⊥ and N(A ) = C(A)⊥

aWarning: the book uses the strange notation R(A) = Range(A) for the column space of A here, rather than our C(A)

Proof. Using the definition we see that

T n T C(A )⊥ = x R : (x, z) = 0, z C(A ) { ∈ ∀ ∈ } = x Rn : (x, ATy) = 0, y Rm { ∈ ∀ ∈ } = x Rn : (Ax, y) = 0, y Rm (Lemma 5.2.3) { ∈ ∀ ∈ } = x Rn :Ax = 0 (Non-degeneracy) { ∈ } = N(A)

The second formula comes from replacing A AT ↔ 5. Orthogonality 5.2. Orthogonal subspaces The Theorem tells us how to find the orthogonal complement to a general subspace U Rn: ≤ 1 Take a basis u ,..., u of U { 1 r} 2 Build the rank r matrix A Rn r with columns u ,..., u ∈ × 1 r 3 U = C(A) = U = N(AT) ⇒ ⊥ Example  1   5  0 2 4 If U = Span 1 , −0 R , then −0 1 ≤

 1 5  0 2 T 1 0 1 0  A = 1− 0 = A = 5 2− 0 1 −0 1 ⇒ − from which we find U⊥ as the nullspace

 0   1  T 1 0 U⊥ = N(A ) = Span 0 , 1 2 5 − 5. Orthogonality 5.2. Orthogonal subspaces

Theorem 5.2.5 1 If S Rn then dim S + dim S = n ≤ ⊥ 2 If B = s ,..., s is a basis of S, then we may form a basis { 1 r} C = s ,..., s of S such that B C is a basis of Rn { r+1 n} ⊥ ∪ The Theorem clears up what we’ve already seen: e.g. the orthogonal complement to a line in R3 is always a plane, etc. Proof. Suppose S = 0 , otherwise S = Rn and the Theorem is trivial 6 { } ⊥   | | n r Otherwise let A = s1 sr R ··· ∈ × | | Since B is a basis we have S = C(A), whence Theorem 5.2.4 yields T S⊥ = C(A)⊥ = N(A ) The Rank–Nullity Theorem gives us 1 : T T dim S⊥ = null A = n rank A = n r = n dim S − − − 5. Orthogonality 5.2. Orthogonal subspaces

Proof (cont). Now choose a basis C = s ,..., s of S and suppose that { r+1 n} ⊥ α1s1 + + αrsr + αr+1sr+1 + + αnsn = 0 | ···{z } | {z··· } s S s S ∈ ⊥∈ ⊥

Lemma 5.2.2 = s = s S S = 0 = s = s = 0 ⇒ − ⊥ ∈ ∩ ⊥ { } ⇒ ⊥ Since B, C are bases it follows that all αi = 0, whence s1,..., sn are linearly independent Since dim Rn = n we necessarily have a basis of Rn 5. Orthogonality 5.2. Orthogonal subspaces Direct Sums of Subspaces

Definition 5.2.6 Suppose that U, V are subspaces of W Moreover suppose that each w W can be written uniquely as a ∈ sum w = u + v for some u U, v V ∈ ∈ Then W is the direct sum of U and V and we write W = U V ⊕ W = U V is equivalent to both of the following holding ⊕ simultaneously:

1 W = U + V; everything in W can be written as a combination u + v 2 U V = 0 ; the is unique ∩ { } 5. Orthogonality 5.2. Orthogonal subspaces

Orthogonal complements are always direct sums

Theorem 5.2.7 If S is a subspace of Rn then S S = Rn ⊕ ⊥ Proof. We must prove S S = 0 and Rn = S + S ∩ ⊥ { } ⊥ The first is Lemma 5.2.2, part 2 For the second we use Theorem 5.2.5 and the homework:

dim(S + S⊥) = dim S + dim S⊥ dim(S S⊥) = n − ∩ n from which S + S⊥ = R Thus S S = Rn ⊕ ⊥ 5. Orthogonality 5.2. Orthogonal subspaces Theorem 5.2.8 n If S is a subspace of R then (S⊥)⊥ = S

Proof. If s S then ∈ (s, y) = 0 for all y S⊥ ∈ Thus s (S ) hence S (S ) ∈ ⊥ ⊥ ≤ ⊥ ⊥ Conversely, let z (S ) ∈ ⊥ ⊥ Since Rn = S S there exist unique s S, s S such that ⊕ ⊥ ∈ ⊥ ∈ ⊥ z = s + s⊥

Now take scalar products with s⊥: 2 0 = (z, s⊥) = (s, s⊥) + (s⊥, s⊥) = s⊥ = s⊥ = 0 ⇒ Hence z = s S and we have (S ) S ∈ ⊥ ⊥ ≤ Putting both halves together gives the Theorem 5. Orthogonality 5.2. Orthogonal subspaces The Fundamental Subspaces Theorem has a bearing on whether linear systems have solutions

Corollary 5.2.9 Let A Rm n and b Rm. Then exactly one of the following holds: ∈ × ∈ 1 There is a vector x Rn such that Ax = b, or ∈ 2 There exists some y N(AT) Rm such that (y, b) = 0 ∈ ≤ 6 The corollary is illustrated for m = n = 3, and rank A = 2: a 2 suitable, but unnecessary, choice satisfying is y = πN(AT)(b) Proof. N(AT) = C(A) = Rn = C(A) N(AT) ⊥ ⇒ ⊕ Write b = p + y according to the direct sum, then (b, y) = y 2 | | This is zero iff b C(A) iff Ax = b has a ∈ solution 5. Orthogonality 5.3. Least squares problems 5.3 Least squares problems

In applications, one often has more equations than unknowns and cannot find a solution to all of them simultaneously: what do we do? Idea: find a combination of variables that comes as close as possible to solving all the equations Many methods exist: depend on type of problem, definition of ‘close as possible’, etc.5 We consider a method for approaching overdetermined linear systems, first championed by Gauss

5Take a Numerical Analysis class for more! 5. Orthogonality 5.3. Least squares problems

Suppose Ax = b is an overdetermined system: i.e. A Rm n with m > n (more rows than columns) ∈ × b Rm is given ∈   x.1 x = . Rn is the column vector of variables xn ∈ The picture from Corollary 5.2.9 gives us an approach: In general b C(A) and there is no solution 6∈ The closest we can get to a solution x would be to choose xˆ so that Axˆ is as close as possible to b

Since Rn = C(A) N(AT), we decompose ⊕ b = p + y and instead solve Axˆ = p 5. Orthogonality 5.3. Least squares problems Least Squares?

Suppose Ax = b is our m n overdetermined system × Any vector x Rn creates a residual r(x) = Ax b Rm: either ∈ − ∈ 1 We can solve Ax = b and thus make r(x) = 0, or 2 We want to minimize the residual; equivalent to minimizing the length r(x) || || Definition 5.3.1 If xˆ Rn is such that Axˆ b Ax b for all x Rn then ∈ || − || ≤ || − || ∈ we say that xˆ is a least squares solution to the system Ax = b

Minimizing r(x) is equivalent to minimizing r(x) 2, a || || || || sum of squares: no -roots! In general there will be many least squares solutions to a given system: if xˆ is such, then xˆ + n is another for any n N(A) ∈ 5. Orthogonality 5.3. Least squares problems

Theorem 5.3.2 Let S Rm and b Rm, then: ≤ ∈ 1 There exists a unique p S which is ∈ closest to b 2 p S is closest to b iff p b S ∈ − ∈ ⊥

Proof. Since Rm = S S we may write b = p + s for some p S ⊕ ⊥ ⊥ ∈ and s S . Let s S, then ⊥ ∈ ⊥ ∈ b s 2 = b p + p s 2 || − || || − − || = b p 2 + p s 2 (Pythagoras’) || − || || − || b p 2 ≥ || − || with equality iff p = s

The closest point in S to b is therefore the orthogonal projection of b onto S 5. Orthogonality 5.3. Least squares problems

By Theorem 5.3.2, it follows that xˆ is a least squares solution to Ax = b iff Axˆ = p = πC(A)b We don’t yet have a formula for calculating the orthogonal projection πS for a general subspace S, but we can calculate when S is 1-dimensional Example  1   1  Find the vector p S = Span 3 which is closest to b = −0 ∈ 2 1 We want the projection onto S = Span(s):

(s, b) 1  1  p = πS(b) = s = 3 s 2 14 2 || || 5. Orthogonality 5.3. Least squares problems Unique Least Squares Solutions

We address the simplest situation of least squares solutions xˆ to Ax = b: when the solution xˆ is unique Theorem 5.3.3 If A Rm n has rank A = n, then the equations ∈ × ATAx = ATb have a unique solution T 1 T xˆ = (A A)− A b which is the unique least squares solution to the system Ax = b

Proof. We must prove three things: 1 ATA is invertible T 1 T 2 xˆ = (A A)− A b is a least squares solution to Ax = b 3 xˆ is the only least squares solution 5. Orthogonality 5.3. Least squares problems

Proof (cont). 1 Suppose that z Rn solves ATAz = 0 ∈ Then T Az N(A ) = C(A)⊥ (Fundamental Subspaces) ∈ But Az C(A), whence ∈ Az C(A) C(A) = 0 = Az = 0 ∈ ∩ ⊥ { } ⇒ To finish, null A = n rank A = 0 − from which Az = 0 has only the solution z = 0 Hence ATAz = 0 = z = 0, whence ATA is invertible ⇒ T 1 T T T 2 xˆ = (A A)− A b certainly solves A Ax = A b However, for any y Rn, ∈ T T 1 T T (Axˆ b, Ay) = (A A(A A)− A b A b, y) = 0 − − hence Axˆ b C(A) − ∈ ⊥ xˆ is therefore a least squares solution to Ax = b 5. Orthogonality 5.3. Least squares problems

Proof (cont). 3 Now suppose thaty ˆ is another least squares solution Then A(yˆ xˆ) = Axˆ b (Ayˆ b) | {z− } | − −{z − } C(A) C(A) ∈ ∈ ⊥ Since C(A) C(A) = 0 we have A(yˆ xˆ) = 0 ∩ ⊥ { } − Since rank A = n we necessarily havey ˆ xˆ = 0 and so the − least squares solution is unique

Note how often the fact that rank A = n is required: the Theorem is false without it! Example to come. . . 5. Orthogonality 5.3. Least squares problems General Orthogonal Projections (non-examinable)

Corollary 5.3.4 Suppose S Rm is a subspace with dim S = n ≤ Let A Rm n be any matrixa with C(A) = S ∈ × Then T 1 T πS = A(A A)− A is the orthogonal projection onto S

aNecessarily the columns of A form a basis of S

It is easy to see that if A = v is a column vector, then we recover the original definition of orthogonal projection onto a vector

T 1 T 1 T πv = v(v v)− v = vv v 2 || || 5. Orthogonality 5.3. Least squares problems Example Find the unique least-squares solution to the system of equations   x1 + 2x2 = 0 3x1 + 3x2 = 1  x2 = 4  1 2   0  We have Ax = b where A = 3 3 and b = 1 0 1 4 Since rank A = 2, the Theorem says that the unique solution is    1   T 1 T 1 3 0  1 2 − 1 3 0  0 xˆ = (A A)− A b = 3 3 1 2 3 1 0 1 2 3 1 4 1 10 11 − 3  1 14 11  3  1 35  = 11 14 7 = 19 11− 10 7 = 19 −37 − xˆ R2 is closest to a solution to Ax = b in the sense that ∈ Axˆ R3 is as close as possible to b: we are minimizing ∈ distance in R3, not in R2 Should check using multivariable calc that 2 2 2 f (x1, x2) = (x1 + 2x2) + (3x1 + 3x2 1) + (x2 4) has an 35− 37 − absolute minimum at (x1, x2) = ( −19 , 19 ) 5. Orthogonality 5.3. Least squares problems

Example

 3 6  Find all the least-squares solutions xˆ when A = 1 −2 and 1− 2  2  − b = −1 4 T 11 22  rank A = 1 < 2 and so A A = 22− 44 is non-invertible and we are obliged to solve ATAxˆ = −ATb directly This reads  11 22  9 − xˆ = − 22 44 18 − 9/11  2  whence xˆ = − 0 + λ 1 , where λ is any scalar There is a one-parameter set of least-squares solutions 5. Orthogonality 5.3. Least squares problems Best-fitting curves in Statistics

Least-squares solutions are often used in statistics when one wants to find a best fitting polynomial to a set of data points Example

Find the equation of the line y = α0 + α1t which minimizes the sum of the squares of the vertical distances to the data points (1, 3), (2, 6), and (3, 7)

Observe how the different choices of line 2 2 2 affect the sum of the distances d1 + d2 + d3 5. Orthogonality 5.3. Least squares problems Example (cont)

The sum of the squared errors, as a of α0, α1, is  α +α  2 2 2 2 0 1  3  (y(1) 3) + (y(2) 6) + (y(3) 7) = α0+2α1 6 − − − α0+3α1 − 7  1 1   3  2 α0 2 = 1 2 ( α ) 6 = Aα b 1 3 1 − 7 || − || α0 Therefore ( α1 ) is the least-squares solution

α0 T 1 T ( α1 ) = (A A)− A b  1   3  = 3 6 − 1 1 1 6 6 14 1 2 3 7 1 14 6  16  4/3  = 6 6− 3 36 = 2 − 4 We therefore get the line y = 3 + 2t

This is the “best-fitting least-squares” line to the data 5. Orthogonality 5.3. Least squares problems Best-fitting least-squares polynomials

Suppose (ti, bi) : i = 0, . . . , n is a set of data points where { 6 } t1,..., tn are distinct Question: If t is given, what do we expect b to be? We look for a polynomial p(t) of degree k < n which minimizes the squares of the errors in the dependent variable b p(t) is then a prediction of the value b if t is given Example Try plugging in the data “1 1; 2 2; 3 1; 4 3; 5 7; 6 2; 7 3;” to the applet for degrees 1–5

6 The ti are often time-values and the bi the values of some output at time ti 5. Orthogonality 5.3. Least squares problems

Let p(t) = α + α t + + α tk be a polynomial of degree k < n 0 1 ··· k The predictive error7 at t = t is the distance p(t ) b i | i − i| Choose coefficients α0,..., αk to minimize the sum of the squared errors n 2 ∑(p(ti) bi) i=1 −

Sum squares of errors for three reasons:

1 Positive and negative errors are treated the same (both positive) 2 Large errors are penalized much more than small ones 3 The calculations are much easier than other methods!

7If k = n then there is a unique polynomial through the n + 1 data points, so we have a formula b = f (t) and thus no predictive error for any ti 5. Orthogonality 5.3. Least squares problems Since we have the coefficients for p(t), we can write

 1 t t2 tk  1 1 1  α0  p(t1) ! ··· 2 k α1 1 t2 t2 t2 . =  ···   .  =: Pa, .  . .  . p(tn) . . α 1 t t2 tk k n n ··· n  b1  n (k+1) . defining the matrix P R × . Setting b = . , we are ∈ bn trying to minimize n 2 2 ∑(p(ti) bi) = Pa b i=1 − || − || This is a least squares problem Moreover, rank P = k + 1 is maximal iff the ti are distinct. The unique least squares solution is therefore aˆ = (PTP)PTb, which returns us the coefficients α0,..., αk of the best-fitting least-squares polynomial of degree k ≤ 5. Orthogonality 5.3. Least squares problems Example Find the best-fitting line and quadratic to the data

ti 1 2 3 4 bi 1 2 1 3

 1 1   1  1 2 2 For the straight line we have P = 1 3 and b = 1 , thus 1 4 3 1 T 1 T 4 10 − 7  1 1  aˆ = (P P)− P b = 10 30 20 = 2 1 1 hence p(t) = 2 (1 + t) is the best-fitting straight line

y 3 d4

2 d2 d3

1 d1 y = 0.5t + 0.5 2 ∑i di = 1.5 0 012345 t 5. Orthogonality 5.3. Least squares problems

Example (cont) t 1 2 3 4 For the data i the best-fitting quadratic requires bi 1 2 1 3  1 1 1  1 2 4 P = 1 3 9 , thus 1 4 16 1 T 1 T  4 10 30 −  7  1  7  aˆ = (P P)− P b = 10 30 100 20 = 3 30 100 354 66 4 −1 The best-fitting quadratic polynomial is therefore 1 2 p(t) = 4 (7 3t + t ) − y 3 d4

Note that p fits the data bet- 2 d2 ter than the straight line, other- d3 d wise the best-fitting quadratic 1 1 y = 0.25t2 0.75t + 1.75 would be a straight line! 2− ∑i di = 1.25 0 012345 t 5. Orthogonality 5.4. Inner product spaces 5.4 Inner Product Spaces

Inner products generalize the scalar product on Rn

Definition 5.4.1 An inner product ( , ) on a real vector space V is a function ( , ) : V V R which satisfies the following axioms: × → I (x, x) 0, x V with equality iff x = 0 ≥ ∀ ∈ II (x, y) = (y, x), x, y V ∀ ∈ III (αx + βy, z) = α(x, z) + β(y, z), x, y, z V, α, β R ∀ ∈ ∀ ∈ (V, ( , )) is an

( , ) is also called a positive definite (I), symmetric (II), (bi)linear8 (III) form III says that each map L : V R defined by L (x) = (x, z) is z → z linear9 8Linear in both arguments 9When dim V < ∞ it is a fact (beyond this course) that all linear maps V R are of the form L for some z V → z ∈ 5. Orthogonality 5.4. Inner product spaces Inner Products on Rn

If w1,..., wn > 0, then n (x, y) := ∑ wixiyi i=1 10 is an inner product: the wi are called weights Indeed if A Rn n is any symmetric (AT = A), positive-definite ∈ × (xTAx > 0, x = 0) matrix, then ∀ 6 (x, y) := xTAy is an inner product11 on Rn Two examples on R3 are 1 0 0 3 0 0 (x, y) = xT 0 3 0 y (x, y) = xT 0 1 1 y 0 0 4 0 1 2

10If w = w + 2 = = w = 1 we get the standard scalar product 1 ··· n 11Check each of I, II, III 5. Orthogonality 5.4. Inner product spaces Inner Products on Pn The standard basis 1, x, x2,..., xn 1 identifies P with Rn and { − } n we can use any of the inner products on the previous slide, e.g. 2 2 (a1 + b1x + c1x , a2 + b2x + c2x ) = a1a2 + b1b2 + c1c2 in P3

Alternatively, let x1,..., xn be distinct real and define n (p, q) := ∑ p(xi)q(xi) i=1 Conditions II and III clearly hold, but I needs a little work: n 2 (p, p) = ∑ p(xi) = 0 p(xi) = 0, i = 1, . . . , n i=1 ⇐⇒ ∀ This says that p(x) has at least n distinct roots However a polynomial of degree n 1 has at most n 1 ≤ − − roots, unless it is identically zero: hence I holds Can also have weights: if w(x) is a positive function n (p, q) := ∑ w(xi)p(xi)q(xi) is an inner product i=1 5. Orthogonality 5.4. Inner product spaces Inner Products on C[a, b]

Undoubtedly the most important example for future courses is the L2 inner product on C[a, b] Z b (f , g) := f (x)g(x) dx a We check R b I (f , f ) = f (x)2 dx 0 with equality iff f (x) 0 a ≥ ≡ (since f is continuous) R b R b II (f , g) = a f (x)g(x) dx = a g(x)f (x) dx = (g, f ) R b III (αf + βg, h)= a (αf (x) + βg(x))h(x) dx R b R b = α a f (x)h(x) dx + β a g(x)h(x) dx = α(f , h) + β(g, h) Can similarly define a weighted inner product Z b (f , g) = w(x)f (x)g(x) dx a where w(x) is any positive function 5. Orthogonality 5.4. Inner product spaces Basic Properties

Definition 5.4.2 If (V, ( , )) is an inner product space then the norm or length of a vector v V is ∈ q v := (v, v) || || v, w V are orthogonal iff (v, w) = 0 ∈ Observe: v = 0 (v, v) = 0 v = 0, by property I || || ⇐⇒ ⇐⇒ Theorem 5.4.3 (Pythagoras’) If v, w are orthogonal then v + w 2 = v 2 + w 2 || || || || || || The proof is identical to that given in Rn 5. Orthogonality 5.4. Inner product spaces Example Find the norms and inner products of the three vectors 1, sin x, cos x with respect to the L2 inner product on C[0, 2π] { } q R 2π 1 = 12 dx = √2π || || 0 q q R 2π R 2π sin x = sin2x dx = 1 (1 cos 2x) dx = √π || || 0 0 2 − cos x = √π || || R 2π (1, sin x) = 1 sin x dx = 0 0 · (1, cos x) = 0 R 2π (sin x, cos x) = 0 sin x cos x dx = 0 1, sin x, cos x are therefore orthogonal vectors in C[0, 2π] Dividing by the norms we see that 1 , 1 sin x, 1 cos x √2π √π √π are orthonormal vectorsa in C[0, 2π] aImportant for Fourier Series 5. Orthogonality 5.4. Inner product spaces Orthogonal Projections

Can define orthogonal projections exactly as in Rn

Definition 5.4.4 If v = 0 in an inner product space (V, ( , )) then the orthogonal 6 projection of x V onto v is ∈ (v, x) πv(x) = v v 2 || || In particular ! (v, x) (v, x) (πv(x), x πv(x)) = v, x v − v 2 − v 2 || || || || ! (v, x) (v, x) x x πv(x) = (v, x) v 2 − v 2 − v 2 || || || || || || = 0 v whence π (x) and x π (x) are orthogonal π (x) v − v v 5. Orthogonality 5.4. Inner product spaces

Example Calculate the orthogonal projection of sin x onto Span x C[0, 2π] with the L2-inner product { } ≤ R 2π (x, sin x) x sin x dx π (sin ) = = 0 x x 2 x R 2π x x x2 dx || || 0 2π R 2π x cos x + cos x dx = − 0 0 x 1 3 2π 3 x 0 2π 3 = − x = − x 8 3 4π2 3 π

y 1

0 x π 3π 2 π 2 2π 1 − 5. Orthogonality 5.4. Inner product spaces

Theorem 5.4.5 (Cauchy–Schwarz inequality) (v, w) v w , with equality iff v, w are parallel | | ≤ || || || || Can’t rely on cosine rule like in Rn as currently have no notion of angle Proof. Suppose v = 0, otherwise the Theorem is trivial 6 π (w) and w π (w) are orthogonal, so by Pythagoras’ v − v 2 2 2 2 2 (v, w) w = πv(w) + w πv(w) πv(w) = || || || || || − || ≥ || || v 2 || || Rearranging gives the Theorem: equality is clearly iff w = πv(w) and so iff v, w are parallel 5. Orthogonality 5.4. Inner product spaces Angles in Inner Product Spaces

Cauchy–Schwarz allows us to define the notion of angle Definition 5.4.6 The angle θ between two non-zero vectors v, w in an inner product space is given by

(v, w) cos θ = v w || || || || Can now check that the Cosine rule holds:

v + w 2 = v 2 + w 2 2 v w cos θ || || || || || || − || || || || and, more painfully, that the rule holds also! 5. Orthogonality 5.4. Inner product spaces Norms

Definition 5.4.7 A norm on a real vector space V is a function : V R || || → which satisfies the following axioms: I v 0, v V, with equality iff v = 0 || || ≥ ∀ ∈ II αv = α v , α R, v V || || | | || || ∀ ∈ ∀ ∈ III v + w v + w , v, w V || || ≤ || || || || ∀ ∈ We call (V, ) a normed linear space || ||

Condition III is the triangle inequality: the length of one side of a triangle is at most the sum of the w v + w lengths of the other two sides

v 5. Orthogonality 5.4. Inner product spaces

Theorem 5.4.8 p If (V, ( , )) is an inner product space, then v = (v, v) is a norm || || Proof. I is the identical condition for an inner product p p For II, αv = (αv, αv) = α2(v, v) = α v || || | | || || For III we need the Cauchy–Schwarz inequality:

v + w 2 = v 2 + 2(v, w) + w 2 || || || || || || v 2 + 2 v w + w 2 ≤ || || || || || || || || = ( v + w )2 || || || || 5. Orthogonality 5.4. Inner product spaces The p-norms

These generalize the standard norm on Rn

Definition 5.4.9 Given p 1, the p-norm on Rn is the norm ≥ 1/p n ! p x p := ∑ xi || || i=1 | | The uniform or ∞-norm on Rn is the norm

x ∞ := max xi || || i=1,...,n | |

The 2-norm is the usual notion of length in Rn Only the 2-norm comes from an inner product on Rn: a normed linear space in general has no idea of what the angle between vectors means, only their lengths 5. Orthogonality 5.4. Inner product spaces

The three most common norms are the 1-, 2-, and ∞-norms Example  1  If x = 3 then 1 −

x 1 = 1 + 3 + 1 = 5 || || q | | | | |− | x = 12 + 32 + ( 1)2 = √11 || ||2 − x = max 1 , 3 , 1 = 3 || ||∞ {| | | | |− |} Note that x x x : this is true in generala || ||1 ≥ || ||2 ≥ || ||∞ aSee the homework. . . 5. Orthogonality 5.4. Inner product spaces Lp norms on C[a, b] (non-examinable)

There are also analogues of the p-norms on function spaces Definition 5.4.10 On C[a, b], the Lp-norm (p 1) is given by ≥ Z b 1/p p f p := f (x) dx || || a | |

The uniform or ∞-norm is defined by

f ∞ := max f (x) || || x [a,b] | | ∈ Again only the L2 norm comes from an inner product — in this case the L2 inner product defined earlier 5. Orthogonality 5.5. Orthonormal sets 5.5 Orthonormal sets

Definition 5.5.1

v1,..., vn in an inner product space V are orthogonal iff (v , v ) = 0, i = j i j ∀ 6 v1,..., vn are orthonormal iff ( 1 if i = j (vi, vj) = δij = 0 if i = j 6 Can turn an orthogonal set into an orthonormal set by dividing by the norms:  v v  v ,..., v 1 ,..., n { 1 n} 7→ v v || 1|| || n||

Example Recall 1 , 1 sin x, 1 cos x are orthonormal in (C[0, 2π], L2) √2π √π √π 5. Orthogonality 5.5. Orthonormal sets

Theorem 5.5.2 An non-zero orthogonal set v ,..., v is linearly independent { 1 n} Proof. Suppose that α v + + α v = 0 1 1 ··· n n Then, for each i,

0 = (v , 0) = (v , α v + + α v ) i i 1 1 ··· n n = α (v , v ) + + α (v , v ) + + α (v , v ) 1 i 1 ··· i i i ··· n i n = α v 2 i || i||

Since all αi = 0 we have 5. Orthogonality 5.5. Orthonormal sets Calculating in orthonormal bases

Theorem 5.5.3 Let = u ,..., u be an orthonormal basis of an inner product U { 1 n} space (V, ( , )). Then: n ( ) ! v,.u1 1 v = (v, ui)ui, v V: i.e. [v] = . ∑ U i=1 ∀ ∈ (v,un) n n ! n 2 ∑ aiui, ∑ biui = ∑ aibi i=1 i=1 i=1 2 n n 3 2 ∑ ciui = ∑ ci (Parseval’s formula) i=1 i=1

12 n Everything works as if you are in R with the basis e1,..., en!

12Essentially. . . 5. Orthogonality 5.5. Orthonormal sets

Proof. Since u ,..., u is a basis there exist unique α R such that { 1 n} i ∈ v = α u + + α u 1 1 ··· n n ∴ (v, ui) = αi which proves 1 2 and 3 are straightforward by linearity from 1

With careful caveats, the above formulæ are valid when dim V = ∞: which leads to the example. . . 5. Orthogonality 5.5. Orthonormal sets

Theorem 5.5.4 In C[ π, π] with the scaled L2 inner product − 1 R π (f , g) = π π f (x)g(x) dx, the following infinite set is orthonormal: −  1  , sin x, cos x, sin 2x, cos 2x,... √2

Proof. Just compute : use identities such as 2 sin nx sin mx = cos(n m)x cos(n + m)x, i.e. − − 1 Z π (sin(nx), sin(mx)) = sin(nx) sin(mx) dx π π − 1 Z π = cos(n m)x cos(n + m)x dx 2π π − − − 1 Z π = cos(n m)x dx = δmn 2π π − − 5. Orthogonality 5.5. Orthonormal sets

Parseval’s formula makes some calculations extremely easy

Example 1 , cos 2x are orthonormal with respect to previous inner √2 product and

1 1 1 1 cos2 x = (1 + cos 2x) = + cos 2x 2 √2 · √2 2 Hence

Z π  2  2 1 2 1 1 3 cos4 x dx = cos2 x = + = π π √2 2 4 − Z π 4 3π ∴ cos x dx = π 4 − 5. Orthogonality 5.5. Orthonormal sets Least squares approximations

Orthogonal projections Least squares approximations Theorem 5.5.5

Let u1,..., un be orthonormal in V and let S = Span(u1,..., un) Then the orthogonal projection πS : V S onto S is n → πS(v) = ∑(v, ui)ui, v V i=1 ∀ ∈

Proof.

πS is certainly linear (property III of the inner product) Moreover, for each i, (v π (v), u ) = (v, u ) (v, u ) = 0 − S i i − i = v π (v) S⊥ ⇒ − S ∈ π (v) + (v π (v)) is the unique decomposition of v into ∴ S − S S, S⊥ parts 5. Orthogonality 5.5. Orthonormal sets

Corollary 5.5.6

πS(v) is the closest element of S to v

The proof is exactly the same as Theorem 5.3.2

Definition 5.5.7 Given v V we call π (v) the least-squares approximation of v ∈ S by S

Least squares approximations often used to find approximations to complicated functions by simpler ones. . . 5. Orthogonality 5.5. Orthonormal sets

Example q 1 , 3 x are orthonormal in (C[ 1, 1], L2) √2 2 − The least-squares approximation to f (x) = ex by a linear polynomial on the interval [ 1, 1] is therefore − y 3

  ! 1 1 √3 √3 2 ex ex, + ex, x x ≈ √2 · √2 √2 √2 1 Z 1 3 Z 1 = ex dx + xex dx x 1 2 1 2 1 · − − 1 1 1 = (e e− ) + 3e− x 2 − 101 − x Different interval different linear approximation. . . 5. Orthogonality 5.5. Orthonormal sets Fourier Series

Least-squares mostly works for projection onto infinite sets n o Recall: = 1 , sin x, cos x, sin 2x, cos 2x,... is orthonormal U √2 1 R π with respect to (f , g) = π π f (x)g(x) dx − Definition 5.5.8 Suppose f has period 2π. The Fourier Series of f is its orthogonal projection onto Span U  1  1 ∞ (f )(x) = , f (x) + ∑ (sin nx, f (x)) sin nx F √2 √2 n=1 ∞ + ∑ (cos nx, f (x)) cos nx n=1

if the infinite sum convergesa

aBeyond this course 5. Orthogonality 5.5. Orthonormal sets Example Let f (x) = x on [ π, π] extended periodically   − Then 1 , x = 0 = (cos nx, x) for all n, since x is odd √2 Moreover Z π 1 1 π 2 n+1 (sin nx, x) = x sin nx dx = [x cos nx] π = ( 1) π π −nπ − n − − Thus ∞ 2 2 (f )(x) = ∑ ( 1)n+1 sin nx = 2 sin x sin 2x + sin 3x F n=1 n − − 3 − · · · 5. Orthogonality 5.5. Orthonormal sets Example Similarly the Fourier series of f (x) = x2 on [ π, π] is − π2 ∞ ( 1)n (f )(x) = + 4 ∑ − 2 cos nx F 3 n=1 n π2 4 = 4 cos x + cos 2x cos 3x + 3 − − 9 ··· 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization 5.6 Gram–Schmidt Orthogonalisation

Orthogonal13 bases are useful: how to find them? Answer: Use projections

Example

Let x , x be a basis of R2 { 1 2} Linear independence = x = π (x ) ⇒ 2 6 x1 2 Moreover x2 x2 πx1 (x2) x π (x ) x − 2 − x1 2 ⊥ 1 2 ∴ x1, x2 πx1 (x2) is an orthogonal basis of R { − } x1 We have orthogonalized the basis x1, x2 π (x ) { } x1 2 The Gram–Schmidt algorithm does this in general, in any inner product space

13And orthonormal 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

Theorem 5.6.1 (Gram–Schmidt) Let x ,..., x be a basis of an inner product space (V, ( , )) { 1 n} Define vectors vi recursively by v1 = x1 and

k

vk+1 = xk+1 ∑ πvi (xk+1) − i=1

Then v ,..., v is an orthogonal basis of V { 1 n} If desired, can easily form an orthonormal basis

 v v  u ,..., u = 1 ,..., n { 1 n} v v || 1|| || n|| 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

Proof. Fix k < n and suppose that v ,..., v is an orthogonal basis { 1 k} of Span(x1,..., xk) Observe: v = x ∑k π (x ) Span(x ,..., x ) k+1 k+1 − i=1 vi k+1 ∈ 1 k+1 If i k, then (v , v ) = 0 ≤ k+1 i Hence v ,..., v is an orthogonal (hence linearly { 1 k+1} independent) spanning set of Span(x1,..., xk+1) I.e. v ,..., v is an orthogonal basis of Span(x ,..., x ) { 1 k+1} 1 k+1 The result follows by induction 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

Example n 1   3   0 o Orthonormalize the basis 0 , 0 , 2 of R3 1 1 1

Label the vectors in order x1, x2, x3 and apply Gram–Schmidt:  1  v1 = x1 = 0 1  1   3  0 , 0  3  1 1  1   1  v2 = x2 πv (x2) = 0 0 = 0 − 1 1 −  1  2 1 1 0 − 1  0  1  1  1  1   0  v3 = x3 πv (x3) πv (x3) = 2 0 − 0 = 2 1 2 1 2 1 2 1 0 − − − − − v , v , v is an orthogonal basis, whence { 1 2 3}   1  1  1  1   0  0 , 0 , 1 √ 1 √ 1 0 2 2 − is an orthonormal basis 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

Gram–Schmidt is an algorithm which depends on the order of the inputs x1,..., xn Example n 0   1   3 o The Gram–Schmidt orthonormalization of 2 , 0 , 0 1 1 1 is   1  0  1  5  1  2  2 , 2 , 1 1 −4 3 2 √5 3√5 − completely different from the previous example 5. Orthogonality 5.6. Gram–Schmidt Orthogonalization

Example Find an orthonormal basis of Span(1, x, x2) in (C[ 1, 1], L2) − R 1 (1, x) 1 x dx v1 = 1 v2 = x 1 = x − = x − 2 · − R 1 2 1 1 1 dx || || − 2 2 2 (1, x ) (x, x ) v3 = x 1 x − 1 2 · − x 2 · || || || || R 1 2 R 1 3 2 1 x dx 1 x dx 2 1 = x − − x = x − R 1 2 − R 1 2 − 3 1 1 dx 1 x dx − − To normalize, divide through by norms: ( r r ) 1 3 45  1 , x, x2 √2 2 8 − 3