<<

MATH 532: Chapter 5: Norms, Inner Products and

Greg Fasshauer

Department of Applied Illinois Institute of Technology

Spring 2015

[email protected] MATH 532 1 Outline

1 Vector Norms

2 Norms

3 Inner Spaces

4 Orthogonal Vectors

5 Gram–Schmidt & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 2 Vector Norms

Vector[0] Norms

1 Vector Norms

2 Matrix Norms Definition 3 Inner Product Spaces Let x, y ∈ Rn (Cn). Then 4 Orthogonal Vectors n T X 5 Gram–Schmidt Orthogonalization & QRx Factorizationy = xi yi ∈ R i=1 6 Unitary and Orthogonal Matrices n X ∗ = ¯ ∈ 7 Orthogonal Reduction x y xi yi C i=1 8 Complementary Subspaces is called the standard inner product for Rn (Cn). 9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 4 Vector Norms

Definition

Let V be a . A k · k : V → R≥0 is called a provided for any x, y ∈ V and α ∈ R

1 kxk ≥ 0 and kxk = 0 if and only if x = 0,

2 kαxk = |α| kxk,

3 kx + yk ≤ kxk + kyk.

Remark The inequality in (3) is known as the inequality.

[email protected] MATH 532 5 Vector Norms

Remark Any inner product h·, ·i induces a norm via (more later) p kxk = hx, xi.

We will show that the standard inner product induces the Euclidean norm (cf. length of a vector).

Remark Inner products let us define via

xT y cos θ = . kxkkyk

In particular, x, y are orthogonal if and only if xT y = 0.

[email protected] MATH 532 6 Vector Norms

Example Let x ∈ Rn and consider the Euclidean norm √ T kxk2 = x x n !1/2 X 2 = xi . i=1

We show that k · k2 is a norm. We do this for the real case, but the complex case goes analogously.

1 Clearly, kxk2 ≥ 0. Also,

2 kxk2 = 0 ⇐⇒ kxk2 = 0 n X 2 ⇐⇒ xi = 0 ⇐⇒ xi = 0, i = 1,..., n, i=1 ⇐⇒ x = 0.

[email protected] MATH 532 7 Vector Norms Example (cont.) 2 We have

n !1/2 n !1/2 X 2 X 2 kαxk2 = (αxi ) = |α| xi = |α| kxk2. i=1 i=1

3 To establish (3) we need Lemma Let x, y ∈ Rn. Then

T |x y| ≤ kxk2kyk2. (Cauchy–Schwarz–Bunyakovsky)

Moreover, equality holds if and only if y = αx with

xT y α = . 2 kxk2

[email protected] MATH 532 8 Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

Using trigonometry as in the figure, the projection of a onto b is then b aT b b aT b kak cos θ = kak = b. kbk kakkbk kbk kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

[email protected] MATH 532 9 Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx) = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots. [email protected] MATH 532 10 Vector Norms

Proof (cont.) Now we look at the equality claim.

T “=⇒”: Let’s assume that x y = kxk2kyk2. But then the first part of the proof shows that ky − αxk2 = 0 so that y = αx.

“⇐=”: Let’s assume y = αx. Then

T T 2 x y = x (αx) = |α|kxk2 2 kxk2kyk2 = kxk2kαxk2 = |α|kxk2, so that we have equality. 

[email protected] MATH 532 11 Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

[email protected] MATH 532 12 Vector Norms

Lemma Let x, y ∈ Rn. Then we have the backward triangle inequality

| kxk − kyk | ≤ kx − yk.

Proof We write tri.ineq. kxk = kx − y + yk ≤ kx − yk + kyk. But this implies kxk − kyk ≤ kx − yk.

[email protected] MATH 532 13 Vector Norms

Proof (cont.) Switch the roles of x and y to get

kyk − kxk ≤ ky − xk ⇐⇒ − (kxk − kyk) ≤ kx − yk.

Together with the previous inequality we have

|kxk − kyk| ≤ kx − yk.



[email protected] MATH 532 14 Vector Norms Other common norms

`1-norm (or taxi-cab norm, Manhattan norm): n X kxk1 = |xi | i=1

`∞-norm (or maximum norm, Chebyshev norm):

kxk∞ = max |xi | 1≤i≤n

`p-norm: n !1/p X p kxkp = |xi | i=1

Remark In the homework you will use Hölder’s and Minkowski’s inequalities to show that the p-norm is a norm. [email protected] MATH 532 15 Vector Norms

Remark We now show that kxk∞ = lim kxkp. p→∞ Let’s use tildes to mark all components of x that are maximal, i.e..

x˜1 = x˜2 = ... = x˜k = max |xi |. 1≤i≤n

The remaining components are then x˜k+1,..., x˜n. This implies that

x˜ i < 1, for i = k + 1,..., n. x˜1

[email protected] MATH 532 16 Vector Norms

Remark (cont.) Now

n !1/p X p kxkp = |x˜i | i=1  1/p  x˜ p x˜ p ˜  k+1 n  = |x1| k + + ... +  .  x˜1 x˜1  | {z } |{z} <1 <1

Since the terms inside the parentheses — except for k — go to 0 for p → ∞, (·)1/p → 1 for p → ∞. And so kxkp → |x˜1| = max |xi | = kxk∞. 1≤i≤n

[email protected] MATH 532 17 Vector Norms

2 Figure: Unit “balls” in R for the `1, `2 and `∞ norms.

Note that B1 ⊆ B2 ⊆ B∞ since, e.g., √ ! √ ! √ ! √ 2 √ 2 q 2 2 √2 = 2, √2 = 1 + 1 = 1, √2 = , 2 2 2 2 2 2 2 1 2 2 2 ∞ √ ! √ ! √ ! 2 2 2 so that √2 ≥ √2 ≥ √2 . 2 2 2 2 1 2 2 2 ∞ In fact, we have in general (similar to HW) n kxk1 ≥ kxk2 ≥ kxk∞, for any x ∈ R .

[email protected] MATH 532 18 Vector Norms Norm equivalence Definition Two norms k · k and k · k0 on a vector space V are called equivalent if there exist constants α, β such that

kxk α ≤ ≤ β for all x(6= 0) ∈ V. kxk0

Example

k · k1 and√ k · k2 are equivalent since from above kxk1 ≥ kxk2 and also kxk1 ≤ nkxk2 (see HW) so that kxk √ α = 1 ≤ 1 ≤ n = β. kxk2

Remark In fact, all norms on finite-dimensional vector spaces are equivalent.

[email protected] MATH 532 19 Matrix Norms

Matrix[0] norms are special norms — they will satisfy one additional property1 Vector Norms. This property should help us measure kABk for two matrices A, B of appropriate2 Matrix Norms sizes.

We3 lookInner Product at the Spaces simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n: 4 Orthogonal Vectors 1/2  m n  m !1/2 5 Gram–Schmidt OrthogonalizationX & QRX Factorization2 X 2 kAkF =  |aij |  = kAi∗k2 6 Unitary and Orthogonal Matricesi=1 j=1 i=1  1/2 7 Orthogonal Reduction n q X 2 T =  kA∗j k2 = trace(A A), 8 Complementary Subspaces j=1

9 Orthogonal Decomposition i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements10 Singular Value of the Decomposition matrix.

11 Orthogonal Projections [email protected] MATH 532 21 Matrix Norms Now

m 2 X 2 kAxk2 = |Ai∗x| i=1 m CSB X 2 2 ≤ kAi∗k2 kxk2 i=1 | {z } 2 =kAkF so that kAxk2 ≤ kAkF kxk2.

We can generalize this to matrices, i.e., we have

kABkF ≤ kAkF kBkF , which motivates us to require this submultiplicativity for any matrix norm. [email protected] MATH 532 22 Matrix Norms

Definition A matrix norm is a function k · k from the set of all real (or complex) matrices of finite size into R≥0 that satisfies 1 kAk ≥ 0 and kAk = 0 if and only if A = O (a matrix of all zeros). 2 kαAk = |α|kAk for all α ∈ R. 3 kA + Bk ≤ kAk + kBk (requires A, B to be of same size). 4 kABk ≤ kAkkBk (requires A, B to have appropriate sizes).

Remark This definition is usually too general. In to the Frobenius norm, most useful matrix norms are induced by a vector norm.

[email protected] MATH 532 23 Matrix Norms Induced matrix norms

Theorem m n Let k · k(m) and k · k(n) be vector norms on R and R , respectively, and let A be an m × n matrix. Then

kAk = max kAxk(m) kxk(n)=1

is a matrix norm called the induced matrix norm.

Remark Here the vector norm could be any vector norm. In particular, any p-norm. For example, we could have

kAk2 = max kAxk2,(m). kxk2,(n)=1

To keep notation simple we often drop indices.

[email protected] MATH 532 24 Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

[email protected] MATH 532 25 Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

[email protected] MATH 532 26 Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk = kAByk (for some y with kyk = 1) kxk=1 (1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

 [email protected] MATH 532 27 Matrix Norms Remark One can show (see HW) that — if A is invertible —

1 min kAxk = . kxk=1 kA−1k

The induced matrix norm can be interpreted geometrically: kAk: the most a vector on the unit can be stretched when transformed by A. 1 kA−1k : the most a vector on the unit sphere can be shrunk when transformed by A.

[email protected] MATH 532 28

Figure: induced matrix 2-norm from [Mey00]. Matrix Norms Matrix 2-norm

Theorem Let A be an m × n matrix. Then p 1 kAk2 = max kAxk2 = λmax. kxk=1

−1 1 1 2 kA k2 = = √ . minkxk=1 kAxk2 λmin T where λmax and λmin are the largest and smallest eigenvalues of A A, respectively.

Remark We also have p λmax = σ1, the largest singular value of A, p λmin = σn, the smallest singular value of A.

[email protected] MATH 532 29 Matrix Norms

Proof We will show only (1), the largest singular value ((2) goes similarly).

The idea is to solve a constrained optimization problem (as in calculus), i.e.,

2 T maximize f (x) = kAxk2 = (Ax) Ax 2 T subject to g(x) = kxk2 = x x = 1.

We do this by introducing a Lagrange multiplier λ and define

h(x, λ) = f (x) − λg(x) = xT AT Ax − λxT x.

[email protected] MATH 532 30 Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   ∂xT ∂x ∂xT ∂x xT AT Ax − λxT x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

[email protected] MATH 532 31 Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 = max kAxk2 kxk =1 2 2 kxk2=1 √ p = max λ = λmax.



[email protected] MATH 532 32 Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1 T 2 kAk2 = kA k2

3 T 2 T kA Ak2 = kAk2 = kAA k2   AO 4 = max {kAk2, kBk2} OB T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

Remark The proof is a HW problem.

[email protected] MATH 532 33 Matrix Norms Matrix 1-norm and ∞-norm

Theorem Let A be an m × n matrix. Then we have 1 the column sum norm

m X kAk1 = max kAxk1 = max |aij |, kxk =1 j=1,...,n 1 i=1

2 and the row sum norm

n X kAk∞ = max kAxk∞ = max |aij |. kxk∞=1 i=1,...,m j=1

Remark We know these are norms, so what we need to do is verify that the formulas hold. We will show (1).

[email protected] MATH 532 34 Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

[email protected] MATH 532 35 Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

m X kAxk1 = kAek k1 = kA∗k k1 = |aik | i=1 m X = max |aik | j=1...,n i=1 due to our choice of k. Since kek k1 = 1 we indeed have the desired formula. 

[email protected] MATH 532 36 Inner Product Spaces

Definition[0] A1 generalVector Norms inner product in a real (complex) vector space V is a symmetric (Hermitian) h·, ·i : V × V → ( ), i.e., 2 Matrix Norms R C 1 hx, xi ∈ R≥0 with hx, xi = 0 if and only if x = 0. 3 Inner Product Spaces 2 hx, αyi = αhx, yi for all scalars α. 4 3 Orthogonalhx, y + Vectorszi = hx, yi + hx, zi.

54 Gram–Schmidthx, yi = Orthogonalizationhy, xi (or h &x QR, y Factorizationi = hy, xi if complex).

6 Unitary and Orthogonal Matrices Remark The7 Orthogonal following Reduction two properties (providing bilinearity) are implied (see HW) 8 Complementary Subspaces

9 Orthogonal Decomposition hαx, yi = αhx, yi hx + y, zi = hx, zi + hy, zi. 10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 38 Inner Product Spaces

As before, any inner product induces a norm via

k · k = ph·, ·i.

One can show (analogous to the Euclidean case) that k · k is a norm.

In particular, we have a general Cauchy–Schwarz–Bunyakovsky inequality |hx, yi| ≤ kxk kyk.

[email protected] MATH 532 39 Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn). 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

× × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

with q p T kAk = hA, Ai = trace(A A) = kAkF .

[email protected] MATH 532 40 Inner Product Spaces

Remark In the infinite-dimensional setting we have, e.g., for f , g continuous functions on (a, b) Z b hf , gi = f (t)g(t)dt a with 1/2 Z b ! p 2 kf k = hf , f i = (f (t)) dt . a

[email protected] MATH 532 41 Inner Product Spaces Parallelogram identity

In any the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

This is true since

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi   = 2hx, xi + 2hy, yi = 2 kxk2 + kyk2 .

[email protected] MATH 532 42 Inner Product Spaces identity

The following theorem shows that we not only get a norm from an inner product (i.e., every is a Banach space), but — if the parallelogram identity holds — then we can get an inner product from a norm (i.e., a Banach space becomes a Hilbert space).

Theorem Let V be a real vector space with norm k · k. If the parallelogram identity (2) holds then

1   hx, yi = kx + yk2 − kx − yk2 (3) 4 is an inner product on V.

[email protected] MATH 532 43 Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

1   1 hx, xi = kx + xk2 − kx − xk2 = k2xk2 = kxk2 ≥ 0. 4 4 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 : hx, yi = hy, xi is clear since kx − yk = ky − xk.

[email protected] MATH 532 44 Inner Product Spaces

Proof (cont.) 3 Additivity: The parallelogram identity implies

1   kx + yk2 + kx + zk2 = kx + y + x + zk2 + ky − zk2 . (4) 2 and 1   kx − yk2 + kx − zk2 = kx − y + x − zk2 + kz − yk2 . (5) 2 Subtracting (5) from (4) we get

kx + yk2 − kx − yk2+kx + zk2 − kx − zk2 1   = k2x + y + zk2 − k2x − y − zk2 . 2 (6)

[email protected] MATH 532 45 Inner Product Spaces

Proof (cont.) The specific form of the polarized inner product implies

1   hx, yi + hx, zi = kx + yk2 − kx − yk2 + kx + zk2 − kx − zk2 4 (6) 1   = k2x + y + zk2 − k2x − y − zk2 8 2 2! 1 y + z y + z = x + − x − 2 2 2 polarization y + z = 2hx, i. (7) 2 Setting z = 0 in (7) yields y hx, yi = 2hx, i (8) 2 since hx, zi = 0.

[email protected] MATH 532 46 Inner Product Spaces

Proof (cont.) To summarize, we have y + z hx, yi + hx, zi = 2hx, i. (7) 2 and y hx, yi = 2hx, i. (8) 2 Since (8) is true for any y ∈ V we can, in particular, set y = y + z so that we have y + z hx, y + zi = 2hx, i. 2 This, however, is the right-hand side of (7) so that we end up with

hx, y + zi = hx, yi + hx, zi, as desired.

[email protected] MATH 532 47 Inner Product Spaces

Proof (cont.) 2 : To show hx, αyi = αhx, yi for integer α we can just repeatedly apply the additivity property just proved.

From this we can get the property for rational α as follows. β We let α = γ with integer β, γ 6= 0 so that β βγhx, yi = hγx, βyi = γ2hx, yi. γ

Dividing by γ2 we get β β hx, yi = hx, yi. γ γ

[email protected] MATH 532 48 Inner Product Spaces

Proof (cont.) Finally, for real α we use the continuity of the norm function (see HW) which implies that our inner product h·, ·i also is continuous.

Now we take a sequence {αn} of rational numbers such that αn → α for n → ∞ and have — by continuity

hx, αnyi → hx, αyi as n → ∞.



[email protected] MATH 532 49 Inner Product Spaces

Theorem The only vector p-norm induced by an inner product is the 2-norm.

Remark Since many problems are more easily dealt with in inner product spaces (since we then have lengths and angles, see next section) the 2-norm has a clear advantage over other p-norms.

[email protected] MATH 532 50 Inner Product Spaces

Proof We know that the 2-norm does induce an inner product, i.e., √ T kxk2 = x x.

Therefore we need to show that it doesn’t work for p 6= 2. We do this by showing that the parallelogram identity   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 fails for p 6= 2. We will do this for 1 ≤ p < ∞. You will work out the case p = ∞ in a HW problem.

[email protected] MATH 532 51 Inner Product Spaces

Proof (cont.)

All we need is a counterexample, so we take x = e1 and y = e2 so that

n !2/p 2 2 X p 2/p kx + ykp = ke1 + e2kp = |[e1 + e2]i | = 2 i=1 and, similarly 2 2 2/p kx − ykp = ke1 − e2kp = 2 . Together, the left-hand side of the parallelogram identity is 2 22/p = 22/p+1.

[email protected] MATH 532 52 Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 2 2 kxkp = ke1kp = 1 = ke2kp = kykp, so that the right-hand side comes out to 4. Finally, we have

2 2 22/p+1 = 4 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



[email protected] MATH 532 53 Orthogonal Vectors

Orthogonal[0] Vectors

1 Vector Norms

2 We willMatrix now Norms work in a general inner product space V with induced norm3 Inner Product Spaces k · k = ph·, ·i. 4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization Definition Two6 Unitary vectors and Orthogonalx, y ∈ Matrices V are called orthogonal if

7 Orthogonal Reduction hx, yi = 0. 8 Complementary Subspaces We often use the notation x ⊥ y. 9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 55 Orthogonal Vectors In the HW you will prove the for the 2-norm and standard inner product xT y, i.e., kxk2 + kyk2 = kx − yk2 ⇐⇒ xT y = 0. Moreover, the law of cosines states

kx − yk2 = kxk2 + kyk2 − 2kxkkyk cos θ, so that

2 2 2 T kxk + kyk − kx − yk Pythagoras 2x y cos θ = = . 2kxkkyk 2kxkkyk

This motivates our general definition of angles: Definition Let x, y ∈ V. The angle between x and y is defined via

hx, yi cos θ = , θ ∈ [0, π]. kxkkyk

[email protected] MATH 532 56 Orthogonal Vectors Orthonormal sets

Definition

A set {u1, u2,..., un} ⊆ V is called orthonormal if

hui , uj i = δij ().

Theorem Every orthonormal set is linearly independent.

Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an orthonormal for V.

[email protected] MATH 532 57 Orthogonal Vectors

Proof (of the theorem) We want to show , i.e., that

n X αj uj = 0 =⇒ αj = 0, j = 1,..., n. j=1

To see this is true we take the inner product with ui :

n X hui , αj uj i = hui , 0i j=1 n X ⇐⇒ αj hui , uj i = 0 ⇐⇒ αi = 0. j=1 | {z } =δij

Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence. 

[email protected] MATH 532 58 Orthogonal Vectors

Example The standard of Rn is given by

{e1, e2,..., en}.

Using this basis we can express any x ∈ Rn as

x = x1e1 + x2e2 + ... + xnen, we get a coordinate expansion of x.

[email protected] MATH 532 59 Orthogonal Vectors

In fact, any other orthonormal basis provides just as simple a representation of x;

Consider the orthonormal basis B = {u1, u2,..., un} and assume

n X x = αj uj j=1 for some appropriate scalars αj . To find these expansion coefficients αj we take the inner product with ui , i.e., n X hui , xi = αj hui , uj i = αi . j=1 | {z } =δij

[email protected] MATH 532 60 Orthogonal Vectors

We therefore have proved Theorem

Let {u1, u2,..., un} be an orthonormal basis for an inner product space V. Then any x ∈ V can be written as

n X x = hx, ui iui . j=1

This is a (finite) Fourier expansion with Fourier coefficients hx, ui i.

[email protected] MATH 532 61 Orthogonal Vectors

Remark The classical (infinite-dimensional) for continuous functions on (−π, π) uses the orthogonal (but not yet orthonormal) basis {1, sin t, cos t, sin 2t, cos 2t,..., } and the inner product Z π hf , gi = f (t)g(t)dt. −π

[email protected] MATH 532 62 Orthogonal Vectors

Example Consider the basis        1 0 1  B = {u1, u2, u3} = 0 , 1 ,  0  .  1 0 −1 

It is clear by inspection that B is an orthogonal subset of R3, i.e., using T the Euclidean inner product, we have ui uj = 0, i, j = 1, 2, 3, i 6= j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by ui computing v i = , i = 1, 2, 3. kui k2 This yields

1 0  1  1 1 v 1 = √ 0 , v 2 = 1 , v 3 = √  0  . 2 1 0 2 −1

[email protected] MATH 532 63 Orthogonal Vectors

Example (cont.) T The Fourier expansion of x = 1 2 3 is given by

3 X  T  x = x v i v i i=1 1 0  1  4 1 2 1 = √ √ 0 + 2 1 − √ √  0  2 2 1 0 2 2 −1 2 0 −1 = 0 + 2 +  0  . 2 0 1

[email protected] MATH 532 64 Gram–Schmidt Orthogonalization & QR Factorization

[0]

1 Vector Norms

2 Matrix Norms

We3 wantInner Product to convert Spaces an arbitrary basis {x1, x2,..., xn} of V to an orthonormal basis {u1, u2,..., un}. 4 Orthogonal Vectors Idea: construct u1, u2,..., un successively so that 5 Gram–Schmidt Orthogonalization & QR Factorization {u1, u2,..., uk } is an ON basis for span{x1, x2,..., xk },

6 Unitary and Orthogonalk = 1 Matrices,..., n.

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 66 Gram–Schmidt Orthogonalization & QR Factorization Construction

k = 1: x1 u1 = . kx1k

k = 2: Consider the projection of x2 onto u1, i.e.,

hu1, x2iu1.

Then v 2 = x2 − hu1, x2iu1 and v 2 u2 = . kv 2k

[email protected] MATH 532 67 Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i This vector, however is not yet normalized.

[email protected] MATH 532 68 Gram–Schmidt Orthogonalization & QR Factorization

We now want kuk+1k = 1, i.e., r v k+1 v k+1 1 h , i = kv k+1k = 1 huk+1, xk+1i huk+1, xk+1i |huk+1, xk+1i| k X =⇒ kv k+1k = kxk+1 − hui , xk+1iui k = |huk+1, xk+1i|. i=1 Therefore

k X huk+1, xk+1i = ±kxk+1 − hui , xk+1iui k. i=1 Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign.

[email protected] MATH 532 69 Gram–Schmidt Orthogonalization & QR Factorization Gram–Schmidt algorithm

Summarizing, we have

x1 u1 = , kx1k k−1 X v k = xk − hui , xk iui , k = 2,..., n, i=1 v k uk = . kv k k

[email protected] MATH 532 70 Gram–Schmidt Orthogonalization & QR Factorization Using matrix notation to describe Gram–Schmidt

We will assume V ⊆ Rm (but this also works in the complex case). Let 0  .  m U1 =  .  ∈ R 0 and for k = 2, 3,..., n let

 m×k−1 Uk = u1 u2 ··· uk−1 ∈ R .

[email protected] MATH 532 71 Gram–Schmidt Orthogonalization & QR Factorization

Then  T  u1 xk T  u xk  UT x =  2  k k  .   .  T uk−1xk and

 T  u1 xk T  u xk  U UT x = u u ··· u   2  k k k 1 2 k−1  .   .  T uk−1xk k−1 k−1 X T X T = ui (ui xk ) = (ui xk )ui . i=1 i=1

[email protected] MATH 532 72 Gram–Schmidt Orthogonalization & QR Factorization

Now, Gram–Schmidt says

k−1 X T T v k = xk − (ui xk )ui = xk − Uk Uk xk i=1  T  = I − Uk Uk xk , k = 1, 2,..., n, where the case k = 1 is also covered by the special definition of U1. Remark T T Uk Uk is a projection matrix, and I − Uk Uk is a complementary projection. We will cover these later.

[email protected] MATH 532 73 Gram–Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram–Schmidt)

Consider an m × n matrix A with (A) = n.

We want to convert the set of columns of A, {a1, a2,..., an} to an ON basis {q1, q2,..., qn} of R(A). From our discussion of Gram–Schmidt we know

a1 q1 = , ka1k k−1 X v k = ak − hqi , ak iqi , k = 2,..., n, i=1 v k qk = . kv k k

[email protected] MATH 532 74 Gram–Schmidt Orthogonalization & QR Factorization We now rewrite as follows:

a1 = ka1kq1

ak = hq1, a2iq1 + ... + hqk−1, ak iqk−1 + kv k kqk , k = 2,..., n. We also introduce the new notation

r11 = ka1k, rkk = kv k k, k = 2,..., n. Then  A = a1 a2 ··· an   r11 hq1, a2i · · · hq1, ani  r22 · · · hq2, ani = q q ··· q    1 2 n  .. .  | {z }  . .  =Q O rnn | {z } =R and we have the reduced QR factorization of A. [email protected] MATH 532 75 Gram–Schmidt Orthogonalization & QR Factorization

Remark The matrix Q is m × n with orthonormal columns

The matrix R is n × n upper triangular with positive diagonal entries.

The reduced QR factorization is unique (see HW).

[email protected] MATH 532 76 Gram–Schmidt Orthogonalization & QR Factorization

Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1   1 √ a1 1 q = = √ 0 , r11 = ka1k = 2 1 ka k 1 2 1 √  T  T 2 v 2 = a2 − q a2 q1, q a2 = √ = 2 = r12 1 1 2 2 √ 1  1  2 √ = 1 − √ 0 =  1  , kv 2k = 3 = r22 0 2 1 −1  1  v 2 1 =⇒ q = = √  1  2 kv k 2 3 −1

[email protected] MATH 532 77 Gram–Schmidt Orthogonalization & QR Factorization

Example (cont.)

 T   T  v 3 = a3 − q1 a3 q1 − q2 a3 q2 with T 1 T q a3 = √ = r13, q a3 = 0 = r23 1 2 2 Thus 0 1 −1 √ 1 1 6 v 3 = 1 − √ √ 0 − 0 =  2  , kv 3k = = r33 2 2 1 2 2 1 1

So −1 v 3 1 q = = √  2  3 kv k 3 6 1

[email protected] MATH 532 78 Gram–Schmidt Orthogonalization & QR Factorization

Example (cont.) Together we have

 1 1 1  √ √  √ √ − √ 2 2 √1 2 3 6 √ 2 Q =  0 √1 √2  , R =    3 6   0 3√ 0  √1 − √1 √1 0 0 6 2 3 6

[email protected] MATH 532 79 Gram–Schmidt Orthogonalization & QR Factorization Solving linear systems with the QR factorization

Recall the use of the LU factorization to solve Ax = b. Now, A = QR implies

Ax = b ⇐⇒ QRx = b.

In the special case of a nonsingular n × n matrix A the matrix Q is also n × n with ON columns so that

Q−1 = QT (since QT Q = I)

and QRx = b ⇐⇒ Rx = QT b.

[email protected] MATH 532 80 Gram–Schmidt Orthogonalization & QR Factorization

Therefore we solve Ax = b by the following steps: 1 Compute A = QR. 2 Compute y = QT b. 3 Solve the upper triangular system Rx = y.

Remark This procedure is comparable to the three-step LU solution procedure.

[email protected] MATH 532 81 Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

[email protected] MATH 532 82 Gram–Schmidt Orthogonalization & QR Factorization

Remark This is the same as the QR factorization applied to a square and consistent system Ax = b.

Summary The QR factorization provides a simple and efficient way to solve least squares problems.

The ill-conditioned matrix AT A is never computed.

If it is required, then it can be computed from R as RT R (in fact, this is the Cholesky factorization) of AT A.

[email protected] MATH 532 83 Gram–Schmidt Orthogonalization & QR Factorization Modified Gram–Schmidt

There is still a problem with the QR factorization via Gram–Schmidt:

it is not numerically stable (see HW).

A better — but still not ideal — approach is provided by the modified Gram–Schmidt algorithm.

Idea: rearrange the order of calculation, i.e., write the projection matrices k−1 T X T Uk Uk = ui ui i=1 as a sum of rank-1 projections.

[email protected] MATH 532 84 Gram–Schmidt Orthogonalization & QR Factorization MGS Algorithm

x1 k=1: u1 ← , uj ← xj , j = 2,..., n kx1k

for k = 2 : n

T Ek = I − uk−1uk−1 for j = k,..., n

uj ← Ek uj uk uk ← kuk k

[email protected] MATH 532 85 Gram–Schmidt Orthogonalization & QR Factorization

Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better.

Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later).

Householder reflections are also more efficient than MGS.

[email protected] MATH 532 86 Unitary and Orthogonal Matrices

Unitary[0] and Orthogonal Matrices

1 Vector Norms Definition A2 realMatrix (complex) Norms n × n matrix is called orthogonal(unitary) if its n n columns3 Inner Product form Spaces an orthonormal basis for R (C ).

4 TheoremOrthogonal Vectors

Let5 UGram–Schmidtbe an orthogonal Orthogonalization & n QR× Factorizationn matrix. Then

1 6 UnitaryU has and Orthogonalorthonormal Matrices rows. 2 U−1 = UT . 7 Orthogonal Reduction n 3 kUxk2 = kxk2 for all x ∈ R , i.e., U is an isometry. 8 Complementary Subspaces Remark 9 Orthogonal Decomposition Analogous properties for unitary matrices are formulated and proved in [Mey00].10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 88 Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

T ui ⊥ uj ⇐⇒ ui uj = δij  T  ⇐⇒ U U = δij ij ⇐⇒ UT U = I.

But UT U = I implies UT = U−1.

1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

[email protected] MATH 532 89 Unitary and Orthogonal Matrices

Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x ∈ Rn

2 T kUxk2 = (Ux) (Ux) = xT UT U x |{z} =I 2 = kxk2.



[email protected] MATH 532 90 Unitary and Orthogonal Matrices

Remark n The converse of (3) is also true, i.e., if kUxk2 = kxk2 for all x ∈ R then U must be orthogonal. Consider x = ei . Then

2 T (3) 2 kUei k2 = ui ui = kei k2 = 1, so the columns of U have norm 1. Moreover, for x = ei + ej (i 6= j) we get

2 T T T T (3) 2 kU(ei + ej )k2 = ui ui +ui uj + uj ui + uj uj = kei + ej k2 = 2, | {z } | {z } =1 =1

T so that ui uj = 0 for i 6= j and the columns of U are orthogonal.

[email protected] MATH 532 91 Unitary and Orthogonal Matrices

Example The simplest is the identity matrix I. Permutation matrices are orthogonal, e.g.,

1 0 0 P = 0 0 1 0 1 0

In fact, for permutation matrices we even have PT = P so that PT P = P2 = I. Such matrices are called involutary (see pretest). An orthogonal matrix can be viewed as a unitary matrix, buta unitary matrix may not be orthogonal. For example for

1  1 i A = √ 2 −1 i

we have A∗A = AA∗ = I, but AT A 6= I 6= AAT .

[email protected] MATH 532 92 Unitary and Orthogonal Matrices Elementary Orthogonal Projectors

Definition A matrix Q of the form

T n Q = I − uu , u ∈ R , kuk2 = 1,

is called an elementary orthogonal projection.

Remark Note that Q is not an orthogonal matrix:

QT = (I − uuT )T = I − uuT = Q.

[email protected] MATH 532 93 Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

QT Q above= Q2 = (I − uuT )(I − uuT ) = I − 2uuT + u uT u uT |{z} =1 = (I − uuT ) = Q.

[email protected] MATH 532 94 Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

[email protected] MATH 532 95 Unitary and Orthogonal Matrices Summary

(I − Q)x ∈ span{u}, so

T I − Q = uu = Pu

is a projection onto span{u}. Qx ∈ u⊥, so T Q = I − uu = Pu⊥ is a projection onto u⊥.

[email protected] MATH 532 96 Unitary and Orthogonal Matrices

Remark

Above we assumed that kuk2 = 1.

For an arbitrary vector v we get a u = v = √ v . kvk2 v T v Therefore, for general v T = vv { } Pv v T v is a projection onto span v .

vv T ⊥ ⊥ = − = − Pv I Pv I v T v is a projection onto v .

[email protected] MATH 532 97 Unitary and Orthogonal Matrices Elementary Reflections

Definition Let v(6= 0) ∈ Rn. Then vv T R = I − 2 v T v is called the elementary (or Householder) reflector about v ⊥.

Remark n For u ∈ R with kuk2 = 1 we have

R = I − 2uuT .

[email protected] MATH 532 98 Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

[email protected] MATH 532 99 Unitary and Orthogonal Matrices Properties of elementary reflections

Theorem Let R be an elementary reflector. Then

R−1 = RT = R,

i.e., R is orthogonal, symmetric, and involutary.

Remark However, these properties do not characterize a reflection, i.e., an orthogonal, symmetric and involutary matrix is not necessarily a reflection (see HW).

[email protected] MATH 532 100 Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT = R. Also,

R2 = (I − 2uuT )(I − 2uuT ) = I − 4uuT + 4u uT u uT = I, |{z} =1

−1 so that R = R. 

[email protected] MATH 532 101 Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1| and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1) T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 T T T = 2(x x ± µkxke1 x) = 2v x. (9)

[email protected] MATH 532 102 Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v so that vv T x 2v T x Rx = x − 2 = x − v v T v v T v | {z } (9) =1 = x − v

= ∓µkxk2 e1. | {z } =α

Remark These special reflections are used in the Householder variant of the QR factorization. For optimal of real matrices one lets ∓µ = sign(x1).

[email protected] MATH 532 103 Unitary and Orthogonal Matrices

Remark 2 −1 Since R = I (R = R) we have — whenever kxk2 = 1 —

2 Rx = ∓µe1 =⇒ R x = ∓µRe1 ⇐⇒ x = ∓µR∗1.

Therefore the matrix U = ∓R (taking |µ| = 1) is orthogonal (since R is) and contains x as its first column.

Thus, this allows us to construct an ON basis for Rn that contains x (see example in [Mey00].

[email protected] MATH 532 104 Unitary and Orthogonal Matrices Rotations

We give only a brief overview (more details can be found in [Mey00]).

We begin in R2 and look for a matrix representation of the of a vector u into another vector v, counterclockwise by an angle θ:

Here u  kuk cos φ u = 1 = (10) u2 kuk sin φ v  kvk cos(φ + θ) v = 1 = v2 kvk sin(φ + θ) (11)

[email protected] MATH 532 105 Unitary and Orthogonal Matrices

We use the trigonometric identities

cos(A + B) = cos A cos B − sin A sin B sin(A + B) = sin A cos B + sin B cos A with A = φ, B = θ and kvk = kuk to get   (11) kvk cos(φ + θ) v = kvk sin(φ + θ) kuk (cos φ cos θ − sin φ sin θ) = kuk (sin φ cos θ + sin θ cos φ)   (10) u cos θ − u sin θ = 1 2 u2 cos θ + u1 sin θ cos θ − sin θ = u = Pu, sin θ cos θ where P is the rotation matrix.

[email protected] MATH 532 106 Unitary and Orthogonal Matrices

Remark Note that  cos θ sin θ  cos θ − sin θ PT P = − sin θ cos θ sin θ cos θ  cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ = − cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ = I,

so that P is an orthogonal matrix.

PT is also a rotation matrix (by an angle −θ).

[email protected] MATH 532 107 Unitary and Orthogonal Matrices

Rotations about a coordinate axis in R3 are very similar. Such rotations are referred to a rotations.

For example, rotation about the x-axis (in the yz-plane) is accomplished with

1 0 0  Pyz = 0 cos θ − sin θ 0 sin θ cos θ

Rotation about the y and z-axes is done analogously.

[email protected] MATH 532 108 Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher . Definition An orthogonal matrix of the form

1   ..   .     c s    ← i  1     .  Pij =  .   .    ← j  −s c     1     .   ..  1 ↑ ↑ i j with c2 + s2 = 1 is called a plane rotation (or Givens rotation).

Note that the orientation is reversed from the earlier discussion. [email protected] MATH 532 109 Unitary and Orthogonal Matrices

Usually we set x x c = i , s = j q 2 2 q 2 2 xi + xj xi + xj T since then for x = x1 ··· xn     x1 x1 . .  .   .       x 2+x 2     i j  cx + sx q  i j   x 2+x 2     i j  P x =  .  =   ij  .   .     .  −sxi + cxj       0   .     .   .   .  xn xn

th This shows thatP ij zeros the j component of x.

[email protected] MATH 532 110 Unitary and Orthogonal Matrices

x2+x2 q Note that i j = x2 + x2 so that repeatedly applying Givens q 2 2 i j xi +xj rotations Pij with the same i, but different values of j will zero out all but the ith component of x, and that component will become q 2 2 x1 + ... + xn = kxk2.

Therefore, the sequence

P = Pin ··· Pi,i+1Pi,i−1 ··· Pi1

n of Givens rotations rotates the vector x ∈ R onto ei , i.e.,

Px = kxk2ei .

Moreover, the matrix P is orthogonal.

[email protected] MATH 532 111 Unitary and Orthogonal Matrices

Remark Givens rotations can be used as an alternative to Householder reflections to construct a QR factorization.

Householder reflections are in general more efficient, but for sparse matrices Givens rotations are more efficient because they can be applied more selectively.

[email protected] MATH 532 112 Orthogonal Reduction

Orthogonal[0] Reduction

1 Vector Norms

2 Matrix Norms Recall the form of LU factorization (): 3 Inner Product Spaces Tn−1 ··· T2T1A = U, 4 Orthogonal Vectors whereT are lower triangular and U is upper triangular, i.e., we have a 5 Gram–Schmidtk Orthogonalization & QR Factorization triangular reduction. 6 Unitary and Orthogonal Matrices For the QR factorization we will use orthogonal Householder reflectors 7 Orthogonal Reduction Rk to get

8 Complementary Subspaces Rn−1 ··· R2R1A = T, where T is upper triangular, i.e., we have an orthogonal reduction. 9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 114 Orthogonal Reduction

Recall Householder reflectors vv T R = I − 2 , with v = x ± µkxke , v T v 1 so that Rx = ∓µkxke1 and µ = 1 for x real.

Now we explain how to use these Householder reflectors to convert an m × n matrix A to an upper triangular matrix of the same size, i.e., how to do a full QR factorization.

[email protected] MATH 532 115 Orthogonal Reduction

Apply Householder reflector to the first column of A:

 vv T  R A = I − 2 A with v = A ± kA ke 1 ∗1 v T v ∗1 ∗1 ∗1 1   t11  0  = ∓kA ke =   ∗1 1  .   .  0

Then,R 1 applied to all of A yields   t11 t12 ··· t1n  0 ∗ · · · ∗  t tT  R A =   = 11 1 1  . . .   . . .  0 A2 0 ∗ · · · ∗

[email protected] MATH 532 116 Orthogonal Reduction

Next, we apply the same idea to A2, i.e., we let

1 0T  R2 = ˆ 0 R2

Then    T  t11 t12 ··· t1n t11 t1 T R2R1A = ˆ =  t22 t2  0 R2A2 0 0 A3

[email protected] MATH 532 117 Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e.,   t11 ∗  .. .   . .  Rn ··· R2R1 A =   whenever m > n | {z }  O tnn =P O or   t11 ∗ . . Rm ··· R2R1 A =  .. . ∗ whenever n > m | {z }   =P O tmm

Since each Rk is orthogonal (unitary for complex A) we have

PA = T withP m × m orthogonal and T m × n upper triangular, i.e.,

A = QR (Q = PT , R = T)

[email protected] MATH 532 118 Orthogonal Reduction

Remark This is similar to obtaining the QR factorization via MGS, but now Q is orthogonal (square) and R is rectangular.

This gives us the full QR factorization, whereas MGS gave us the reduced QR factorization (with m × n Q and n × n R).

[email protected] MATH 532 119 Orthogonal Reduction

Example We use Householder reflections to find the QR factorization (whereR has positive diagonal elements) of

1 2 0 A = 0 1 1 . 1 0 1

v v T = − 1 1 , = ± k k R1 I 2 T with v 1 A∗1 A∗1 e1 v 1 v 1 so that √  2 R1A = ∓kA∗1ke1 = ∓  0  . 0 √ Thus we take the ± sign as “−” so that t11 = 2 > 0.

[email protected] MATH 532 120 Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1. √ √ T v A∗2 = (1 − 2) · 2 + 0 · 1 + 1 · 0 = 2 − 2 2 1 √ T v 1 A∗3 = (1 − 2) · 0 + 0 · 1 + 1 · 1 = 1

Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

[email protected] MATH 532 121 Orthogonal Reduction

Example ((cont.)) Therefore √ 2 √ 1 − 2 √ 2 − 2 2  2  R1A∗2 = 1 − √  0  = √ 1 2 0 2 − 2 1 | {z√ } =− 2 √  2  2 R A =  1  1 ∗3  √  2 − 2 so that √ √ √  2 2 2 ! 2 1 1 R A =  0 1 1  , A = √ √ 1  √ √  2 2 2 2 − 2 0 2 − 2

[email protected] MATH 532 122 Orthogonal Reduction Example ((cont.)) Next √ v 1 − 3 ˆ = − T 2 = ( ) − k( ) k = √ R2x x 2v 2 x T with v 2 A2 ∗1 A2 ∗1 e1 v 2 v 2 2 √ √ √ v 1 1 − 3 T ( ) = , T ( ) = − , 2 = √ √ v 2 A2 ∗1 3 3 v 2 A2 ∗2 3 2 T v 2 v 2 3 − 3 2 so √ !   0 ˆ 3 ˆ √ R2(A2)∗1 = , R2(A2)∗2 = 6 0 2 1 0T  Using R = we get 2 ˆ √ √ √ 0 R2  2  2 √2 2 R R A =  0 3 0  = T 2 1  √  | {z } 6 =P 0 0 2

[email protected] MATH 532 123 Orthogonal Reduction

Remark As mentioned earlier, the factor R of the QR factorization is given by the matrix T.

The factor Q = PT is not explicitly given in the example.

One could also obtain the same answer using Givens rotations (compare [Mey00, Example 5.7.2]).

[email protected] MATH 532 124 Orthogonal Reduction

Theorem Let A be an n × n nonsingular real matrix. Then the factorization

A = QR with n × n orthogonal matrix Q and n × n upper triangular matrix R with positive diagonal entries is unique.

Remark In this n × n case the reduced and full QR factorizations coincide, i.e., the results obtained via Gram–Schmidt, Householder and Givens should be identical.

[email protected] MATH 532 125 Orthogonal Reduction

Proof Assume we have two QR factorizations

T −1 A = Q1R1 = Q2R2 ⇐⇒ Q2 Q1 = R2R1 = U.

−1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

Moreover, since U is orthogonal u11 = 1.

[email protected] MATH 532 126 Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0   = u12 = 0  .   .  0 since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

Comparing all the other pairs of columns of U shows that U = I, and therefore Q1 = Q2 and R1 = R2. 

[email protected] MATH 532 127 Orthogonal Reduction Recommendations (so far) for solution of Ax = b

1 If A is square and nonsingular, then use LU factorization with partial pivoting. This is stable for most practical problems and n3 requires O( 3 ) operations.

2 To find a least square solution, use QR factorization:

Ax = b ⇐⇒ QRx = b ⇐⇒ Rx = QT b.

Usually the reduced QR factorization is all that’s needed.

[email protected] MATH 532 128 Orthogonal Reduction

Even though (for square nonsingular A) the Gram–Schmidt, Householder and Givens versions of the QR factorization are equivalent (due to the uniqueness theorem), we have — for general A — that classical GS is not stable, modified GS is stable for least squares, but unstable for QR (since it has problems maintaining orthogonality), Householder and Givens are stable, both for least squares and QR

[email protected] MATH 532 129 Orthogonal Reduction Computational cost (for n × n matrices)

n3 LU with partial pivoting: O( 3 ) Gram–Schmidt: O(n3) 2n3 Householder: O( 3 ) 4n3 Givens: O( 3 )

Householder reflections are often the preferred method since they provide both stability and also decent efficiency.

[email protected] MATH 532 130 Complementary Subspaces

Complementary[0] Subspaces

Definition1 Vector Norms Let V be a vector space and X , Y ⊆ V be subspaces. X and Y are 2 Matrix Norms called complementary provided 3 Inner Product Spaces V = X + Y and X ∩ Y = {0}. 4 Orthogonal Vectors In this case, V is also called the direct sum of X and Y, and we write 5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices V = X ⊕ Y.

Example7 Orthogonal Reduction Any two lines through the origin in 2 are complementary. 8 Complementary Subspaces R Any plane through the origin in R3 is complementary to any 9 Orthogonalthrough Decomposition the origin not contained in the plane. 3 10 SingularTwo planesValue Decomposition through the origin in R are not complementary since they must intersect in a line. 11 Orthogonal Projections [email protected] MATH 532 132 Complementary Subspaces Theorem

Let V be a vector space, and X , Y ⊆ V be subspaces with bases BX and BY . The following are equivalent: 1 V = X ⊕ Y. 2 For every v ∈ V there exist unique x ∈ X and y ∈ Y such that v = x + y.

3 BX ∩ BY = {} and BX ∪ BY is a basis for V.

Proof. See [Mey00].

Definition Suppose V = X ⊕ Y, i.e., any v ∈ V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X .

[email protected] MATH 532 133 Complementary Subspaces Properties of projectors Theorem Let X , Y be complementary subspaces of V. Let P, defined by Pv = x, be the projector onto X along Y. Then 1 P is unique. 2 P2 = P, i.e., P is idempotent. 3 I − P is the complementary projector (onto Y along X ). 4 R(P) = {x : Px = x} = X (“fixed points” for P). 5 N(I − P) = X = R(P) and R(I − P) = N(P) = Y. 6 If V = Rn (or Cn), then −1 P = XO XY   IO −1 = XY XY , OO where the columns of X and Y are bases for X and Y.

[email protected] MATH 532 134 Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y = Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

[email protected] MATH 532 135 Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

[email protected] MATH 532 136 Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y. So PB = P XY = XO or −1 P = XO B−1 = XY . This establishes the first part of (6). The second part follows by noting that

 IO  IO B = XY = XO . OO OO

 [email protected] MATH 532 137 Complementary Subspaces

We just saw that any projector is idempotent, i.e., P2 = P. In fact, Theorem A matrix P is a projector if and only if P2 = P.

Proof. One direction is given above. For the other see [Mey00].

Remark This theorem is sometimes used to define projectors.

[email protected] MATH 532 138 Complementary Subspaces Angle between subspaces

In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces.

If R, N are complementary then

1 1 1 sin θ = = = , kPk2 λmax σ1 where P is the projector onto R along N , λmax is the largest T eigenvalue of P P and σ1 is the largest singular value of P.

See [Mey00, Example 5.9.2] for more details.

[email protected] MATH 532 139 Complementary Subspaces

Remark We will skip [Mey00, Section 5.10] on the range–nullspace decomposition.

While the range–nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality.

This also means we will not discuss nilpotent matrices and — later on — the Jordan normal form.

[email protected] MATH 532 140 Orthogonal Decomposition

Definition[0] Let V be an inner product space and M ⊆ V. The orthogonal 1 complementVector Norms M⊥ of M is

2 Matrix Norms M⊥ = {x ∈ V : hm, xi = 0 for all m ∈ M}. 3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition Remark 10 Singular Value Decomposition Even if M is not a subspace of V (i.e., only a subset), M⊥ is (see HW). 11 Orthogonal Projections [email protected] MATH 532 142 Orthogonal Decomposition

Theorem Let V be an inner product space and M ⊆ V. If M is a subspace of V, then V = M ⊕ M⊥.

Proof According to the definition of complementary subspaces we need to show 1 M ∩ M⊥ = {0}, 2 M + M⊥ = V.

[email protected] MATH 532 143 Orthogonal Decomposition

Proof (cont.)

1 Let’s assume there exists an x ∈ M ∩ M⊥, i.e., x ∈ M and x ∈ M⊥.

The definition of M⊥ implies

hx, xi = 0.

But then the definition of an inner product implies x = 0.

This is true for any x ∈ M ∩ M⊥, so x = 0 is the only such vector.

[email protected] MATH 532 144 Orthogonal Decomposition

Proof (cont.)

2 ⊥ We let BM and BM⊥ be ON bases for M and M , respectively.

⊥ Since M ∩ M = {0} we know that BM ∪ BM⊥ is an ON basis for some S ⊆ V.

In fact, S = V since otherwise we could extend BM ∪ BM⊥ to an ON basis of V (using the extension theorem and GS).

However, any vector in the extension must be orthogonal to M, i.e., in M⊥, but this is not possible since the extended basis must be linearly independent.

Therefore, the extension set is empty.



[email protected] MATH 532 145 Orthogonal Decomposition

Theorem Let V be an inner product space with dim(V) = n and M be a subspace of V. Then 1 dim M⊥ = n − dim M, ⊥ 2 M⊥ = M.

Proof For (1) recall our formula from Chapter 4

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

Here M ∩ M⊥ = {0}, so that dim(M ∩ M⊥) = 0. Also, since M is a subspace of V we have V = M + M⊥ and the dimension formula implies (1).

[email protected] MATH 532 146 Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M. Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni = hm + n, ni = hm, ni +hn, ni. | {z } =0 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

[email protected] MATH 532 147 Orthogonal Decomposition

Proof (cont.) Now, recall from Chapter 4 that for subspaces X ⊆ Y

dim X = dim Y =⇒ X = Y.

⊥ We take X = M⊥ and Y = M (and know from the work just ⊥ performed that M⊥ is a subspace of ⊆ M). From (1) we know

dim M⊥ = n − dim M ⊥ dim M⊥ = n − dim M⊥ = n − (n − dim M) = dim M.

⊥⊥ But then M = M. 

[email protected] MATH 532 148 Orthogonal Decomposition Back to Fundamental Subspaces

Theorem Let A be a real m × n matrix. Then 1 R(A)⊥ = N(AT ), 2 N(A)⊥ = R(AT ).

Corollary

m ⊥ T R = R(A) ⊕R(A) = R(A) ⊕ N(A ), | {z } ⊆Rm n ⊥ T R = N(A) ⊕N(A) = N(A) ⊕ R(A ). | {z } ⊆Rn

[email protected] MATH 532 149 Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) ⊥ R(A)⊥ = N(AT ) ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

 [email protected] MATH 532 150 Orthogonal Decomposition Starting to think about the SVD The decompositions of Rm and Rn from the corollary help prepare for the SVD of an m × n matrix A. Assume rank(A) = r and let m BR(A) = {u1,..., ur } ON basis for R(A) ⊆ R , T m BN(AT ) = {ur+1,..., um} ON basis for N(A ) ⊆ R , T n BR(AT ) = {v 1,..., v r } ON basis for R(A ) ⊆ R , n BN(A) = {v r+1,..., v n} ON basis for N(A) ⊆ R . By the corollary m BR(A) ∪ BN(AT ) ON basis for R , n BR(AT ) ∪ BN(A) ON basis for R , and therefore the following are orthogonal matrices  U = u1 u2 ··· um  V = v 1 v 2 ··· v n . [email protected] MATH 532 151 Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j = r + 1,..., n, T T ui A = 0 ⇐⇒ A ui = 0, i = r + 1,..., m, so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

[email protected] MATH 532 152 Orthogonal Decomposition

Thus C O R = UT AV = r×r OO C O ⇐⇒ A = URVT = U r×r VT , OO the URV factorization of A.

Remark

The matrix Cr×r is nonsingular since

rank(C) = rank(UT AV) = rank(A) = r because multiplication by the orthogonal (and therefore nonsingular) matrices UT and V does not change the rank of A.

[email protected] MATH 532 153 Orthogonal Decomposition

We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization.

As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A.

However, the URV factorization is not unique. Different ON bases result in different factorizations.

[email protected] MATH 532 154 Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT ) = R(UR)  rank(C)=r = R U1CO = R(U1C) = R(U1) (12) |{z} m×r so that the columns of U1 are an ON basis for R(A).

[email protected] MATH 532 155 Orthogonal Decomposition

Moreover,

T prev. thm ⊥ (12) ⊥ N(A ) = R(A) = R(U1) = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

T This implies that the columns of U2 are an ON basis for N(A ).

The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

[email protected] MATH 532 156 Orthogonal Decomposition

The main difference between a URV factorization and the SVD is that the SVD will contain a Σ with r nonzero singular values, while R contains the full r × r block C.

As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C.

Idea: use Householder reflections (or Givens rotations)

[email protected] MATH 532 157 Orthogonal Decomposition

Consider an m × n matrix A. We apply an m × m orthogonal (Householder reflection) matrix P so that B A −→ PA = , with r × m matrix B, rank(B) = r. O

Next, use n × n orthogonal Q as follows:

T BT −→ QBT = , with r × r upper triangularT , rank(T) = r. O

Then BQT = TT O ⇐⇒ B = TT O Q and B TT O = Q. O OO

[email protected] MATH 532 158 Orthogonal Decomposition

Together,

TT O PA = Q OO TT O ⇐⇒ A = PT Q, OO a URV factorization with lower triangular block TT .

Remark See HW for an example of this process with numbers.

[email protected] MATH 532 159 Singular Value Decomposition

Singular[0] Value Decomposition

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces We know   4 Orthogonal Vectors CO A = URVT = U VT , OO 5 Gram–Schmidt Orthogonalization & QR Factorization where C is upper triangular and U, V are orthogonal. 6 Unitary and Orthogonal Matrices

Now7 Orthogonal we want Reduction to establish that C can even be made diagonal.

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections [email protected] MATH 532 161 Singular Value Decomposition

Note that kAk2 = kCk2 =: σ1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW). Also, kCk2 = max kCzk2 kzk2=1 so that kCk2 = kCxk2 for some x, kxk2 = 1. In fact (see Sect.5.2), x is such that (CT C − λI)x = 0, i.e., x is an eigenvector of CT C so that √ p T T kCk2 = σ1 = λ = x C Cx. (13)

[email protected] MATH 532 162 Singular Value Decomposition

Since x is a unit vector we can extend it to an orthogonal matrix  Rx = x X , e.g., using Householder reflectors as discussed at the end of Sect.5.6.

Similarly, let Cx Cx y = = . (14) kCxk2 σ1

Then  Ry = y Y is also orthogonal (and Hermitian/symmetric) since it’s a Householder reflector.

[email protected] MATH 532 163 Singular Value Decomposition

Now y T  y T Cx y T CX RT CR = C x X = . y x YT YT Cx YT CX |{z} =Ry

From above

2 (13) T T (14) T σ1 = λ = x C Cx = σ1y Cx T =⇒ y Cx = σ1.

Also, T (14) T Y Cx = Y (σ1y) = 0 since Ry is orthogonal, i.e., y is orthogonal to the columns of Y.

[email protected] MATH 532 164 Singular Value Decomposition T T T Let Y CX = C2 and y CX = c so that  T  σ1 c Ry CRx = . 0 C2

To show that cT = 0T consider T (14) Cx  cT = y T CX = CX σ1 xT CT CX = . (15) σ1 From (13) x is an eigenvector of CT C, i.e.,

T 2 T T 2 T C Cx = λx = σ1x ⇐⇒ x C C = σ1x . Plugging this into (15) yields

T T c = σ1x X = 0  since Rx = x X is orthogonal.

[email protected] MATH 532 165 Singular Value Decomposition

Moreover, σ1 ≥ kC2k2 since

HW σ1 = kCk2 = kRy CRx k2 = max{σ1, kC2k2}.

Next, we repeat this process for C2, i.e.,

 T  σ2 0 Sy C2Sx = with σ2 ≥ kC3k2. 0 C3

Let  T   T  1 0 T 1 0 P2 = T Ry , Q2 = Rx . 0 Sy 0 Sx Then  T  σ1 0 0 T P2CQ2 =  0 σ2 0  with σ1 ≥ σ2 ≥ kC3k2. 0 0 C3

[email protected] MATH 532 166 Singular Value Decomposition We continue this until   σ1 O  σ2  P CQ =   = D, σ ≥ σ ≥ ... ≥ σ . r−1 r−1  ..  1 2 r  .  O σr Finally, let P O Q O UeT = r−1 UT , and Ve = r−1 . OI OI Together, DO UeT AVe = OO or — without the tildes — the singular value decomposition (SVD) of A DO A = U VT , OO where A is m × n, U is m × m,D = r × r and V = n × n. [email protected] MATH 532 167 Singular Value Decomposition

We use the following terminology: singular values: σ1 ≥ σ2 ≥ ... ≥ σr > 0, left singular vectors: columns of U, right singular vectors: columns of V.

Remark In Chapter 7 we will see that the columns of U and V are also special eigenvectors of AT A.

[email protected] MATH 532 168 Singular Value Decomposition Geometric interpretation of SVD

× For the following we assume A ∈ Rn n, n = 2.

1 2 σv 2 2 σ 1 1u1 0.5

±2 ±1 0 1 2 ±1 ±0.5 0.5 1

v ±1 2 ±0.5 v ±2 1

±1 This picture is true since A = UDVT ⇐⇒ AV = UD

and σ1, σ2 are the lengths of the semi-axes of the ellipse because ku1k = ku2k = 1. Remark See [Mey00] for more details. [email protected] MATH 532 169 Singular Value Decomposition

For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths

σ1 ≥ σ2 ≥ ... ≥ σn.

Therefore, σ1 κ2(A) = σn is the distortion ratio of the transformation A. Moreover, 1 σ = k k , σ = 1 A 2 n −1 kA k2 so that −1 κ2(A) = kAk2kA k2 × is the 2-norm condition number of A (∈ Rn n).

[email protected] MATH 532 170 Singular Value Decomposition

Remark

The relations for σ1 and σn hold because

T HW kAk2 = kUDV k2 = kDk2 = σ1

−1 −1 T HW −1 1 kA k2 = kVD U k2 = kD k2 = σn

Remark

We always have κ2(A) ≥ 1, and κ2(A) = 1 if and only if A is a multiple of an orthogonal matrix (typo in [Mey00], see proof on next slide).

[email protected] MATH 532 171 Singular Value Decomposition

Proof “⇐=”: Assume A = αQ with α > 0, Q orthogonal, i.e.,

invariance kAk2 = αkQk2 = α max kQxk2 = α max kxk2 = α. kxk2=1 kxk2=1

Also 1 AT A = α2QT Q = α2I =⇒ A−1 = AT and kAT k = kAk α2 2 2

−1 1 so that kA k2 = α and 1 κ (A) = kAk kA−1k = α = 1. 2 2 2 α

[email protected] MATH 532 172 Singular Value Decomposition

Proof (cont.) “=⇒”: Assume κ (A) = σ1 = 1 so that σ = σ and therefore 2 σn 1 n

D = σ1I.

Thus T T A = UDV = σ1UV and

T 2 T T T A A = σ1(UV ) UV 2 T T 2 = σ1VU UV = σ1I.



[email protected] MATH 532 173 Singular Value Decomposition Applications of the Condition Number

× Let x˜ be the answer obtained by solving Ax = b with A ∈ Rn n. Is a small residual r = b − Ax˜ a good indicator for the accuracy of x˜?

Since x is the exact answer, and x˜ the computed answer we have the relative error kx − x˜k . kxk

[email protected] MATH 532 174 Singular Value Decomposition

Now

krk = kb − Ax˜k = kAx − Ax˜k = kA(x − x˜)k ≤ kAkkx − x˜k.

kA−1bk To get the relative error we multiply by kxk = 1. Then kx − x˜k krk ≤ kAkkA−1bk kxk krk kx − x˜k ≤ κ(A) . (16) kbk kxk

[email protected] MATH 532 175 Singular Value Decomposition

Moreover, using r = b − Ax˜ = b − b˜,

kx − x˜k = kA−1(b − b˜)k ≤ kA−1kkrk.

kAxk Multiplying by kbk = 1 we have

kx − x˜k krk ≤ κ(A) . (17) kxk kbk

Combining (16) and (17) yields

1 krk kx − x˜k krk ≤ ≤ κ(A) . κ(A) kbk kxk kbk

krk Therefore, the relative residual kbk is a good indicator of relative error if and only if A is well conditioned, i.e., κ(A) is small (close to 1).

[email protected] MATH 532 176 Singular Value Decomposition Applications of the SVD

1 Determination of “numerical rank(A)”: rank(A) ≈ index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart–Young theorem states that

k X T Ak = σi ui v i i=1 is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e.,

kA − Ak k2 = min kA − Bk2. rank(B)=k Moreover, kA − Ak k2 = σk+1. Run SVD_movie.m

[email protected] MATH 532 177 Singular Value Decomposition

3 Stable solution of least squares problems: Use Moore–Penrose pseudoinverse

Definition Let A ∈ Rm×n and DO A = U VT OO be the SVD of A. Then D−1 O A† = V UT OO is called the Moore–Penrose pseudoinverse of A.

Remark Note that A† ∈ Rn×m and

r T X v i u A† = i , r = rank(A). σi i=1

[email protected] MATH 532 178 Singular Value Decomposition

We now show that the least squares solution of

Ax = b is given by x = A†b.

[email protected] MATH 532 179 Singular Value Decomposition

Start with normal equations and use

DO A = U VT = UDe VeT , OO

× × the reduced SVD of A, i.e., Ue ∈ Rm r , Ve ∈ Rn r .

AT Ax = AT b ⇐⇒ VDe UeT Ue DVeT x = VDe UeT b |{z} =I ⇐⇒ VDe 2VeT x = VDe UeT b

Multiplication by D−1VeT yields

DVeT x = UeT b.

[email protected] MATH 532 180 Singular Value Decomposition

Thus DVeT x = UeT b implies

x = VDe −1UeT b D−1 O ⇐⇒ x = V UT b OO ⇐⇒ x = A†b.

[email protected] MATH 532 181 Singular Value Decomposition

Remark If A is nonsingular then A† = A−1 (see HW).

If rank(A) < n (i.e., the least squares solution is not unique), then x = A†b provides the unique solution with minimum 2-norm (see justification on following slide).

[email protected] MATH 532 182 Singular Value Decomposition Minimum norm solution of underdetermined systems

Note that the general solution of Ax = b is given by

z = A†b + n, n ∈ N(A).

Then

2 † 2 kzk2 = kA b + nk2 Pythag. thm † 2 2 † 2 = kA bk2 + knk2 ≥ kA bk2.

The Pythagorean theorem applies since (see HW)

A†b ∈ R(A†) = R(AT )

so that, using R(AT ) = N(A)⊥,

A†b ⊥ n.

[email protected] MATH 532 183 Singular Value Decomposition

Remark Explicit use of the pseudoinverse is usually not recommended. × Instead we solve Ax = b, A ∈ Rm n, by

1 A = UDe VeT (reduced SVD)

2 Ax = b ⇐⇒ DVeT x = UeT b, so T 1 Solve Dy = Ue b for y 2 Compute x = Vey

[email protected] MATH 532 184 Singular Value Decomposition Other Applications

Also known as principal component analysis (PCA), (discrete) Karhunen-Loève (KL) transformation, Hotelling transform, or proper orthogonal decomposition (POD) Data compression Noise filtering Regularization of inverse problems Tomography Image deblurring Seismology Information retrieval and data mining (latent semantic analysis) Bioinformatics and computational biology Immunology Molecular dynamics Microarray data analysis

[email protected] MATH 532 185 Orthogonal Projections

Orthogonal[0] Projections

Earlier1 Vector we Norms discussed orthogonal complementary subspaces of an inner product space V, i.e., 2 Matrix Norms ⊥ 3 Inner Product Spaces V = M ⊕ M .

4 Orthogonal Vectors

Definition5 Gram–Schmidt Orthogonalization & QR Factorization Consider V = M ⊕ M⊥ so that for every 6 v ∈ VUnitarythere and exist Orthogonal unique Matrices vectors m ∈ M, n ∈ M⊥ such that 7 Orthogonal Reduction

8 Complementary Subspacesv = m + n.

Then9 Orthogonalm is called Decomposition the orthogonal projection of v onto M. The10 matrixSingular Value PM Decompositionsuch that PMv = m is the orthogonal projector onto M along M⊥. 11 Orthogonal Projections [email protected] MATH 532 187 Orthogonal Projections

For arbitrary complementary subspaces X , Y we showed earlier that the projector onto X along Y is given by

−1 P = XO XY   IO −1 = XY XY , OO where the columns of X and Y are bases for X and Y.

[email protected] MATH 532 188 Orthogonal Projections

Now we let X = M and Y = M⊥ be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M⊥ in their columns. Then −1 P = MO MN . (18)

−1 To find MN we note that

MT N = NT M = O and if N is an orthogonal matrix (i.e., contains an ON basis), then

(MT M)−1MT   IO MN = NT OI

(note that MT M is invertible since M is full rank because its columns form a basis of M).

[email protected] MATH 532 189 Orthogonal Projections

Thus  T −1 T  −1 (M M) M MN = . (19) NT Inserting (19) into (18) yields

(MT M)−1MT  P = MO M NT = M(MT M)−1MT .

Remark

Note that PM is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then

T PM = MM .

[email protected] MATH 532 190 Orthogonal Projections

Similarly,

T −1 T PM⊥ = N(N N) N (arbitrary basis for N ) T PM⊥ = NN ON basis

As before, PM = I − PM⊥ .

Example If M = span{u}, kuk = 1 then

T PM = Pu = uu and T PuT = I − Pu = I − uu (cf. elementary orthogonal projectors earlier).

[email protected] MATH 532 191 Orthogonal Projections Properties of orthogonal projectors

Theorem × Let P ∈ Rn n be a projector, i.e., P2 = P. Then the matrix P is an orthogonal projector if 1 R(P) ⊥ N(P), 2 PT = P,

3 kPk2 = 1.

[email protected] MATH 532 192 Orthogonal Projections

Proof 1 Follows directly from the definition.

2 “=⇒”: Assume P is an orthogonal projector, i.e.,

P = M(MT M)−1MT and PT = M (MT M)−T MT = P. | {z } =(MT M)−1

“⇐=”: Assume P = PT . Then

Orth.decomp. R(P) = R(PT ) = N(P)⊥

so that P is an orthogonal projector via (1).

[email protected] MATH 532 193 Orthogonal Projections

Proof (cont.) 3 For complementary subspaces X , Y we know the angle between X and Y is given by

1 h π i kPk = , θ ∈ 0, . 2 sin θ 2

π Assume P is an orthogonal projector, then θ = 2 so that kPk2 = 1. π Conversely, if kPk2 = 1, then θ = 2 and X , Y are orthogonal complements, i.e., P is an orthogonal projector.



[email protected] MATH 532 194 Orthogonal Projections Why is orthogonal projection so important?

Theorem Let V be an inner product space with subspace M, and let b ∈ V. Then dist(b, M) = min kb − mk2 = kb − PMbk2, n∈M

i.e., PMb is the unique vector in M closest to b. The quantity dist(b, M) is called the (orthogonal) from b to M.

[email protected] MATH 532 195 Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

Then

2 2 kb − mk2 = kb − p + p − mk2 Pythag. 2 2 = kb − pk2 + kp − mk2 2 ≥ kb − pk2.

Therefore minm∈M kb − mk2 = kb − pk2.

[email protected] MATH 532 196 Orthogonal Projections

Proof (cont.) Uniqueness: Assume there exists a q ∈ M such that

kb − qk2 = kb − pk2. (20)

Then

2 2 kb − qk2 = k b − p + p − q k2 | {z } | {z } ∈M⊥ ∈M Pythag. 2 2 = kb − pk2 + kp − qk2.

2 But then (20) implies that kp − qk2 = 0 and therefore p = q. 

[email protected] MATH 532 197 Orthogonal Projections Least squares approximation revisited

Now we give a “modern” derivation of the normal equations (without calculus), and note that much of this remains true for best L2 approximation. × Goal of least squares: For A ∈ Rm n, find v u m uX 2 min t ((Ax)i − bi ) ⇐⇒ min kAx − bk2. x∈Rn x∈Rn i=1

Now Ax ∈ R(A), so the least squares error is

dist(b, R(A)) = min kb − Axk2 Ax∈R(A)

= kb − PR(A)bk2

with PR(A) the orthogonal projector onto R(A).

[email protected] MATH 532 198 Orthogonal Projections

[email protected] MATH 532 199 Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

[email protected] MATH 532 200 Appendix References ReferencesI

[Mey00] Carl D. Meyer, and Applied Linear Algebra, SIAM, Philadelphia, PA, 2000.

[email protected] MATH 532 201