<<

MATH 532: Linear Chapter 5: Norms, Inner Products and

Greg Fasshauer

Department of Applied Illinois Institute of Technology

Spring 2015

[email protected] MATH 532 1 Outline

1 Vector Norms

2 Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 2 Vector Norms Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 3 Vector Norms Vector Norms

Definition Let x, y ∈ Rn (Cn). Then

n T X x y = xi yi ∈ R i=1 n ∗ X x y = x¯i yi ∈ C i=1

is called the standard inner product for Rn (Cn).

[email protected] MATH 532 4 Vector Norms

Definition

Let V be a vector . A k · k : V → R≥0 is called a provided for any x, y ∈ V and α ∈ R

1 kxk ≥ 0 and kxk = 0 if and only if x = 0,

2 kαxk = |α| kxk,

3 kx + yk ≤ kxk + kyk.

Remark The in (3) is known as the inequality.

[email protected] MATH 532 5 We will show that the standard inner product induces the Euclidean norm (cf. length of a vector).

Remark Inner products let us define via

xT y cos θ = . kxkkyk

In particular, x, y are orthogonal if and only if xT y = 0.

Vector Norms

Remark Any inner product h·, ·i induces a norm via (more later) p kxk = hx, xi.

[email protected] MATH 532 6 Remark Inner products let us define angles via

xT y cos θ = . kxkkyk

In particular, x, y are orthogonal if and only if xT y = 0.

Vector Norms

Remark Any inner product h·, ·i induces a norm via (more later) p kxk = hx, xi.

We will show that the standard inner product induces the Euclidean norm (cf. length of a vector).

[email protected] MATH 532 6 In particular, x, y are orthogonal if and only if xT y = 0.

Vector Norms

Remark Any inner product h·, ·i induces a norm via (more later) p kxk = hx, xi.

We will show that the standard inner product induces the Euclidean norm (cf. length of a vector).

Remark Inner products let us define angles via

xT y cos θ = . kxkkyk

[email protected] MATH 532 6 Vector Norms

Remark Any inner product h·, ·i induces a norm via (more later) p kxk = hx, xi.

We will show that the standard inner product induces the Euclidean norm (cf. length of a vector).

Remark Inner products let us define angles via

xT y cos θ = . kxkkyk

In particular, x, y are orthogonal if and only if xT y = 0.

[email protected] MATH 532 6 We show that k · k2 is a norm. We do this for the real case, but the complex case goes analogously.

1 Clearly, kxk2 ≥ 0. Also,

2 kxk2 = 0 ⇐⇒ kxk2 = 0 n X 2 ⇐⇒ xi = 0 ⇐⇒ xi = 0, i = 1,..., n, i=1 ⇐⇒ x = 0.

Vector Norms

Example Let x ∈ Rn and consider the Euclidean norm √ T kxk2 = x x n !1/2 X 2 = xi . i=1

[email protected] MATH 532 7 1 Clearly, kxk2 ≥ 0. Also,

2 kxk2 = 0 ⇐⇒ kxk2 = 0 n X 2 ⇐⇒ xi = 0 ⇐⇒ xi = 0, i = 1,..., n, i=1 ⇐⇒ x = 0.

Vector Norms

Example Let x ∈ Rn and consider the Euclidean norm √ T kxk2 = x x n !1/2 X 2 = xi . i=1

We show that k · k2 is a norm. We do this for the real case, but the complex case goes analogously.

[email protected] MATH 532 7 Vector Norms

Example Let x ∈ Rn and consider the Euclidean norm √ T kxk2 = x x n !1/2 X 2 = xi . i=1

We show that k · k2 is a norm. We do this for the real case, but the complex case goes analogously.

1 Clearly, kxk2 ≥ 0. Also,

2 kxk2 = 0 ⇐⇒ kxk2 = 0 n X 2 ⇐⇒ xi = 0 ⇐⇒ xi = 0, i = 1,..., n, i=1 ⇐⇒ x = 0.

[email protected] MATH 532 7 n !1/2 X 2 = |α| xi = |α| kxk2. i=1

3 To establish (3) we need Lemma Let x, y ∈ Rn. Then

T |x y| ≤ kxk2kyk2. (Cauchy–Schwarz–Bunyakovsky)

Moreover, equality holds if and only if y = αx with

xT y α = . 2 kxk2

Vector Norms Example (cont.) 2 We have

n !1/2 X 2 kαxk2 = (αxi ) i=1

[email protected] MATH 532 8 3 To establish (3) we need Lemma Let x, y ∈ Rn. Then

T |x y| ≤ kxk2kyk2. (Cauchy–Schwarz–Bunyakovsky)

Moreover, equality holds if and only if y = αx with

xT y α = . 2 kxk2

Vector Norms Example (cont.) 2 We have

n !1/2 n !1/2 X 2 X 2 kαxk2 = (αxi ) = |α| xi = |α| kxk2. i=1 i=1

[email protected] MATH 532 8 Vector Norms Example (cont.) 2 We have

n !1/2 n !1/2 X 2 X 2 kαxk2 = (αxi ) = |α| xi = |α| kxk2. i=1 i=1

3 To establish (3) we need Lemma Let x, y ∈ Rn. Then

T |x y| ≤ kxk2kyk2. (Cauchy–Schwarz–Bunyakovsky)

Moreover, equality holds if and only if y = αx with

xT y α = . 2 kxk2

[email protected] MATH 532 8 aT b cos θ = . kakkbk

Using trigonometry as in the figure, the projection of a onto b is then b aT b b aT b kak cos θ = kak = b. kbk kakkbk kbk kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the θ between two vectors a and b is related to the inner product by

[email protected] MATH 532 9 Using trigonometry as in the figure, the projection of a onto b is then b aT b b aT b kak cos θ = kak = b. kbk kakkbk kbk kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the angle θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

[email protected] MATH 532 9 the projection of a onto b is then b aT b b aT b kak cos θ = kak = b. kbk kakkbk kbk kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the angle θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

Using trigonometry as in the figure,

[email protected] MATH 532 9 aT b b aT b = kak = b. kakkbk kbk kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the angle θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

Using trigonometry as in the figure, the projection of a onto b is then b kak cos θ kbk

[email protected] MATH 532 9 aT b = b. kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the angle θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

Using trigonometry as in the figure, the projection of a onto b is then b aT b b kak cos θ = kak kbk kakkbk kbk

[email protected] MATH 532 9 Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the angle θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

Using trigonometry as in the figure, the projection of a onto b is then b aT b b aT b kak cos θ = kak = b. kbk kakkbk kbk kbk2

[email protected] MATH 532 9 Vector Norms Motivation for Proof of Cauchy–Schwarz–Bunyakovsky

As already alluded to above, the angle θ between two vectors a and b is related to the inner product by

aT b cos θ = . kakkbk

Using trigonometry as in the figure, the projection of a onto b is then b aT b b aT b kak cos θ = kak = b. kbk kakkbk kbk kbk2

Now, we let y = a and x = b, so that the projection of y onto x is given by xT y αx, where α = . kxk2

[email protected] MATH 532 9 = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking roots.

since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx)

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0

[email protected] MATH 532 10 = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots.

Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx)

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm.

[email protected] MATH 532 10 = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots.

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx)

[email protected] MATH 532 10 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots.

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx) = y T y − 2αxT y + α2xT x

[email protected] MATH 532 10 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots.

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx) = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2

[email protected] MATH 532 10 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots.

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx) = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2

[email protected] MATH 532 10 and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots.

Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx) = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2,

[email protected] MATH 532 10 Vector Norms Proof of Cauchy–Schwarz–Bunyakovsky 2 We know that ky − αxk2 ≥ 0 since it’s (the square of) a norm. Therefore,

2 T 0 ≤ ky − αxk2 = (y − αx) (y − αx) = y T y − 2αxT y + α2xT x 2 xT y xT y = y T y − 2 xT y + xT x kxk2 kxk4 |{z} 2 =kxk2 2 xT y = k k2 − . y 2 2 kxk2 This implies 2  T  2 2 x y ≤ kxk2kyk2, and the Cauchy–Schwarz–Bunyakovsky inequality follows by taking square roots. [email protected] MATH 532 10 T “=⇒”: Let’s assume that x y = kxk2kyk2. But then the first part of the proof shows that ky − αxk2 = 0 so that y = αx.

“⇐=”: Let’s assume y = αx. Then

T T 2 x y = x (αx) = |α|kxk2 2 kxk2kyk2 = kxk2kαxk2 = |α|kxk2,

so that we have equality. 

Vector Norms

Proof (cont.) Now we look at the equality claim.

[email protected] MATH 532 11 But then the first part of the proof shows that ky − αxk2 = 0 so that y = αx.

“⇐=”: Let’s assume y = αx. Then

T T 2 x y = x (αx) = |α|kxk2 2 kxk2kyk2 = kxk2kαxk2 = |α|kxk2,

so that we have equality. 

Vector Norms

Proof (cont.) Now we look at the equality claim.

T “=⇒”: Let’s assume that x y = kxk2kyk2.

[email protected] MATH 532 11 “⇐=”: Let’s assume y = αx. Then

T T 2 x y = x (αx) = |α|kxk2 2 kxk2kyk2 = kxk2kαxk2 = |α|kxk2,

so that we have equality. 

Vector Norms

Proof (cont.) Now we look at the equality claim.

T “=⇒”: Let’s assume that x y = kxk2kyk2. But then the first part of the proof shows that ky − αxk2 = 0 so that y = αx.

[email protected] MATH 532 11 Then

T T 2 x y = x (αx) = |α|kxk2 2 kxk2kyk2 = kxk2kαxk2 = |α|kxk2,

so that we have equality. 

Vector Norms

Proof (cont.) Now we look at the equality claim.

T “=⇒”: Let’s assume that x y = kxk2kyk2. But then the first part of the proof shows that ky − αxk2 = 0 so that y = αx.

“⇐=”: Let’s assume y = αx.

[email protected] MATH 532 11 Vector Norms

Proof (cont.) Now we look at the equality claim.

T “=⇒”: Let’s assume that x y = kxk2kyk2. But then the first part of the proof shows that ky − αxk2 = 0 so that y = αx.

“⇐=”: Let’s assume y = αx. Then

T T 2 x y = x (αx) = |α|kxk2 2 kxk2kyk2 = kxk2kαxk2 = |α|kxk2, so that we have equality. 

[email protected] MATH 532 11 2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the .

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

[email protected] MATH 532 12 = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y)

[email protected] MATH 532 12 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2

[email protected] MATH 532 12 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2

[email protected] MATH 532 12 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2

[email protected] MATH 532 12 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2

[email protected] MATH 532 12 Now we just need to take square roots to have the triangle inequality.

Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

[email protected] MATH 532 12 Vector Norms

Example (cont.)

3 Now we can show that k · k2 satisfies the triangle inequality:

2 T kx + yk2 = (x + y) (x + y) = xT x +xT y + y T x + y T y |{z} |{z} |{z} =kxk2 = T 2 2 x y =kyk2 2 T 2 = kxk2 + 2x y + kyk2 2 T 2 ≤ kxk2 + 2|x y| + kyk2 CSB 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) .

Now we just need to take square roots to have the triangle inequality.

[email protected] MATH 532 12 Proof We write tri.ineq. kxk = kx − y + yk ≤ kx − yk + kyk. But this implies kxk − kyk ≤ kx − yk.

Vector Norms

Lemma Let x, y ∈ Rn. Then we have the backward triangle inequality

| kxk − kyk | ≤ kx − yk.

[email protected] MATH 532 13 tri.ineq. ≤ kx − yk + kyk. But this implies kxk − kyk ≤ kx − yk.

Vector Norms

Lemma Let x, y ∈ Rn. Then we have the backward triangle inequality

| kxk − kyk | ≤ kx − yk.

Proof We write kxk = kx − y + yk

[email protected] MATH 532 13 But this implies kxk − kyk ≤ kx − yk.

Vector Norms

Lemma Let x, y ∈ Rn. Then we have the backward triangle inequality

| kxk − kyk | ≤ kx − yk.

Proof We write tri.ineq. kxk = kx − y + yk ≤ kx − yk + kyk.

[email protected] MATH 532 13 Vector Norms

Lemma Let x, y ∈ Rn. Then we have the backward triangle inequality

| kxk − kyk | ≤ kx − yk.

Proof We write tri.ineq. kxk = kx − y + yk ≤ kx − yk + kyk. But this implies kxk − kyk ≤ kx − yk.

[email protected] MATH 532 13 ⇐⇒ − (kxk − kyk) ≤ kx − yk.

Together with the previous inequality we have

|kxk − kyk| ≤ kx − yk.



Vector Norms

Proof (cont.) Switch the roles of x and y to get

kyk − kxk ≤ ky − xk

[email protected] MATH 532 14 Together with the previous inequality we have

|kxk − kyk| ≤ kx − yk.



Vector Norms

Proof (cont.) Switch the roles of x and y to get

kyk − kxk ≤ ky − xk ⇐⇒ − (kxk − kyk) ≤ kx − yk.

[email protected] MATH 532 14 Vector Norms

Proof (cont.) Switch the roles of x and y to get

kyk − kxk ≤ ky − xk ⇐⇒ − (kxk − kyk) ≤ kx − yk.

Together with the previous inequality we have

|kxk − kyk| ≤ kx − yk.



[email protected] MATH 532 14 `1-norm (or taxi-cab norm, Manhattan norm): n X kxk1 = |xi | i=1

`∞-norm (or maximum norm, Chebyshev norm):

kxk∞ = max |xi | 1≤i≤n

`p-norm: n !1/p X p kxkp = |xi | i=1

Remark In the homework you will use Hölder’s and Minkowski’s inequalities to show that the p-norm is a norm.

Vector Norms Other common norms

[email protected] MATH 532 15 `∞-norm (or maximum norm, Chebyshev norm):

kxk∞ = max |xi | 1≤i≤n

`p-norm: n !1/p X p kxkp = |xi | i=1

Remark In the homework you will use Hölder’s and Minkowski’s inequalities to show that the p-norm is a norm.

Vector Norms Other common norms

`1-norm (or taxi-cab norm, Manhattan norm): n X kxk1 = |xi | i=1

[email protected] MATH 532 15 `p-norm: n !1/p X p kxkp = |xi | i=1

Remark In the homework you will use Hölder’s and Minkowski’s inequalities to show that the p-norm is a norm.

Vector Norms Other common norms

`1-norm (or taxi-cab norm, Manhattan norm): n X kxk1 = |xi | i=1

`∞-norm (or maximum norm, Chebyshev norm):

kxk∞ = max |xi | 1≤i≤n

[email protected] MATH 532 15 Remark In the homework you will use Hölder’s and Minkowski’s inequalities to show that the p-norm is a norm.

Vector Norms Other common norms

`1-norm (or taxi-cab norm, Manhattan norm): n X kxk1 = |xi | i=1

`∞-norm (or maximum norm, Chebyshev norm):

kxk∞ = max |xi | 1≤i≤n

`p-norm: n !1/p X p kxkp = |xi | i=1

[email protected] MATH 532 15 Vector Norms Other common norms

`1-norm (or taxi-cab norm, Manhattan norm): n X kxk1 = |xi | i=1

`∞-norm (or maximum norm, Chebyshev norm):

kxk∞ = max |xi | 1≤i≤n

`p-norm: n !1/p X p kxkp = |xi | i=1

Remark In the homework you will use Hölder’s and Minkowski’s inequalities to show that the p-norm is a norm. [email protected] MATH 532 15 Let’s use tildes to mark all components of x that are maximal, i.e..

x˜1 = x˜2 = ... = x˜k = max |xi |. 1≤i≤n

The remaining components are then x˜k+1,..., x˜n. This implies that

x˜ i < 1, for i = k + 1,..., n. x˜1

Vector Norms

Remark We now show that kxk∞ = lim kxkp. p→∞

[email protected] MATH 532 16 The remaining components are then x˜k+1,..., x˜n. This implies that

x˜ i < 1, for i = k + 1,..., n. x˜1

Vector Norms

Remark We now show that kxk∞ = lim kxkp. p→∞ Let’s use tildes to mark all components of x that are maximal, i.e..

x˜1 = x˜2 = ... = x˜k = max |xi |. 1≤i≤n

[email protected] MATH 532 16 This implies that

x˜ i < 1, for i = k + 1,..., n. x˜1

Vector Norms

Remark We now show that kxk∞ = lim kxkp. p→∞ Let’s use tildes to mark all components of x that are maximal, i.e..

x˜1 = x˜2 = ... = x˜k = max |xi |. 1≤i≤n

The remaining components are then x˜k+1,..., x˜n.

[email protected] MATH 532 16 Vector Norms

Remark We now show that kxk∞ = lim kxkp. p→∞ Let’s use tildes to mark all components of x that are maximal, i.e..

x˜1 = x˜2 = ... = x˜k = max |xi |. 1≤i≤n

The remaining components are then x˜k+1,..., x˜n. This implies that

x˜ i < 1, for i = k + 1,..., n. x˜1

[email protected] MATH 532 16  1/p  x˜ p x˜ p ˜  k+1 n  = |x1| k + + ... +  .  x˜1 x˜1  | {z } |{z} <1 <1

Since the terms inside the parentheses — except for k — go to 0 for p → ∞, (·)1/p → 1 for p → ∞. And so kxkp → |x˜1| = max |xi | = kxk∞. 1≤i≤n

Vector Norms

Remark (cont.) Now

n !1/p X p kxkp = |x˜i | i=1

[email protected] MATH 532 17 Since the terms inside the parentheses — except for k — go to 0 for p → ∞, (·)1/p → 1 for p → ∞. And so kxkp → |x˜1| = max |xi | = kxk∞. 1≤i≤n

Vector Norms

Remark (cont.) Now

n !1/p X p kxkp = |x˜i | i=1  1/p  x˜ p x˜ p ˜  k+1 n  = |x1| k + + ... +  .  x˜1 x˜1  | {z } |{z} <1 <1

[email protected] MATH 532 17 And so kxkp → |x˜1| = max |xi | = kxk∞. 1≤i≤n

Vector Norms

Remark (cont.) Now

n !1/p X p kxkp = |x˜i | i=1  1/p  x˜ p x˜ p ˜  k+1 n  = |x1| k + + ... +  .  x˜1 x˜1  | {z } |{z} <1 <1

Since the terms inside the parentheses — except for k — go to 0 for p → ∞, (·)1/p → 1 for p → ∞.

[email protected] MATH 532 17 Vector Norms

Remark (cont.) Now

n !1/p X p kxkp = |x˜i | i=1  1/p  x˜ p x˜ p ˜  k+1 n  = |x1| k + + ... +  .  x˜1 x˜1  | {z } |{z} <1 <1

Since the terms inside the parentheses — except for k — go to 0 for p → ∞, (·)1/p → 1 for p → ∞. And so kxkp → |x˜1| = max |xi | = kxk∞. 1≤i≤n

[email protected] MATH 532 17 √ ! √ ! √ ! 2 2 2 so that √2 ≥ √2 ≥ √2 . 2 2 2 2 1 2 2 2 ∞ In fact, we have in general (similar to HW) n kxk1 ≥ kxk2 ≥ kxk∞, for any x ∈ R .

Vector Norms

2 Figure: Unit “balls” in R for the `1, `2 and `∞ norms.

Note that B1 ⊆ B2 ⊆ B∞ since, e.g., √ ! √ ! √ ! √ 2 √ 2 q 2 2 √2 = 2, √2 = 1 + 1 = 1, √2 = , 2 2 2 2 2 2 2 1 2 2 2 ∞

[email protected] MATH 532 18 In fact, we have in general (similar to HW) n kxk1 ≥ kxk2 ≥ kxk∞, for any x ∈ R .

Vector Norms

2 Figure: Unit “balls” in R for the `1, `2 and `∞ norms.

Note that B1 ⊆ B2 ⊆ B∞ since, e.g., √ ! √ ! √ ! √ 2 √ 2 q 2 2 √2 = 2, √2 = 1 + 1 = 1, √2 = , 2 2 2 2 2 2 2 1 2 2 2 ∞ √ ! √ ! √ ! 2 2 2 so that √2 ≥ √2 ≥ √2 . 2 2 2 2 1 2 2 2 ∞

[email protected] MATH 532 18 Vector Norms

2 Figure: Unit “balls” in R for the `1, `2 and `∞ norms.

Note that B1 ⊆ B2 ⊆ B∞ since, e.g., √ ! √ ! √ ! √ 2 √ 2 q 2 2 √2 = 2, √2 = 1 + 1 = 1, √2 = , 2 2 2 2 2 2 2 1 2 2 2 ∞ √ ! √ ! √ ! 2 2 2 so that √2 ≥ √2 ≥ √2 . 2 2 2 2 1 2 2 2 ∞ In fact, we have in general (similar to HW) n kxk1 ≥ kxk2 ≥ kxk∞, for any x ∈ R .

[email protected] MATH 532 18 Example

k · k1 and√ k · k2 are equivalent since from above kxk1 ≥ kxk2 and also kxk1 ≤ nkxk2 (see HW) so that kxk √ α = 1 ≤ 1 ≤ n = β. kxk2

Remark In fact, all norms on finite-dimensional vector spaces are equivalent.

Vector Norms Norm equivalence Definition Two norms k · k and k · k0 on a V are called equivalent if there exist constants α, β such that

kxk α ≤ ≤ β for all x(6= 0) ∈ V. kxk0

[email protected] MATH 532 19 kxk √ α = 1 ≤ 1 ≤ n = β. kxk2

Remark In fact, all norms on finite-dimensional vector spaces are equivalent.

Vector Norms Norm equivalence Definition Two norms k · k and k · k0 on a vector space V are called equivalent if there exist constants α, β such that

kxk α ≤ ≤ β for all x(6= 0) ∈ V. kxk0

Example

k · k1 and√ k · k2 are equivalent since from above kxk1 ≥ kxk2 and also kxk1 ≤ nkxk2 (see HW) so that

[email protected] MATH 532 19 Remark In fact, all norms on finite-dimensional vector spaces are equivalent.

Vector Norms Norm equivalence Definition Two norms k · k and k · k0 on a vector space V are called equivalent if there exist constants α, β such that

kxk α ≤ ≤ β for all x(6= 0) ∈ V. kxk0

Example

k · k1 and√ k · k2 are equivalent since from above kxk1 ≥ kxk2 and also kxk1 ≤ nkxk2 (see HW) so that kxk √ α = 1 ≤ 1 ≤ n = β. kxk2

[email protected] MATH 532 19 Vector Norms Norm equivalence Definition Two norms k · k and k · k0 on a vector space V are called equivalent if there exist constants α, β such that

kxk α ≤ ≤ β for all x(6= 0) ∈ V. kxk0

Example

k · k1 and√ k · k2 are equivalent since from above kxk1 ≥ kxk2 and also kxk1 ≤ nkxk2 (see HW) so that kxk √ α = 1 ≤ 1 ≤ n = β. kxk2

Remark In fact, all norms on finite-dimensional vector spaces are equivalent.

[email protected] MATH 532 19 Matrix Norms Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 20 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1  1/2 n q X 2 T =  kA∗j k2 = (A A), j=1

i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n:

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property.

[email protected] MATH 532 21 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1  1/2 n q X 2 T =  kA∗j k2 = trace(A A), j=1

i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n:

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes.

[email protected] MATH 532 21 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1  1/2 n q X 2 T =  kA∗j k2 = trace(A A), j=1

i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n:

[email protected] MATH 532 21 m !1/2 X 2 = kAi∗k2 i=1  1/2 n q X 2 T =  kA∗j k2 = trace(A A), j=1

i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n: 1/2  m n  X X 2 kAkF =  |aij |  i=1 j=1

[email protected] MATH 532 21  1/2 n q X 2 T =  kA∗j k2 = trace(A A), j=1

i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n: 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1

[email protected] MATH 532 21 q = trace(AT A),

i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n: 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1 1/2  n  X 2 =  kA∗j k2 j=1

[email protected] MATH 532 21 i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n: 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1  1/2 n q X 2 T =  kA∗j k2 = trace(A A), j=1

[email protected] MATH 532 21 Matrix Norms

Matrix norms are special norms — they will satisfy one additional property. This property should help us measure kABk for two matrices A, B of appropriate sizes. We look at the simplest matrix norm, the Frobenius norm, defined for , A ∈ Rm n: 1/2  m n  m !1/2 X X 2 X 2 kAkF =  |aij |  = kAi∗k2 i=1 j=1 i=1  1/2 n q X 2 T =  kA∗j k2 = trace(A A), j=1 i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements of the matrix.

[email protected] MATH 532 21 m CSB X 2 2 ≤ kAi∗k2 kxk2 i=1 | {z } 2 =kAkF

so that kAxk2 ≤ kAkF kxk2.

We can generalize this to matrices, i.e., we have

kABkF ≤ kAkF kBkF ,

which motivates us to require this submultiplicativity for any matrix norm.

Matrix Norms Now

m 2 X 2 kAxk2 = |Ai∗x| i=1

[email protected] MATH 532 22 so that kAxk2 ≤ kAkF kxk2.

We can generalize this to matrices, i.e., we have

kABkF ≤ kAkF kBkF ,

which motivates us to require this submultiplicativity for any matrix norm.

Matrix Norms Now

m 2 X 2 kAxk2 = |Ai∗x| i=1 m CSB X 2 2 ≤ kAi∗k2 kxk2 i=1 | {z } 2 =kAkF

[email protected] MATH 532 22 We can generalize this to matrices, i.e., we have

kABkF ≤ kAkF kBkF ,

which motivates us to require this submultiplicativity for any matrix norm.

Matrix Norms Now

m 2 X 2 kAxk2 = |Ai∗x| i=1 m CSB X 2 2 ≤ kAi∗k2 kxk2 i=1 | {z } 2 =kAkF so that kAxk2 ≤ kAkF kxk2.

[email protected] MATH 532 22 Matrix Norms Now

m 2 X 2 kAxk2 = |Ai∗x| i=1 m CSB X 2 2 ≤ kAi∗k2 kxk2 i=1 | {z } 2 =kAkF so that kAxk2 ≤ kAkF kxk2.

We can generalize this to matrices, i.e., we have

kABkF ≤ kAkF kBkF , which motivates us to require this submultiplicativity for any matrix norm. [email protected] MATH 532 22 Remark This definition is usually too general. In addition to the Frobenius norm, most useful matrix norms are induced by a vector norm.

Matrix Norms

Definition A matrix norm is a function k · k from the set of all real (or complex) matrices of finite size into R≥0 that satisfies 1 kAk ≥ 0 and kAk = 0 if and only if A = O (a matrix of all zeros). 2 kαAk = |α|kAk for all α ∈ R. 3 kA + Bk ≤ kAk + kBk (requires A, B to be of same size). 4 kABk ≤ kAkkBk (requires A, B to have appropriate sizes).

[email protected] MATH 532 23 Matrix Norms

Definition A matrix norm is a function k · k from the set of all real (or complex) matrices of finite size into R≥0 that satisfies 1 kAk ≥ 0 and kAk = 0 if and only if A = O (a matrix of all zeros). 2 kαAk = |α|kAk for all α ∈ R. 3 kA + Bk ≤ kAk + kBk (requires A, B to be of same size). 4 kABk ≤ kAkkBk (requires A, B to have appropriate sizes).

Remark This definition is usually too general. In addition to the Frobenius norm, most useful matrix norms are induced by a vector norm.

[email protected] MATH 532 23 Remark Here the vector norm could be any vector norm. In particular, any p-norm. For example, we could have

kAk2 = max kAxk2,(m). kxk2,(n)=1

To keep notation simple we often drop indices.

Matrix Norms Induced matrix norms

Theorem m n Let k · k(m) and k · k(n) be vector norms on R and R , respectively, and let A be an m × n matrix. Then

kAk = max kAxk(m) kxk(n)=1

is a matrix norm called the induced matrix norm.

[email protected] MATH 532 24 Matrix Norms Induced matrix norms

Theorem m n Let k · k(m) and k · k(n) be vector norms on R and R , respectively, and let A be an m × n matrix. Then

kAk = max kAxk(m) kxk(n)=1

is a matrix norm called the induced matrix norm.

Remark Here the vector norm could be any vector norm. In particular, any p-norm. For example, we could have

kAk2 = max kAxk2,(m). kxk2,(n)=1

To keep notation simple we often drop indices.

[email protected] MATH 532 24 It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm.

[email protected] MATH 532 25 Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O.

[email protected] MATH 532 25 So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

[email protected] MATH 532 25 There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0.

[email protected] MATH 532 25 Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek .

[email protected] MATH 532 25 kek k=1 ≥ kAek k = kA∗k k > 0

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kAk = max kAxk kxk=1

[email protected] MATH 532 25 = kA∗k k > 0

since A∗k 6= 0.

Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k kxk=1

[email protected] MATH 532 25 Matrix Norms

Proof 1 kAk ≥ 0 is obvious since this holds for the vector norm. It remains to show that kAk = 0 if and only if A = O. Assume A = O, then

kAk = max k Ax k = 0. kxk=1 |{z} =0

So now consider A 6= O. We need to show that kAk > 0. There must exist a column of A that is not 0. We call this column A∗k and take x = ek . Then kek k=1 kAk = max kAxk ≥ kAek k = kA∗k k > 0 kxk=1

since A∗k 6= 0.

[email protected] MATH 532 25 = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

= max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk

[email protected] MATH 532 26 = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

= |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk

[email protected] MATH 532 26 = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

= |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk

[email protected] MATH 532 26 = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

[email protected] MATH 532 26 = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk

[email protected] MATH 532 26 = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk = max k(A + B)xk

[email protected] MATH 532 26 ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk = max k(A + B)xk = max kAx + Bxk

[email protected] MATH 532 26 = max kAxk + max kBxk = kAk + kBk.

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk)

[email protected] MATH 532 26 = kAk + kBk.

Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk

[email protected] MATH 532 26 Matrix Norms

Proof (cont.) 2 Using the corresponding property for the vector norm we have

kαAk = max kαAxk = |α| max kAxk = |α|kAk.

3 Also straightforward (based on the triangle inequality for the vector norm)

kA + Bk = max k(A + B)xk = max kAx + Bxk ≤ max (kAxk + kBxk) = max kAxk + max kBxk = kAk + kBk.

[email protected] MATH 532 26 = kAByk (for some y with kyk = 1)

(1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk kxk=1



Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk

[email protected] MATH 532 27 = kAByk (for some y with kyk = 1)

(1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

kAxk kAxk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk kxk=1



Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAk

[email protected] MATH 532 27 = kAByk (for some y with kyk = 1)

(1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

kAxk ≥ . kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk kxk=1



Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAk = max kAxk = max kxk=1 x6=0 kxk

[email protected] MATH 532 27 = kAByk (for some y with kyk = 1)

(1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk kxk=1



Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk

[email protected] MATH 532 27 = kAByk (for some y with kyk = 1)

(1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

But then we also have kABk ≤ kAkkBk since

kABk = max kABxk kxk=1



Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1)

[email protected] MATH 532 27 = kAByk (for some y with kyk = 1)

(1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

kABk = max kABxk kxk=1



Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

[email protected] MATH 532 27 (1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk = kAByk (for some y with kyk = 1) kxk=1

 [email protected] MATH 532 27 (1) ≤ kAkkBk kyk . |{z} =1

Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk = kAByk (for some y with kyk = 1) kxk=1 (1) ≤ kAkkByk

 [email protected] MATH 532 27 Matrix Norms Proof (cont.) 4 First note that kAxk max kAxk = max kxk=1 x6=0 kxk and so kAxk kAxk kAk = max kAxk = max ≥ . kxk=1 x6=0 kxk kxk Therefore kAxk ≤ kAkkxk. (1) But then we also have kABk ≤ kAkkBk since

kABk = max kABxk = kAByk (for some y with kyk = 1) kxk=1 (1) (1) ≤ kAkkByk ≤ kAkkBk kyk . |{z} =1

 [email protected] MATH 532 27 The induced matrix norm can be interpreted geometrically: kAk: the most a vector on the unit can be stretched when transformed by A. 1 kA−1k : the most a vector on the unit sphere can be shrunk when transformed by A.

Figure: induced matrix 2-norm from [Mey00].

Matrix Norms Remark One can show (see HW) that — if A is invertible —

1 min kAxk = . kxk=1 kA−1k

[email protected] MATH 532 28 Matrix Norms Remark One can show (see HW) that — if A is invertible —

1 min kAxk = . kxk=1 kA−1k

The induced matrix norm can be interpreted geometrically: kAk: the most a vector on the unit sphere can be stretched when transformed by A. 1 kA−1k : the most a vector on the unit sphere can be shrunk when transformed by A.

[email protected] MATH 532 28

Figure: induced matrix 2-norm from [Mey00]. Remark We also have p λmax = σ1, the largest singular value of A, p λmin = σn, the smallest singular value of A.

Matrix Norms Matrix 2-norm

Theorem Let A be an m × n matrix. Then p 1 kAk2 = max kAxk2 = λmax. kxk=1

−1 1 1 2 kA k2 = = √ . minkxk=1 kAxk2 λmin T where λmax and λmin are the largest and smallest eigenvalues of A A, respectively.

[email protected] MATH 532 29 Matrix Norms Matrix 2-norm

Theorem Let A be an m × n matrix. Then p 1 kAk2 = max kAxk2 = λmax. kxk=1

−1 1 1 2 kA k2 = = √ . minkxk=1 kAxk2 λmin T where λmax and λmin are the largest and smallest eigenvalues of A A, respectively.

Remark We also have p λmax = σ1, the largest singular value of A, p λmin = σn, the smallest singular value of A.

[email protected] MATH 532 29 The idea is to solve a constrained optimization problem (as in calculus), i.e.,

2 T maximize f (x) = kAxk2 = (Ax) Ax 2 T subject to g(x) = kxk2 = x x = 1.

We do this by introducing a Lagrange multiplier λ and define

h(x, λ) = f (x) − λg(x) = xT AT Ax − λxT x.

Matrix Norms

Proof We will show only (1), the largest singular value ((2) goes similarly).

[email protected] MATH 532 30 We do this by introducing a Lagrange multiplier λ and define

h(x, λ) = f (x) − λg(x) = xT AT Ax − λxT x.

Matrix Norms

Proof We will show only (1), the largest singular value ((2) goes similarly).

The idea is to solve a constrained optimization problem (as in calculus), i.e.,

2 T maximize f (x) = kAxk2 = (Ax) Ax 2 T subject to g(x) = kxk2 = x x = 1.

[email protected] MATH 532 30 Matrix Norms

Proof We will show only (1), the largest singular value ((2) goes similarly).

The idea is to solve a constrained optimization problem (as in calculus), i.e.,

2 T maximize f (x) = kAxk2 = (Ax) Ax 2 T subject to g(x) = kxk2 = x x = 1.

We do this by introducing a Lagrange multiplier λ and define

h(x, λ) = f (x) − λg(x) = xT AT Ax − λxT x.

[email protected] MATH 532 30 ∂xT ∂x ∂xT ∂x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

∂   xT AT Ax − λxT x ∂xi

Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

[email protected] MATH 532 31 ∂xT ∂x ∂xT ∂x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   xT AT Ax − λxT x ∂xi

[email protected] MATH 532 31 T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   ∂xT ∂x ∂xT ∂x xT AT Ax − λxT x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi ∂xi

[email protected] MATH 532 31  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   ∂xT ∂x ∂xT ∂x xT AT Ax − λxT x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x

[email protected] MATH 532 31 Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   ∂xT ∂x ∂xT ∂x xT AT Ax − λxT x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

[email protected] MATH 532 31 so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   ∂xT ∂x ∂xT ∂x xT AT Ax − λxT x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

[email protected] MATH 532 31 Matrix Norms

Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: ∂h = 0, i = 1,..., n, g(x) = 1 ∂xi

∂   ∂xT ∂x ∂xT ∂x xT AT Ax − λxT x = AT Ax + xT AT A − λ x − λxT ∂xi ∂xi ∂xi ∂xi ∂xi T T T = 2ei A Ax − 2λei x  T  = 2 (A Ax)i − (λx)i , i = 1,..., n.

Together this yields   AT Ax − λx = 0 ⇐⇒ AT A − λI x = 0,

so that λ must be an eigenvalue of AT A (since g(x) = xT x = 1 ensures x 6= 0).

[email protected] MATH 532 31 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ

so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 kxk2=1



Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue.

[email protected] MATH 532 32 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

=⇒ xT AT Ax = λxT x = λ

so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 kxk2=1



Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx

[email protected] MATH 532 32 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

= λxT x = λ

so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 kxk2=1



Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax

[email protected] MATH 532 32 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

= λ

so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 kxk2=1



Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x

[email protected] MATH 532 32 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 kxk2=1



Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ

[email protected] MATH 532 32 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

And then

kAk2 = max kAxk2 kxk2=1



Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ so that √ √ T T kAxk2 = x A Ax = λ.

[email protected] MATH 532 32 = max kAxk2 2 kxk2=1 √ p = max λ = λmax.

Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 kxk2=1



[email protected] MATH 532 32 √ p = max λ = λmax.

Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 = max kAxk2 kxk =1 2 2 kxk2=1



[email protected] MATH 532 32 Matrix Norms

Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, AT Ax = λx =⇒ xT AT Ax = λxT x = λ so that √ √ T T kAxk2 = x A Ax = λ. And then

kAk2 = max kAxk2 = max kAxk2 kxk =1 2 2 kxk2=1 √ p = max λ = λmax.



[email protected] MATH 532 32 T 2 kAk2 = kA k2

3 T 2 T kA Ak2 = kAk2 = kAA k2   AO 4 = max {kAk2, kBk2} OB T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

Remark The proof is a HW problem.

Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1

[email protected] MATH 532 33 3 T 2 T kA Ak2 = kAk2 = kAA k2   AO 4 = max {kAk2, kBk2} OB T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

Remark The proof is a HW problem.

Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1 T 2 kAk2 = kA k2

[email protected] MATH 532 33   AO 4 = max {kAk2, kBk2} OB T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

Remark The proof is a HW problem.

Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1 T 2 kAk2 = kA k2

3 T 2 T kA Ak2 = kAk2 = kAA k2

[email protected] MATH 532 33 T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

Remark The proof is a HW problem.

Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1 T 2 kAk2 = kA k2

3 T 2 T kA Ak2 = kAk2 = kAA k2   AO 4 = max {kAk2, kBk2} OB

[email protected] MATH 532 33 Remark The proof is a HW problem.

Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1 T 2 kAk2 = kA k2

3 T 2 T kA Ak2 = kAk2 = kAA k2   AO 4 = max {kAk2, kBk2} OB T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

[email protected] MATH 532 33 Matrix Norms Special properties of the 2-norm

T 1 kAk2 = max max |y Ax| kxk2=1 kyk2=1 T 2 kAk2 = kA k2

3 T 2 T kA Ak2 = kAk2 = kAA k2   AO 4 = max {kAk2, kBk2} OB T T T 5 kU AVk2 = kAk2 provided UU = I and V V = I (orthogonal matrices).

Remark The proof is a HW problem.

[email protected] MATH 532 33 Remark We know these are norms, so what we need to do is verify that the formulas hold. We will show (1).

Matrix Norms Matrix 1-norm and ∞-norm

Theorem Let A be an m × n matrix. Then we have 1 the column sum norm

m X kAk1 = max kAxk1 = max |aij |, kxk =1 j=1,...,n 1 i=1

2 and the row sum norm

n X kAk∞ = max kAxk∞ = max |aij |. kxk∞=1 i=1,...,m j=1

[email protected] MATH 532 34 Matrix Norms Matrix 1-norm and ∞-norm

Theorem Let A be an m × n matrix. Then we have 1 the column sum norm

m X kAk1 = max kAxk1 = max |aij |, kxk =1 j=1,...,n 1 i=1

2 and the row sum norm

n X kAk∞ = max kAxk∞ = max |aij |. kxk∞=1 i=1,...,m j=1

Remark We know these are norms, so what we need to do is verify that the formulas hold. We will show (1).

[email protected] MATH 532 34 m m m n X X X X = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

kAxk1

Matrix Norms Proof

First we look at kAxk1.

[email protected] MATH 532 35 m m n X X X = |Ai∗x| = | aij xj | i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m X kAxk1 = |(Ax)i | i=1

[email protected] MATH 532 35 m n X X = | aij xj | i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m m X X kAxk1 = |(Ax)i | = |Ai∗x| i=1 i=1

[email protected] MATH 532 35 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1

[email protected] MATH 532 35 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1

[email protected] MATH 532 35 " m # n X X ≤ max |aij | |xj |. j=1,...,n i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # X X = |xj | |aij | j=1 i=1

[email protected] MATH 532 35 Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

[email protected] MATH 532 35 m X kAxk1 ≤ max |aij |. j=1,...,n i=1

Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

[email protected] MATH 532 35 Matrix Norms Proof

First we look at kAxk1.

m m m n X X X X kAxk1 = |(Ax)i | = |Ai∗x| = | aij xj | i=1 i=1 i=1 j=1 m n reg.∆ X X ≤ |aij | |xj | i=1 j=1 n " m # " m # n X X X X = |xj | |aij | ≤ max |aij | |xj |. j=1,...,n j=1 i=1 i=1 j=1

Since we actually need to look at kAxk1 for kxk1 = 1 we note that Pn kxk1 = j=1 |xj | and therefore have

m X kAxk1 ≤ max |aij |. j=1,...,n i=1

[email protected] MATH 532 35 m X = kA∗k k1 = |aik | i=1 m X = max |aik | j=1...,n i=1 due to our choice of k. Since kek k1 = 1 we indeed have the desired formula. 

kAxk1 = kAek k1

Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

[email protected] MATH 532 36 m X = kA∗k k1 = |aik | i=1 m X = max |aik | j=1...,n i=1 due to our choice of k. Since kek k1 = 1 we indeed have the desired formula. 

Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

kAxk1 = kAek k1

[email protected] MATH 532 36 m X = |aik | i=1 m X = max |aik | j=1...,n i=1 due to our choice of k. Since kek k1 = 1 we indeed have the desired formula. 

Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

kAxk1 = kAek k1 = kA∗k k1

[email protected] MATH 532 36 m X = max |aik | j=1...,n i=1 due to our choice of k. Since kek k1 = 1 we indeed have the desired formula. 

Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

m X kAxk1 = kAek k1 = kA∗k k1 = |aik | i=1

[email protected] MATH 532 36 Since kek k1 = 1 we indeed have the desired formula. 

Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

m X kAxk1 = kAek k1 = kA∗k k1 = |aik | i=1 m X = max |aik | j=1...,n i=1 due to our choice of k.

[email protected] MATH 532 36 Matrix Norms

Proof (cont.)

We even have equality since for x = ek , where k is the index such that A∗k has maximum column sum, we get

m X kAxk1 = kAek k1 = kA∗k k1 = |aik | i=1 m X = max |aik | j=1...,n i=1 due to our choice of k. Since kek k1 = 1 we indeed have the desired formula. 

[email protected] MATH 532 36 Inner Product Spaces Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 37 Remark The following two properties (providing bilinearity) are implied (see HW)

hαx, yi = αhx, yi hx + y, zi = hx, zi + hy, zi.

Inner Product Spaces

Definition A general inner product in a real (complex) vector space V is a symmetric (Hermitian) h·, ·i : V × V → R (C), i.e., 1 hx, xi ∈ R≥0 with hx, xi = 0 if and only if x = 0. 2 hx, αyi = αhx, yi for all scalars α. 3 hx, y + zi = hx, yi + hx, zi. 4 hx, yi = hy, xi (or hx, yi = hy, xi if complex).

[email protected] MATH 532 38 Inner Product Spaces

Definition A general inner product in a real (complex) vector space V is a symmetric (Hermitian) bilinear form h·, ·i : V × V → R (C), i.e., 1 hx, xi ∈ R≥0 with hx, xi = 0 if and only if x = 0. 2 hx, αyi = αhx, yi for all scalars α. 3 hx, y + zi = hx, yi + hx, zi. 4 hx, yi = hy, xi (or hx, yi = hy, xi if complex).

Remark The following two properties (providing bilinearity) are implied (see HW)

hαx, yi = αhx, yi hx + y, zi = hx, zi + hy, zi.

[email protected] MATH 532 38 One can show (analogous to the Euclidean case) that k · k is a norm.

In particular, we have a general Cauchy–Schwarz–Bunyakovsky inequality |hx, yi| ≤ kxk kyk.

Inner Product Spaces

As before, any inner product induces a norm via

k · k = ph·, ·i.

[email protected] MATH 532 39 In particular, we have a general Cauchy–Schwarz–Bunyakovsky inequality |hx, yi| ≤ kxk kyk.

Inner Product Spaces

As before, any inner product induces a norm via

k · k = ph·, ·i.

One can show (analogous to the Euclidean case) that k · k is a norm.

[email protected] MATH 532 39 Inner Product Spaces

As before, any inner product induces a norm via

k · k = ph·, ·i.

One can show (analogous to the Euclidean case) that k · k is a norm.

In particular, we have a general Cauchy–Schwarz–Bunyakovsky inequality |hx, yi| ≤ kxk kyk.

[email protected] MATH 532 39 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

× × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

with q p T kAk = hA, Ai = trace(A A) = kAkF .

Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn).

[email protected] MATH 532 40 with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

× × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

with q p T kAk = hA, Ai = trace(A A) = kAkF .

Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn). 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

[email protected] MATH 532 40 × × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

with q p T kAk = hA, Ai = trace(A A) = kAkF .

Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn). 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

[email protected] MATH 532 40 with q p T kAk = hA, Ai = trace(A A) = kAkF .

Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn). 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

× × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

[email protected] MATH 532 40 = kAkF .

Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn). 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

× × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

with p q kAk = hA, Ai = trace(AT A)

[email protected] MATH 532 40 Inner Product Spaces Example ∗ 1 hx, yi = xT y (or x y), the standard inner product for Rn (Cn). 2 For nonsingular matrices A we get the A-inner product on Rn, i.e.,

hx, yi = xT AT Ay

with √ p T T kxkA = hx, xi = x A Ax = kAxk2.

× × 3 If V = Rm n (or Cm n) then we get the standard inner product for matrices, i.e.,

hA, Bi = trace(AT B) (or trace(A∗B))

with q p T kAk = hA, Ai = trace(A A) = kAkF .

[email protected] MATH 532 40 Z b hf , gi = f (t)g(t)dt a with 1/2 Z b ! p 2 kf k = hf , f i = (f (t)) dt . a

Inner Product Spaces

Remark In the infinite-dimensional setting we have, e.g., for f , g continuous functions on (a, b)

[email protected] MATH 532 41 with 1/2 Z b ! p 2 kf k = hf , f i = (f (t)) dt . a

Inner Product Spaces

Remark In the infinite-dimensional setting we have, e.g., for f , g continuous functions on (a, b) Z b hf , gi = f (t)g(t)dt a

[email protected] MATH 532 41 Inner Product Spaces

Remark In the infinite-dimensional setting we have, e.g., for f , g continuous functions on (a, b) Z b hf , gi = f (t)g(t)dt a with 1/2 Z b ! p 2 kf k = hf , f i = (f (t)) dt . a

[email protected] MATH 532 41 = hx + y, x + yi + hx − y, x − yi = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi   = 2hx, xi + 2hy, yi = 2 kxk2 + kyk2 .

This is true since

kx + yk2 + kx − yk2

Inner Product Spaces Parallelogram identity

In any the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

[email protected] MATH 532 42 = hx + y, x + yi + hx − y, x − yi = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi   = 2hx, xi + 2hy, yi = 2 kxk2 + kyk2 .

Inner Product Spaces Parallelogram identity

In any inner product space the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

This is true since

kx + yk2 + kx − yk2

[email protected] MATH 532 42 = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi   = 2hx, xi + 2hy, yi = 2 kxk2 + kyk2 .

Inner Product Spaces Parallelogram identity

In any inner product space the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

This is true since

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi

[email protected] MATH 532 42   = 2hx, xi + 2hy, yi = 2 kxk2 + kyk2 .

Inner Product Spaces Parallelogram identity

In any inner product space the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

This is true since

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi

[email protected] MATH 532 42   = 2 kxk2 + kyk2 .

Inner Product Spaces Parallelogram identity

In any inner product space the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

This is true since

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi = 2hx, xi + 2hy, yi

[email protected] MATH 532 42 Inner Product Spaces Parallelogram identity

In any inner product space the so-called parallelogram identity holds, i.e.,   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 . (2)

This is true since

kx + yk2 + kx − yk2 = hx + y, x + yi + hx − y, x − yi = hx, xi + hx, yi + hy, xi + hy, yi + hx, xi − hx, yi − hy, xi + hy, yi   = 2hx, xi + 2hy, yi = 2 kxk2 + kyk2 .

[email protected] MATH 532 42 but — if the parallelogram identity holds — then we can get an inner product from a norm (i.e., a becomes a ).

Theorem Let V be a real vector space with norm k · k. If the parallelogram identity (2) holds then

1   hx, yi = kx + yk2 − kx − yk2 (3) 4 is an inner product on V.

Inner Product Spaces identity

The following theorem shows that we not only get a norm from an inner product (i.e., every Hilbert space is a Banach space),

[email protected] MATH 532 43 Theorem Let V be a real vector space with norm k · k. If the parallelogram identity (2) holds then

1   hx, yi = kx + yk2 − kx − yk2 (3) 4 is an inner product on V.

Inner Product Spaces

The following theorem shows that we not only get a norm from an inner product (i.e., every Hilbert space is a Banach space), but — if the parallelogram identity holds — then we can get an inner product from a norm (i.e., a Banach space becomes a Hilbert space).

[email protected] MATH 532 43 Inner Product Spaces Polarization identity

The following theorem shows that we not only get a norm from an inner product (i.e., every Hilbert space is a Banach space), but — if the parallelogram identity holds — then we can get an inner product from a norm (i.e., a Banach space becomes a Hilbert space).

Theorem Let V be a real vector space with norm k · k. If the parallelogram identity (2) holds then

1   hx, yi = kx + yk2 − kx − yk2 (3) 4 is an inner product on V.

[email protected] MATH 532 43 1 Nonnegativity:

1   1 hx, xi = kx + xk2 − kx − xk2 = k2xk2 = kxk2 ≥ 0. 4 4 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 : hx, yi = hy, xi is clear since kx − yk = ky − xk.

Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold.

[email protected] MATH 532 44 1   1 = kx + xk2 − kx − xk2 = k2xk2 = kxk2 ≥ 0. 4 4 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 Symmetry: hx, yi = hy, xi is clear since kx − yk = ky − xk.

Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

hx, xi

[email protected] MATH 532 44 1 = k2xk2 = kxk2 ≥ 0. 4 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 Symmetry: hx, yi = hy, xi is clear since kx − yk = ky − xk.

Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

1   hx, xi = kx + xk2 − kx − xk2 4

[email protected] MATH 532 44 = kxk2 ≥ 0.

Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 Symmetry: hx, yi = hy, xi is clear since kx − yk = ky − xk.

Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

1   1 hx, xi = kx + xk2 − kx − xk2 = k2xk2 4 4

[email protected] MATH 532 44 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 Symmetry: hx, yi = hy, xi is clear since kx − yk = ky − xk.

Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

1   1 hx, xi = kx + xk2 − kx − xk2 = k2xk2 = kxk2 ≥ 0. 4 4

[email protected] MATH 532 44 4 Symmetry: hx, yi = hy, xi is clear since kx − yk = ky − xk.

Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

1   1 hx, xi = kx + xk2 − kx − xk2 = k2xk2 = kxk2 ≥ 0. 4 4 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2.

[email protected] MATH 532 44 Inner Product Spaces

Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity:

1   1 hx, xi = kx + xk2 − kx − xk2 = k2xk2 = kxk2 ≥ 0. 4 4 Moreover, hx, xi > 0 if and only if x = 0 since hx, xi = kxk2. 4 Symmetry: hx, yi = hy, xi is clear since kx − yk = ky − xk.

[email protected] MATH 532 44 and 1   kx − yk2 + kx − zk2 = kx − y + x − zk2 + kz − yk2 . (5) 2 Subtracting (5) from (4) we get

kx + yk2 − kx − yk2+kx + zk2 − kx − zk2 1   = k2x + y + zk2 − k2x − y − zk2 . 2 (6)

Inner Product Spaces

Proof (cont.) 3 Additivity: The parallelogram identity implies

1   kx + yk2 + kx + zk2 = kx + y + x + zk2 + ky − zk2 . (4) 2

[email protected] MATH 532 45 Subtracting (5) from (4) we get

kx + yk2 − kx − yk2+kx + zk2 − kx − zk2 1   = k2x + y + zk2 − k2x − y − zk2 . 2 (6)

Inner Product Spaces

Proof (cont.) 3 Additivity: The parallelogram identity implies

1   kx + yk2 + kx + zk2 = kx + y + x + zk2 + ky − zk2 . (4) 2 and 1   kx − yk2 + kx − zk2 = kx − y + x − zk2 + kz − yk2 . (5) 2

[email protected] MATH 532 45 Inner Product Spaces

Proof (cont.) 3 Additivity: The parallelogram identity implies

1   kx + yk2 + kx + zk2 = kx + y + x + zk2 + ky − zk2 . (4) 2 and 1   kx − yk2 + kx − zk2 = kx − y + x − zk2 + kz − yk2 . (5) 2 Subtracting (5) from (4) we get

kx + yk2 − kx − yk2+kx + zk2 − kx − zk2 1   = k2x + y + zk2 − k2x − y − zk2 . 2 (6)

[email protected] MATH 532 45 (6) 1   = k2x + y + zk2 − k2x − y − zk2 8 2 2! 1 y + z y + z = x + − x − 2 2 2 polarization y + z = 2hx, i. 2 Setting z = 0 in (7) yields y hx, yi = 2hx, i (8) 2 since hx, zi = 0.

Inner Product Spaces

Proof (cont.) The specific form of the polarized inner product implies

1   hx, yi + hx, zi = kx + yk2 − kx − yk2 + kx + zk2 − kx − zk2 4

(7)

[email protected] MATH 532 46 2 2! 1 y + z y + z = x + − x − 2 2 2 polarization y + z = 2hx, i. 2 Setting z = 0 in (7) yields y hx, yi = 2hx, i (8) 2 since hx, zi = 0.

Inner Product Spaces

Proof (cont.) The specific form of the polarized inner product implies

1   hx, yi + hx, zi = kx + yk2 − kx − yk2 + kx + zk2 − kx − zk2 4 (6) 1   = k2x + y + zk2 − k2x − y − zk2 8

(7)

[email protected] MATH 532 46 polarization y + z = 2hx, i. 2 Setting z = 0 in (7) yields y hx, yi = 2hx, i (8) 2 since hx, zi = 0.

Inner Product Spaces

Proof (cont.) The specific form of the polarized inner product implies

1   hx, yi + hx, zi = kx + yk2 − kx − yk2 + kx + zk2 − kx − zk2 4 (6) 1   = k2x + y + zk2 − k2x − y − zk2 8 2 2! 1 y + z y + z = x + − x − 2 2 2

(7)

[email protected] MATH 532 46 Setting z = 0 in (7) yields y hx, yi = 2hx, i (8) 2 since hx, zi = 0.

Inner Product Spaces

Proof (cont.) The specific form of the polarized inner product implies

1   hx, yi + hx, zi = kx + yk2 − kx − yk2 + kx + zk2 − kx − zk2 4 (6) 1   = k2x + y + zk2 − k2x − y − zk2 8 2 2! 1 y + z y + z = x + − x − 2 2 2 polarization y + z = 2hx, i. (7) 2

[email protected] MATH 532 46 Inner Product Spaces

Proof (cont.) The specific form of the polarized inner product implies

1   hx, yi + hx, zi = kx + yk2 − kx − yk2 + kx + zk2 − kx − zk2 4 (6) 1   = k2x + y + zk2 − k2x − y − zk2 8 2 2! 1 y + z y + z = x + − x − 2 2 2 polarization y + z = 2hx, i. (7) 2 Setting z = 0 in (7) yields y hx, yi = 2hx, i (8) 2 since hx, zi = 0.

[email protected] MATH 532 46 Since (8) is true for any y ∈ V we can, in particular, set y = y + z so that we have y + z hx, y + zi = 2hx, i. 2 This, however, is the right-hand side of (7) so that we end up with

hx, y + zi = hx, yi + hx, zi,

as desired.

Inner Product Spaces

Proof (cont.) To summarize, we have y + z hx, yi + hx, zi = 2hx, i. (7) 2 and y hx, yi = 2hx, i. (8) 2

[email protected] MATH 532 47 This, however, is the right-hand side of (7) so that we end up with

hx, y + zi = hx, yi + hx, zi,

as desired.

Inner Product Spaces

Proof (cont.) To summarize, we have y + z hx, yi + hx, zi = 2hx, i. (7) 2 and y hx, yi = 2hx, i. (8) 2 Since (8) is true for any y ∈ V we can, in particular, set y = y + z so that we have y + z hx, y + zi = 2hx, i. 2

[email protected] MATH 532 47 Inner Product Spaces

Proof (cont.) To summarize, we have y + z hx, yi + hx, zi = 2hx, i. (7) 2 and y hx, yi = 2hx, i. (8) 2 Since (8) is true for any y ∈ V we can, in particular, set y = y + z so that we have y + z hx, y + zi = 2hx, i. 2 This, however, is the right-hand side of (7) so that we end up with

hx, y + zi = hx, yi + hx, zi, as desired.

[email protected] MATH 532 47 From this we can get the property for rational α as follows. β We let α = γ with integer β, γ 6= 0 so that β βγhx, yi = hγx, βyi = γ2hx, yi. γ

Dividing by γ2 we get β β hx, yi = hx, yi. γ γ

Inner Product Spaces

Proof (cont.) 2 multiplication: To show hx, αyi = αhx, yi for integer α we can just repeatedly apply the additivity property just proved.

[email protected] MATH 532 48 β We let α = γ with integer β, γ 6= 0 so that β βγhx, yi = hγx, βyi = γ2hx, yi. γ

Dividing by γ2 we get β β hx, yi = hx, yi. γ γ

Inner Product Spaces

Proof (cont.) 2 : To show hx, αyi = αhx, yi for integer α we can just repeatedly apply the additivity property just proved.

From this we can get the property for rational α as follows.

[email protected] MATH 532 48 Dividing by γ2 we get β β hx, yi = hx, yi. γ γ

Inner Product Spaces

Proof (cont.) 2 Scalar multiplication: To show hx, αyi = αhx, yi for integer α we can just repeatedly apply the additivity property just proved.

From this we can get the property for rational α as follows. β We let α = γ with integer β, γ 6= 0 so that β βγhx, yi = hγx, βyi = γ2hx, yi. γ

[email protected] MATH 532 48 Inner Product Spaces

Proof (cont.) 2 Scalar multiplication: To show hx, αyi = αhx, yi for integer α we can just repeatedly apply the additivity property just proved.

From this we can get the property for rational α as follows. β We let α = γ with integer β, γ 6= 0 so that β βγhx, yi = hγx, βyi = γ2hx, yi. γ

Dividing by γ2 we get β β hx, yi = hx, yi. γ γ

[email protected] MATH 532 48 Now we take a sequence {αn} of rational numbers such that αn → α for n → ∞ and have — by continuity

hx, αnyi → hx, αyi as n → ∞.



Inner Product Spaces

Proof (cont.) Finally, for real α we use the continuity of the norm function (see HW) which implies that our inner product h·, ·i also is continuous.

[email protected] MATH 532 49 Inner Product Spaces

Proof (cont.) Finally, for real α we use the continuity of the norm function (see HW) which implies that our inner product h·, ·i also is continuous.

Now we take a sequence {αn} of rational numbers such that αn → α for n → ∞ and have — by continuity

hx, αnyi → hx, αyi as n → ∞.



[email protected] MATH 532 49 Remark Since many problems are more easily dealt with in inner product spaces (since we then have lengths and angles, see next section) the 2-norm has a clear advantage over other p-norms.

Inner Product Spaces

Theorem The only vector p-norm induced by an inner product is the 2-norm.

[email protected] MATH 532 50 Inner Product Spaces

Theorem The only vector p-norm induced by an inner product is the 2-norm.

Remark Since many problems are more easily dealt with in inner product spaces (since we then have lengths and angles, see next section) the 2-norm has a clear advantage over other p-norms.

[email protected] MATH 532 50 Therefore we need to show that it doesn’t work for p 6= 2. We do this by showing that the parallelogram identity   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2

fails for p 6= 2. We will do this for 1 ≤ p < ∞. You will work out the case p = ∞ in a HW problem.

Inner Product Spaces

Proof We know that the 2-norm does induce an inner product, i.e., √ T kxk2 = x x.

[email protected] MATH 532 51 We do this by showing that the parallelogram identity   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2

fails for p 6= 2. We will do this for 1 ≤ p < ∞. You will work out the case p = ∞ in a HW problem.

Inner Product Spaces

Proof We know that the 2-norm does induce an inner product, i.e., √ T kxk2 = x x.

Therefore we need to show that it doesn’t work for p 6= 2.

[email protected] MATH 532 51 We will do this for 1 ≤ p < ∞. You will work out the case p = ∞ in a HW problem.

Inner Product Spaces

Proof We know that the 2-norm does induce an inner product, i.e., √ T kxk2 = x x.

Therefore we need to show that it doesn’t work for p 6= 2. We do this by showing that the parallelogram identity   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 fails for p 6= 2.

[email protected] MATH 532 51 Inner Product Spaces

Proof We know that the 2-norm does induce an inner product, i.e., √ T kxk2 = x x.

Therefore we need to show that it doesn’t work for p 6= 2. We do this by showing that the parallelogram identity   kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 fails for p 6= 2. We will do this for 1 ≤ p < ∞. You will work out the case p = ∞ in a HW problem.

[email protected] MATH 532 51 n !2/p X p 2/p = |[e1 + e2]i | = 2 i=1 and, similarly 2 2 2/p kx − ykp = ke1 − e2kp = 2 . Together, the left-hand side of the parallelogram identity is 2 22/p = 22/p+1.

Inner Product Spaces

Proof (cont.)

All we need is a counterexample, so we take x = e1 and y = e2 so that

2 2 kx + ykp = ke1 + e2kp

[email protected] MATH 532 52 = 22/p

and, similarly 2 2 2/p kx − ykp = ke1 − e2kp = 2 . Together, the left-hand side of the parallelogram identity is 2 22/p = 22/p+1.

Inner Product Spaces

Proof (cont.)

All we need is a counterexample, so we take x = e1 and y = e2 so that

n !2/p 2 2 X p kx + ykp = ke1 + e2kp = |[e1 + e2]i | i=1

[email protected] MATH 532 52 and, similarly 2 2 2/p kx − ykp = ke1 − e2kp = 2 . Together, the left-hand side of the parallelogram identity is 2 22/p = 22/p+1.

Inner Product Spaces

Proof (cont.)

All we need is a counterexample, so we take x = e1 and y = e2 so that

n !2/p 2 2 X p 2/p kx + ykp = ke1 + e2kp = |[e1 + e2]i | = 2 i=1

[email protected] MATH 532 52 Together, the left-hand side of the parallelogram identity is 2 22/p = 22/p+1.

Inner Product Spaces

Proof (cont.)

All we need is a counterexample, so we take x = e1 and y = e2 so that

n !2/p 2 2 X p 2/p kx + ykp = ke1 + e2kp = |[e1 + e2]i | = 2 i=1 and, similarly 2 2 2/p kx − ykp = ke1 − e2kp = 2 .

[email protected] MATH 532 52 Inner Product Spaces

Proof (cont.)

All we need is a counterexample, so we take x = e1 and y = e2 so that

n !2/p 2 2 X p 2/p kx + ykp = ke1 + e2kp = |[e1 + e2]i | = 2 i=1 and, similarly 2 2 2/p kx − ykp = ke1 − e2kp = 2 . Together, the left-hand side of the parallelogram identity is 2 22/p = 22/p+1.

[email protected] MATH 532 52 2 2 = 1 = ke2kp = kykp,

so that the right-hand side comes out to 4. Finally, we have

2 2 22/p+1 = 4 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 kxkp = ke1kp

[email protected] MATH 532 53 2 2 = ke2kp = kykp,

so that the right-hand side comes out to 4. Finally, we have

2 2 22/p+1 = 4 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 kxkp = ke1kp = 1

[email protected] MATH 532 53 4. Finally, we have

2 2 22/p+1 = 4 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 2 2 kxkp = ke1kp = 1 = ke2kp = kykp, so that the right-hand side comes out to

[email protected] MATH 532 53 Finally, we have

2 2 22/p+1 = 4 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 2 2 kxkp = ke1kp = 1 = ke2kp = kykp, so that the right-hand side comes out to 4.

[email protected] MATH 532 53 2 2 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 2 2 kxkp = ke1kp = 1 = ke2kp = kykp, so that the right-hand side comes out to 4. Finally, we have

22/p+1 = 4

[email protected] MATH 532 53 Inner Product Spaces

Proof (cont.) For the right-hand side of the parallelogram identity we calculate

2 2 2 2 kxkp = ke1kp = 1 = ke2kp = kykp, so that the right-hand side comes out to 4. Finally, we have

2 2 22/p+1 = 4 ⇐⇒ + 1 = 2 ⇐⇒ = 1 or p = 2. p p



[email protected] MATH 532 53 Orthogonal Vectors Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 54 Definition Two vectors x, y ∈ V are called orthogonal if

hx, yi = 0.

We often use the notation x ⊥ y.

Orthogonal Vectors Orthogonal Vectors

We will now work in a general inner product space V with induced norm k · k = ph·, ·i.

[email protected] MATH 532 55 Orthogonal Vectors Orthogonal Vectors

We will now work in a general inner product space V with induced norm k · k = ph·, ·i.

Definition Two vectors x, y ∈ V are called orthogonal if

hx, yi = 0.

We often use the notation x ⊥ y.

[email protected] MATH 532 55 Moreover, the states

kx − yk2 = kxk2 + kyk2 − 2kxkkyk cos θ,

so that

2 2 2 T kxk + kyk − kx − yk Pythagoras 2x y cos θ = = . 2kxkkyk 2kxkkyk

This motivates our general definition of angles: Definition Let x, y ∈ V. The angle between x and y is defined via

hx, yi cos θ = , θ ∈ [0, π]. kxkkyk

Orthogonal Vectors In the HW you will prove the for the 2-norm and standard inner product xT y, i.e., kxk2 + kyk2 = kx − yk2 ⇐⇒ xT y = 0.

[email protected] MATH 532 56 so that

2 2 2 T kxk + kyk − kx − yk Pythagoras 2x y cos θ = = . 2kxkkyk 2kxkkyk

This motivates our general definition of angles: Definition Let x, y ∈ V. The angle between x and y is defined via

hx, yi cos θ = , θ ∈ [0, π]. kxkkyk

Orthogonal Vectors In the HW you will prove the Pythagorean theorem for the 2-norm and standard inner product xT y, i.e., kxk2 + kyk2 = kx − yk2 ⇐⇒ xT y = 0. Moreover, the law of cosines states

kx − yk2 = kxk2 + kyk2 − 2kxkkyk cos θ,

[email protected] MATH 532 56 This motivates our general definition of angles: Definition Let x, y ∈ V. The angle between x and y is defined via

hx, yi cos θ = , θ ∈ [0, π]. kxkkyk

Orthogonal Vectors In the HW you will prove the Pythagorean theorem for the 2-norm and standard inner product xT y, i.e., kxk2 + kyk2 = kx − yk2 ⇐⇒ xT y = 0. Moreover, the law of cosines states

kx − yk2 = kxk2 + kyk2 − 2kxkkyk cos θ, so that

2 2 2 T kxk + kyk − kx − yk Pythagoras 2x y cos θ = = . 2kxkkyk 2kxkkyk

[email protected] MATH 532 56 Orthogonal Vectors In the HW you will prove the Pythagorean theorem for the 2-norm and standard inner product xT y, i.e., kxk2 + kyk2 = kx − yk2 ⇐⇒ xT y = 0. Moreover, the law of cosines states

kx − yk2 = kxk2 + kyk2 − 2kxkkyk cos θ, so that

2 2 2 T kxk + kyk − kx − yk Pythagoras 2x y cos θ = = . 2kxkkyk 2kxkkyk

This motivates our general definition of angles: Definition Let x, y ∈ V. The angle between x and y is defined via

hx, yi cos θ = , θ ∈ [0, π]. kxkkyk

[email protected] MATH 532 56 Theorem Every orthonormal set is linearly independent.

Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an orthonormal for V.

Orthogonal Vectors Orthonormal sets

Definition

A set {u1, u2,..., un} ⊆ V is called orthonormal if

hui , uj i = δij ().

[email protected] MATH 532 57 Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an for V.

Orthogonal Vectors Orthonormal sets

Definition

A set {u1, u2,..., un} ⊆ V is called orthonormal if

hui , uj i = δij (Kronecker delta).

Theorem Every orthonormal set is linearly independent.

[email protected] MATH 532 57 Orthogonal Vectors Orthonormal sets

Definition

A set {u1, u2,..., un} ⊆ V is called orthonormal if

hui , uj i = δij (Kronecker delta).

Theorem Every orthonormal set is linearly independent.

Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an orthonormal basis for V.

[email protected] MATH 532 57 n X ⇐⇒ αj hui , uj i = 0 ⇐⇒ αi = 0. j=1 | {z } =δij

Since i was arbitrary this holds for all i = 1,..., n, and we have .

To see this is true we take the inner product with ui :

n X hui , αj uj i = hui , 0i j=1



Orthogonal Vectors

Proof (of the theorem) We want to show linear independence, i.e., that

n X αj uj = 0 =⇒ αj = 0, j = 1,..., n. j=1

[email protected] MATH 532 58 n X ⇐⇒ αj hui , uj i = 0 ⇐⇒ αi = 0. j=1 | {z } =δij

Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence.

Orthogonal Vectors

Proof (of the theorem) We want to show linear independence, i.e., that

n X αj uj = 0 =⇒ αj = 0, j = 1,..., n. j=1

To see this is true we take the inner product with ui :

n X hui , αj uj i = hui , 0i j=1



[email protected] MATH 532 58 Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence.

Orthogonal Vectors

Proof (of the theorem) We want to show linear independence, i.e., that

n X αj uj = 0 =⇒ αj = 0, j = 1,..., n. j=1

To see this is true we take the inner product with ui :

n X hui , αj uj i = hui , 0i j=1 n X ⇐⇒ αj hui , uj i = 0 ⇐⇒ αi = 0. j=1 | {z } =δij



[email protected] MATH 532 58 Orthogonal Vectors

Proof (of the theorem) We want to show linear independence, i.e., that

n X αj uj = 0 =⇒ αj = 0, j = 1,..., n. j=1

To see this is true we take the inner product with ui :

n X hui , αj uj i = hui , 0i j=1 n X ⇐⇒ αj hui , uj i = 0 ⇐⇒ αi = 0. j=1 | {z } =δij

Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence. 

[email protected] MATH 532 58 Using this basis we can express any x ∈ Rn as

x = x1e1 + x2e2 + ... + xnen,

we get a coordinate expansion of x.

Orthogonal Vectors

Example The standard orthonormal basis of Rn is given by

{e1, e2,..., en}.

[email protected] MATH 532 59 Orthogonal Vectors

Example The standard orthonormal basis of Rn is given by

{e1, e2,..., en}.

Using this basis we can express any x ∈ Rn as

x = x1e1 + x2e2 + ... + xnen, we get a coordinate expansion of x.

[email protected] MATH 532 59 Consider the orthonormal basis B = {u1, u2,..., un} and assume

n X x = αj uj j=1

for some appropriate scalars αj . To find these expansion coefficients αj we take the inner product with ui , i.e., n X hui , xi = αj hui , uj i = αi . j=1 | {z } =δij

Orthogonal Vectors

In fact, any other orthonormal basis provides just as simple a representation of x;

[email protected] MATH 532 60 To find these expansion coefficients αj we take the inner product with ui , i.e., n X hui , xi = αj hui , uj i = αi . j=1 | {z } =δij

Orthogonal Vectors

In fact, any other orthonormal basis provides just as simple a representation of x;

Consider the orthonormal basis B = {u1, u2,..., un} and assume

n X x = αj uj j=1 for some appropriate scalars αj .

[email protected] MATH 532 60 n X = αj hui , uj i = αi . j=1 | {z } =δij

Orthogonal Vectors

In fact, any other orthonormal basis provides just as simple a representation of x;

Consider the orthonormal basis B = {u1, u2,..., un} and assume

n X x = αj uj j=1 for some appropriate scalars αj . To find these expansion coefficients αj we take the inner product with ui , i.e.,

hui , xi

[email protected] MATH 532 60 = αi .

Orthogonal Vectors

In fact, any other orthonormal basis provides just as simple a representation of x;

Consider the orthonormal basis B = {u1, u2,..., un} and assume

n X x = αj uj j=1 for some appropriate scalars αj . To find these expansion coefficients αj we take the inner product with ui , i.e., n X hui , xi = αj hui , uj i j=1 | {z } =δij

[email protected] MATH 532 60 Orthogonal Vectors

In fact, any other orthonormal basis provides just as simple a representation of x;

Consider the orthonormal basis B = {u1, u2,..., un} and assume

n X x = αj uj j=1 for some appropriate scalars αj . To find these expansion coefficients αj we take the inner product with ui , i.e., n X hui , xi = αj hui , uj i = αi . j=1 | {z } =δij

[email protected] MATH 532 60 Orthogonal Vectors

We therefore have proved Theorem

Let {u1, u2,..., un} be an orthonormal basis for an inner product space V. Then any x ∈ V can be written as

n X x = hx, ui iui . j=1

This is a (finite) Fourier expansion with Fourier coefficients hx, ui i.

[email protected] MATH 532 61 Orthogonal Vectors

Remark The classical (infinite-dimensional) for continuous functions on (−π, π) uses the orthogonal (but not yet orthonormal) basis {1, sin t, cos t, sin 2t, cos 2t,..., } and the inner product Z π hf , gi = f (t)g(t)dt. −π

[email protected] MATH 532 62 It is clear by inspection that B is an orthogonal subset of R3, i.e., using T the Euclidean inner product, we have ui uj = 0, i, j = 1, 2, 3, i 6= j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by ui computing v i = , i = 1, 2, 3. kui k2 This yields

1 0  1  1 1 v 1 = √ 0 , v 2 = 1 , v 3 = √  0  . 2 1 0 2 −1

Orthogonal Vectors

Example Consider the basis        1 0 1  B = {u1, u2, u3} = 0 , 1 ,  0  .  1 0 −1 

[email protected] MATH 532 63 We can obtain an orthonormal basis by normalizing the vectors, i.e., by ui computing v i = , i = 1, 2, 3. kui k2 This yields

1 0  1  1 1 v 1 = √ 0 , v 2 = 1 , v 3 = √  0  . 2 1 0 2 −1

Orthogonal Vectors

Example Consider the basis        1 0 1  B = {u1, u2, u3} = 0 , 1 ,  0  .  1 0 −1 

It is clear by inspection that B is an orthogonal subset of R3, i.e., using T the Euclidean inner product, we have ui uj = 0, i, j = 1, 2, 3, i 6= j.

[email protected] MATH 532 63 This yields

1 0  1  1 1 v 1 = √ 0 , v 2 = 1 , v 3 = √  0  . 2 1 0 2 −1

Orthogonal Vectors

Example Consider the basis        1 0 1  B = {u1, u2, u3} = 0 , 1 ,  0  .  1 0 −1 

It is clear by inspection that B is an orthogonal subset of R3, i.e., using T the Euclidean inner product, we have ui uj = 0, i, j = 1, 2, 3, i 6= j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by ui computing v i = , i = 1, 2, 3. kui k2

[email protected] MATH 532 63 Orthogonal Vectors

Example Consider the basis        1 0 1  B = {u1, u2, u3} = 0 , 1 ,  0  .  1 0 −1 

It is clear by inspection that B is an orthogonal subset of R3, i.e., using T the Euclidean inner product, we have ui uj = 0, i, j = 1, 2, 3, i 6= j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by ui computing v i = , i = 1, 2, 3. kui k2 This yields

1 0  1  1 1 v 1 = √ 0 , v 2 = 1 , v 3 = √  0  . 2 1 0 2 −1

[email protected] MATH 532 63 1 0  1  4 1 2 1 = √ √ 0 + 2 1 − √ √  0  2 2 1 0 2 2 −1 2 0 −1 = 0 + 2 +  0  . 2 0 1

3 X  T  x = x v i v i i=1

Orthogonal Vectors

Example (cont.) T The Fourier expansion of x = 1 2 3 is given by

[email protected] MATH 532 64 1 0  1  4 1 2 1 = √ √ 0 + 2 1 − √ √  0  2 2 1 0 2 2 −1 2 0 −1 = 0 + 2 +  0  . 2 0 1

Orthogonal Vectors

Example (cont.) T The Fourier expansion of x = 1 2 3 is given by

3 X  T  x = x v i v i i=1

[email protected] MATH 532 64 2 0 −1 = 0 + 2 +  0  . 2 0 1

Orthogonal Vectors

Example (cont.) T The Fourier expansion of x = 1 2 3 is given by

3 X  T  x = x v i v i i=1 1 0  1  4 1 2 1 = √ √ 0 + 2 1 − √ √  0  2 2 1 0 2 2 −1

[email protected] MATH 532 64 Orthogonal Vectors

Example (cont.) T The Fourier expansion of x = 1 2 3 is given by

3 X  T  x = x v i v i i=1 1 0  1  4 1 2 1 = √ √ 0 + 2 1 − √ √  0  2 2 1 0 2 2 −1 2 0 −1 = 0 + 2 +  0  . 2 0 1

[email protected] MATH 532 64 Gram–Schmidt Orthogonalization & QR Factorization Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 65 Idea: construct u1, u2,..., un successively so that {u1, u2,..., uk } is an ON basis for span{x1, x2,..., xk }, k = 1,..., n.

Gram–Schmidt Orthogonalization & QR Factorization

We want to convert an arbitrary basis {x1, x2,..., xn} of V to an orthonormal basis {u1, u2,..., un}.

[email protected] MATH 532 66 Gram–Schmidt Orthogonalization & QR Factorization

We want to convert an arbitrary basis {x1, x2,..., xn} of V to an orthonormal basis {u1, u2,..., un}.

Idea: construct u1, u2,..., un successively so that {u1, u2,..., uk } is an ON basis for span{x1, x2,..., xk }, k = 1,..., n.

[email protected] MATH 532 66 k = 2: Consider the projection of x2 onto u1, i.e.,

hu1, x2iu1.

Then v 2 = x2 − hu1, x2iu1 and v 2 u2 = . kv 2k

Gram–Schmidt Orthogonalization & QR Factorization Construction

k = 1: x1 u1 = . kx1k

[email protected] MATH 532 67 Then v 2 = x2 − hu1, x2iu1 and v 2 u2 = . kv 2k

Gram–Schmidt Orthogonalization & QR Factorization Construction

k = 1: x1 u1 = . kx1k

k = 2: Consider the projection of x2 onto u1, i.e.,

hu1, x2iu1.

[email protected] MATH 532 67 Gram–Schmidt Orthogonalization & QR Factorization Construction

k = 1: x1 u1 = . kx1k

k = 2: Consider the projection of x2 onto u1, i.e.,

hu1, x2iu1.

Then v 2 = x2 − hu1, x2iu1 and v 2 u2 = . kv 2k

[email protected] MATH 532 67 k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i This vector, however is not yet normalized.

Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }.

[email protected] MATH 532 68 k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i This vector, however is not yet normalized.

Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

[email protected] MATH 532 68 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i This vector, however is not yet normalized.

Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

k+1 X xk+1 = hui , xk+1iui i=1

[email protected] MATH 532 68 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i This vector, however is not yet normalized.

Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1

[email protected] MATH 532 68 v = k+1 huk+1, xk+1i This vector, however is not yet normalized.

Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui ⇐⇒ uk+1 = huk+1, xk+1i

[email protected] MATH 532 68 This vector, however is not yet normalized.

Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i

[email protected] MATH 532 68 Gram–Schmidt Orthogonalization & QR Factorization

In general, consider {u1,..., uk } as a given ON basis for span{x1,..., xk }. Use the Fourier expansion to express xk+1 with respect to {u1,..., uk+1}:

k+1 X xk+1 = hui , xk+1iui i=1 k X ⇐⇒ xk+1 = hui , xk+1iui + huk+1, xk+1iuk+1 i=1 Pk xk+1 − i=1hui , xk+1iui v k+1 ⇐⇒ uk+1 = = huk+1, xk+1i huk+1, xk+1i This vector, however is not yet normalized.

[email protected] MATH 532 68 k X =⇒ kv k+1k = kxk+1 − hui , xk+1iui k = |huk+1, xk+1i|. i=1 Therefore

k X huk+1, xk+1i = ±kxk+1 − hui , xk+1iui k. i=1 Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive .

Gram–Schmidt Orthogonalization & QR Factorization

We now want kuk+1k = 1, i.e., r v k+1 v k+1 1 h , i = kv k+1k = 1 huk+1, xk+1i huk+1, xk+1i |huk+1, xk+1i|

[email protected] MATH 532 69 Therefore

k X huk+1, xk+1i = ±kxk+1 − hui , xk+1iui k. i=1 Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign.

Gram–Schmidt Orthogonalization & QR Factorization

We now want kuk+1k = 1, i.e., r v k+1 v k+1 1 h , i = kv k+1k = 1 huk+1, xk+1i huk+1, xk+1i |huk+1, xk+1i| k X =⇒ kv k+1k = kxk+1 − hui , xk+1iui k = |huk+1, xk+1i|. i=1

[email protected] MATH 532 69 Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign.

Gram–Schmidt Orthogonalization & QR Factorization

We now want kuk+1k = 1, i.e., r v k+1 v k+1 1 h , i = kv k+1k = 1 huk+1, xk+1i huk+1, xk+1i |huk+1, xk+1i| k X =⇒ kv k+1k = kxk+1 − hui , xk+1iui k = |huk+1, xk+1i|. i=1 Therefore

k X huk+1, xk+1i = ±kxk+1 − hui , xk+1iui k. i=1

[email protected] MATH 532 69 Gram–Schmidt Orthogonalization & QR Factorization

We now want kuk+1k = 1, i.e., r v k+1 v k+1 1 h , i = kv k+1k = 1 huk+1, xk+1i huk+1, xk+1i |huk+1, xk+1i| k X =⇒ kv k+1k = kxk+1 − hui , xk+1iui k = |huk+1, xk+1i|. i=1 Therefore

k X huk+1, xk+1i = ±kxk+1 − hui , xk+1iui k. i=1 Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign.

[email protected] MATH 532 69 Gram–Schmidt Orthogonalization & QR Factorization Gram–Schmidt algorithm

Summarizing, we have

x1 u1 = , kx1k k−1 X v k = xk − hui , xk iui , k = 2,..., n, i=1 v k uk = . kv k k

[email protected] MATH 532 70 Gram–Schmidt Orthogonalization & QR Factorization Using matrix notation to describe Gram–Schmidt

We will assume V ⊆ Rm (but this also works in the complex case). Let 0  .  m U1 =  .  ∈ R 0 and for k = 2, 3,..., n let

 m×k−1 Uk = u1 u2 ··· uk−1 ∈ R .

[email protected] MATH 532 71 and

 T  u1 xk T  u xk  U UT x = u u ··· u   2  k k k 1 2 k−1  .   .  T uk−1xk k−1 k−1 X T X T = ui (ui xk ) = (ui xk )ui . i=1 i=1

Gram–Schmidt Orthogonalization & QR Factorization

Then  T  u1 xk T  u xk  UT x =  2  k k  .   .  T uk−1xk

[email protected] MATH 532 72 Gram–Schmidt Orthogonalization & QR Factorization

Then  T  u1 xk T  u xk  UT x =  2  k k  .   .  T uk−1xk and

 T  u1 xk T  u xk  U UT x = u u ··· u   2  k k k 1 2 k−1  .   .  T uk−1xk k−1 k−1 X T X T = ui (ui xk ) = (ui xk )ui . i=1 i=1

[email protected] MATH 532 72 T = xk − Uk Uk xk

 T  = I − Uk Uk xk , k = 1, 2,..., n,

where the case k = 1 is also covered by the special definition of U1. Remark T T Uk Uk is a projection matrix, and I − Uk Uk is a complementary projection. We will cover these later.

Gram–Schmidt Orthogonalization & QR Factorization

Now, Gram–Schmidt says

k−1 X T v k = xk − (ui xk )ui i=1

[email protected] MATH 532 73  T  = I − Uk Uk xk , k = 1, 2,..., n,

where the case k = 1 is also covered by the special definition of U1. Remark T T Uk Uk is a projection matrix, and I − Uk Uk is a complementary projection. We will cover these later.

Gram–Schmidt Orthogonalization & QR Factorization

Now, Gram–Schmidt says

k−1 X T T v k = xk − (ui xk )ui = xk − Uk Uk xk i=1

[email protected] MATH 532 73 Remark T T Uk Uk is a projection matrix, and I − Uk Uk is a complementary projection. We will cover these later.

Gram–Schmidt Orthogonalization & QR Factorization

Now, Gram–Schmidt says

k−1 X T T v k = xk − (ui xk )ui = xk − Uk Uk xk i=1  T  = I − Uk Uk xk , k = 1, 2,..., n, where the case k = 1 is also covered by the special definition of U1.

[email protected] MATH 532 73 Gram–Schmidt Orthogonalization & QR Factorization

Now, Gram–Schmidt says

k−1 X T T v k = xk − (ui xk )ui = xk − Uk Uk xk i=1  T  = I − Uk Uk xk , k = 1, 2,..., n, where the case k = 1 is also covered by the special definition of U1. Remark T T Uk Uk is a projection matrix, and I − Uk Uk is a complementary projection. We will cover these later.

[email protected] MATH 532 73 We want to convert the set of columns of A, {a1, a2,..., an} to an ON basis {q1, q2,..., qn} of R(A). From our discussion of Gram–Schmidt we know

a1 q1 = , ka1k k−1 X v k = ak − hqi , ak iqi , k = 2,..., n, i=1 v k qk = . kv k k

Gram–Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram–Schmidt)

Consider an m × n matrix A with (A) = n.

[email protected] MATH 532 74 From our discussion of Gram–Schmidt we know

a1 q1 = , ka1k k−1 X v k = ak − hqi , ak iqi , k = 2,..., n, i=1 v k qk = . kv k k

Gram–Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram–Schmidt)

Consider an m × n matrix A with rank(A) = n.

We want to convert the set of columns of A, {a1, a2,..., an} to an ON basis {q1, q2,..., qn} of R(A).

[email protected] MATH 532 74 Gram–Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram–Schmidt)

Consider an m × n matrix A with rank(A) = n.

We want to convert the set of columns of A, {a1, a2,..., an} to an ON basis {q1, q2,..., qn} of R(A). From our discussion of Gram–Schmidt we know

a1 q1 = , ka1k k−1 X v k = ak − hqi , ak iqi , k = 2,..., n, i=1 v k qk = . kv k k

[email protected] MATH 532 74 We also introduce the new notation

r11 = ka1k, rkk = kv k k, k = 2,..., n. Then  A = a1 a2 ··· an   r11 hq1, a2i · · · hq1, ani  r22 · · · hq2, ani = q q ··· q    1 2 n  .. .  | {z }  . .  =Q O rnn | {z } =R and we have the reduced QR factorization of A.

Gram–Schmidt Orthogonalization & QR Factorization We now rewrite as follows:

a1 = ka1kq1

ak = hq1, a2iq1 + ... + hqk−1, ak iqk−1 + kv k kqk , k = 2,..., n.

[email protected] MATH 532 75 Then  A = a1 a2 ··· an   r11 hq1, a2i · · · hq1, ani  r22 · · · hq2, ani = q q ··· q    1 2 n  .. .  | {z }  . .  =Q O rnn | {z } =R and we have the reduced QR factorization of A.

Gram–Schmidt Orthogonalization & QR Factorization We now rewrite as follows:

a1 = ka1kq1

ak = hq1, a2iq1 + ... + hqk−1, ak iqk−1 + kv k kqk , k = 2,..., n. We also introduce the new notation

r11 = ka1k, rkk = kv k k, k = 2,..., n.

[email protected] MATH 532 75 Gram–Schmidt Orthogonalization & QR Factorization We now rewrite as follows:

a1 = ka1kq1

ak = hq1, a2iq1 + ... + hqk−1, ak iqk−1 + kv k kqk , k = 2,..., n. We also introduce the new notation

r11 = ka1k, rkk = kv k k, k = 2,..., n. Then  A = a1 a2 ··· an   r11 hq1, a2i · · · hq1, ani  r22 · · · hq2, ani = q q ··· q    1 2 n  .. .  | {z }  . .  =Q O rnn | {z } =R and we have the reduced QR factorization of A. [email protected] MATH 532 75 Gram–Schmidt Orthogonalization & QR Factorization

Remark The matrix Q is m × n with orthonormal columns

The matrix R is n × n upper triangular with positive diagonal entries.

The reduced QR factorization is unique (see HW).

[email protected] MATH 532 76 √  T  T 2 v 2 = a2 − q a2 q1, q a2 = √ = 2 = r12 1 1 2 2 √ 1  1  2 √ = 1 − √ 0 =  1  , kv 2k = 3 = r22 0 2 1 −1  1  v 2 1 =⇒ q = = √  1  2 kv k 2 3 −1

  1 √ a1 1 q = = √ 0 , r11 = ka1k = 2 1 ka k 1 2 1

Gram–Schmidt Orthogonalization & QR Factorization

Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1

[email protected] MATH 532 77 √  T  T 2 v 2 = a2 − q a2 q1, q a2 = √ = 2 = r12 1 1 2 2 √ 1  1  2 √ = 1 − √ 0 =  1  , kv 2k = 3 = r22 0 2 1 −1  1  v 2 1 =⇒ q = = √  1  2 kv k 2 3 −1

Gram–Schmidt Orthogonalization & QR Factorization

Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1   1 √ a1 1 q = = √ 0 , r11 = ka1k = 2 1 ka k 1 2 1

[email protected] MATH 532 77 2 √ 1  1  2 √ = 1 − √ 0 =  1  , kv 2k = 3 = r22 0 2 1 −1  1  v 2 1 =⇒ q = = √  1  2 kv k 2 3 −1

Gram–Schmidt Orthogonalization & QR Factorization

Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1   1 √ a1 1 q = = √ 0 , r11 = ka1k = 2 1 ka k 1 2 1 √  T  T 2 v 2 = a2 − q a2 q1, q a2 = √ = 2 = r12 1 1 2

[email protected] MATH 532 77  1  v 2 1 =⇒ q = = √  1  2 kv k 2 3 −1

Gram–Schmidt Orthogonalization & QR Factorization

Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1   1 √ a1 1 q = = √ 0 , r11 = ka1k = 2 1 ka k 1 2 1 √  T  T 2 v 2 = a2 − q a2 q1, q a2 = √ = 2 = r12 1 1 2 2 √ 1  1  2 √ = 1 − √ 0 =  1  , kv 2k = 3 = r22 0 2 1 −1

[email protected] MATH 532 77 Gram–Schmidt Orthogonalization & QR Factorization

Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1   1 √ a1 1 q = = √ 0 , r11 = ka1k = 2 1 ka k 1 2 1 √  T  T 2 v 2 = a2 − q a2 q1, q a2 = √ = 2 = r12 1 1 2 2 √ 1  1  2 √ = 1 − √ 0 =  1  , kv 2k = 3 = r22 0 2 1 −1  1  v 2 1 =⇒ q = = √  1  2 kv k 2 3 −1

[email protected] MATH 532 77 Gram–Schmidt Orthogonalization & QR Factorization

Example (cont.)

 T   T  v 3 = a3 − q1 a3 q1 − q2 a3 q2 with T 1 T q a3 = √ = r13, q a3 = 0 = r23 1 2 2 Thus 0 1 −1 √ 1 1 6 v 3 = 1 − √ √ 0 − 0 =  2  , kv 3k = = r33 2 2 1 2 2 1 1

So −1 v 3 1 q = = √  2  3 kv k 3 6 1

[email protected] MATH 532 78 Gram–Schmidt Orthogonalization & QR Factorization

Example (cont.) Together we have

 1 1 1  √ √  √ √ − √ 2 2 √1 2 3 6 √ 2 Q =  0 √1 √2  , R =    3 6   0 3√ 0  √1 − √1 √1 0 0 6 2 3 6

[email protected] MATH 532 79 Gram–Schmidt Orthogonalization & QR Factorization Solving linear systems with the QR factorization

Recall the use of the LU factorization to solve Ax = b. Now, A = QR implies

Ax = b ⇐⇒ QRx = b.

In the special case of a nonsingular n × n matrix A the matrix Q is also n × n with ON columns so that

Q−1 = QT (since QT Q = I)

and QRx = b ⇐⇒ Rx = QT b.

[email protected] MATH 532 80 Gram–Schmidt Orthogonalization & QR Factorization

Therefore we solve Ax = b by the following steps: 1 Compute A = QR. 2 Compute y = QT b. 3 Solve the upper triangular system Rx = y.

Remark This procedure is comparable to the three-step LU solution procedure.

[email protected] MATH 532 81 ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the equations corresponds to solving (cf. the previous discussion) Rx = QT b.

× Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems.

[email protected] MATH 532 82 ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

(so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n

[email protected] MATH 532 82 ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists).

[email protected] MATH 532 82 ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

Using the QR factorization of A this becomes (QR)T QRx = (QR)T b

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b.

[email protected] MATH 532 82 ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b

[email protected] MATH 532 82 ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b ⇐⇒ RT QT Q Rx = RT QT b | {z } =I

[email protected] MATH 532 82 Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b.

[email protected] MATH 532 82 Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible.

[email protected] MATH 532 82 Gram–Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. × Consider Ax = b with A ∈ Rm n and rank(A) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations AT Ax = AT b. Using the QR factorization of A this becomes (QR)T QRx = (QR)T b ⇐⇒ RT QT Q Rx = RT QT b | {z } =I ⇐⇒ RT Rx = RT QT b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = QT b.

[email protected] MATH 532 82 Summary The QR factorization provides a simple and efficient way to solve least squares problems.

The ill-conditioned matrix AT A is never computed.

If it is required, then it can be computed from R as RT R (in fact, this is the Cholesky factorization) of AT A.

Gram–Schmidt Orthogonalization & QR Factorization

Remark This is the same as the QR factorization applied to a square and consistent system Ax = b.

[email protected] MATH 532 83 Gram–Schmidt Orthogonalization & QR Factorization

Remark This is the same as the QR factorization applied to a square and consistent system Ax = b.

Summary The QR factorization provides a simple and efficient way to solve least squares problems.

The ill-conditioned matrix AT A is never computed.

If it is required, then it can be computed from R as RT R (in fact, this is the Cholesky factorization) of AT A.

[email protected] MATH 532 83 it is not numerically stable (see HW).

A better — but still not ideal — approach is provided by the modified Gram–Schmidt algorithm.

Idea: rearrange the order of calculation, i.e., write the projection matrices k−1 T X T Uk Uk = ui ui i=1 as a sum of rank-1 projections.

Gram–Schmidt Orthogonalization & QR Factorization Modified Gram–Schmidt

There is still a problem with the QR factorization via Gram–Schmidt:

[email protected] MATH 532 84 A better — but still not ideal — approach is provided by the modified Gram–Schmidt algorithm.

Idea: rearrange the order of calculation, i.e., write the projection matrices k−1 T X T Uk Uk = ui ui i=1 as a sum of rank-1 projections.

Gram–Schmidt Orthogonalization & QR Factorization Modified Gram–Schmidt

There is still a problem with the QR factorization via Gram–Schmidt:

it is not numerically stable (see HW).

[email protected] MATH 532 84 Idea: rearrange the order of calculation, i.e., write the projection matrices k−1 T X T Uk Uk = ui ui i=1 as a sum of rank-1 projections.

Gram–Schmidt Orthogonalization & QR Factorization Modified Gram–Schmidt

There is still a problem with the QR factorization via Gram–Schmidt:

it is not numerically stable (see HW).

A better — but still not ideal — approach is provided by the modified Gram–Schmidt algorithm.

[email protected] MATH 532 84 Gram–Schmidt Orthogonalization & QR Factorization Modified Gram–Schmidt

There is still a problem with the QR factorization via Gram–Schmidt:

it is not numerically stable (see HW).

A better — but still not ideal — approach is provided by the modified Gram–Schmidt algorithm.

Idea: rearrange the order of calculation, i.e., write the projection matrices k−1 T X T Uk Uk = ui ui i=1 as a sum of rank-1 projections.

[email protected] MATH 532 84 Gram–Schmidt Orthogonalization & QR Factorization MGS Algorithm

x1 k=1: u1 ← , uj ← xj , j = 2,..., n kx1k

for k = 2 : n

T Ek = I − uk−1uk−1 for j = k,..., n

uj ← Ek uj uk uk ← kuk k

[email protected] MATH 532 85 Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later).

Householder reflections are also more efficient than MGS.

Gram–Schmidt Orthogonalization & QR Factorization

Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better.

[email protected] MATH 532 86 Householder reflections are also more efficient than MGS.

Gram–Schmidt Orthogonalization & QR Factorization

Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better.

Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later).

[email protected] MATH 532 86 Gram–Schmidt Orthogonalization & QR Factorization

Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better.

Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later).

Householder reflections are also more efficient than MGS.

[email protected] MATH 532 86 Unitary and Orthogonal Matrices Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 87 Theorem Let U be an orthogonal n × n matrix. Then 1 U has orthonormal rows. 2 U−1 = UT . n 3 kUxk2 = kxk2 for all x ∈ R , i.e., U is an .

Remark Analogous properties for unitary matrices are formulated and proved in [Mey00].

Unitary and Orthogonal Matrices Unitary and Orthogonal Matrices

Definition A real (complex) n × n matrix is called orthogonal(unitary) if its columns form an orthonormal basis for Rn (Cn).

[email protected] MATH 532 88 Remark Analogous properties for unitary matrices are formulated and proved in [Mey00].

Unitary and Orthogonal Matrices Unitary and Orthogonal Matrices

Definition A real (complex) n × n matrix is called orthogonal(unitary) if its columns form an orthonormal basis for Rn (Cn).

Theorem Let U be an orthogonal n × n matrix. Then 1 U has orthonormal rows. 2 U−1 = UT . n 3 kUxk2 = kxk2 for all x ∈ R , i.e., U is an isometry.

[email protected] MATH 532 88 Unitary and Orthogonal Matrices Unitary and Orthogonal Matrices

Definition A real (complex) n × n matrix is called orthogonal(unitary) if its columns form an orthonormal basis for Rn (Cn).

Theorem Let U be an orthogonal n × n matrix. Then 1 U has orthonormal rows. 2 U−1 = UT . n 3 kUxk2 = kxk2 for all x ∈ R , i.e., U is an isometry.

Remark Analogous properties for unitary matrices are formulated and proved in [Mey00].

[email protected] MATH 532 88 T ui ⊥ uj ⇐⇒ ui uj = δij  T  ⇐⇒ U U = δij ij ⇐⇒ UT U = I.

But UT U = I implies UT = U−1.

1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

[email protected] MATH 532 89 1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

 T  ⇐⇒ U U = δij ij ⇐⇒ UT U = I.

But UT U = I implies UT = U−1.

Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

T ui ⊥ uj ⇐⇒ ui uj = δij

[email protected] MATH 532 89 ⇐⇒ UT U = I.

But UT U = I implies UT = U−1.

1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

T ui ⊥ uj ⇐⇒ ui uj = δij  T  ⇐⇒ U U = δij ij

[email protected] MATH 532 89 But UT U = I implies UT = U−1.

1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

T ui ⊥ uj ⇐⇒ ui uj = δij  T  ⇐⇒ U U = δij ij ⇐⇒ UT U = I.

[email protected] MATH 532 89 1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

T ui ⊥ uj ⇐⇒ ui uj = δij  T  ⇐⇒ U U = δij ij ⇐⇒ UT U = I.

But UT U = I implies UT = U−1.

[email protected] MATH 532 89 Unitary and Orthogonal Matrices

Proof  2 By definition U = u1 ··· un has orthonormal columns, i.e.,

T ui ⊥ uj ⇐⇒ ui uj = δij  T  ⇐⇒ U U = δij ij ⇐⇒ UT U = I.

But UT U = I implies UT = U−1.

1 Therefore the statement about orthonormal rows follows from

UU−1 = UUT = I.

[email protected] MATH 532 89 = xT UT U x |{z} =I 2 = kxk2.



Then, for any x ∈ Rn

2 T kUxk2 = (Ux) (Ux)

Unitary and Orthogonal Matrices

Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal.

[email protected] MATH 532 90 = xT UT U x |{z} =I 2 = kxk2.



Unitary and Orthogonal Matrices

Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x ∈ Rn

2 T kUxk2 = (Ux) (Ux)

[email protected] MATH 532 90 2 = kxk2.



Unitary and Orthogonal Matrices

Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x ∈ Rn

2 T kUxk2 = (Ux) (Ux) = xT UT U x |{z} =I

[email protected] MATH 532 90 Unitary and Orthogonal Matrices

Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x ∈ Rn

2 T kUxk2 = (Ux) (Ux) = xT UT U x |{z} =I 2 = kxk2.



[email protected] MATH 532 90 Consider x = ei . Then

2 T (3) 2 kUei k2 = ui ui = kei k2 = 1,

so the columns of U have norm 1. Moreover, for x = ei + ej (i 6= j) we get

2 T T T T (3) 2 kU(ei + ej )k2 = ui ui +ui uj + uj ui + uj uj = kei + ej k2 = 2, | {z } | {z } =1 =1

T so that ui uj = 0 for i 6= j and the columns of U are orthogonal.

Unitary and Orthogonal Matrices

Remark n The converse of (3) is also true, i.e., if kUxk2 = kxk2 for all x ∈ R then U must be orthogonal.

[email protected] MATH 532 91 (3) 2 = kei k2 = 1,

so the columns of U have norm 1. Moreover, for x = ei + ej (i 6= j) we get

2 T T T T (3) 2 kU(ei + ej )k2 = ui ui +ui uj + uj ui + uj uj = kei + ej k2 = 2, | {z } | {z } =1 =1

T so that ui uj = 0 for i 6= j and the columns of U are orthogonal.

Unitary and Orthogonal Matrices

Remark n The converse of (3) is also true, i.e., if kUxk2 = kxk2 for all x ∈ R then U must be orthogonal. Consider x = ei . Then

2 T kUei k2 = ui ui

[email protected] MATH 532 91 Moreover, for x = ei + ej (i 6= j) we get

2 T T T T (3) 2 kU(ei + ej )k2 = ui ui +ui uj + uj ui + uj uj = kei + ej k2 = 2, | {z } | {z } =1 =1

T so that ui uj = 0 for i 6= j and the columns of U are orthogonal.

Unitary and Orthogonal Matrices

Remark n The converse of (3) is also true, i.e., if kUxk2 = kxk2 for all x ∈ R then U must be orthogonal. Consider x = ei . Then

2 T (3) 2 kUei k2 = ui ui = kei k2 = 1, so the columns of U have norm 1.

[email protected] MATH 532 91 (3) 2 = kei + ej k2 = 2,

T so that ui uj = 0 for i 6= j and the columns of U are orthogonal.

Unitary and Orthogonal Matrices

Remark n The converse of (3) is also true, i.e., if kUxk2 = kxk2 for all x ∈ R then U must be orthogonal. Consider x = ei . Then

2 T (3) 2 kUei k2 = ui ui = kei k2 = 1, so the columns of U have norm 1. Moreover, for x = ei + ej (i 6= j) we get

2 T T T T kU(ei + ej )k2 = ui ui +ui uj + uj ui + uj uj | {z } | {z } =1 =1

[email protected] MATH 532 91 Unitary and Orthogonal Matrices

Remark n The converse of (3) is also true, i.e., if kUxk2 = kxk2 for all x ∈ R then U must be orthogonal. Consider x = ei . Then

2 T (3) 2 kUei k2 = ui ui = kei k2 = 1, so the columns of U have norm 1. Moreover, for x = ei + ej (i 6= j) we get

2 T T T T (3) 2 kU(ei + ej )k2 = ui ui +ui uj + uj ui + uj uj = kei + ej k2 = 2, | {z } | {z } =1 =1

T so that ui uj = 0 for i 6= j and the columns of U are orthogonal.

[email protected] MATH 532 91 Permutation matrices are orthogonal, e.g.,

1 0 0 P = 0 0 1 0 1 0

In fact, for permutation matrices we even have PT = P so that PT P = P2 = I. Such matrices are called involutary (see pretest). An can be viewed as a , buta unitary matrix may not be orthogonal. For example for

1  1 i A = √ 2 −1 i

we have A∗A = AA∗ = I, but AT A 6= I 6= AAT .

Unitary and Orthogonal Matrices

Example The simplest orthogonal matrix is the identity matrix I.

[email protected] MATH 532 92 An orthogonal matrix can be viewed as a unitary matrix, buta unitary matrix may not be orthogonal. For example for

1  1 i A = √ 2 −1 i

we have A∗A = AA∗ = I, but AT A 6= I 6= AAT .

Unitary and Orthogonal Matrices

Example The simplest orthogonal matrix is the identity matrix I. Permutation matrices are orthogonal, e.g.,

1 0 0 P = 0 0 1 0 1 0

In fact, for permutation matrices we even have PT = P so that PT P = P2 = I. Such matrices are called involutary (see pretest).

[email protected] MATH 532 92 Unitary and Orthogonal Matrices

Example The simplest orthogonal matrix is the identity matrix I. Permutation matrices are orthogonal, e.g.,

1 0 0 P = 0 0 1 0 1 0

In fact, for permutation matrices we even have PT = P so that PT P = P2 = I. Such matrices are called involutary (see pretest). An orthogonal matrix can be viewed as a unitary matrix, buta unitary matrix may not be orthogonal. For example for

1  1 i A = √ 2 −1 i

we have A∗A = AA∗ = I, but AT A 6= I 6= AAT .

[email protected] MATH 532 92 Remark Note that Q is not an orthogonal matrix:

QT = (I − uuT )T = I − uuT = Q.

Unitary and Orthogonal Matrices Elementary Orthogonal Projectors

Definition A matrix Q of the form

T n Q = I − uu , u ∈ R , kuk2 = 1,

is called an elementary orthogonal projection.

[email protected] MATH 532 93 = I − uuT = Q.

Unitary and Orthogonal Matrices Elementary Orthogonal Projectors

Definition A matrix Q of the form

T n Q = I − uu , u ∈ R , kuk2 = 1,

is called an elementary orthogonal projection.

Remark Note that Q is not an orthogonal matrix:

QT = (I − uuT )T

[email protected] MATH 532 93 = Q.

Unitary and Orthogonal Matrices Elementary Orthogonal Projectors

Definition A matrix Q of the form

T n Q = I − uu , u ∈ R , kuk2 = 1,

is called an elementary orthogonal projection.

Remark Note that Q is not an orthogonal matrix:

QT = (I − uuT )T = I − uuT

[email protected] MATH 532 93 Unitary and Orthogonal Matrices Elementary Orthogonal Projectors

Definition A matrix Q of the form

T n Q = I − uu , u ∈ R , kuk2 = 1,

is called an elementary orthogonal projection.

Remark Note that Q is not an orthogonal matrix:

QT = (I − uuT )T = I − uuT = Q.

[email protected] MATH 532 93 QT Q above= Q2 = (I − uuT )(I − uuT ) = I − 2uuT + u uT u uT |{z} =1 = (I − uuT ) = Q.

Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

[email protected] MATH 532 94 above= Q2 = (I − uuT )(I − uuT ) = I − 2uuT + u uT u uT |{z} =1 = (I − uuT ) = Q.

Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

QT Q

[email protected] MATH 532 94 = (I − uuT )(I − uuT ) = I − 2uuT + u uT u uT |{z} =1 = (I − uuT ) = Q.

Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

QT Q above= Q2

[email protected] MATH 532 94 = I − 2uuT + u uT u uT |{z} =1 = (I − uuT ) = Q.

Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

QT Q above= Q2 = (I − uuT )(I − uuT )

[email protected] MATH 532 94 = (I − uuT ) = Q.

Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

QT Q above= Q2 = (I − uuT )(I − uuT ) = I − 2uuT + u uT u uT |{z} =1

[email protected] MATH 532 94 Unitary and Orthogonal Matrices

All projectors are idempotent, i.e., Q2 = Q:

QT Q above= Q2 = (I − uuT )(I − uuT ) = I − 2uuT + u uT u uT |{z} =1 = (I − uuT ) = Q.

[email protected] MATH 532 94 ((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

[email protected] MATH 532 95 = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the orthogonal complement of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx

[email protected] MATH 532 95 = 0.

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the orthogonal complement of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x | {z } =Q

[email protected] MATH 532 95 Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the orthogonal complement of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

[email protected] MATH 532 95 = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the orthogonal complement of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x

[email protected] MATH 532 95 Therefore Qx ∈ u⊥, the orthogonal complement of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

[email protected] MATH 532 95 T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the orthogonal complement of u.

[email protected] MATH 532 95 Unitary and Orthogonal Matrices Geometric interpretation Consider

x = (I − Q)x + Qx and observe that (I − Q)x ⊥ Qx:

((I − Q)x)T Qx = x T (I − QT )Qx = x T (Q − QT Q)x = 0. | {z } =Q

Also,

(I − Q)x = (uuT )x = u(uT x) ∈ span{u}.

Therefore Qx ∈ u⊥, the orthogonal complement of u. T T T Also note that k(u x)uk = |u x| kuk2, so that |u x| is the length of | {z } =1 the orthogonal projection of x onto span{u}.

[email protected] MATH 532 95 Qx ∈ u⊥, so T Q = I − uu = Pu⊥ is a projection onto u⊥.

Unitary and Orthogonal Matrices Summary

(I − Q)x ∈ span{u}, so

T I − Q = uu = Pu

is a projection onto span{u}.

[email protected] MATH 532 96 Unitary and Orthogonal Matrices Summary

(I − Q)x ∈ span{u}, so

T I − Q = uu = Pu

is a projection onto span{u}. Qx ∈ u⊥, so T Q = I − uu = Pu⊥ is a projection onto u⊥.

[email protected] MATH 532 96 v = √ v . kvk2 v T v Therefore, for general v T = vv { } Pv v T v is a projection onto span v .

vv T ⊥ ⊥ = − = − Pv I Pv I v T v is a projection onto v .

Unitary and Orthogonal Matrices

Remark

Above we assumed that kuk2 = 1.

For an arbitrary vector v we get a u =

[email protected] MATH 532 97 Therefore, for general v T = vv { } Pv v T v is a projection onto span v .

vv T ⊥ ⊥ = − = − Pv I Pv I v T v is a projection onto v .

Unitary and Orthogonal Matrices

Remark

Above we assumed that kuk2 = 1.

For an arbitrary vector v we get a unit vector u = v = √ v . kvk2 v T v

[email protected] MATH 532 97 vv T ⊥ ⊥ = − = − Pv I Pv I v T v is a projection onto v .

Unitary and Orthogonal Matrices

Remark

Above we assumed that kuk2 = 1.

For an arbitrary vector v we get a unit vector u = v = √ v . kvk2 v T v Therefore, for general v T = vv { } Pv v T v is a projection onto span v .

[email protected] MATH 532 97 Unitary and Orthogonal Matrices

Remark

Above we assumed that kuk2 = 1.

For an arbitrary vector v we get a unit vector u = v = √ v . kvk2 v T v Therefore, for general v T = vv { } Pv v T v is a projection onto span v .

vv T ⊥ ⊥ = − = − Pv I Pv I v T v is a projection onto v .

[email protected] MATH 532 97 Unitary and Orthogonal Matrices Elementary Reflections

Definition Let v(6= 0) ∈ Rn. Then vv T R = I − 2 v T v is called the elementary (or Householder) reflector about v ⊥.

Remark n For u ∈ R with kuk2 = 1 we have

R = I − 2uuT .

[email protected] MATH 532 98 Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q

so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above.

[email protected] MATH 532 99 = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q

so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x

[email protected] MATH 532 99 = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q

so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x

[email protected] MATH 532 99 = Qx,

so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x |{z} =Q

[email protected] MATH 532 99 so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q

[email protected] MATH 532 99 Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥.

[email protected] MATH 532 99 = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk

[email protected] MATH 532 99 = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk

[email protected] MATH 532 99 and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x|

[email protected] MATH 532 99 = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk

[email protected] MATH 532 99 = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk

[email protected] MATH 532 99 Together, Rx is the reflection of x about u⊥.

Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

[email protected] MATH 532 99 Unitary and Orthogonal Matrices Geometric interpretation

Consider kuk2 = 1, and note that Qx = (I − uuT )x is the orthogonal projection of x onto u⊥ as above. Also,

Q(Rx) = Q(I − 2uuT )x = Q (I − 2(I − Q)) x = (Q − 2Q + 2 Q2 )x = Qx, |{z} =Q so that Qx is also the orthogonal projection of Rx onto u⊥. Moreover, kx − Qxk = kx − (I − uuT )xk = |uT x| kuk = |uT x| and kQx − Rxk = k(Q − R)xk = k I − uuT − (I − 2uuT ) xk = kuuT xk = |uT x|.

Together, Rx is the reflection of x about u⊥.

[email protected] MATH 532 99 Remark However, these properties do not characterize a reflection, i.e., an orthogonal, symmetric and involutary matrix is not necessarily a reflection (see HW).

Unitary and Orthogonal Matrices Properties of elementary reflections

Theorem Let R be an elementary reflector. Then

R−1 = RT = R,

i.e., R is orthogonal, symmetric, and involutary.

[email protected] MATH 532 100 Unitary and Orthogonal Matrices Properties of elementary reflections

Theorem Let R be an elementary reflector. Then

R−1 = RT = R,

i.e., R is orthogonal, symmetric, and involutary.

Remark However, these properties do not characterize a reflection, i.e., an orthogonal, symmetric and involutary matrix is not necessarily a reflection (see HW).

[email protected] MATH 532 100 = I − 4uuT + 4u uT u uT = I, |{z} =1

−1 so that R = R. 

= I − 2uuT = R. Also,

R2 = (I − 2uuT )(I − 2uuT )

Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T

[email protected] MATH 532 101 = I − 4uuT + 4u uT u uT = I, |{z} =1

−1 so that R = R. 

= R. Also,

R2 = (I − 2uuT )(I − 2uuT )

Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT

[email protected] MATH 532 101 = I − 4uuT + 4u uT u uT = I, |{z} =1

−1 so that R = R. 

Also,

R2 = (I − 2uuT )(I − 2uuT )

Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT = R.

[email protected] MATH 532 101 = I − 4uuT + 4u uT u uT = I, |{z} =1

−1 so that R = R. 

Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT = R. Also,

R2 = (I − 2uuT )(I − 2uuT )

[email protected] MATH 532 101 = I,

−1 so that R = R. 

Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT = R. Also,

R2 = (I − 2uuT )(I − 2uuT ) = I − 4uuT + 4u uT u uT |{z} =1

[email protected] MATH 532 101 −1 so that R = R. 

Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT = R. Also,

R2 = (I − 2uuT )(I − 2uuT ) = I − 4uuT + 4u uT u uT = I, |{z} =1

[email protected] MATH 532 101 Unitary and Orthogonal Matrices

Proof. RT = (I − 2uuT )T = I − 2uuT = R. Also,

R2 = (I − 2uuT )(I − 2uuT ) = I − 4uuT + 4u uT u uT = I, |{z} =1

−1 so that R = R. 

[email protected] MATH 532 101 T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 T T T = 2(x x ± µkxke1 x) = 2v x.

To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1| and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1)

(9)

Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix.

[email protected] MATH 532 102 T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 T T T = 2(x x ± µkxke1 x) = 2v x.

and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1)

(9)

Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1|

[email protected] MATH 532 102 T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 T T T = 2(x x ± µkxke1 x) = 2v x.

Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1| and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1)

(9)

[email protected] MATH 532 102 T T T = 2(x x ± µkxke1 x) = 2v x.

Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1| and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1) T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 (9)

[email protected] MATH 532 102 = 2v T x.

Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1| and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1) T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 T T = 2(x x ± µkxke1 x) (9)

[email protected] MATH 532 102 Unitary and Orthogonal Matrices

Reflection of x onto e1

If we can construct a matrix R such that Rx = αe1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider ( 1 if x real, v = x ± µkxk e , where µ = 1 2 1 x1 if x1 complex, |x1| and note

T T v v = (x ± µkxk2e1) (x ± µkxk2e1) T T 2 2 = x x ± 2µkxk2e1 x + µ kxk2 |{z} =1 T T T = 2(x x ± µkxke1 x) = 2v x. (9)

[email protected] MATH 532 102 2v T x = x − v v T v | {z } (9) =1 = x − v

= ∓µkxk2 e1. | {z } =α

Remark These special reflections are used in the Householder variant of the QR factorization. For optimal of real matrices one lets ∓µ = sign(x1).

so that vv T x Rx = x − 2 v T v

Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v

[email protected] MATH 532 103 2v T x = x − v v T v | {z } (9) =1 = x − v

= ∓µkxk2 e1. | {z } =α

Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets ∓µ = sign(x1).

Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v so that vv T x Rx = x − 2 v T v

[email protected] MATH 532 103 = x − v

= ∓µkxk2 e1. | {z } =α

Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets ∓µ = sign(x1).

Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v so that vv T x 2v T x Rx = x − 2 = x − v v T v v T v | {z } (9) =1

[email protected] MATH 532 103 = ∓µkxk2 e1. | {z } =α

Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets ∓µ = sign(x1).

Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v so that vv T x 2v T x Rx = x − 2 = x − v v T v v T v | {z } (9) =1 = x − v

[email protected] MATH 532 103 Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets ∓µ = sign(x1).

Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v so that vv T x 2v T x Rx = x − 2 = x − v v T v v T v | {z } (9) =1 = x − v

= ∓µkxk2 e1. | {z } =α

[email protected] MATH 532 103 Unitary and Orthogonal Matrices Our Householder reflection was defined as vv T R = I − 2 v T v so that vv T x 2v T x Rx = x − 2 = x − v v T v v T v | {z } (9) =1 = x − v

= ∓µkxk2 e1. | {z } =α

Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets ∓µ = sign(x1).

[email protected] MATH 532 103 2 =⇒ R x = ∓µRe1 ⇐⇒ x = ∓µR∗1.

Therefore the matrix U = ∓R (taking |µ| = 1) is orthogonal (since R is) and contains x as its first column.

Thus, this allows us to construct an ON basis for Rn that contains x (see example in [Mey00].

Unitary and Orthogonal Matrices

Remark 2 −1 Since R = I (R = R) we have — whenever kxk2 = 1 —

Rx = ∓µe1

[email protected] MATH 532 104 ⇐⇒ x = ∓µR∗1.

Therefore the matrix U = ∓R (taking |µ| = 1) is orthogonal (since R is) and contains x as its first column.

Thus, this allows us to construct an ON basis for Rn that contains x (see example in [Mey00].

Unitary and Orthogonal Matrices

Remark 2 −1 Since R = I (R = R) we have — whenever kxk2 = 1 —

2 Rx = ∓µe1 =⇒ R x = ∓µRe1

[email protected] MATH 532 104 Therefore the matrix U = ∓R (taking |µ| = 1) is orthogonal (since R is) and contains x as its first column.

Thus, this allows us to construct an ON basis for Rn that contains x (see example in [Mey00].

Unitary and Orthogonal Matrices

Remark 2 −1 Since R = I (R = R) we have — whenever kxk2 = 1 —

2 Rx = ∓µe1 =⇒ R x = ∓µRe1 ⇐⇒ x = ∓µR∗1.

[email protected] MATH 532 104 Thus, this allows us to construct an ON basis for Rn that contains x (see example in [Mey00].

Unitary and Orthogonal Matrices

Remark 2 −1 Since R = I (R = R) we have — whenever kxk2 = 1 —

2 Rx = ∓µe1 =⇒ R x = ∓µRe1 ⇐⇒ x = ∓µR∗1.

Therefore the matrix U = ∓R (taking |µ| = 1) is orthogonal (since R is) and contains x as its first column.

[email protected] MATH 532 104 Unitary and Orthogonal Matrices

Remark 2 −1 Since R = I (R = R) we have — whenever kxk2 = 1 —

2 Rx = ∓µe1 =⇒ R x = ∓µRe1 ⇐⇒ x = ∓µR∗1.

Therefore the matrix U = ∓R (taking |µ| = 1) is orthogonal (since R is) and contains x as its first column.

Thus, this allows us to construct an ON basis for Rn that contains x (see example in [Mey00].

[email protected] MATH 532 104 Unitary and Orthogonal Matrices Rotations

We give only a brief overview (more details can be found in [Mey00]).

We begin in R2 and look for a matrix representation of the of a vector u into another vector v, counterclockwise by an angle θ:

Here u  kuk cos φ u = 1 = (10) u2 kuk sin φ v  kvk cos(φ + θ) v = 1 = v2 kvk sin(φ + θ) (11)

[email protected] MATH 532 105   (10) u cos θ − u sin θ = 1 2 u2 cos θ + u1 sin θ cos θ − sin θ = u = Pu, sin θ cos θ

where P is the rotation matrix.

Unitary and Orthogonal Matrices

We use the trigonometric identities

cos(A + B) = cos A cos B − sin A sin B sin(A + B) = sin A cos B + sin B cos A with A = φ, B = θ and kvk = kuk to get   (11) kvk cos(φ + θ) v = kvk sin(φ + θ) kuk (cos φ cos θ − sin φ sin θ) = kuk (sin φ cos θ + sin θ cos φ)

[email protected] MATH 532 106 cos θ − sin θ = u = Pu, sin θ cos θ

where P is the rotation matrix.

Unitary and Orthogonal Matrices

We use the trigonometric identities

cos(A + B) = cos A cos B − sin A sin B sin(A + B) = sin A cos B + sin B cos A with A = φ, B = θ and kvk = kuk to get   (11) kvk cos(φ + θ) v = kvk sin(φ + θ) kuk (cos φ cos θ − sin φ sin θ) = kuk (sin φ cos θ + sin θ cos φ)   (10) u cos θ − u sin θ = 1 2 u2 cos θ + u1 sin θ

[email protected] MATH 532 106 = Pu,

where P is the rotation matrix.

Unitary and Orthogonal Matrices

We use the trigonometric identities

cos(A + B) = cos A cos B − sin A sin B sin(A + B) = sin A cos B + sin B cos A with A = φ, B = θ and kvk = kuk to get   (11) kvk cos(φ + θ) v = kvk sin(φ + θ) kuk (cos φ cos θ − sin φ sin θ) = kuk (sin φ cos θ + sin θ cos φ)   (10) u cos θ − u sin θ = 1 2 u2 cos θ + u1 sin θ cos θ − sin θ = u sin θ cos θ

[email protected] MATH 532 106 Unitary and Orthogonal Matrices

We use the trigonometric identities

cos(A + B) = cos A cos B − sin A sin B sin(A + B) = sin A cos B + sin B cos A with A = φ, B = θ and kvk = kuk to get   (11) kvk cos(φ + θ) v = kvk sin(φ + θ) kuk (cos φ cos θ − sin φ sin θ) = kuk (sin φ cos θ + sin θ cos φ)   (10) u cos θ − u sin θ = 1 2 u2 cos θ + u1 sin θ cos θ − sin θ = u = Pu, sin θ cos θ where P is the rotation matrix.

[email protected] MATH 532 106  cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ = − cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ = I,

so that P is an orthogonal matrix.

PT is also a rotation matrix (by an angle −θ).

Unitary and Orthogonal Matrices

Remark Note that  cos θ sin θ  cos θ − sin θ PT P = − sin θ cos θ sin θ cos θ

[email protected] MATH 532 107 = I,

so that P is an orthogonal matrix.

PT is also a rotation matrix (by an angle −θ).

Unitary and Orthogonal Matrices

Remark Note that  cos θ sin θ  cos θ − sin θ PT P = − sin θ cos θ sin θ cos θ  cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ = − cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ

[email protected] MATH 532 107 so that P is an orthogonal matrix.

PT is also a rotation matrix (by an angle −θ).

Unitary and Orthogonal Matrices

Remark Note that  cos θ sin θ  cos θ − sin θ PT P = − sin θ cos θ sin θ cos θ  cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ = − cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ = I,

[email protected] MATH 532 107 PT is also a rotation matrix (by an angle −θ).

Unitary and Orthogonal Matrices

Remark Note that  cos θ sin θ  cos θ − sin θ PT P = − sin θ cos θ sin θ cos θ  cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ = − cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ = I,

so that P is an orthogonal matrix.

[email protected] MATH 532 107 Unitary and Orthogonal Matrices

Remark Note that  cos θ sin θ  cos θ − sin θ PT P = − sin θ cos θ sin θ cos θ  cos2 θ + sin2 θ − cos θ sin θ + cos θ sin θ = − cos θ sin θ + cos θ sin θ sin2 θ + cos2 θ = I,

so that P is an orthogonal matrix.

PT is also a rotation matrix (by an angle −θ).

[email protected] MATH 532 107 For example, rotation about the x-axis (in the yz-) is accomplished with

1 0 0  Pyz = 0 cos θ − sin θ 0 sin θ cos θ

Rotation about the y and z-axes is done analogously.

Unitary and Orthogonal Matrices

Rotations about a coordinate axis in R3 are very similar. Such rotations are referred to a plane rotations.

[email protected] MATH 532 108 Rotation about the y and z-axes is done analogously.

Unitary and Orthogonal Matrices

Rotations about a coordinate axis in R3 are very similar. Such rotations are referred to a plane rotations.

For example, rotation about the x-axis (in the yz-plane) is accomplished with

1 0 0  Pyz = 0 cos θ − sin θ 0 sin θ cos θ

[email protected] MATH 532 108 Unitary and Orthogonal Matrices

Rotations about a coordinate axis in R3 are very similar. Such rotations are referred to a plane rotations.

For example, rotation about the x-axis (in the yz-plane) is accomplished with

1 0 0  Pyz = 0 cos θ − sin θ 0 sin θ cos θ

Rotation about the y and z-axes is done analogously.

[email protected] MATH 532 108 Definition An orthogonal matrix of the form

1   ..   .     c s    ← i  1     .  Pij =  .   .    ← j  −s c     1     .   ..  1 ↑ ↑ i j

with c2 + s2 = 1 is called a plane rotation (or Givens rotation).

Note that the orientation is reversed from the earlier discussion.

Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher .

[email protected] MATH 532 109 Note that the orientation is reversed from the earlier discussion.

Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher dimensions. Definition An orthogonal matrix of the form

1   ..   .     c s    ← i  1     .  Pij =  .   .    ← j  −s c     1     .   ..  1 ↑ ↑ i j with c2 + s2 = 1 is called a plane rotation (or Givens rotation).

[email protected] MATH 532 109 Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher dimensions. Definition An orthogonal matrix of the form

1   ..   .     c s    ← i  1     .  Pij =  .   .    ← j  −s c     1     .   ..  1 ↑ ↑ i j with c2 + s2 = 1 is called a plane rotation (or Givens rotation).

Note that the orientation is reversed from the earlier discussion. [email protected] MATH 532 109 th This shows thatP ij zeros the j component of x.

Unitary and Orthogonal Matrices

Usually we set x x c = i , s = j q 2 2 q 2 2 xi + xj xi + xj T since then for x = x1 ··· xn     x1 x1 . .  .   .       x 2+x 2     i j  cx + sx q  i j   x 2+x 2     i j  P x =  .  =   ij  .   .     .  −sxi + cxj       0   .     .   .   .  xn xn

[email protected] MATH 532 110 Unitary and Orthogonal Matrices

Usually we set x x c = i , s = j q 2 2 q 2 2 xi + xj xi + xj T since then for x = x1 ··· xn     x1 x1 . .  .   .       x 2+x 2     i j  cx + sx q  i j   x 2+x 2     i j  P x =  .  =   ij  .   .     .  −sxi + cxj       0   .     .   .   .  xn xn

th This shows thatP ij zeros the j component of x.

[email protected] MATH 532 110 Therefore, the sequence

P = Pin ··· Pi,i+1Pi,i−1 ··· Pi1

n of Givens rotations rotates the vector x ∈ R onto ei , i.e.,

Px = kxk2ei .

Moreover, the matrix P is orthogonal.

Unitary and Orthogonal Matrices

x2+x2 q Note that i j = x2 + x2 so that repeatedly applying Givens q 2 2 i j xi +xj rotations Pij with the same i, but different values of j will zero out all but the ith component of x, and that component will become q 2 2 x1 + ... + xn = kxk2.

[email protected] MATH 532 111 Moreover, the matrix P is orthogonal.

Unitary and Orthogonal Matrices

x2+x2 q Note that i j = x2 + x2 so that repeatedly applying Givens q 2 2 i j xi +xj rotations Pij with the same i, but different values of j will zero out all but the ith component of x, and that component will become q 2 2 x1 + ... + xn = kxk2.

Therefore, the sequence

P = Pin ··· Pi,i+1Pi,i−1 ··· Pi1

n of Givens rotations rotates the vector x ∈ R onto ei , i.e.,

Px = kxk2ei .

[email protected] MATH 532 111 Unitary and Orthogonal Matrices

x2+x2 q Note that i j = x2 + x2 so that repeatedly applying Givens q 2 2 i j xi +xj rotations Pij with the same i, but different values of j will zero out all but the ith component of x, and that component will become q 2 2 x1 + ... + xn = kxk2.

Therefore, the sequence

P = Pin ··· Pi,i+1Pi,i−1 ··· Pi1

n of Givens rotations rotates the vector x ∈ R onto ei , i.e.,

Px = kxk2ei .

Moreover, the matrix P is orthogonal.

[email protected] MATH 532 111 Householder reflections are in general more efficient, but for sparse matrices Givens rotations are more efficient because they can be applied more selectively.

Unitary and Orthogonal Matrices

Remark Givens rotations can be used as an alternative to Householder reflections to construct a QR factorization.

[email protected] MATH 532 112 Unitary and Orthogonal Matrices

Remark Givens rotations can be used as an alternative to Householder reflections to construct a QR factorization.

Householder reflections are in general more efficient, but for sparse matrices Givens rotations are more efficient because they can be applied more selectively.

[email protected] MATH 532 112 Orthogonal Reduction Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 113 For the QR factorization we will use orthogonal Householder reflectors Rk to get Rn−1 ··· R2R1A = T, where T is upper triangular, i.e., we have an orthogonal reduction.

Orthogonal Reduction Orthogonal Reduction

Recall the form of LU factorization ():

Tn−1 ··· T2T1A = U,

whereT k are lower triangular and U is upper triangular, i.e., we have a triangular reduction.

[email protected] MATH 532 114 Orthogonal Reduction Orthogonal Reduction

Recall the form of LU factorization (Gaussian elimination):

Tn−1 ··· T2T1A = U,

whereT k are lower triangular and U is upper triangular, i.e., we have a triangular reduction.

For the QR factorization we will use orthogonal Householder reflectors Rk to get Rn−1 ··· R2R1A = T, where T is upper triangular, i.e., we have an orthogonal reduction.

[email protected] MATH 532 114 Now we explain how to use these Householder reflectors to convert an m × n matrix A to an upper triangular matrix of the same size, i.e., how to do a full QR factorization.

Orthogonal Reduction

Recall Householder reflectors vv T R = I − 2 , with v = x ± µkxke , v T v 1 so that Rx = ∓µkxke1 and µ = 1 for x real.

[email protected] MATH 532 115 Orthogonal Reduction

Recall Householder reflectors vv T R = I − 2 , with v = x ± µkxke , v T v 1 so that Rx = ∓µkxke1 and µ = 1 for x real.

Now we explain how to use these Householder reflectors to convert an m × n matrix A to an upper triangular matrix of the same size, i.e., how to do a full QR factorization.

[email protected] MATH 532 115 Then,R 1 applied to all of A yields   t11 t12 ··· t1n  0 ∗ · · · ∗  t tT  R A =   = 11 1 1  . . .   . . .  0 A2 0 ∗ · · · ∗

Orthogonal Reduction

Apply Householder reflector to the first column of A:

 vv T  R A = I − 2 A with v = A ± kA ke 1 ∗1 v T v ∗1 ∗1 ∗1 1   t11  0  = ∓kA ke =   ∗1 1  .   .  0

[email protected] MATH 532 116 t tT  = 11 1 0 A2

Orthogonal Reduction

Apply Householder reflector to the first column of A:

 vv T  R A = I − 2 A with v = A ± kA ke 1 ∗1 v T v ∗1 ∗1 ∗1 1   t11  0  = ∓kA ke =   ∗1 1  .   .  0

Then,R 1 applied to all of A yields   t11 t12 ··· t1n  0 ∗ · · · ∗  R A =   1  . . .   . . .  0 ∗ · · · ∗

[email protected] MATH 532 116 Orthogonal Reduction

Apply Householder reflector to the first column of A:

 vv T  R A = I − 2 A with v = A ± kA ke 1 ∗1 v T v ∗1 ∗1 ∗1 1   t11  0  = ∓kA ke =   ∗1 1  .   .  0

Then,R 1 applied to all of A yields   t11 t12 ··· t1n  0 ∗ · · · ∗  t tT  R A =   = 11 1 1  . . .   . . .  0 A2 0 ∗ · · · ∗

[email protected] MATH 532 116    T  t11 t12 ··· t1n t11 t1 T ˆ =  t22 t2  0 R2A2 0 0 A3

Orthogonal Reduction

Next, we apply the same idea to A2, i.e., we let

1 0T  R2 = ˆ 0 R2

Then

R2R1A =

[email protected] MATH 532 117   t11 t12 ··· t1n T  t22 t2  0 0 A3

Orthogonal Reduction

Next, we apply the same idea to A2, i.e., we let

1 0T  R2 = ˆ 0 R2

Then  T  t11 t1 R2R1A = ˆ = 0 R2A2

[email protected] MATH 532 117 Orthogonal Reduction

Next, we apply the same idea to A2, i.e., we let

1 0T  R2 = ˆ 0 R2

Then    T  t11 t12 ··· t1n t11 t1 T R2R1A = ˆ =  t22 t2  0 R2A2 0 0 A3

[email protected] MATH 532 117 or   t11 ∗ . . Rm ··· R2R1 A =  .. . ∗ whenever n > m | {z }   =P O tmm

Since each Rk is orthogonal (unitary for complex A) we have

PA = T

withP m × m orthogonal and T m × n upper triangular, i.e.,

A = QR (Q = PT , R = T)

Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e.,   t11 ∗  .. .   . .  Rn ··· R2R1 A =   whenever m > n | {z }  O tnn =P O

[email protected] MATH 532 118 Since each Rk is orthogonal (unitary for complex A) we have

PA = T

withP m × m orthogonal and T m × n upper triangular, i.e.,

A = QR (Q = PT , R = T)

Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e.,   t11 ∗  .. .   . .  Rn ··· R2R1 A =   whenever m > n | {z }  O tnn =P O or   t11 ∗ . . Rm ··· R2R1 A =  .. . ∗ whenever n > m | {z }   =P O tmm

[email protected] MATH 532 118 A = QR (Q = PT , R = T)

Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e.,   t11 ∗  .. .   . .  Rn ··· R2R1 A =   whenever m > n | {z }  O tnn =P O or   t11 ∗ . . Rm ··· R2R1 A =  .. . ∗ whenever n > m | {z }   =P O tmm

Since each Rk is orthogonal (unitary for complex A) we have

PA = T withP m × m orthogonal and T m × n upper triangular, i.e.,

[email protected] MATH 532 118 Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e.,   t11 ∗  .. .   . .  Rn ··· R2R1 A =   whenever m > n | {z }  O tnn =P O or   t11 ∗ . . Rm ··· R2R1 A =  .. . ∗ whenever n > m | {z }   =P O tmm

Since each Rk is orthogonal (unitary for complex A) we have

PA = T withP m × m orthogonal and T m × n upper triangular, i.e.,

A = QR (Q = PT , R = T)

[email protected] MATH 532 118 This gives us the full QR factorization, whereas MGS gave us the reduced QR factorization (with m × n Q and n × n R).

Orthogonal Reduction

Remark This is similar to obtaining the QR factorization via MGS, but now Q is orthogonal (square) and R is rectangular.

[email protected] MATH 532 119 Orthogonal Reduction

Remark This is similar to obtaining the QR factorization via MGS, but now Q is orthogonal (square) and R is rectangular.

This gives us the full QR factorization, whereas MGS gave us the reduced QR factorization (with m × n Q and n × n R).

[email protected] MATH 532 119 v v T = − 1 1 , = ± k k R1 I 2 T with v 1 A∗1 A∗1 e1 v 1 v 1 so that √  2 R1A = ∓kA∗1ke1 = ∓  0  . 0 √ Thus we take the ± sign as “−” so that t11 = 2 > 0.

Orthogonal Reduction

Example We use Householder reflections to find the QR factorization (whereR has positive diagonal elements) of

1 2 0 A = 0 1 1 . 1 0 1

[email protected] MATH 532 120 √  2 ∓  0  . 0 √ Thus we take the ± sign as “−” so that t11 = 2 > 0.

Orthogonal Reduction

Example We use Householder reflections to find the QR factorization (whereR has positive diagonal elements) of

1 2 0 A = 0 1 1 . 1 0 1

v v T = − 1 1 , = ± k k R1 I 2 T with v 1 A∗1 A∗1 e1 v 1 v 1 so that

R1A = ∓kA∗1ke1 =

[email protected] MATH 532 120 √ Thus we take the ± sign as “−” so that t11 = 2 > 0.

Orthogonal Reduction

Example We use Householder reflections to find the QR factorization (whereR has positive diagonal elements) of

1 2 0 A = 0 1 1 . 1 0 1

v v T = − 1 1 , = ± k k R1 I 2 T with v 1 A∗1 A∗1 e1 v 1 v 1 so that √  2 R1A = ∓kA∗1ke1 = ∓  0  . 0

[email protected] MATH 532 120 Orthogonal Reduction

Example We use Householder reflections to find the QR factorization (whereR has positive diagonal elements) of

1 2 0 A = 0 1 1 . 1 0 1

v v T = − 1 1 , = ± k k R1 I 2 T with v 1 A∗1 A∗1 e1 v 1 v 1 so that √  2 R1A = ∓kA∗1ke1 = ∓  0  . 0 √ Thus we take the ± sign as “−” so that t11 = 2 > 0.

[email protected] MATH 532 120 √ √ = (1 − 2) · 2 + 0 · 1 + 1 · 0 = 2 − 2 2 √ = (1 − 2) · 0 + 0 · 1 + 1 · 1 = 1

Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

T v 1 A∗2 T v 1 A∗3

Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1.

[email protected] MATH 532 121 √ = 2 − 2 2 √ = (1 − 2) · 0 + 0 · 1 + 1 · 1 = 1

Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1. √ T v 1 A∗2 = (1 − 2) · 2 + 0 · 1 + 1 · 0 T v 1 A∗3

[email protected] MATH 532 121 √ = (1 − 2) · 0 + 0 · 1 + 1 · 1 = 1

Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1. √ √ T v 1 A∗2 = (1 − 2) · 2 + 0 · 1 + 1 · 0 = 2 − 2 2 T v 1 A∗3

[email protected] MATH 532 121 = 1

Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1. √ √ T v A∗2 = (1 − 2) · 2 + 0 · 1 + 1 · 0 = 2 − 2 2 1 √ T v 1 A∗3 = (1 − 2) · 0 + 0 · 1 + 1 · 1

[email protected] MATH 532 121 Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1. √ √ T v A∗2 = (1 − 2) · 2 + 0 · 1 + 1 · 0 = 2 − 2 2 1 √ T v 1 A∗3 = (1 − 2) · 0 + 0 · 1 + 1 · 1 = 1

[email protected] MATH 532 121 Orthogonal Reduction

Example ((cont.))

To find R1A we can either compute R1 using the formula above and then compute the matrix-matrix product, or — more cheaply — note that ! v v T v = − 1 1 = − T 1 , R1x I 2 T x x 2v 1 x T v 1 v 1 v 1 v 1 T so that we can compute v 1 A∗j , j = 2, 3, instead of the full R1. √ √ T v A∗2 = (1 − 2) · 2 + 0 · 1 + 1 · 0 = 2 − 2 2 1 √ T v 1 A∗3 = (1 − 2) · 0 + 0 · 1 + 1 · 1 = 1

Also √ 1 − 2 v 1 1 2 = √  0  v T v 1 1 2 − 2 1

[email protected] MATH 532 121 √ 2 √ 1 − 2 √ 2 − 2 2  2  = 1 − √  0  = √ 1 2 0 2 − 2 1 | {z√ } =− 2 √  2  2 R A =  1  1 ∗3  √  2 − 2 so that √ √ √  2 2 2 ! 2 1 1 R A =  0 1 1  , A = √ √ 1  √ √  2 2 2 2 − 2 0 2 − 2

Orthogonal Reduction

Example ((cont.)) Therefore

R1A∗2

[email protected] MATH 532 122 √  2  = √ 1 2

√  2  2 R A =  1  1 ∗3  √  2 − 2 so that √ √ √  2 2 2 ! 2 1 1 R A =  0 1 1  , A = √ √ 1  √ √  2 2 2 2 − 2 0 2 − 2

Orthogonal Reduction

Example ((cont.)) Therefore √ 2 √ 1 − 2 2 − 2 2 R1A∗2 = 1 − √  0  0 2 − 2 1 | {z√ } =− 2

[email protected] MATH 532 122 √  2  2 R A =  1  1 ∗3  √  2 − 2 so that √ √ √  2 2 2 ! 2 1 1 R A =  0 1 1  , A = √ √ 1  √ √  2 2 2 2 − 2 0 2 − 2

Orthogonal Reduction

Example ((cont.)) Therefore √ 2 √ 1 − 2 √ 2 − 2 2  2  R1A∗2 = 1 − √  0  = √ 1 2 0 2 − 2 1 | {z√ } =− 2

[email protected] MATH 532 122 so that √ √ √  2 2 2 ! 2 1 1 R A =  0 1 1  , A = √ √ 1  √ √  2 2 2 2 − 2 0 2 − 2

Orthogonal Reduction

Example ((cont.)) Therefore √ 2 √ 1 − 2 √ 2 − 2 2  2  R1A∗2 = 1 − √  0  = √ 1 2 0 2 − 2 1 | {z√ } =− 2 √  2  2 R A =  1  1 ∗3  √  2 − 2

[email protected] MATH 532 122 √ √ √  2 2 2 ! 2 1 1  0 1 1  , A = √ √  √ √  2 2 2 2 − 2 0 2 − 2

Orthogonal Reduction

Example ((cont.)) Therefore √ 2 √ 1 − 2 √ 2 − 2 2  2  R1A∗2 = 1 − √  0  = √ 1 2 0 2 − 2 1 | {z√ } =− 2 √  2  2 R A =  1  1 ∗3  √  2 − 2 so that

R1A =

[email protected] MATH 532 122 Orthogonal Reduction

Example ((cont.)) Therefore √ 2 √ 1 − 2 √ 2 − 2 2  2  R1A∗2 = 1 − √  0  = √ 1 2 0 2 − 2 1 | {z√ } =− 2 √  2  2 R A =  1  1 ∗3  √  2 − 2 so that √ √ √  2 2 2 ! 2 1 1 R A =  0 1 1  , A = √ √ 1  √ √  2 2 2 2 − 2 0 2 − 2

[email protected] MATH 532 122 √ 1 − 3 = √ 2 √ √ √ v 1 1 − 3 T ( ) = , T ( ) = − , 2 = √ √ v 2 A2 ∗1 3 3 v 2 A2 ∗2 3 2 T v 2 v 2 3 − 3 2 so √ !   0 ˆ 3 ˆ √ R2(A2)∗1 = , R2(A2)∗2 = 6 0 2 1 0T  Using R = we get 2 ˆ √ √ √ 0 R2  2  2 √2 2 R R A =  0 3 0  = T 2 1  √  | {z } 6 =P 0 0 2

Orthogonal Reduction Example ((cont.)) Next v ˆ = − T 2 = ( ) − k( ) k R2x x 2v 2 x T with v 2 A2 ∗1 A2 ∗1 e1 v 2 v 2

[email protected] MATH 532 123 √ √ √ v 1 1 − 3 T ( ) = , T ( ) = − , 2 = √ √ v 2 A2 ∗1 3 3 v 2 A2 ∗2 3 2 T v 2 v 2 3 − 3 2 so √ !   0 ˆ 3 ˆ √ R2(A2)∗1 = , R2(A2)∗2 = 6 0 2 1 0T  Using R = we get 2 ˆ √ √ √ 0 R2  2  2 √2 2 R R A =  0 3 0  = T 2 1  √  | {z } 6 =P 0 0 2

Orthogonal Reduction Example ((cont.)) Next √ v 1 − 3 ˆ = − T 2 = ( ) − k( ) k = √ R2x x 2v 2 x T with v 2 A2 ∗1 A2 ∗1 e1 v 2 v 2 2

[email protected] MATH 532 123 so √ !   0 ˆ 3 ˆ √ R2(A2)∗1 = , R2(A2)∗2 = 6 0 2 1 0T  Using R = we get 2 ˆ √ √ √ 0 R2  2  2 √2 2 R R A =  0 3 0  = T 2 1  √  | {z } 6 =P 0 0 2

Orthogonal Reduction Example ((cont.)) Next √ v 1 − 3 ˆ = − T 2 = ( ) − k( ) k = √ R2x x 2v 2 x T with v 2 A2 ∗1 A2 ∗1 e1 v 2 v 2 2 √ √ √ v 1 1 − 3 T ( ) = , T ( ) = − , 2 = √ √ v 2 A2 ∗1 3 3 v 2 A2 ∗2 3 2 T v 2 v 2 3 − 3 2

[email protected] MATH 532 123 1 0T  Using R = we get 2 ˆ √ √ √ 0 R2  2  2 √2 2 R R A =  0 3 0  = T 2 1  √  | {z } 6 =P 0 0 2

Orthogonal Reduction Example ((cont.)) Next √ v 1 − 3 ˆ = − T 2 = ( ) − k( ) k = √ R2x x 2v 2 x T with v 2 A2 ∗1 A2 ∗1 e1 v 2 v 2 2 √ √ √ v 1 1 − 3 T ( ) = , T ( ) = − , 2 = √ √ v 2 A2 ∗1 3 3 v 2 A2 ∗2 3 2 T v 2 v 2 3 − 3 2 so √ !   0 ˆ 3 ˆ √ R2(A2)∗1 = , R2(A2)∗2 = 6 0 2

[email protected] MATH 532 123 Orthogonal Reduction Example ((cont.)) Next √ v 1 − 3 ˆ = − T 2 = ( ) − k( ) k = √ R2x x 2v 2 x T with v 2 A2 ∗1 A2 ∗1 e1 v 2 v 2 2 √ √ √ v 1 1 − 3 T ( ) = , T ( ) = − , 2 = √ √ v 2 A2 ∗1 3 3 v 2 A2 ∗2 3 2 T v 2 v 2 3 − 3 2 so √ !   0 ˆ 3 ˆ √ R2(A2)∗1 = , R2(A2)∗2 = 6 0 2 1 0T  Using R = we get 2 ˆ √ √ √ 0 R2  2  2 √2 2 R R A =  0 3 0  = T 2 1  √  | {z } 6 =P 0 0 2

[email protected] MATH 532 123 One could also obtain the same answer using Givens rotations (compare [Mey00, Example 5.7.2]).

Orthogonal Reduction

Remark As mentioned earlier, the factor R of the QR factorization is given by the matrix T.

The factor Q = PT is not explicitly given in the example.

[email protected] MATH 532 124 Orthogonal Reduction

Remark As mentioned earlier, the factor R of the QR factorization is given by the matrix T.

The factor Q = PT is not explicitly given in the example.

One could also obtain the same answer using Givens rotations (compare [Mey00, Example 5.7.2]).

[email protected] MATH 532 124 Remark In this n × n case the reduced and full QR factorizations coincide, i.e., the results obtained via Gram–Schmidt, Householder and Givens should be identical.

Orthogonal Reduction

Theorem Let A be an n × n nonsingular real matrix. Then the factorization

A = QR with n × n orthogonal matrix Q and n × n upper triangular matrix R with positive diagonal entries is unique.

[email protected] MATH 532 125 Orthogonal Reduction

Theorem Let A be an n × n nonsingular real matrix. Then the factorization

A = QR with n × n orthogonal matrix Q and n × n upper triangular matrix R with positive diagonal entries is unique.

Remark In this n × n case the reduced and full QR factorizations coincide, i.e., the results obtained via Gram–Schmidt, Householder and Givens should be identical.

[email protected] MATH 532 125 T −1 Q2 Q1 = R2R1 = U.

−1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

Moreover, since U is orthogonal u11 = 1.

Orthogonal Reduction

Proof Assume we have two QR factorizations

A = Q1R1 = Q2R2 ⇐⇒

[email protected] MATH 532 126 −1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

Moreover, since U is orthogonal u11 = 1.

Orthogonal Reduction

Proof Assume we have two QR factorizations

T −1 A = Q1R1 = Q2R2 ⇐⇒ Q2 Q1 = R2R1 = U.

[email protected] MATH 532 126 Therefore U has all of these properties. Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

Moreover, since U is orthogonal u11 = 1.

Orthogonal Reduction

Proof Assume we have two QR factorizations

T −1 A = Q1R1 = Q2R2 ⇐⇒ Q2 Q1 = R2R1 = U.

−1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal.

[email protected] MATH 532 126 Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

Moreover, since U is orthogonal u11 = 1.

Orthogonal Reduction

Proof Assume we have two QR factorizations

T −1 A = Q1R1 = Q2R2 ⇐⇒ Q2 Q1 = R2R1 = U.

−1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal. Therefore U has all of these properties.

[email protected] MATH 532 126 Moreover, since U is orthogonal u11 = 1.

Orthogonal Reduction

Proof Assume we have two QR factorizations

T −1 A = Q1R1 = Q2R2 ⇐⇒ Q2 Q1 = R2R1 = U.

−1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

[email protected] MATH 532 126 Orthogonal Reduction

Proof Assume we have two QR factorizations

T −1 A = Q1R1 = Q2R2 ⇐⇒ Q2 Q1 = R2R1 = U.

−1 Now,R 2R1 is upper triangular with positive diagonal (since each T factor is) andQ 2 Q1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular   u11  0  U =   . ∗1  .   .  0

Moreover, since U is orthogonal u11 = 1.

[email protected] MATH 532 126 = u12 = 0

since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

Comparing all the other pairs of columns of U shows that U = I, and therefore Q1 = Q2 and R1 = R2. 

Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0    .   .  0

[email protected] MATH 532 127 = 0

since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

Comparing all the other pairs of columns of U shows that U = I, and therefore Q1 = Q2 and R1 = R2. 

Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0   = u12  .   .  0

[email protected] MATH 532 127 since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

Comparing all the other pairs of columns of U shows that U = I, and therefore Q1 = Q2 and R1 = R2. 

Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0   = u12 = 0  .   .  0

[email protected] MATH 532 127 Comparing all the other pairs of columns of U shows that U = I, and therefore Q1 = Q2 and R1 = R2. 

Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0   = u12 = 0  .   .  0 since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

[email protected] MATH 532 127 Q1 = Q2 and R1 = R2. 

Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0   = u12 = 0  .   .  0 since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

Comparing all the other pairs of columns of U shows that U = I, and therefore

[email protected] MATH 532 127 Orthogonal Reduction

Proof (cont.) Next,   u12 u22   T   0  U∗1U∗2 = 1 0 ··· 0   = u12 = 0  .   .  0 since the columns of U are orthogonal, and the fact that kU∗2k = 1 implies u22 = 1.

Comparing all the other pairs of columns of U shows that U = I, and therefore Q1 = Q2 and R1 = R2. 

[email protected] MATH 532 127 2 To find a least square solution, use QR factorization:

Ax = b ⇐⇒ QRx = b ⇐⇒ Rx = QT b.

Usually the reduced QR factorization is all that’s needed.

Orthogonal Reduction Recommendations (so far) for solution of Ax = b

1 If A is square and nonsingular, then use LU factorization with partial pivoting. This is stable for most practical problems and n3 requires O( 3 ) operations.

[email protected] MATH 532 128 Orthogonal Reduction Recommendations (so far) for solution of Ax = b

1 If A is square and nonsingular, then use LU factorization with partial pivoting. This is stable for most practical problems and n3 requires O( 3 ) operations.

2 To find a least square solution, use QR factorization:

Ax = b ⇐⇒ QRx = b ⇐⇒ Rx = QT b.

Usually the reduced QR factorization is all that’s needed.

[email protected] MATH 532 128 Orthogonal Reduction

Even though (for square nonsingular A) the Gram–Schmidt, Householder and Givens versions of the QR factorization are equivalent (due to the uniqueness theorem), we have — for general A — that classical GS is not stable, modified GS is stable for least squares, but unstable for QR (since it has problems maintaining orthogonality), Householder and Givens are stable, both for least squares and QR

[email protected] MATH 532 129 Orthogonal Reduction Computational cost (for n × n matrices)

n3 LU with partial pivoting: O( 3 ) Gram–Schmidt: O(n3) 2n3 Householder: O( 3 ) 4n3 Givens: O( 3 )

Householder reflections are often the preferred method since they provide both stability and also decent efficiency.

[email protected] MATH 532 130 Complementary Subspaces Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 131 Example Any two lines through the origin in R2 are complementary. Any plane through the origin in R3 is complementary to any through the origin not contained in the plane. Two planes through the origin in R3 are not complementary since they must intersect in a line.

Complementary Subspaces Complementary Subspaces Definition Let V be a vector space and X , Y ⊆ V be subspaces. X and Y are called complementary provided

V = X + Y and X ∩ Y = {0}.

In this case, V is also called the direct sum of X and Y, and we write

V = X ⊕ Y.

[email protected] MATH 532 132 Complementary Subspaces Complementary Subspaces Definition Let V be a vector space and X , Y ⊆ V be subspaces. X and Y are called complementary provided

V = X + Y and X ∩ Y = {0}.

In this case, V is also called the direct sum of X and Y, and we write

V = X ⊕ Y. Example Any two lines through the origin in R2 are complementary. Any plane through the origin in R3 is complementary to any line through the origin not contained in the plane. Two planes through the origin in R3 are not complementary since they must intersect in a line.

[email protected] MATH 532 132 Proof. See [Mey00].

Definition Suppose V = X ⊕ Y, i.e., any v ∈ V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X .

Complementary Subspaces Theorem

Let V be a vector space, and X , Y ⊆ V be subspaces with bases BX and BY . The following are equivalent: 1 V = X ⊕ Y. 2 For every v ∈ V there exist unique x ∈ X and y ∈ Y such that v = x + y.

3 BX ∩ BY = {} and BX ∪ BY is a basis for V.

[email protected] MATH 532 133 Definition Suppose V = X ⊕ Y, i.e., any v ∈ V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X .

Complementary Subspaces Theorem

Let V be a vector space, and X , Y ⊆ V be subspaces with bases BX and BY . The following are equivalent: 1 V = X ⊕ Y. 2 For every v ∈ V there exist unique x ∈ X and y ∈ Y such that v = x + y.

3 BX ∩ BY = {} and BX ∪ BY is a basis for V.

Proof. See [Mey00].

[email protected] MATH 532 133 Complementary Subspaces Theorem

Let V be a vector space, and X , Y ⊆ V be subspaces with bases BX and BY . The following are equivalent: 1 V = X ⊕ Y. 2 For every v ∈ V there exist unique x ∈ X and y ∈ Y such that v = x + y.

3 BX ∩ BY = {} and BX ∪ BY is a basis for V.

Proof. See [Mey00].

Definition Suppose V = X ⊕ Y, i.e., any v ∈ V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X .

[email protected] MATH 532 133 Complementary Subspaces Properties of projectors Theorem Let X , Y be complementary subspaces of V. Let P, defined by Pv = x, be the projector onto X along Y. Then 1 P is unique. 2 P2 = P, i.e., P is idempotent. 3 I − P is the complementary projector (onto Y along X ). 4 R(P) = {x : Px = x} = X (“fixed points” for P). 5 N(I − P) = X = R(P) and R(I − P) = N(P) = Y. 6 If V = Rn (or Cn), then −1 P = XO XY   IO −1 = XY XY , OO where the columns of X and Y are bases for X and Y.

[email protected] MATH 532 134 Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y =

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2.

[email protected] MATH 532 135 Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

P(Pv) = Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y =

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v =

[email protected] MATH 532 135 Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y =

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) =

[email protected] MATH 532 135 Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y =

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x.

[email protected] MATH 532 135 Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

3 Using the unique decomposition of v we can write

v = x + y =

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x. Together we therefore have P2 = P.

[email protected] MATH 532 135 Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y =

[email protected] MATH 532 135 ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y = Pv + y

[email protected] MATH 532 135 Complementary Subspaces

Proof

1 Assume P1v = x = P2v for all v ∈ V. But then P1 = P2. 2 We know Pv = x for every v ∈ V so that P2v = P(Pv) = Px = x. Together we therefore have P2 = P. 3 Using the unique decomposition of v we can write

v = x + y = Pv + y ⇐⇒ (I − P)v = y,

the projection of v onto Y along X .

[email protected] MATH 532 135 if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since

[email protected] MATH 532 136 On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P).

[email protected] MATH 532 136 (2) = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

Px = P2v

[email protected] MATH 532 136 Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

[email protected] MATH 532 136 5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

[email protected] MATH 532 136 The claim R(I − P) = Y = N(P) is shown similarly.

Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P).

[email protected] MATH 532 136 Complementary Subspaces Proof (cont.) 4 Note that x ∈ R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x ∈ R(P) then x = Pv for some v ∈ V and so

(2) Px = P2v = Pv = x.

Therefore

R(P) = {x : x = Pv, v ∈ V} = X = {x : Px = x}.

5 Since N(I − P) = {x :(I − P)x = 0}, and

(I − P)x = 0 ⇐⇒ Px = x

we have N(I − P) = X = R(P). The claim R(I − P) = Y = N(P) is shown similarly.

[email protected] MATH 532 136 Then the columns of B form a basis for V and B is nonsingular. From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y. So PB = P XY = XO or −1 P = XO B−1 = XY . This establishes the first part of (6). The second part follows by noting that

 IO  IO B = XY = XO . OO OO



Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively.

[email protected] MATH 532 137 From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y. So PB = P XY = XO or −1 P = XO B−1 = XY . This establishes the first part of (6). The second part follows by noting that

 IO  IO B = XY = XO . OO OO



Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular.

[email protected] MATH 532 137 So PB = P XY = XO or −1 P = XO B−1 = XY . This establishes the first part of (6). The second part follows by noting that

 IO  IO B = XY = XO . OO OO



Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y.

[email protected] MATH 532 137 XO or −1 P = XO B−1 = XY . This establishes the first part of (6). The second part follows by noting that

 IO  IO B = XY = XO . OO OO



Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y. So PB = P XY =

[email protected] MATH 532 137 The second part follows by noting that

 IO  IO B = XY = XO . OO OO



Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y. So PB = P XY = XO or −1 P = XO B−1 = XY . This establishes the first part of (6).

[email protected] MATH 532 137 Complementary Subspaces Proof (cont.)  6 Take B = XY , where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we haveP x = x, where x can be any column of X. Also,P y = 0, where y is any column of Y. So PB = P XY = XO or −1 P = XO B−1 = XY . This establishes the first part of (6). The second part follows by noting that

 IO  IO B = XY = XO . OO OO

 [email protected] MATH 532 137 Proof. One direction is given above. For the other see [Mey00].

Remark This theorem is sometimes used to define projectors.

Complementary Subspaces

We just saw that any projector is idempotent, i.e., P2 = P. In fact, Theorem A matrix P is a projector if and only if P2 = P.

[email protected] MATH 532 138 Remark This theorem is sometimes used to define projectors.

Complementary Subspaces

We just saw that any projector is idempotent, i.e., P2 = P. In fact, Theorem A matrix P is a projector if and only if P2 = P.

Proof. One direction is given above. For the other see [Mey00].

[email protected] MATH 532 138 Complementary Subspaces

We just saw that any projector is idempotent, i.e., P2 = P. In fact, Theorem A matrix P is a projector if and only if P2 = P.

Proof. One direction is given above. For the other see [Mey00].

Remark This theorem is sometimes used to define projectors.

[email protected] MATH 532 138 If R, N are complementary then

1 1 1 sin θ = = = , kPk2 λmax σ1 where P is the projector onto R along N , λmax is the largest T eigenvalue of P P and σ1 is the largest singular value of P.

See [Mey00, Example 5.9.2] for more details.

Complementary Subspaces Angle between subspaces

In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces.

[email protected] MATH 532 139 Complementary Subspaces Angle between subspaces

In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces.

If R, N are complementary then

1 1 1 sin θ = = = , kPk2 λmax σ1 where P is the projector onto R along N , λmax is the largest T eigenvalue of P P and σ1 is the largest singular value of P.

See [Mey00, Example 5.9.2] for more details.

[email protected] MATH 532 139 While the range–nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality.

This also means we will not discuss nilpotent matrices and — later on — the Jordan normal form.

Complementary Subspaces

Remark We will skip [Mey00, Section 5.10] on the range–nullspace decomposition.

[email protected] MATH 532 140 This also means we will not discuss nilpotent matrices and — later on — the Jordan normal form.

Complementary Subspaces

Remark We will skip [Mey00, Section 5.10] on the range–nullspace decomposition.

While the range–nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality.

[email protected] MATH 532 140 Complementary Subspaces

Remark We will skip [Mey00, Section 5.10] on the range–nullspace decomposition.

While the range–nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality.

This also means we will not discuss nilpotent matrices and — later on — the Jordan normal form.

[email protected] MATH 532 140 Orthogonal Decomposition Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 141 Remark Even if M is not a subspace of V (i.e., only a subset), M⊥ is (see HW).

Orthogonal Decomposition

Definition Let V be an inner product space and M ⊆ V. The orthogonal complement M⊥ of M is

M⊥ = {x ∈ V : hm, xi = 0 for all m ∈ M}.

[email protected] MATH 532 142 Orthogonal Decomposition

Definition Let V be an inner product space and M ⊆ V. The orthogonal complement M⊥ of M is

M⊥ = {x ∈ V : hm, xi = 0 for all m ∈ M}.

Remark Even if M is not a subspace of V (i.e., only a subset), M⊥ is (see HW).

[email protected] MATH 532 142 Proof According to the definition of complementary subspaces we need to show 1 M ∩ M⊥ = {0}, 2 M + M⊥ = V.

Orthogonal Decomposition

Theorem Let V be an inner product space and M ⊆ V. If M is a subspace of V, then V = M ⊕ M⊥.

[email protected] MATH 532 143 Orthogonal Decomposition

Theorem Let V be an inner product space and M ⊆ V. If M is a subspace of V, then V = M ⊕ M⊥.

Proof According to the definition of complementary subspaces we need to show 1 M ∩ M⊥ = {0}, 2 M + M⊥ = V.

[email protected] MATH 532 143 The definition of M⊥ implies

hx, xi = 0.

But then the definition of an inner product implies x = 0.

This is true for any x ∈ M ∩ M⊥, so x = 0 is the only such vector.

Orthogonal Decomposition

Proof (cont.)

1 Let’s assume there exists an x ∈ M ∩ M⊥, i.e., x ∈ M and x ∈ M⊥.

[email protected] MATH 532 144 But then the definition of an inner product implies x = 0.

This is true for any x ∈ M ∩ M⊥, so x = 0 is the only such vector.

Orthogonal Decomposition

Proof (cont.)

1 Let’s assume there exists an x ∈ M ∩ M⊥, i.e., x ∈ M and x ∈ M⊥.

The definition of M⊥ implies

hx, xi = 0.

[email protected] MATH 532 144 This is true for any x ∈ M ∩ M⊥, so x = 0 is the only such vector.

Orthogonal Decomposition

Proof (cont.)

1 Let’s assume there exists an x ∈ M ∩ M⊥, i.e., x ∈ M and x ∈ M⊥.

The definition of M⊥ implies

hx, xi = 0.

But then the definition of an inner product implies x = 0.

[email protected] MATH 532 144 Orthogonal Decomposition

Proof (cont.)

1 Let’s assume there exists an x ∈ M ∩ M⊥, i.e., x ∈ M and x ∈ M⊥.

The definition of M⊥ implies

hx, xi = 0.

But then the definition of an inner product implies x = 0.

This is true for any x ∈ M ∩ M⊥, so x = 0 is the only such vector.

[email protected] MATH 532 144 ⊥ Since M ∩ M = {0} we know that BM ∪ BM⊥ is an ON basis for some S ⊆ V.

In fact, S = V since otherwise we could extend BM ∪ BM⊥ to an ON basis of V (using the extension theorem and GS).

However, any vector in the extension must be orthogonal to M, i.e., in M⊥, but this is not possible since the extended basis must be linearly independent.

Therefore, the extension set is empty.



Orthogonal Decomposition

Proof (cont.)

2 ⊥ We let BM and BM⊥ be ON bases for M and M , respectively.

[email protected] MATH 532 145 In fact, S = V since otherwise we could extend BM ∪ BM⊥ to an ON basis of V (using the extension theorem and GS).

However, any vector in the extension must be orthogonal to M, i.e., in M⊥, but this is not possible since the extended basis must be linearly independent.

Therefore, the extension set is empty.



Orthogonal Decomposition

Proof (cont.)

2 ⊥ We let BM and BM⊥ be ON bases for M and M , respectively.

⊥ Since M ∩ M = {0} we know that BM ∪ BM⊥ is an ON basis for some S ⊆ V.

[email protected] MATH 532 145 However, any vector in the extension must be orthogonal to M, i.e., in M⊥, but this is not possible since the extended basis must be linearly independent.

Therefore, the extension set is empty.



Orthogonal Decomposition

Proof (cont.)

2 ⊥ We let BM and BM⊥ be ON bases for M and M , respectively.

⊥ Since M ∩ M = {0} we know that BM ∪ BM⊥ is an ON basis for some S ⊆ V.

In fact, S = V since otherwise we could extend BM ∪ BM⊥ to an ON basis of V (using the extension theorem and GS).

[email protected] MATH 532 145 Orthogonal Decomposition

Proof (cont.)

2 ⊥ We let BM and BM⊥ be ON bases for M and M , respectively.

⊥ Since M ∩ M = {0} we know that BM ∪ BM⊥ is an ON basis for some S ⊆ V.

In fact, S = V since otherwise we could extend BM ∪ BM⊥ to an ON basis of V (using the extension theorem and GS).

However, any vector in the extension must be orthogonal to M, i.e., in M⊥, but this is not possible since the extended basis must be linearly independent.

Therefore, the extension set is empty.



[email protected] MATH 532 145 Proof For (1) recall our formula from Chapter 4

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

Here M ∩ M⊥ = {0}, so that dim(M ∩ M⊥) = 0. Also, since M is a subspace of V we have V = M + M⊥ and the dimension formula implies (1).

Orthogonal Decomposition

Theorem Let V be an inner product space with dim(V) = n and M be a subspace of V. Then 1 dim M⊥ = n − dim M, ⊥ 2 M⊥ = M.

[email protected] MATH 532 146 Here M ∩ M⊥ = {0}, so that dim(M ∩ M⊥) = 0. Also, since M is a subspace of V we have V = M + M⊥ and the dimension formula implies (1).

Orthogonal Decomposition

Theorem Let V be an inner product space with dim(V) = n and M be a subspace of V. Then 1 dim M⊥ = n − dim M, ⊥ 2 M⊥ = M.

Proof For (1) recall our dimension formula from Chapter 4

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

[email protected] MATH 532 146 Also, since M is a subspace of V we have V = M + M⊥ and the dimension formula implies (1).

Orthogonal Decomposition

Theorem Let V be an inner product space with dim(V) = n and M be a subspace of V. Then 1 dim M⊥ = n − dim M, ⊥ 2 M⊥ = M.

Proof For (1) recall our dimension formula from Chapter 4

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

Here M ∩ M⊥ = {0}, so that dim(M ∩ M⊥) = 0.

[email protected] MATH 532 146 Orthogonal Decomposition

Theorem Let V be an inner product space with dim(V) = n and M be a subspace of V. Then 1 dim M⊥ = n − dim M, ⊥ 2 M⊥ = M.

Proof For (1) recall our dimension formula from Chapter 4

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

Here M ∩ M⊥ = {0}, so that dim(M ∩ M⊥) = 0. Also, since M is a subspace of V we have V = M + M⊥ and the dimension formula implies (1).

[email protected] MATH 532 146 Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni = hm + n, ni = hm, ni +hn, ni. | {z } =0 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M.

[email protected] MATH 532 147 ⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni = hm + n, ni = hm, ni +hn, ni. | {z } =0 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M. Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

[email protected] MATH 532 147 hm + n, ni = hm, ni +hn, ni. | {z } =0 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M. Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni =

[email protected] MATH 532 147 hm, ni +hn, ni. | {z } =0 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M. Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni = hm + n, ni =

[email protected] MATH 532 147 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M. Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni = hm + n, ni = hm, ni +hn, ni. | {z } =0

[email protected] MATH 532 147 Orthogonal Decomposition

Proof (cont.) 2 Instead of directly establishing equality we first show that ⊥ M⊥ ⊆ M. Since M ⊕ M⊥ = V any x ∈ V can be uniquely decomposed into

x = m + n with m ∈ M, n ∈ M⊥.

⊥ Now we take x ∈ M⊥ so that hx, ni = 0 for all n ∈ M⊥, and therefore

0 = hx, ni = hm + n, ni = hm, ni +hn, ni. | {z } =0 But hn, ni = 0 ⇐⇒ n = 0, and therefore x = m is in M.

[email protected] MATH 532 147 = n − (n − dim M) = dim M.

⊥⊥ But then M = M. 

⊥ We take X = M⊥ and Y = M (and know from the work just ⊥ performed that M⊥ is a subspace of ⊆ M). From (1) we know

dim M⊥ = n − dim M ⊥ dim M⊥ = n − dim M⊥

Orthogonal Decomposition

Proof (cont.) Now, recall from Chapter 4 that for subspaces X ⊆ Y

dim X = dim Y =⇒ X = Y.

[email protected] MATH 532 148 = n − (n − dim M) = dim M.

⊥⊥ But then M = M. 

From (1) we know

dim M⊥ = n − dim M ⊥ dim M⊥ = n − dim M⊥

Orthogonal Decomposition

Proof (cont.) Now, recall from Chapter 4 that for subspaces X ⊆ Y

dim X = dim Y =⇒ X = Y.

⊥ We take X = M⊥ and Y = M (and know from the work just ⊥ performed that M⊥ is a subspace of ⊆ M).

[email protected] MATH 532 148 = n − (n − dim M) = dim M.

⊥⊥ But then M = M. 

Orthogonal Decomposition

Proof (cont.) Now, recall from Chapter 4 that for subspaces X ⊆ Y

dim X = dim Y =⇒ X = Y.

⊥ We take X = M⊥ and Y = M (and know from the work just ⊥ performed that M⊥ is a subspace of ⊆ M). From (1) we know

dim M⊥ = n − dim M ⊥ dim M⊥ = n − dim M⊥

[email protected] MATH 532 148 ⊥⊥ But then M = M. 

Orthogonal Decomposition

Proof (cont.) Now, recall from Chapter 4 that for subspaces X ⊆ Y

dim X = dim Y =⇒ X = Y.

⊥ We take X = M⊥ and Y = M (and know from the work just ⊥ performed that M⊥ is a subspace of ⊆ M). From (1) we know

dim M⊥ = n − dim M ⊥ dim M⊥ = n − dim M⊥ = n − (n − dim M) = dim M.

[email protected] MATH 532 148 Orthogonal Decomposition

Proof (cont.) Now, recall from Chapter 4 that for subspaces X ⊆ Y

dim X = dim Y =⇒ X = Y.

⊥ We take X = M⊥ and Y = M (and know from the work just ⊥ performed that M⊥ is a subspace of ⊆ M). From (1) we know

dim M⊥ = n − dim M ⊥ dim M⊥ = n − dim M⊥ = n − (n − dim M) = dim M.

⊥⊥ But then M = M. 

[email protected] MATH 532 148 Corollary

m ⊥ T R = R(A) ⊕R(A) = R(A) ⊕ N(A ), | {z } ⊆Rm n ⊥ T R = N(A) ⊕N(A) = N(A) ⊕ R(A ). | {z } ⊆Rn

Orthogonal Decomposition Back to Fundamental Subspaces

Theorem Let A be a real m × n matrix. Then 1 R(A)⊥ = N(AT ), 2 N(A)⊥ = R(AT ).

[email protected] MATH 532 149 Orthogonal Decomposition Back to Fundamental Subspaces

Theorem Let A be a real m × n matrix. Then 1 R(A)⊥ = N(AT ), 2 N(A)⊥ = R(AT ).

Corollary

m ⊥ T R = R(A) ⊕R(A) = R(A) ⊕ N(A ), | {z } ⊆Rm n ⊥ T R = N(A) ⊕N(A) = N(A) ⊕ R(A ). | {z } ⊆Rn

[email protected] MATH 532 149 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

n hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) R(A)⊥ = N(AT )



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

x ∈ R(A)⊥ ⇐⇒

[email protected] MATH 532 150 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) R(A)⊥ = N(AT )



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R

[email protected] MATH 532 150 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) R(A)⊥ = N(AT )



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R

[email protected] MATH 532 150 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) R(A)⊥ = N(AT )



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R

[email protected] MATH 532 150 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) R(A)⊥ = N(AT )



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0

[email protected] MATH 532 150 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

2 Using (1), we have

(1) R(A)⊥ = N(AT )



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product.

[email protected] MATH 532 150 ⊥ ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.



Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) R(A)⊥ = N(AT )

[email protected] MATH 532 150 → T A⇐⇒A R(AT ) = N(A)⊥.

Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) ⊥ R(A)⊥ = N(AT ) ⇐⇒ R(A) = N(AT )⊥

 [email protected] MATH 532 150 Orthogonal Decomposition Proof (of Theorem)

1 We show that x ∈ R(A)⊥ implies x ∈ N(AT ) and vice versa.

⊥ n x ∈ R(A) ⇐⇒ hAy, xi = 0 for any y ∈ R T T n ⇐⇒ y A x = 0 for any y ∈ R T n ⇐⇒ hy, A xi = 0 for any y ∈ R ⇐⇒ AT x = 0 ⇐⇒ x ∈ N(AT )

by the definitions of these subspaces and of an inner product. 2 Using (1), we have

(1) ⊥ R(A)⊥ = N(AT ) ⇐⇒ R(A) = N(AT )⊥ → T A⇐⇒A R(AT ) = N(A)⊥.

 [email protected] MATH 532 150 Assume rank(A) = r and let m BR(A) = {u1,..., ur } ON basis for R(A) ⊆ R , T m BN(AT ) = {ur+1,..., um} ON basis for N(A ) ⊆ R , T n BR(AT ) = {v 1,..., v r } ON basis for R(A ) ⊆ R , n BN(A) = {v r+1,..., v n} ON basis for N(A) ⊆ R . By the corollary m BR(A) ∪ BN(AT ) ON basis for R , n BR(AT ) ∪ BN(A) ON basis for R , and therefore the following are orthogonal matrices  U = u1 u2 ··· um  V = v 1 v 2 ··· v n .

Orthogonal Decomposition Starting to think about the SVD The decompositions of Rm and Rn from the corollary help prepare for the SVD of an m × n matrix A.

[email protected] MATH 532 151 By the corollary m BR(A) ∪ BN(AT ) ON basis for R , n BR(AT ) ∪ BN(A) ON basis for R , and therefore the following are orthogonal matrices  U = u1 u2 ··· um  V = v 1 v 2 ··· v n .

Orthogonal Decomposition Starting to think about the SVD The decompositions of Rm and Rn from the corollary help prepare for the SVD of an m × n matrix A. Assume rank(A) = r and let m BR(A) = {u1,..., ur } ON basis for R(A) ⊆ R , T m BN(AT ) = {ur+1,..., um} ON basis for N(A ) ⊆ R , T n BR(AT ) = {v 1,..., v r } ON basis for R(A ) ⊆ R , n BN(A) = {v r+1,..., v n} ON basis for N(A) ⊆ R .

[email protected] MATH 532 151 and therefore the following are orthogonal matrices  U = u1 u2 ··· um  V = v 1 v 2 ··· v n .

Orthogonal Decomposition Starting to think about the SVD The decompositions of Rm and Rn from the corollary help prepare for the SVD of an m × n matrix A. Assume rank(A) = r and let m BR(A) = {u1,..., ur } ON basis for R(A) ⊆ R , T m BN(AT ) = {ur+1,..., um} ON basis for N(A ) ⊆ R , T n BR(AT ) = {v 1,..., v r } ON basis for R(A ) ⊆ R , n BN(A) = {v r+1,..., v n} ON basis for N(A) ⊆ R . By the corollary m BR(A) ∪ BN(AT ) ON basis for R , n BR(AT ) ∪ BN(A) ON basis for R ,

[email protected] MATH 532 151 Orthogonal Decomposition Starting to think about the SVD The decompositions of Rm and Rn from the corollary help prepare for the SVD of an m × n matrix A. Assume rank(A) = r and let m BR(A) = {u1,..., ur } ON basis for R(A) ⊆ R , T m BN(AT ) = {ur+1,..., um} ON basis for N(A ) ⊆ R , T n BR(AT ) = {v 1,..., v r } ON basis for R(A ) ⊆ R , n BN(A) = {v r+1,..., v n} ON basis for N(A) ⊆ R . By the corollary m BR(A) ∪ BN(AT ) ON basis for R , n BR(AT ) ∪ BN(A) ON basis for R , and therefore the following are orthogonal matrices  U = u1 u2 ··· um  V = v 1 v 2 ··· v n . [email protected] MATH 532 151 r + 1,..., n, T T ui A = 0 ⇐⇒ A ui = 0, i = r + 1,..., m,

so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j =

[email protected] MATH 532 152 T T ui A = 0 ⇐⇒ A ui = 0, i = r + 1,..., m,

so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j = r + 1,..., n,

[email protected] MATH 532 152 T ⇐⇒ A ui = 0, i = r + 1,..., m,

so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j = r + 1,..., n, T ui A = 0

[email protected] MATH 532 152 r + 1,..., m,

so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j = r + 1,..., n, T T ui A = 0 ⇐⇒ A ui = 0, i =

[email protected] MATH 532 152 so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j = r + 1,..., n, T T ui A = 0 ⇐⇒ A ui = 0, i = r + 1,..., m,

[email protected] MATH 532 152 Orthogonal Decomposition

Consider m,n T  T  R = U AV = ui Av j . i,j=1 Note that

Av j = 0, j = r + 1,..., n, T T ui A = 0 ⇐⇒ A ui = 0, i = r + 1,..., m, so  T T  u1 Av 1 ··· u1 Av r  . .  C O R =  . .O = r×r .  T T  OO ur Av 1 ··· ur Av r  OO

[email protected] MATH 532 152 Remark

The matrix Cr×r is nonsingular since

rank(C) = rank(UT AV) = rank(A) = r

because multiplication by the orthogonal (and therefore nonsingular) matrices UT and V does not change the rank of A.

Orthogonal Decomposition

Thus C O R = UT AV = r×r OO C O ⇐⇒ A = URVT = U r×r VT , OO the URV factorization of A.

[email protected] MATH 532 153 Orthogonal Decomposition

Thus C O R = UT AV = r×r OO C O ⇐⇒ A = URVT = U r×r VT , OO the URV factorization of A.

Remark

The matrix Cr×r is nonsingular since

rank(C) = rank(UT AV) = rank(A) = r because multiplication by the orthogonal (and therefore nonsingular) matrices UT and V does not change the rank of A.

[email protected] MATH 532 153 As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A.

However, the URV factorization is not unique. Different ON bases result in different factorizations.

Orthogonal Decomposition

We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization.

[email protected] MATH 532 154 However, the URV factorization is not unique. Different ON bases result in different factorizations.

Orthogonal Decomposition

We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization.

As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A.

[email protected] MATH 532 154 Orthogonal Decomposition

We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization.

As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A.

However, the URV factorization is not unique. Different ON bases result in different factorizations.

[email protected] MATH 532 154 = R(UR)  rank(C)=r = R U1CO = R(U1C) = R(U1) |{z} m×r

so that the columns of U1 are an ON basis for R(A).

We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT )

(12)

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO

[email protected] MATH 532 155 = R(UR)  rank(C)=r = R U1CO = R(U1C) = R(U1) |{z} m×r

so that the columns of U1 are an ON basis for R(A).

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT )

(12)

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

[email protected] MATH 532 155 = R(UR)  rank(C)=r = R U1CO = R(U1C) = R(U1) |{z} m×r

so that the columns of U1 are an ON basis for R(A).

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT )

(12)

[email protected] MATH 532 155  rank(C)=r = R U1CO = R(U1C) = R(U1) |{z} m×r

so that the columns of U1 are an ON basis for R(A).

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT ) = R(UR)

(12)

[email protected] MATH 532 155 rank(C)=r = R(U1C) = R(U1) |{z} m×r

so that the columns of U1 are an ON basis for R(A).

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT ) = R(UR)  = R U1CO (12)

[email protected] MATH 532 155 rank(C)=r = R(U1)

so that the columns of U1 are an ON basis for R(A).

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT ) = R(UR)  = R U1CO = R(U1C) (12) |{z} m×r

[email protected] MATH 532 155 so that the columns of U1 are an ON basis for R(A).

Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT ) = R(UR)  rank(C)=r = R U1CO = R(U1C) = R(U1) (12) |{z} m×r

[email protected] MATH 532 155 Orthogonal Decomposition

Consider A = URVT with U, V orthogonal m × m and n × n matrices, CO respectively, and R = with C nonsingular. OO We partition     U1 U2 V1 V2 U = |{z} |{z} , V = |{z} |{z} m×r m×m−r n×r n×n−r

Then V (and therefore also VT ) is nonsingular and we see that

R(A) = R(URVT ) = R(UR)  rank(C)=r = R U1CO = R(U1C) = R(U1) (12) |{z} m×r so that the columns of U1 are an ON basis for R(A).

[email protected] MATH 532 155 prev. thm ⊥ (12) ⊥ = R(A) = R(U1) = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

T This implies that the columns of U2 are an ON basis for N(A ).

The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

Orthogonal Decomposition

Moreover,

N(AT )

[email protected] MATH 532 156 (12) ⊥ = R(U1) = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

T This implies that the columns of U2 are an ON basis for N(A ).

The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

Orthogonal Decomposition

Moreover,

prev. thm N(AT ) = R(A)⊥

[email protected] MATH 532 156 = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

T This implies that the columns of U2 are an ON basis for N(A ).

The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

Orthogonal Decomposition

Moreover,

T prev. thm ⊥ (12) ⊥ N(A ) = R(A) = R(U1)

[email protected] MATH 532 156 T This implies that the columns of U2 are an ON basis for N(A ).

The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

Orthogonal Decomposition

Moreover,

T prev. thm ⊥ (12) ⊥ N(A ) = R(A) = R(U1) = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

[email protected] MATH 532 156 The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

Orthogonal Decomposition

Moreover,

T prev. thm ⊥ (12) ⊥ N(A ) = R(A) = R(U1) = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

T This implies that the columns of U2 are an ON basis for N(A ).

[email protected] MATH 532 156 Orthogonal Decomposition

Moreover,

T prev. thm ⊥ (12) ⊥ N(A ) = R(A) = R(U1) = R(U2)

m since U is orthogonal and R = R(U1) ⊕ R(U2).

T This implies that the columns of U2 are an ON basis for N(A ).

The other two cases can be argued similarly using N(AB) = N(B) provided rank(A) = n.

[email protected] MATH 532 156 As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C.

Idea: use Householder reflections (or Givens rotations)

Orthogonal Decomposition

The main difference between a URV factorization and the SVD is that the SVD will contain a diagonal matrix Σ with r nonzero singular values, while R contains the full r × r block C.

[email protected] MATH 532 157 Idea: use Householder reflections (or Givens rotations)

Orthogonal Decomposition

The main difference between a URV factorization and the SVD is that the SVD will contain a diagonal matrix Σ with r nonzero singular values, while R contains the full r × r block C.

As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C.

[email protected] MATH 532 157 Orthogonal Decomposition

The main difference between a URV factorization and the SVD is that the SVD will contain a diagonal matrix Σ with r nonzero singular values, while R contains the full r × r block C.

As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C.

Idea: use Householder reflections (or Givens rotations)

[email protected] MATH 532 157 Next, use n × n orthogonal Q as follows:

T BT −→ QBT = , with r × r upper triangularT , rank(T) = r. O

Then BQT = TT O ⇐⇒ B = TT O Q and B TT O = Q. O OO

Orthogonal Decomposition

Consider an m × n matrix A. We apply an m × m orthogonal (Householder reflection) matrix P so that B A −→ PA = , with r × m matrix B, rank(B) = r. O

[email protected] MATH 532 158 Then BQT = TT O ⇐⇒ B = TT O Q and B TT O = Q. O OO

Orthogonal Decomposition

Consider an m × n matrix A. We apply an m × m orthogonal (Householder reflection) matrix P so that B A −→ PA = , with r × m matrix B, rank(B) = r. O

Next, use n × n orthogonal Q as follows:

T BT −→ QBT = , with r × r upper triangularT , rank(T) = r. O

[email protected] MATH 532 158 and B TT O = Q. O OO

Orthogonal Decomposition

Consider an m × n matrix A. We apply an m × m orthogonal (Householder reflection) matrix P so that B A −→ PA = , with r × m matrix B, rank(B) = r. O

Next, use n × n orthogonal Q as follows:

T BT −→ QBT = , with r × r upper triangularT , rank(T) = r. O

Then BQT = TT O ⇐⇒ B = TT O Q

[email protected] MATH 532 158 Orthogonal Decomposition

Consider an m × n matrix A. We apply an m × m orthogonal (Householder reflection) matrix P so that B A −→ PA = , with r × m matrix B, rank(B) = r. O

Next, use n × n orthogonal Q as follows:

T BT −→ QBT = , with r × r upper triangularT , rank(T) = r. O

Then BQT = TT O ⇐⇒ B = TT O Q and B TT O = Q. O OO

[email protected] MATH 532 158 Remark See HW for an example of this process with numbers.

Orthogonal Decomposition

Together,

TT O PA = Q OO TT O ⇐⇒ A = PT Q, OO a URV factorization with lower triangular block TT .

[email protected] MATH 532 159 Orthogonal Decomposition

Together,

TT O PA = Q OO TT O ⇐⇒ A = PT Q, OO a URV factorization with lower triangular block TT .

Remark See HW for an example of this process with numbers.

[email protected] MATH 532 159 Singular Value Decomposition Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 160 Now we want to establish that C can even be made diagonal.

Singular Value Decomposition Singular Value Decomposition

We know CO A = URVT = U VT , OO where C is upper triangular and U, V are orthogonal.

[email protected] MATH 532 161 Singular Value Decomposition Singular Value Decomposition

We know CO A = URVT = U VT , OO where C is upper triangular and U, V are orthogonal.

Now we want to establish that C can even be made diagonal.

[email protected] MATH 532 161 Also, kCk2 = max kCzk2 kzk2=1 so that kCk2 = kCxk2 for some x, kxk2 = 1. In fact (see Sect.5.2), x is such that (CT C − λI)x = 0, i.e., x is an eigenvector of CT C so that √ p T T kCk2 = σ1 = λ = x C Cx. (13)

Singular Value Decomposition

Note that kAk2 = kCk2 =: σ1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW).

[email protected] MATH 532 162 In fact (see Sect.5.2), x is such that (CT C − λI)x = 0, i.e., x is an eigenvector of CT C so that √ p T T kCk2 = σ1 = λ = x C Cx. (13)

Singular Value Decomposition

Note that kAk2 = kCk2 =: σ1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW). Also, kCk2 = max kCzk2 kzk2=1 so that kCk2 = kCxk2 for some x, kxk2 = 1.

[email protected] MATH 532 162 Singular Value Decomposition

Note that kAk2 = kCk2 =: σ1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW). Also, kCk2 = max kCzk2 kzk2=1 so that kCk2 = kCxk2 for some x, kxk2 = 1. In fact (see Sect.5.2), x is such that (CT C − λI)x = 0, i.e., x is an eigenvector of CT C so that √ p T T kCk2 = σ1 = λ = x C Cx. (13)

[email protected] MATH 532 162 Similarly, let Cx Cx y = = . (14) kCxk2 σ1

Then  Ry = y Y is also orthogonal (and Hermitian/symmetric) since it’s a Householder reflector.

Singular Value Decomposition

Since x is a unit vector we can extend it to an orthogonal matrix  Rx = x X , e.g., using Householder reflectors as discussed at the end of Sect.5.6.

[email protected] MATH 532 163 Then  Ry = y Y is also orthogonal (and Hermitian/symmetric) since it’s a Householder reflector.

Singular Value Decomposition

Since x is a unit vector we can extend it to an orthogonal matrix  Rx = x X , e.g., using Householder reflectors as discussed at the end of Sect.5.6.

Similarly, let Cx Cx y = = . (14) kCxk2 σ1

[email protected] MATH 532 163 Singular Value Decomposition

Since x is a unit vector we can extend it to an orthogonal matrix  Rx = x X , e.g., using Householder reflectors as discussed at the end of Sect.5.6.

Similarly, let Cx Cx y = = . (14) kCxk2 σ1

Then  Ry = y Y is also orthogonal (and Hermitian/symmetric) since it’s a Householder reflector.

[email protected] MATH 532 163 From above

2 (13) T T (14) T σ1 = λ = x C Cx = σ1y Cx T =⇒ y Cx = σ1.

Also, T (14) T Y Cx = Y (σ1y) = 0

since Ry is orthogonal, i.e., y is orthogonal to the columns of Y.

Singular Value Decomposition

Now y T  y T Cx y T CX RT CR = C x X = . y x YT YT Cx YT CX |{z} =Ry

[email protected] MATH 532 164 Also, T (14) T Y Cx = Y (σ1y) = 0

since Ry is orthogonal, i.e., y is orthogonal to the columns of Y.

Singular Value Decomposition

Now y T  y T Cx y T CX RT CR = C x X = . y x YT YT Cx YT CX |{z} =Ry

From above

2 (13) T T (14) T σ1 = λ = x C Cx = σ1y Cx T =⇒ y Cx = σ1.

[email protected] MATH 532 164 Singular Value Decomposition

Now y T  y T Cx y T CX RT CR = C x X = . y x YT YT Cx YT CX |{z} =Ry

From above

2 (13) T T (14) T σ1 = λ = x C Cx = σ1y Cx T =⇒ y Cx = σ1.

Also, T (14) T Y Cx = Y (σ1y) = 0 since Ry is orthogonal, i.e., y is orthogonal to the columns of Y.

[email protected] MATH 532 164 To show that cT = 0T consider T (14) Cx  cT = y T CX = CX σ1 xT CT CX = . (15) σ1 From (13) x is an eigenvector of CT C, i.e.,

T 2 T T 2 T C Cx = λx = σ1x ⇐⇒ x C C = σ1x . Plugging this into (15) yields

T T c = σ1x X = 0  since Rx = x X is orthogonal.

Singular Value Decomposition T T T Let Y CX = C2 and y CX = c so that  T  σ1 c Ry CRx = . 0 C2

[email protected] MATH 532 165 From (13) x is an eigenvector of CT C, i.e.,

T 2 T T 2 T C Cx = λx = σ1x ⇐⇒ x C C = σ1x . Plugging this into (15) yields

T T c = σ1x X = 0  since Rx = x X is orthogonal.

Singular Value Decomposition T T T Let Y CX = C2 and y CX = c so that  T  σ1 c Ry CRx = . 0 C2

To show that cT = 0T consider T (14) Cx  cT = y T CX = CX σ1 xT CT CX = . (15) σ1

[email protected] MATH 532 165 Plugging this into (15) yields

T T c = σ1x X = 0  since Rx = x X is orthogonal.

Singular Value Decomposition T T T Let Y CX = C2 and y CX = c so that  T  σ1 c Ry CRx = . 0 C2

To show that cT = 0T consider T (14) Cx  cT = y T CX = CX σ1 xT CT CX = . (15) σ1 From (13) x is an eigenvector of CT C, i.e.,

T 2 T T 2 T C Cx = λx = σ1x ⇐⇒ x C C = σ1x .

[email protected] MATH 532 165 Singular Value Decomposition T T T Let Y CX = C2 and y CX = c so that  T  σ1 c Ry CRx = . 0 C2

To show that cT = 0T consider T (14) Cx  cT = y T CX = CX σ1 xT CT CX = . (15) σ1 From (13) x is an eigenvector of CT C, i.e.,

T 2 T T 2 T C Cx = λx = σ1x ⇐⇒ x C C = σ1x . Plugging this into (15) yields

T T c = σ1x X = 0  since Rx = x X is orthogonal.

[email protected] MATH 532 165 Next, we repeat this process for C2, i.e.,

 T  σ2 0 Sy C2Sx = with σ2 ≥ kC3k2. 0 C3

Let  T   T  1 0 T 1 0 P2 = T Ry , Q2 = Rx . 0 Sy 0 Sx Then  T  σ1 0 0 T P2CQ2 =  0 σ2 0  with σ1 ≥ σ2 ≥ kC3k2. 0 0 C3

Singular Value Decomposition

Moreover, σ1 ≥ kC2k2 since

HW σ1 = kCk2 = kRy CRx k2 = max{σ1, kC2k2}.

[email protected] MATH 532 166 Let  T   T  1 0 T 1 0 P2 = T Ry , Q2 = Rx . 0 Sy 0 Sx Then  T  σ1 0 0 T P2CQ2 =  0 σ2 0  with σ1 ≥ σ2 ≥ kC3k2. 0 0 C3

Singular Value Decomposition

Moreover, σ1 ≥ kC2k2 since

HW σ1 = kCk2 = kRy CRx k2 = max{σ1, kC2k2}.

Next, we repeat this process for C2, i.e.,

 T  σ2 0 Sy C2Sx = with σ2 ≥ kC3k2. 0 C3

[email protected] MATH 532 166 Singular Value Decomposition

Moreover, σ1 ≥ kC2k2 since

HW σ1 = kCk2 = kRy CRx k2 = max{σ1, kC2k2}.

Next, we repeat this process for C2, i.e.,

 T  σ2 0 Sy C2Sx = with σ2 ≥ kC3k2. 0 C3

Let  T   T  1 0 T 1 0 P2 = T Ry , Q2 = Rx . 0 Sy 0 Sx Then  T  σ1 0 0 T P2CQ2 =  0 σ2 0  with σ1 ≥ σ2 ≥ kC3k2. 0 0 C3

[email protected] MATH 532 166 Finally, let P O Q O UeT = r−1 UT , and Ve = r−1 . OI OI Together, DO UeT AVe = OO or — without the tildes — the singular value decomposition (SVD) of A DO A = U VT , OO where A is m × n, U is m × m,D = r × r and V = n × n.

Singular Value Decomposition We continue this until   σ1 O  σ2  P CQ =   = D, σ ≥ σ ≥ ... ≥ σ . r−1 r−1  ..  1 2 r  .  O σr

[email protected] MATH 532 167 Together, DO UeT AVe = OO or — without the tildes — the singular value decomposition (SVD) of A DO A = U VT , OO where A is m × n, U is m × m,D = r × r and V = n × n.

Singular Value Decomposition We continue this until   σ1 O  σ2  P CQ =   = D, σ ≥ σ ≥ ... ≥ σ . r−1 r−1  ..  1 2 r  .  O σr Finally, let P O Q O UeT = r−1 UT , and Ve = r−1 . OI OI

[email protected] MATH 532 167 Singular Value Decomposition We continue this until   σ1 O  σ2  P CQ =   = D, σ ≥ σ ≥ ... ≥ σ . r−1 r−1  ..  1 2 r  .  O σr Finally, let P O Q O UeT = r−1 UT , and Ve = r−1 . OI OI Together, DO UeT AVe = OO or — without the tildes — the singular value decomposition (SVD) of A DO A = U VT , OO where A is m × n, U is m × m,D = r × r and V = n × n. [email protected] MATH 532 167 Remark In Chapter 7 we will see that the columns of U and V are also special eigenvectors of AT A.

Singular Value Decomposition

We use the following terminology: singular values: σ1 ≥ σ2 ≥ ... ≥ σr > 0, left singular vectors: columns of U, right singular vectors: columns of V.

[email protected] MATH 532 168 Singular Value Decomposition

We use the following terminology: singular values: σ1 ≥ σ2 ≥ ... ≥ σr > 0, left singular vectors: columns of U, right singular vectors: columns of V.

Remark In Chapter 7 we will see that the columns of U and V are also special eigenvectors of AT A.

[email protected] MATH 532 168 Singular Value Decomposition Geometric interpretation of SVD

× For the following we assume A ∈ Rn n, n = 2.

1 2 σv 2 2 σ 1 1u1 0.5

±2 ±1 0 1 2 ±1 ±0.5 0.5 1

v ±1 2 ±0.5 v ±2 1

±1 This picture is true since A = UDVT ⇐⇒ AV = UD

and σ1, σ2 are the lengths of the semi-axes of the ellipse because ku1k = ku2k = 1. Remark See [Mey00] for more details. [email protected] MATH 532 169 Therefore, σ1 κ2(A) = σn is the distortion ratio of the transformation A. Moreover, 1 σ = k k , σ = 1 A 2 n −1 kA k2 so that −1 κ2(A) = kAk2kA k2 × is the 2-norm condition number of A (∈ Rn n).

Singular Value Decomposition

For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths

σ1 ≥ σ2 ≥ ... ≥ σn.

[email protected] MATH 532 170 Moreover, 1 σ = k k , σ = 1 A 2 n −1 kA k2 so that −1 κ2(A) = kAk2kA k2 × is the 2-norm condition number of A (∈ Rn n).

Singular Value Decomposition

For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths

σ1 ≥ σ2 ≥ ... ≥ σn.

Therefore, σ1 κ2(A) = σn is the distortion ratio of the transformation A.

[email protected] MATH 532 170 Singular Value Decomposition

For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths

σ1 ≥ σ2 ≥ ... ≥ σn.

Therefore, σ1 κ2(A) = σn is the distortion ratio of the transformation A. Moreover, 1 σ = k k , σ = 1 A 2 n −1 kA k2 so that −1 κ2(A) = kAk2kA k2 × is the 2-norm condition number of A (∈ Rn n).

[email protected] MATH 532 170 Remark

We always have κ2(A) ≥ 1, and κ2(A) = 1 if and only if A is a multiple of an orthogonal matrix (typo in [Mey00], see proof on next slide).

Singular Value Decomposition

Remark

The relations for σ1 and σn hold because

T HW kAk2 = kUDV k2 = kDk2 = σ1

−1 −1 T HW −1 1 kA k2 = kVD U k2 = kD k2 = σn

[email protected] MATH 532 171 Singular Value Decomposition

Remark

The relations for σ1 and σn hold because

T HW kAk2 = kUDV k2 = kDk2 = σ1

−1 −1 T HW −1 1 kA k2 = kVD U k2 = kD k2 = σn

Remark

We always have κ2(A) ≥ 1, and κ2(A) = 1 if and only if A is a multiple of an orthogonal matrix (typo in [Mey00], see proof on next slide).

[email protected] MATH 532 171 Also 1 AT A = α2QT Q = α2I =⇒ A−1 = AT and kAT k = kAk α2 2 2

−1 1 so that kA k2 = α and 1 κ (A) = kAk kA−1k = α = 1. 2 2 2 α

Singular Value Decomposition

Proof “⇐=”: Assume A = αQ with α > 0, Q orthogonal, i.e.,

invariance kAk2 = αkQk2 = α max kQxk2 = α max kxk2 = α. kxk2=1 kxk2=1

[email protected] MATH 532 172 Singular Value Decomposition

Proof “⇐=”: Assume A = αQ with α > 0, Q orthogonal, i.e.,

invariance kAk2 = αkQk2 = α max kQxk2 = α max kxk2 = α. kxk2=1 kxk2=1

Also 1 AT A = α2QT Q = α2I =⇒ A−1 = AT and kAT k = kAk α2 2 2

−1 1 so that kA k2 = α and 1 κ (A) = kAk kA−1k = α = 1. 2 2 2 α

[email protected] MATH 532 172 Thus T T A = UDV = σ1UV and

T 2 T T T A A = σ1(UV ) UV 2 T T 2 = σ1VU UV = σ1I.



Singular Value Decomposition

Proof (cont.) “=⇒”: Assume κ (A) = σ1 = 1 so that σ = σ and therefore 2 σn 1 n

D = σ1I.

[email protected] MATH 532 173 Singular Value Decomposition

Proof (cont.) “=⇒”: Assume κ (A) = σ1 = 1 so that σ = σ and therefore 2 σn 1 n

D = σ1I.

Thus T T A = UDV = σ1UV and

T 2 T T T A A = σ1(UV ) UV 2 T T 2 = σ1VU UV = σ1I.



[email protected] MATH 532 173 Is a small residual r = b − Ax˜ a good indicator for the accuracy of x˜?

Since x is the exact answer, and x˜ the computed answer we have the relative error kx − x˜k . kxk

Singular Value Decomposition Applications of the Condition Number

× Let x˜ be the answer obtained by solving Ax = b with A ∈ Rn n.

[email protected] MATH 532 174 Since x is the exact answer, and x˜ the computed answer we have the relative error kx − x˜k . kxk

Singular Value Decomposition Applications of the Condition Number

× Let x˜ be the answer obtained by solving Ax = b with A ∈ Rn n. Is a small residual r = b − Ax˜ a good indicator for the accuracy of x˜?

[email protected] MATH 532 174 Singular Value Decomposition Applications of the Condition Number

× Let x˜ be the answer obtained by solving Ax = b with A ∈ Rn n. Is a small residual r = b − Ax˜ a good indicator for the accuracy of x˜?

Since x is the exact answer, and x˜ the computed answer we have the relative error kx − x˜k . kxk

[email protected] MATH 532 174 kA−1bk To get the relative error we multiply by kxk = 1. Then kx − x˜k krk ≤ kAkkA−1bk kxk krk kx − x˜k ≤ κ(A) . (16) kbk kxk

Singular Value Decomposition

Now

krk = kb − Ax˜k = kAx − Ax˜k = kA(x − x˜)k ≤ kAkkx − x˜k.

[email protected] MATH 532 175 Then kx − x˜k krk ≤ kAkkA−1bk kxk krk kx − x˜k ≤ κ(A) . (16) kbk kxk

Singular Value Decomposition

Now

krk = kb − Ax˜k = kAx − Ax˜k = kA(x − x˜)k ≤ kAkkx − x˜k.

kA−1bk To get the relative error we multiply by kxk = 1.

[email protected] MATH 532 175 Singular Value Decomposition

Now

krk = kb − Ax˜k = kAx − Ax˜k = kA(x − x˜)k ≤ kAkkx − x˜k.

kA−1bk To get the relative error we multiply by kxk = 1. Then kx − x˜k krk ≤ kAkkA−1bk kxk krk kx − x˜k ≤ κ(A) . (16) kbk kxk

[email protected] MATH 532 175 kAxk Multiplying by kbk = 1 we have

kx − x˜k krk ≤ κ(A) . (17) kxk kbk

Combining (16) and (17) yields

1 krk kx − x˜k krk ≤ ≤ κ(A) . κ(A) kbk kxk kbk

krk Therefore, the relative residual kbk is a good indicator of relative error if and only if A is well conditioned, i.e., κ(A) is small (close to 1).

Singular Value Decomposition

Moreover, using r = b − Ax˜ = b − b˜,

kx − x˜k = kA−1(b − b˜)k ≤ kA−1kkrk.

[email protected] MATH 532 176 Combining (16) and (17) yields

1 krk kx − x˜k krk ≤ ≤ κ(A) . κ(A) kbk kxk kbk

krk Therefore, the relative residual kbk is a good indicator of relative error if and only if A is well conditioned, i.e., κ(A) is small (close to 1).

Singular Value Decomposition

Moreover, using r = b − Ax˜ = b − b˜,

kx − x˜k = kA−1(b − b˜)k ≤ kA−1kkrk.

kAxk Multiplying by kbk = 1 we have

kx − x˜k krk ≤ κ(A) . (17) kxk kbk

[email protected] MATH 532 176 krk Therefore, the relative residual kbk is a good indicator of relative error if and only if A is well conditioned, i.e., κ(A) is small (close to 1).

Singular Value Decomposition

Moreover, using r = b − Ax˜ = b − b˜,

kx − x˜k = kA−1(b − b˜)k ≤ kA−1kkrk.

kAxk Multiplying by kbk = 1 we have

kx − x˜k krk ≤ κ(A) . (17) kxk kbk

Combining (16) and (17) yields

1 krk kx − x˜k krk ≤ ≤ κ(A) . κ(A) kbk kxk kbk

[email protected] MATH 532 176 Singular Value Decomposition

Moreover, using r = b − Ax˜ = b − b˜,

kx − x˜k = kA−1(b − b˜)k ≤ kA−1kkrk.

kAxk Multiplying by kbk = 1 we have

kx − x˜k krk ≤ κ(A) . (17) kxk kbk

Combining (16) and (17) yields

1 krk kx − x˜k krk ≤ ≤ κ(A) . κ(A) kbk kxk kbk

krk Therefore, the relative residual kbk is a good indicator of relative error if and only if A is well conditioned, i.e., κ(A) is small (close to 1).

[email protected] MATH 532 176 rank(A) ≈ index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart–Young theorem states that

k X T Ak = σi ui v i i=1 is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e.,

kA − Ak k2 = min kA − Bk2. rank(B)=k Moreover, kA − Ak k2 = σk+1. Run SVD_movie.m

Singular Value Decomposition Applications of the SVD

1 Determination of “numerical rank(A)”:

[email protected] MATH 532 177 2 Low-rank approximation of A: The Eckart–Young theorem states that

k X T Ak = σi ui v i i=1 is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e.,

kA − Ak k2 = min kA − Bk2. rank(B)=k Moreover, kA − Ak k2 = σk+1. Run SVD_movie.m

Singular Value Decomposition Applications of the SVD

1 Determination of “numerical rank(A)”: rank(A) ≈ index of smallest singular value greater or equal a desired threshold

[email protected] MATH 532 177 Run SVD_movie.m

Singular Value Decomposition Applications of the SVD

1 Determination of “numerical rank(A)”: rank(A) ≈ index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart–Young theorem states that

k X T Ak = σi ui v i i=1 is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e.,

kA − Ak k2 = min kA − Bk2. rank(B)=k Moreover, kA − Ak k2 = σk+1.

[email protected] MATH 532 177 Singular Value Decomposition Applications of the SVD

1 Determination of “numerical rank(A)”: rank(A) ≈ index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart–Young theorem states that

k X T Ak = σi ui v i i=1 is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e.,

kA − Ak k2 = min kA − Bk2. rank(B)=k Moreover, kA − Ak k2 = σk+1. Run SVD_movie.m

[email protected] MATH 532 177 Remark Note that A† ∈ Rn×m and

r T X v i u A† = i , r = rank(A). σi i=1

Singular Value Decomposition

3 Stable solution of least squares problems: Use Moore–Penrose pseudoinverse

Definition Let A ∈ Rm×n and DO A = U VT OO be the SVD of A. Then D−1 O A† = V UT OO is called the Moore–Penrose pseudoinverse of A.

[email protected] MATH 532 178 Singular Value Decomposition

3 Stable solution of least squares problems: Use Moore–Penrose pseudoinverse

Definition Let A ∈ Rm×n and DO A = U VT OO be the SVD of A. Then D−1 O A† = V UT OO is called the Moore–Penrose pseudoinverse of A.

Remark Note that A† ∈ Rn×m and

r T X v i u A† = i , r = rank(A). σi i=1

[email protected] MATH 532 178 Singular Value Decomposition

We now show that the least squares solution of

Ax = b is given by x = A†b.

[email protected] MATH 532 179 AT Ax = AT b ⇐⇒ VDe UeT Ue DVeT x = VDe UeT b |{z} =I ⇐⇒ VDe 2VeT x = VDe UeT b

Multiplication by D−1VeT yields

DVeT x = UeT b.

Singular Value Decomposition

Start with normal equations and use

DO A = U VT = UDe VeT , OO

× × the reduced SVD of A, i.e., Ue ∈ Rm r , Ve ∈ Rn r .

[email protected] MATH 532 180 Multiplication by D−1VeT yields

DVeT x = UeT b.

Singular Value Decomposition

Start with normal equations and use

DO A = U VT = UDe VeT , OO

× × the reduced SVD of A, i.e., Ue ∈ Rm r , Ve ∈ Rn r .

AT Ax = AT b ⇐⇒ VDe UeT Ue DVeT x = VDe UeT b |{z} =I ⇐⇒ VDe 2VeT x = VDe UeT b

[email protected] MATH 532 180 Singular Value Decomposition

Start with normal equations and use

DO A = U VT = UDe VeT , OO

× × the reduced SVD of A, i.e., Ue ∈ Rm r , Ve ∈ Rn r .

AT Ax = AT b ⇐⇒ VDe UeT Ue DVeT x = VDe UeT b |{z} =I ⇐⇒ VDe 2VeT x = VDe UeT b

Multiplication by D−1VeT yields

DVeT x = UeT b.

[email protected] MATH 532 180 Singular Value Decomposition

Thus DVeT x = UeT b implies

x = VDe −1UeT b D−1 O ⇐⇒ x = V UT b OO ⇐⇒ x = A†b.

[email protected] MATH 532 181 If rank(A) < n (i.e., the least squares solution is not unique), then x = A†b provides the unique solution with minimum 2-norm (see justification on following slide).

Singular Value Decomposition

Remark If A is nonsingular then A† = A−1 (see HW).

[email protected] MATH 532 182 Singular Value Decomposition

Remark If A is nonsingular then A† = A−1 (see HW).

If rank(A) < n (i.e., the least squares solution is not unique), then x = A†b provides the unique solution with minimum 2-norm (see justification on following slide).

[email protected] MATH 532 182 Then

2 † 2 kzk2 = kA b + nk2 Pythag. thm † 2 2 † 2 = kA bk2 + knk2 ≥ kA bk2.

The Pythagorean theorem applies since (see HW)

A†b ∈ R(A†) = R(AT )

so that, using R(AT ) = N(A)⊥,

A†b ⊥ n.

Singular Value Decomposition Minimum norm solution of underdetermined systems

Note that the general solution of Ax = b is given by

z = A†b + n, n ∈ N(A).

[email protected] MATH 532 183 The Pythagorean theorem applies since (see HW)

A†b ∈ R(A†) = R(AT )

so that, using R(AT ) = N(A)⊥,

A†b ⊥ n.

Singular Value Decomposition Minimum norm solution of underdetermined systems

Note that the general solution of Ax = b is given by

z = A†b + n, n ∈ N(A).

Then

2 † 2 kzk2 = kA b + nk2 Pythag. thm † 2 2 † 2 = kA bk2 + knk2 ≥ kA bk2.

[email protected] MATH 532 183 Singular Value Decomposition Minimum norm solution of underdetermined systems

Note that the general solution of Ax = b is given by

z = A†b + n, n ∈ N(A).

Then

2 † 2 kzk2 = kA b + nk2 Pythag. thm † 2 2 † 2 = kA bk2 + knk2 ≥ kA bk2.

The Pythagorean theorem applies since (see HW)

A†b ∈ R(A†) = R(AT )

so that, using R(AT ) = N(A)⊥,

A†b ⊥ n.

[email protected] MATH 532 183 1 A = UDe VeT (reduced SVD)

2 Ax = b ⇐⇒ DVeT x = UeT b, so T 1 Solve Dy = Ue b for y 2 Compute x = Vey

Singular Value Decomposition

Remark Explicit use of the pseudoinverse is usually not recommended. × Instead we solve Ax = b, A ∈ Rm n, by

[email protected] MATH 532 184 2 Ax = b ⇐⇒ DVeT x = UeT b, so T 1 Solve Dy = Ue b for y 2 Compute x = Vey

Singular Value Decomposition

Remark Explicit use of the pseudoinverse is usually not recommended. × Instead we solve Ax = b, A ∈ Rm n, by

1 A = UDe VeT (reduced SVD)

[email protected] MATH 532 184 T 1 Solve Dy = Ue b for y 2 Compute x = Vey

Singular Value Decomposition

Remark Explicit use of the pseudoinverse is usually not recommended. × Instead we solve Ax = b, A ∈ Rm n, by

1 A = UDe VeT (reduced SVD)

2 Ax = b ⇐⇒ DVeT x = UeT b, so

[email protected] MATH 532 184 2 Compute x = Vey

Singular Value Decomposition

Remark Explicit use of the pseudoinverse is usually not recommended. × Instead we solve Ax = b, A ∈ Rm n, by

1 A = UDe VeT (reduced SVD)

2 Ax = b ⇐⇒ DVeT x = UeT b, so T 1 Solve Dy = Ue b for y

[email protected] MATH 532 184 Singular Value Decomposition

Remark Explicit use of the pseudoinverse is usually not recommended. × Instead we solve Ax = b, A ∈ Rm n, by

1 A = UDe VeT (reduced SVD)

2 Ax = b ⇐⇒ DVeT x = UeT b, so T 1 Solve Dy = Ue b for y 2 Compute x = Vey

[email protected] MATH 532 184 Singular Value Decomposition Other Applications

Also known as principal component analysis (PCA), (discrete) Karhunen-Loève (KL) transformation, Hotelling transform, or proper orthogonal decomposition (POD) Data compression Noise filtering Regularization of inverse problems Tomography Image deblurring Seismology Information retrieval and data mining (latent semantic analysis) Bioinformatics and computational biology Immunology Molecular dynamics Microarray data analysis

[email protected] MATH 532 185 Orthogonal Projections Outline

1 Vector Norms

2 Matrix Norms

3 Inner Product Spaces

4 Orthogonal Vectors

5 Gram–Schmidt Orthogonalization & QR Factorization

6 Unitary and Orthogonal Matrices

7 Orthogonal Reduction

8 Complementary Subspaces

9 Orthogonal Decomposition

10 Singular Value Decomposition

11 Orthogonal Projections

[email protected] MATH 532 186 Orthogonal Projections Orthogonal Projections

Earlier we discussed orthogonal complementary subspaces of an inner product space V, i.e.,

V = M ⊕ M⊥.

Definition Consider V = M ⊕ M⊥ so that for every v ∈ V there exist unique vectors m ∈ M, n ∈ M⊥ such that

v = m + n.

Then m is called the orthogonal projection of v onto M. The matrix PM such that PMv = m is the orthogonal projector onto M along M⊥.

[email protected] MATH 532 187 Orthogonal Projections

For arbitrary complementary subspaces X , Y we showed earlier that the projector onto X along Y is given by

−1 P = XO XY   IO −1 = XY XY , OO where the columns of X and Y are bases for X and Y.

[email protected] MATH 532 188 Then −1 P = MO MN . (18)

−1 To find MN we note that

MT N = NT M = O

and if N is an orthogonal matrix (i.e., contains an ON basis), then

(MT M)−1MT   IO MN = NT OI

(note that MT M is invertible since M is full rank because its columns form a basis of M).

Orthogonal Projections

Now we let X = M and Y = M⊥ be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M⊥ in their columns.

[email protected] MATH 532 189 −1 To find MN we note that

MT N = NT M = O

and if N is an orthogonal matrix (i.e., contains an ON basis), then

(MT M)−1MT   IO MN = NT OI

(note that MT M is invertible since M is full rank because its columns form a basis of M).

Orthogonal Projections

Now we let X = M and Y = M⊥ be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M⊥ in their columns. Then −1 P = MO MN . (18)

[email protected] MATH 532 189 Orthogonal Projections

Now we let X = M and Y = M⊥ be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M⊥ in their columns. Then −1 P = MO MN . (18)

−1 To find MN we note that

MT N = NT M = O and if N is an orthogonal matrix (i.e., contains an ON basis), then

(MT M)−1MT   IO MN = NT OI

(note that MT M is invertible since M is full rank because its columns form a basis of M).

[email protected] MATH 532 189 Inserting (19) into (18) yields

(MT M)−1MT  P = MO M NT = M(MT M)−1MT .

Remark

Note that PM is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then

T PM = MM .

Orthogonal Projections

Thus  T −1 T  −1 (M M) M MN = . (19) NT

[email protected] MATH 532 190 Remark

Note that PM is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then

T PM = MM .

Orthogonal Projections

Thus  T −1 T  −1 (M M) M MN = . (19) NT Inserting (19) into (18) yields

(MT M)−1MT  P = MO M NT = M(MT M)−1MT .

[email protected] MATH 532 190 Orthogonal Projections

Thus  T −1 T  −1 (M M) M MN = . (19) NT Inserting (19) into (18) yields

(MT M)−1MT  P = MO M NT = M(MT M)−1MT .

Remark

Note that PM is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then

T PM = MM .

[email protected] MATH 532 190 Example If M = span{u}, kuk = 1 then

T PM = Pu = uu

and T PuT = I − Pu = I − uu (cf. elementary orthogonal projectors earlier).

Orthogonal Projections

Similarly,

T −1 T PM⊥ = N(N N) N (arbitrary basis for N ) T PM⊥ = NN ON basis

As before, PM = I − PM⊥ .

[email protected] MATH 532 191 Orthogonal Projections

Similarly,

T −1 T PM⊥ = N(N N) N (arbitrary basis for N ) T PM⊥ = NN ON basis

As before, PM = I − PM⊥ .

Example If M = span{u}, kuk = 1 then

T PM = Pu = uu and T PuT = I − Pu = I − uu (cf. elementary orthogonal projectors earlier).

[email protected] MATH 532 191 Orthogonal Projections Properties of orthogonal projectors

Theorem × Let P ∈ Rn n be a projector, i.e., P2 = P. Then the matrix P is an orthogonal projector if 1 R(P) ⊥ N(P), 2 PT = P,

3 kPk2 = 1.

[email protected] MATH 532 192 2 “=⇒”: Assume P is an orthogonal projector, i.e.,

P = M(MT M)−1MT and PT = M (MT M)−T MT = P. | {z } =(MT M)−1

“⇐=”: Assume P = PT . Then

Orth.decomp. R(P) = R(PT ) = N(P)⊥

so that P is an orthogonal projector via (1).

Orthogonal Projections

Proof 1 Follows directly from the definition.

[email protected] MATH 532 193 “⇐=”: Assume P = PT . Then

Orth.decomp. R(P) = R(PT ) = N(P)⊥

so that P is an orthogonal projector via (1).

Orthogonal Projections

Proof 1 Follows directly from the definition.

2 “=⇒”: Assume P is an orthogonal projector, i.e.,

P = M(MT M)−1MT and PT = M (MT M)−T MT = P. | {z } =(MT M)−1

[email protected] MATH 532 193 Orthogonal Projections

Proof 1 Follows directly from the definition.

2 “=⇒”: Assume P is an orthogonal projector, i.e.,

P = M(MT M)−1MT and PT = M (MT M)−T MT = P. | {z } =(MT M)−1

“⇐=”: Assume P = PT . Then

Orth.decomp. R(P) = R(PT ) = N(P)⊥

so that P is an orthogonal projector via (1).

[email protected] MATH 532 193 π Assume P is an orthogonal projector, then θ = 2 so that kPk2 = 1. π Conversely, if kPk2 = 1, then θ = 2 and X , Y are orthogonal complements, i.e., P is an orthogonal projector.



Orthogonal Projections

Proof (cont.) 3 For complementary subspaces X , Y we know the angle between X and Y is given by

1 h π i kPk = , θ ∈ 0, . 2 sin θ 2

[email protected] MATH 532 194 π Conversely, if kPk2 = 1, then θ = 2 and X , Y are orthogonal complements, i.e., P is an orthogonal projector.



Orthogonal Projections

Proof (cont.) 3 For complementary subspaces X , Y we know the angle between X and Y is given by

1 h π i kPk = , θ ∈ 0, . 2 sin θ 2

π Assume P is an orthogonal projector, then θ = 2 so that kPk2 = 1.

[email protected] MATH 532 194 Orthogonal Projections

Proof (cont.) 3 For complementary subspaces X , Y we know the angle between X and Y is given by

1 h π i kPk = , θ ∈ 0, . 2 sin θ 2

π Assume P is an orthogonal projector, then θ = 2 so that kPk2 = 1. π Conversely, if kPk2 = 1, then θ = 2 and X , Y are orthogonal complements, i.e., P is an orthogonal projector.



[email protected] MATH 532 194 Orthogonal Projections Why is orthogonal projection so important?

Theorem Let V be an inner product space with subspace M, and let b ∈ V. Then dist(b, M) = min kb − mk2 = kb − PMbk2, n∈M

i.e., PMb is the unique vector in M closest to b. The quantity dist(b, M) is called the (orthogonal) from b to M.

[email protected] MATH 532 195 Pythag. 2 2 = kb − pk2 + kp − mk2 2 ≥ kb − pk2.

Therefore minm∈M kb − mk2 = kb − pk2.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

Then

2 2 kb − mk2 = kb − p + p − mk2

Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

[email protected] MATH 532 196 Pythag. 2 2 = kb − pk2 + kp − mk2 2 ≥ kb − pk2.

Therefore minm∈M kb − mk2 = kb − pk2.

Then

2 2 kb − mk2 = kb − p + p − mk2

Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

[email protected] MATH 532 196 Pythag. 2 2 = kb − pk2 + kp − mk2 2 ≥ kb − pk2.

Therefore minm∈M kb − mk2 = kb − pk2.

Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

Then

2 2 kb − mk2 = kb − p + p − mk2

[email protected] MATH 532 196 2 ≥ kb − pk2.

Therefore minm∈M kb − mk2 = kb − pk2.

Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

Then

2 2 kb − mk2 = kb − p + p − mk2 Pythag. 2 2 = kb − pk2 + kp − mk2

[email protected] MATH 532 196 Therefore minm∈M kb − mk2 = kb − pk2.

Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

Then

2 2 kb − mk2 = kb − p + p − mk2 Pythag. 2 2 = kb − pk2 + kp − mk2 2 ≥ kb − pk2.

[email protected] MATH 532 196 Orthogonal Projections

Proof

Let p = PMb. Then p ∈ M and p − m ∈ M for every m ∈ M.

Moreover, ⊥ b − p = (I − PM)b ∈ M , so that (p − m) ⊥ (b − p).

Then

2 2 kb − mk2 = kb − p + p − mk2 Pythag. 2 2 = kb − pk2 + kp − mk2 2 ≥ kb − pk2.

Therefore minm∈M kb − mk2 = kb − pk2.

[email protected] MATH 532 196 Then

2 2 kb − qk2 = k b − p + p − q k2 | {z } | {z } ∈M⊥ ∈M Pythag. 2 2 = kb − pk2 + kp − qk2.

2 But then (20) implies that kp − qk2 = 0 and therefore p = q. 

Orthogonal Projections

Proof (cont.) Uniqueness: Assume there exists a q ∈ M such that

kb − qk2 = kb − pk2. (20)

[email protected] MATH 532 197 2 But then (20) implies that kp − qk2 = 0 and therefore p = q. 

Orthogonal Projections

Proof (cont.) Uniqueness: Assume there exists a q ∈ M such that

kb − qk2 = kb − pk2. (20)

Then

2 2 kb − qk2 = k b − p + p − q k2 | {z } | {z } ∈M⊥ ∈M Pythag. 2 2 = kb − pk2 + kp − qk2.

[email protected] MATH 532 197 Orthogonal Projections

Proof (cont.) Uniqueness: Assume there exists a q ∈ M such that

kb − qk2 = kb − pk2. (20)

Then

2 2 kb − qk2 = k b − p + p − q k2 | {z } | {z } ∈M⊥ ∈M Pythag. 2 2 = kb − pk2 + kp − qk2.

2 But then (20) implies that kp − qk2 = 0 and therefore p = q. 

[email protected] MATH 532 197 = kb − PR(A)bk2

with PR(A) the orthogonal projector onto R(A).

× Goal of least squares: For A ∈ Rm n, find v u m uX 2 min t ((Ax)i − bi ) ⇐⇒ min kAx − bk2. x∈Rn x∈Rn i=1

Now Ax ∈ R(A), so the least squares error is

dist(b, R(A)) = min kb − Axk2 Ax∈R(A)

Orthogonal Projections Least squares approximation revisited

Now we give a “modern” derivation of the normal equations (without calculus), and note that much of this remains true for best L2 approximation.

[email protected] MATH 532 198 = kb − PR(A)bk2

with PR(A) the orthogonal projector onto R(A).

Now Ax ∈ R(A), so the least squares error is

dist(b, R(A)) = min kb − Axk2 Ax∈R(A)

Orthogonal Projections Least squares approximation revisited

Now we give a “modern” derivation of the normal equations (without calculus), and note that much of this remains true for best L2 approximation. × Goal of least squares: For A ∈ Rm n, find v u m uX 2 min t ((Ax)i − bi ) ⇐⇒ min kAx − bk2. x∈Rn x∈Rn i=1

[email protected] MATH 532 198 = kb − PR(A)bk2

with PR(A) the orthogonal projector onto R(A).

Orthogonal Projections Least squares approximation revisited

Now we give a “modern” derivation of the normal equations (without calculus), and note that much of this remains true for best L2 approximation. × Goal of least squares: For A ∈ Rm n, find v u m uX 2 min t ((Ax)i − bi ) ⇐⇒ min kAx − bk2. x∈Rn x∈Rn i=1

Now Ax ∈ R(A), so the least squares error is

dist(b, R(A)) = min kb − Axk2 Ax∈R(A)

[email protected] MATH 532 198 Orthogonal Projections Least squares approximation revisited

Now we give a “modern” derivation of the normal equations (without calculus), and note that much of this remains true for best L2 approximation. × Goal of least squares: For A ∈ Rm n, find v u m uX 2 min t ((Ax)i − bi ) ⇐⇒ min kAx − bk2. x∈Rn x∈Rn i=1

Now Ax ∈ R(A), so the least squares error is

dist(b, R(A)) = min kb − Axk2 Ax∈R(A)

= kb − PR(A)bk2

with PR(A) the orthogonal projector onto R(A).

[email protected] MATH 532 198 Orthogonal Projections

[email protected] MATH 532 199 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b.

[email protected] MATH 532 200 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b

[email protected] MATH 532 200 2 = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b

⇐⇒ PR(A)Ax

[email protected] MATH 532 200 ⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

[email protected] MATH 532 200 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0

[email protected] MATH 532 200 Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A))

[email protected] MATH 532 200 ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT )

[email protected] MATH 532 200 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0

[email protected] MATH 532 200 Remark No we are no longer limited to the real case.

Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b.

[email protected] MATH 532 200 Orthogonal Projections Moreover, the least squares solution of Ax = b is given by that x for which Ax = PR(A)b. The following argument shows that this is equivalent to the normal equations:

Ax = PR(A)b 2 ⇐⇒ PR(A)Ax = PR(A)b = PR(A)b

⇐⇒ PR(A)(Ax − b) = 0 ⊥ ⇐⇒ Ax − b ∈ N(PR(A)) = R(A) (P orth. proj. onto R(A)) Orth.decomp. ⇐⇒ Ax − b ∈ N(AT ) ⇐⇒ AT (Ax − b) = 0 ⇐⇒ AT Ax = AT b. Remark No we are no longer limited to the real case.

[email protected] MATH 532 200 Appendix References ReferencesI

[Mey00] Carl D. Meyer, Matrix Analysis and Applied , SIAM, Philadelphia, PA, 2000.

[email protected] MATH 532 201