Introduction to II

Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA

1 / 49 Overview

We will cover this material in relatively more detail, but will still skip a lot ...

1 Norms {Measuring the length/mass of mathematical quantities} General norms Vector p norms norms induced by vector p norms Frobenius

2 Decomposition (SVD) The most important tool in Numerical Linear Algebra

3 Least Squares problems Linear systems that do not have a solution

2 / 49 Norms General Norms

How to measure the mass of a matrix or length of a vector

m×n Norm k · k is R → R with

1 Non-negativity kAk ≥ 0, kAk = 0 ⇐⇒ A = 0 2 Triangle inequality kA + Bk ≤ kAk + kBk 3 Scalar multiplication kα Ak = |α| kAk for all α ∈ R.

Properties Minus signs k − Ak = kAk Reverse triangle inequality | kAk − kBk | ≤ kA − Bk New norms m×n m×m For norm k · k on R , and nonsingular M ∈ R def kAkM = kMAk is also a norm

4 / 49 Vector p Norms

n For x ∈ R and integer p ≥ 1

n !1/p def X p kxkp = |xi | i=1

Pn One norm kxk1 = j=1 |xj | √ qPn 2 T Euclidean (two) norm kxk2 = j=1 |xj | = x x

Infinity (max) norm kxk∞ = max1≤j≤n |xj |

5 / 49 Exercises

n 1 Determine k11kp for 11 ∈ R and p = 1, 2, ∞

n 2 For x ∈ R with xj = j, 1 ≤ j ≤ n, determine closed-form expressions for kxkp for p = 1, 2, ∞

6 / 49 Inner Products and Norm Relations

n For x, y ∈ R Cauchy-Schwartz inequality

T |x y| ≤ kxk2 kyk2

H¨olderinequality

T T |x y| ≤ kxk1 kyk∞ |x y| ≤ kxk∞ kyk1

Relations

kxk∞ ≤ kxk1 ≤ n kxk∞ √ kxk2 ≤ kxk1 ≤ n kxk2 √ kxk∞ ≤ kxk2 ≤ n kxk∞

7 / 49 Exercises

1 Prove the norm relations on the previous slide

n p 2 For x ∈ R show kxk2 ≤ kxk∞ kxk1

8 / 49 2 T T T T 2 kV xk2 = (V x) (V x) = x V V x = x x = kxk2

Vector Two Norm

Theorem of Pythagoras n For x, y ∈ R

T 2 2 2 x y = 0 ⇐⇒ kx ± yk2 = kxk2 + kyk2

Two norm does not care about orthonormal matrices n m×n T For x ∈ R and V ∈ R with V V = In

kV xk2 = kxk2

9 / 49 Vector Two Norm

Theorem of Pythagoras n For x, y ∈ R

T 2 2 2 x y = 0 ⇐⇒ kx ± yk2 = kxk2 + kyk2

Two norm does not care about orthonormal matrices n m×n T For x ∈ R and V ∈ R with V V = In

kV xk2 = kxk2

2 T T T T 2 kV xk2 = (V x) (V x) = x V V x = x x = kxk2

9 / 49 Exercises

n For x, y ∈ R show 1 Parallelogram equality

2 2 2 2 kx + yk2 + kx − yk2 = 2 kxk2 + kyk2

2 Polarization identity 1 xT y = kx + yk2 − kx − yk2 4 2 2

10 / 49 Matrix Norms (Induced by Vector p Norms)

m×n For A ∈ R and integer p ≥ 1

def kAxkp kAkp = max = max kA ykp x6=0 kxkp kykp=1

One Norm: Maximum absolute column sum m X kAk1 = max |aij | = max kAe j k1 1≤j≤n 1≤j≤n i=1 Infinity Norm: Maximum absolute row sum

n X T kAk∞ = max |aij | = max kA ei k1 1≤i≤m 1≤i≤m j=1

11 / 49 Matrix Norm Properties

Every norm realized by some vector y 6= 0

kAykp y kAkp = = kAzkp where z ≡ kzkp = 1 kykp kykp

{Vector y is different for every A and every p}

Submultiplicativity m×n n Matrix vector product: For A ∈ R and y ∈ R

kAykp ≤ kAkpkykp

m×n n×l Matrix product: For A ∈ R and B ∈ R

kABkp ≤ kAkpkBkp

12 / 49 More Matrix Norm Properties

m×n m×m n×n For A ∈ R and permutation matrices P ∈ R ,Q ∈ R Permutation matrices do not matter

kPAQkp = kAkp

Submatrices have smaller norms than parent matrix   BA12 If P A Q = then kBkp ≤ kAkp A21 A22

13 / 49 Exercises

1 Different norms realized by different vectors

Find y and z so that kAk1 = kAyk1 and kAk∞ = kAzk∞ when

1 4 A = 0 2

n×n 2 If Q ∈ R is permutation then kQkp = 1 3 Prove the two types of submultiplicativity  4 If D = diag d11 ··· dnn then

kDkp = max |djj | 1≤j≤n

n×n −1 5 If A ∈ R nonsingular then kAkpkA kp ≥ 1

14 / 49 Norm Relations and Transposes

m×n For A ∈ R Relations between different norms 1 √ √ kAk ≤ kAk ≤ m kAk n ∞ 2 ∞ 1 √ √ kAk ≤ kAk ≤ n kAk m 1 2 1

Transposes

T T T kA k1 = kAk∞ kA k∞ = kAk1 kA k2 = kAk2

15 / 49 Proof: Two Norm of Transpose

{True for A = 0, so assume A 6= 0}

n Let kAk2 = kAzk2 for some z ∈ R with kzk2 = 1 Reduce to vector norm

2 2 T T T T  T  kAk2 = kAzk2 = (Az) (Az) = z A Az = z A Az | {z } vector Cauchy-Schwartz inequality and submultiplicativity imply

T  T  T T z A Az ≤ kzk2 kA Azk2 ≤ kA k2 kAk2

2 T T Thus kAk2 ≤ kA k2 kAk2 and kAk2 ≤ kA k2

T T Reversing roles of A and A gives kA k2 ≤ kAk2

16 / 49 Matrix Two Norm

m×n A ∈ R Orthonormal matrices do not change anything k×m T l×n T If U ∈ R with U U = Im,V ∈ R with V V = In T kUAV k2 = kAk2

Gram matrices m×n For A ∈ R T 2 T kA Ak2 = kAk2 = kAA k2 Outer products m n For x ∈ R , y ∈ R T kxy k2 = kxk2 kyk2

17 / 49 Proof: Norm of Gram Matrix

Submultiplicativity and transpose

T T 2 kA Ak2 ≤ kAk2 kA k2 = kAk2 kAk2 = kAk2

T 2 Thus kA Ak2 ≤ kAk2

n Let kAk2 = kAzk2 for some z ∈ R with kzk2 = 1 Cauchy-Schwartz inequality and submultiplicativity imply

2 2 T T  T  kAk2 = kAzk2 = (Az) (Az) =z A A z | {z } vector T T ≤k zk2 kA A zk2 ≤ kA Ak2

2 T Thus kAk2 ≤ kA Ak2

18 / 49 Exercises

1 Infinity norm of outer products m n T For x ∈ R and y ∈ R , show kx y k∞ = kxk∞ kyk1

n×n 2 If A ∈ R with A 6= 0 is idempotent then

(i) kAkp ≥ 1

(ii) kAk2 = 1 if A also symmetric

n×n 3 Given A ∈ R 1 T Among all symmetric matrices, 2 (A + A ) is a (the?) matrix that is closest to A in the two norm

19 / 49 n Vector If x ∈ R then kxkF = kxk2

T Transpose kA kF = kAkF √ Identity kInkF = n

Frobenius Norm

The true mass of a matrix

 m×n For A = a1 ··· an ∈ R v v u n u n m q def uX 2 uX X 2 T kAkF = t kaj k2 = t |aij | = (A A) j=1 j=1 i=1

20 / 49 Frobenius Norm

The true mass of a matrix

 m×n For A = a1 ··· an ∈ R v v u n u n m q def uX 2 uX X 2 T kAkF = t kaj k2 = t |aij | = trace(A A) j=1 j=1 i=1

n Vector If x ∈ R then kxkF = kxk2

T Transpose kA kF = kAkF √ Identity kInkF = n

20 / 49 More Frobenius Norm Properties

m×n A ∈ R Orthonormal invariance k×m T l×n T If U ∈ R with U U = Im,V ∈ R with V V = In

T kUAV kF = kAkF

Relation to two norm p p kAk2 ≤ kAkF ≤ (A) kAk2 ≤ min{m, n} kAk2

Submultiplicativity

kABkF ≤k Ak2 kBkF ≤ kAkF kBkF

21 / 49 Exercises

1 Show the orthonormal invariance of the Frobenius norm

2 Show the submultiplicativity of the Frobenius norm

3 Frobenius norm of outer products m n For x ∈ R and y ∈ R show

T kx y kF = kxk2 kyk2

22 / 49 Singular Value Decomposition (SVD) Full SVD

m×n Given: A ∈ R Tall and skinny: m ≥ n   σ1 Σ A = U VT where Σ ≡  ..  ≥ 0 0  .  σn

Short and fat: m ≤ n   σ1  T  ..  A = U Σ 0 V where Σ ≡  .  ≥ 0 σm

m×m n×n U ∈ R and V ∈ R are orthogonal matrices

24 / 49 Names and Conventions

m×n A ∈ R with SVD Σ A = U VT or A = U Σ 0 VT 0

Singular values (svalues): Diagonal elements σj of Σ

Left singular vector matrix:U

Right singular vector matrix:V

Svalue ordering σ1 ≥ σ2 ≥ ... ≥ σmin{m, n} ≥ 0

Svalues of matrices B, C: σj (B), σj (C)

25 / 49 SVD Properties

m×n A ∈ R Number of svalues equal to small m×n A ∈ R has min{m, n} singular values σj ≥ 0 Orthogonal invariance m×m n×n If P ∈ R and Q ∈ R are orthogonal matrices then P A Q has same svalues as A

Gram Product Nonzero svalues (= eigenvalues) of AT A are squares of svalues of A

Inverse n×n A ∈ R nonsingular ⇐⇒ σj > 0 for 1 ≤ j ≤ n If A = U Σ VT then A−1 = V Σ−1 UT SVD of inverse 26 / 49 Exercises

1 TransposeA T has same singular values as A

2 Orthogonal matrices n×n All singular values of A ∈ R are equal to 1 ⇐⇒ A is

n×n 3 If A ∈ R is symmetric and idempotent then all singular values are 0 and/or 1

m×n 4 For A ∈ R and α > 0 express svalues of (AT A + α I)−1 AT in terms of α and svalues of A   In 5 If A ∈ m×n with m ≥ n then singular values of are R A q 2 equal to 1 + σj for 1 ≤ j ≤ n

27 / 49 Singular Values

m×n A ∈ R with svalues σ1 ≥ · · · ≥ σp p ≡ min{m, n}

Two norm kAk2 = σ1

q 2 2 Frobenius norm kAkF = σ1 + ··· + σp

Well conditioned in absolute sense

|σj (A) − σj (B)| ≤ kA − Bk2 1 ≤ j ≤ p

Product

σj (A B) ≤ σ1(A) σj (B) 1 ≤ j ≤ p

28 / 49 Matrix Schatten Norms m×n A ∈ R , with singular values σ1 ≥ · · · ≥ σρ > 0, and integer p ≥ 0, the family of the Schatten p-norms is defined as

ρ !1/p def X p kAkp = σi . i=1

Different than the vector-induced matrix p-norms1. Schatten zero norm2: equal to the matrix rank. Schatten one norm: the sum of the singular values of the matrix, also called the nuclear norm. Schatten two norm: the Frobenius norm. Schatten infinity norm: the spectral (or two) norm. Schatten p-norms are unitarily invariant, submultiplicative, satisfy H¨older’sinequality, etc. 1Notation is, unfortunately, confusing. 2Not really a norm... 29 / 49 Exercises

1 Norm of inverse n×n If A ∈ R nonsingular with svalues σ1 ≥ · · · ≥ σn −1 then kA k2 = 1/σn

2 Appending a column to a tall and skinny matrix m×n m  If A ∈ R with m > n, z ∈ R , B = A z then

σn+1(B) ≤ σn(A) σ1(B) ≥ σ1(A)

3 Appending a row to a tall and skinny matrix m×n n T T  If A ∈ R with m ≥ n, z ∈ R ,B = A z then q 2 2 σn(B) ≥ σn(A) σ1(A) ≤ σ1(B) ≤ σ1(A) + kzk2

30 / 49 Rank

rank(A) = number of nonzero (positive) svalues of A

Zero matrix rank(0) = 0

Rank bounded by small dimension m×n If A ∈ R then rank(A) ≤ min{m, n} Transpose rank(AT ) = rank(A)

Gram product rank(AT A) = rank(A) = rank(AAT )

General product rank(A B) ≤ min{rank(A), rank(B)}

If A nonsingular then rank(A B) = rank(B)

31 / 49 SVDs of Full Rank Matrices

m×n All svalues of A ∈ R are nonzero

Full column-rank rank(A) = n {Linearly independent columns}

Σ A = U VT Σ ∈ n×n nonsingular 0 R

Full row-rank rank(A) = m {Linearly independent rows}

 T m×m A = U Σ 0 V Σ ∈ R nonsingular

Nonsingular rank(A) = n = m {Lin. indep. rows & columns}

T n×n A = U Σ V Σ ∈ R nonsingular

32 / 49 Exercises

1 Rank of outer product m n T If x ∈ R and y ∈ R then rank(xy ) ≤ 1 n×n  2 If A ∈ R nonsingular then AB has full row-rank n×k for any B ∈ R

3 Orthonormal matrices m×n If A ∈ R has orthonormal columns then rank(A) = n and all svalues of A are equal to 1

m×n 4 Gram products For A ∈ R (i) rank(A) = n ⇐⇒ AT A nonsingular (ii) rank(A) = m ⇐⇒ AAT nonsingular

33 / 49 Thin SVD

Σ A ∈ m×n with A = U VT or A = U Σ 0 VT R 0

Singular values σ1 ≥ · · · ≥ σp ≥ 0, p ≡ min{m, n}   Singular vectorsU= u1 ··· um V = v1 ··· vn

If rank(A) = r then thin (reduced) SVD {only non zero svalues}

   T  σ1 v1 r   ..   .  X T A = u1 ··· ur  .   .  = σj uj vj T j=1 σr vr

34 / 49 Absolute distance of A to set of rank-k matrices

min kA − Bk2 = kA − Ak k2 = σk+1 rank(B)=k v u r u X 2 min kA − BkF = kA − Ak kF = t σj rank(B)=k j=k+1

Optimality of SVD

m×n Pr T GivenA ∈ R , rank(A) = r, thin SVD j=1 σj uj vj Approximation from k dominant svalues

k X T Ak ≡ σj uj vj 1 ≤ k < r j=1

35 / 49 Optimality of SVD

m×n Pr T GivenA ∈ R , rank(A) = r, thin SVD j=1 σj uj vj Approximation from k dominant svalues

k X T Ak ≡ σj uj vj 1 ≤ k < r j=1

Absolute distance of A to set of rank-k matrices

min kA − Bk2 = kA − Ak k2 = σk+1 rank(B)=k v u r u X 2 min kA − BkF = kA − Ak kF = t σj rank(B)=k j=k+1

35 / 49 Exercises

1 Find an example to illustrate that a closest matrix of rank k is not unique in the two norm

36 / 49 Moore Penrose Inverse

m×n GivenA ∈ R with rank(A) = r ≥ 1 and SVD

r Σ 0 X A = U r VT = σ u vT 0 0 j j j j=1

Moore Penrose inverse

 −1  r † def Σr 0 T X 1 T A = V U = vj uj 0 0 σj j=1

† Zero matrix0 m×n = 0n×m

37 / 49 Special Cases of Moore Penrose Inverse

Nonsingular n×n † −1 If A ∈ R with rank(A) = n then A = A −1 −1 InverseA A = In = A A

Full column rank m×n † T −1 T If A ∈ R with rank(A) = n then A = (A A) A † Left inverseA A = In

Full row rank m×n † T T −1 If A ∈ R with rank(A) = m then A = A (AA ) † Right inverseAA = Im

38 / 49 Necessary and Sufficient Conditions

A† is Moore Penrose inverse of A ⇐⇒ A† satisfies

1 AA† A = A

2 A† AA† = A†

3 (A A†)T = A A † {symmetric}

4 (A† A)T = A† A {symmetric}

39 / 49 Exercises

m n 1 If x ∈ R and y ∈ R with y 6= 0 then † kx y k2 = kxk2/kyk2 m×n 2 For A ∈ R the following matrices are idempotent: † † † † AA A AIm − AA In − A A m×n † † 3 If A ∈ R and A 6= 0 then kAA k2 = kA Ak2 = 1 m×n 4 If A ∈ R then † † (Im − AA ) A = 0m×n A (In − A A) = 0m×n

5 m×n T −1 † 2 If A ∈ R with rank(A) = n then k(A A) k2 = kA k2

m×n 6 If A = B C where B ∈ R has rank(B) = n and n×n † −1 † C ∈ R is nonsingular then A = C B 40 / 49 Matrix Spaces and Singular Vectors: A

m×n A ∈ R Column space

n m range(A) = {b : b = A x for some x ∈ R } ⊂ R

Null space (kernel)

n null(A) = {x : A x = 0} ⊂ R

If rank(A) = r ≥ 1

   T   Σr 0 Vr A = Ur Um−r T 0 0 Vn−r

range(A) = range(Ur ) null(A) = range(Vn−r )

41 / 49 Matrix Spaces and Singular Vectors: AT

m×n A ∈ R Row space

T T m n range(A ) = {d : d = A y for some y ∈ R } ⊂ R

Left null space

T T m null(A ) = {y : A y = 0} ⊂ R

If rank(A) = r ≥ 1

   T   Σr 0 Vr A = Ur Um−r T 0 0 Vn−r

T T range(A ) = range(Vr ) null(A ) = range(Um−r )

42 / 49 Fundamental Theorem of Linear Algebra

m×n A ∈ R

T m range(A) ⊕ null(A ) = R implies m = rank(A) + dim null(AT ) range(A) ⊥ null(AT )

T n range(A ) ⊕ null(A) = R implies n = rank(A) + dim null(A) range(AT ) ⊥ null(A)

43 / 49 Spaces of the Moore Penrose Inverse

{Need this for least squares}

Column space

range(A†) = range(AT A) = range(AT ) ⊥ null(A)

Null space

null(A†) = null(A AT ) = null(AT ) ⊥ range(A)

44 / 49 Least Squares (LS) Problems General LS Problems

m×n m Given A ∈ R and b ∈ R

min kAx − bk2 x

General LS solution y=A† b + q for any q ∈ null(A)

All solutions have same LS residual r ≡ b − A y {A y = A A† b since q ∈ null(A)}

LS residual orthogonal to column spaceA T r = 0

LS solution of minimal two norm y=A† b

Computation: SVD

46 / 49 Exercises

1 Use properties of Moore Penrose inverses to show that the LS residual is orthogonal to the column space of A

2 Determine the minimal norm solution for minx kA x − bk2

if A = 0m×n

3 If y is the minimal norm solution to minx kA x − bk2 and AT b = 0, then what can you say about y?

4 Determine the minimal norm solution for minx kA x − bk2 T m n if A = c d where c ∈ R and d ∈ R ?

47 / 49 Full Column Rank LS Problems

m×n m Given A ∈ R with rank(A) = n and b ∈ R

min kAx − bk2 x

Unique LS solution y=A† b

Computation: QR decomposition

T 1 Factor A = Q R where Q Q = In and R is

2 Multiply c = QT b

3 Solve R y = c

Do NOT solve AT A y = AT b

48 / 49 Exercises

m×n 1 If A ∈ R with rank(A) = n has thin QR factorization T A = Q R where Q Q = In and R is then

A† = R−1 QT

m×n † T 2 If A ∈ R has orthonormal columns then A = A

m×n 3 If A ∈ R has rank(A) = n then

† kIm − AA k2 = min{1, m − n}

49 / 49