Numerical Linear Algebra (Continued)

CS 740 Ajit Rajwade

1 Eigenvectors and Eigenvalues

• Given an n x n A, a non-zero vector v is said to be an eigenvector of A with eigenvalue λ if Av = λv.

• In other words, when transformed by matrix A, vector v either shrinks (|λ| < 1) or expands (|λ| > 1) (decreases or increases in magnitude) but its direction remains the same or flips over. Of course, when |λ| = 1, the magnitude of v undergoes no change.

2 Eigenvectors and Eigenvalues

• The preservation of the direction of the vector v when transformed by A gives rise to the name “eigen” – which is German for “self”.

• Eigenvectors and eigenvalues come in useful in several applications in physics, image and signal processing and machine learning.

• Here we stick to real matrices, though almost all the theory is applicable to complex matrices as well.

3 Examples

4 Non-uniqueness

• Eigenvectors and eigenvalues are not unique. An n x n matrix has n eigenvectors (not necessarily all distinct) and n eigenvalues (not necessarily all distinct). • An eigenvector when scaled by an arbitrary non-zero constant remains an eigenvector with the same eigenvalue. By convention, eigenvectors are normalized to unit magnitude.

5 Characteristic equation • The eigenvalues of A can be obtained by finding the roots of the following equation called the characteristic equation: Av  v  Iv (A  I)v  0 This is a homogeneous equation and it will have a solution if and only if (A  I) is a singular matrix. det(A  I)  0 • This equation has n (not necessarily distinct or real) roots – the eigenvalues of A. The equation can be written as follows:

p()  ( 1)( 2 )...( n )  0 6 Characteristic equation • The characteristic equation of an n x n matrix A has degree n. • It always has n roots – this is as per a result called the fundamental theorem of algebra. • The roots may not be distinct. • The roots may not be real even if A is real.

7 Matrix of eigenvectors and eigenvalues A Rnn

Av1  1v1

Av2  2v2 . .

Avn  nvn  0 0   1   2 0   A(v | v |...| v )  ( v |  v |...|  v )  (v | v |...| v ) . 0  1 2 n 1 1 2 2 n n 1 2 n    . 0     0 0 0 0 n   AV  V 8 http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors Physical example 1

In this transformation, the red arrow changes direction but the blue arrow does not. The blue arrow is an eigenvector of this shear mapping, and since its length is unchanged its eigenvalue is 1. The red direction in the second figure is also an eigenvector – it is the direction in which the cloth is stretched and it has an eigenvalue greater than 1. 9 Physical example 1 • This example helps us design transformation matrices that accomplish a certain task. • Let’s say you wanted to stretch by a factor of 2 in direction [1,1] and apply no stretch or shrinking in direction [1,0]. • We treat these directions as eigenvectors with eigenvalues 2 and 1 respectively.

• This gives us 1 1 2 0 V   ,    1 0 0 1 AV  V

1 1 1 A  VV    10 0 2 Physical example 2 • Consider the following matrix which you will often see in computer graphics: cos  sin 0    sin cos 0    0 0 1 • It is a matrix that rotates a 3D point about the Z axis – the Z coordinate remains unchanged, but the X and Y coordinates change by a in the XY plane.

• In this case, the Z axis, i.e. vector (0 0 1)t is an eigenvector with eigenvalue 1.

• If the axis of rotation changes, the rotation matrix changes. But the axis of rotation will always be an eigenvector of the rotation matrix with eigenvalue 1.

11 Properties of Eigenvalues and Eigenvectors • Trace of a matrix = sum of its eigenvalues • Determinant of a matrix = product of its eigenvalues (hence a singular matrix has at least one 0 eigenvalue). • A matrix is invertible iff (= if and only if) all its eigenvalues are non-zero. • The eigenvalues of A-1 are reciprocals of the eigenvalues of A (can you prove this?). • The eigenvectors of A and A-1 are the same (can you reason why?). • Every eigenvalue of an orthonormal/unitary matrix has absolute value 1 (why?). • Complex eigenvalues always occur in conjugate pairs, i.e. if a+ib is an eigenvalue of real matrix A for eigenvector v, then a-ib will also be an eigenvalue of A for eigenvector v* (can you prove this?).

12 Properties of Eigenvalues and Eigenvectors • Given the eigenvalues and eigenvectors of A, what can you say about the eigenvalues and eigenvectors of A-σI for some scalar σ? • Given the eigenvalues and eigenvectors of A, what can you say about the eigenvalues and eigenvectors of A2 or An for some n > 0? • Note that An =A*A*A*…*A (product with itself n-1 times). This is different from A.^n for which A.^n(i,j) = A(i,j)^n.

13 Properties of Eigenvalues and Eigenvectors • Trace of a matrix = sum of its eigenvalues • Determinant of a matrix = product of its eigenvalues

p()  (  1)(  2 )...(  n )  det(I  A) n n p(0)  (1) 12...n  det(A)  (1) det(A)

det(A)  12...n

n1 The coefficien t of  in p() is  (1  2  ...  n ). The coefficien t of the term containing n1 in det(I  A) can be proved to be the negative sum of the diagonal entries of A, i.e. - trace(A). 14 Hence trace(A)  (1  2  ...  n ). Eigenvectors and eigenvalues of special matrices • Diagonal matrix: eigenvalues are elements along the diagonal, eigenvectors are the column vectors normalized to unit magnitude. • Symmetric matrix: eigenvalues are real (proof), eigenvectors corresponding to distinct eigenvalues are orthogonal (proof). AV V, A VV T • The converse is also true, i.e. a matrix with orthonormal eigenvectors and real eigenvalues is symmetric. A VV T AT  (VV T )T VV T  A 15 Computation of Eigenvectors and Eigenvalues • The following MATLAB commands compute eigenvectors and eigenvalues: [V,D] = eig(A) returns two outputs. D is a diagonal matrix containing the eigenvalues. V is a matrix whose columns are the corresponding right eigenvectors.

[V,D] = eigs(A) returns a diagonal matrix D of A's six largest magnitude eigenvalues and a matrix V whose columns are the corresponding eigenvectors.

[V,D] = eigs(A,k) returns a diagonal matrix D of A's k largest magnitude eigenvalues and a matrix V whose columns are the corresponding eigenvectors.

16 Computation of Eigenvectors and Eigenvalues • Eigenvalues can be computed by computing the roots of the characteristic equation. • This is easy for 2 x 2 matrices, but not for larger matrices, since polynomials in arbitrary degree (greater than 5) usually have no closed form solutions. • How do you compute eigenvectors and eigenvalues?

17 Computing eigenvectors

• Assume n x n matrix A has a unique eigenvalue λ1 of largest magnitude. Let the corresponding eigenvector be v1. • Consider vector x0 which is expressible as a linear combination of eigenvectors of A, i.e.

n x0  ivi i1

• Repeated multiplication of A with x0 converges to a multiple of v1. Why?

18 n x0  ivi i1 k n n  n     x  A k x   A k v   k v  k  v   i   v  k 0  i i  i i i 1  1 1    i i  i1 i1  i2  1   k  11v1 Each such term tends to 0, as k tends to

∞, because λ1 is the largest magnitude eigenvalue of A. This method of eigenvector computation is called as power iteration. It converges to the eigenvector with the largest eigenvalue.

Caution: The possibility that the starting vector x0 has no component in the direction v1 (i.e. if α1 = 0) is very small, and can be eliminated by slightly and randomly perturbing the starting vector.

Caution: If there are two or more eigenvalues that are the largest and have the same magnitude, the starting vector will converge to some linear combination of the corresponding eigenvectors.

Caution: If the initial vector is real and A is real, it will not converge to a complex eigenvector!19 Computation of Eigenvectors and Eigenvalues • How do you compute the largest eigenvalue given the eigenvector? t t t v1 Av1 Av1  1v1  v1 Av1  v11v1  1  t v1v1 • What if you wanted to compute the eigenvector corresponding to the smallest eigenvalue? Work with A-1 instead of A. • What about other eigenvectors? There exist more advanced algorithms for those, which we skip in this course. But we handle one special case.

20 Computation of Eigenvectors and Eigenvalues • Let A be a symmetric matrix. So its eigenvalues are real and its eigenvectors are orthonormal. AV  VΛ n T t A  VV  ivivi i1 • To compute the second eigenvector, work with the following matrix instead of A (v1 ) t A  A  1v1v1 where v1 and λ1 are the estimated first eigenvector and first eigenvalue.

• The eigenvector of A(-v1) corresponding to its largest eigenvalue is the eigenvector of A corresponding to its second largest eiegnvalue.

21 Computation of Eigenvectors and Eigenvalues • Several more advanced algorithms exist – we skip the algorithms in this course. • The algorithms have ready implementation in a famous library called LAPACK – written in Fortran 90, with C/C++/MATLAB interfaces.

22 Singular Value Decomposition (SVD)

23 Singular value Decomposition

• For any m x n matrix A, the following decomposition always exists:

A  USVT , A  Rmn , T T mm U U  UU  Im,U R , T T nn Diagonal matrix with non- V V  VV  In,V  R , negative entries on the mn diagonal – called singular S R values.

Columns of U are the eigenvecto rs of AAT (called the left singular vectors). Columns of V are the eigenvecto rs of AT A (called the right singular vectors). The non - zero singular values are the positive square roots of 24 non - zero eigenvalue s of AAT or AT A. Singular value Decomposition

• For any m x n real matrix A, the SVD consists of matrices U,S,V which are always real – this is unlike eigenvectors and eigenvalues of A which may be complex even if A is real.

• The singular values are always non-negative, even thought the eigenvalues may be negative.

• While writing the SVD, the following convention is assumed, and the left and right singular vectors are also arranged accordingly:

1  2  ...  m1  m 25 Singular value Decomposition

• If only r < min(m,n) singular values are non- zero, the SVD can be represented in reduced form as follows: A  USV T , A Rmn , U  Rmr , V  Rnr , S  Rrr

26 Singular value Decomposition

r T t A  USV  Siiuivi i1

T This m by n matrix ui v i is the product of a column vector ui and the of column vector vi. It has rank 1. Thus A is a weighted summation of r rank-1 matrices.

Note: ui and vi are the i-th column of matrix U and V respectively.

27 Singular value decomposition

A  USVT AAT  (USVT )(USVT )T  USVT VSUT  US2UT Thus, the left singular vectors of A (i.e. columns of U) are the eigenvecto rs of AAT . The singular values of A (i.e. diagonal elements of S) are square - roots of the eigenvalue s of AAT . AT A  (USVT )T (USVT )  VSUTUSVT  VS2VT Thus, the right singular vectors of A (i.e. columns of V) are the eigenvecto rs of AT A. The singular values of A (i.e. diagonal elements of S) are square - roots of the eigenvalue s of AT A.

28 Application: SVD of Natural Images

• An image is a 2D array – each entry contains a grayscale value. The image can be treated as a matrix. • It has been observed that for many image matrices, the singular values undergo rapid decay (note: they are always non-negative). • An image can be approximated with the k largest singular values and their corresponding singular vectors: k A  S u vt ,k  min(m,n)  ii i i 29 i1 Left to right, top to bottom: Reconstructed image using the first i= 1,2,3,5,10,25,50,100,150 singular values and singular vectors. Last image: original

30 Left to right, top to bottom, we display: Note: the spatial T uiv i frequencies increase as the singular values where i = 1,2,3,5,10,25,50,100,150. decrease Note each image is independently re- scaled to the 0-1 range for display purpose. 31 SVD: Use in Image Compression

• Instead of storing mn intensity values, we store (n+m+1)r intensity values where r is the number of stored singular values (or singular vectors). The remaining m-r singular values (and hence their singular vectors) are effectively set to 0.

• This is called as storing a low-rank (rank r) approximation for an image.

32 Properties of SVD: Best low-rank reconstruction • SVD gives us the best possible rank-r approximation to any matrix (it may or may not be a natural image matrix). • In other words, the solution to the following optimization problem: ˆ 2 ˆ min ˆ A  A where rank(A)  r,r  min(m,n) A F is given using the SVD of A as follows:

r Note: We are using the singular vectors ˆ t T corresponding to the r largest singular A  Siiuivi ,where A  USV i1 values.

This property of the SVD is called the Eckart Young Theorem. 33 Properties of SVD: Best low-rank reconstruction ˆ 2 ˆ min ˆ A  A where rank(A)  r,r  min(m,n) A F

m n A  A 2 F  ij i1 j1

Frobenius norm of the matrix (fancy way of saying you square all matrix values, add them up, and then take the square root!) 2 ˆ 2 2 2 Why? Note : A  A   r1  r2 ...  n F 34 Geometric interpretation: Eckart- Young theorem • The best linear approximation to an is its longest axis. • The best 2D approximation to an ellipsoid in 3D is the ellipse spanned by the longest anf second-longest axes. • And so on!

35 Properties of SVD: Singularity

• A A is non-singular (i.e. invertible or full-rank) if and only if all its singular values are non-zero.

• The ratio σ1/σn tells you how close A is to being singular. This ratio is called condition number of A. The larger the condition number, the closer the matrix is to being singular.

36 Properties of SVD: Rank, Inverse, Determinant • The rank of a rectangular matrix A is equal to the number of non-zero singular values. Note that rank(A) = rank(S). • SVD can be used to compute inverse of a square matrix: A  USVT , A  Rnn , A 1  VS1UT • Absolute value of the determinant of square matrix A is equal to the product of its singular values. n T T | det(A)|| det(USV ) || det(U)det(S)det(V )| det(S)  i i1 37 Properties of SVD: Pseudo-inverse

• SVD can be used to compute pseudo-inverse of a rectangular matrix: A  USVT ,A  Rmn , 1 A   VS1UT ,where S1(i,i)  S1(i,i)  for all 0 0 S(i,i) 1 non - zero singular values and S0 (i,i)  0 otherwise.

38 Properties of SVD: Frobenius norm

• The Frobenius norm of a matrix is equal to the square-root of the sum of the squares of its singular values:

m n A  A 2  trace(AT A)  trace((USVT )T (USVT )) F  ij i1 j1  trace(VT S2V)  trace(VVT S2 )  trace(S2 )

2   i i

39 Solution to equations of the form Av = 0 • Remember A is a m by n matrix. • The equation Av = 0 may have a solution apart from the zero-vector (also called as trivial solution). • To obtain the solution, we first compute the SVD of A giving A = USVT. The solution is proportional to the column of V corresponding to the zero singular value.

40 Solution to equations of the form Av = 0 • To obtain the solution, we first compute the SVD of A giving A = USVT. The solution is proportional to the column of V corresponding to the zero singular value.

Let v k be the right singular vector of A, corresponding to the 0 singular value. r T t Avk  USV v k   iuiv i v k i1 r t t   iuiv i v k  kuk v k v k i1,ik

t 41  0  0 (vi vk  0 by orthonormality of V;  k  0) Solution to equations of the form Av = 0 • If more than one singular value is 0, then the corresponding right singular vector (and its scaled versions) are also solutions. • In fact, all such right singular vectors with zero singular values span the null-space of A. • Equations of this form arise in several applications – for example, in computer vision (camera calibration, image alignment). • In some cases, Av = 0 has no exact solution. Instead, we seek to find a solution w such that Aw is as small as possible in magnitude. In this case, w is given by the right singular vector of A corresponding to the smallest singular value.

42 Orthogonal Procrustes problem

• Read the notes from: http://www.cse.iitb.ac.in/~ajitvr/CS740_Fall201 4/procrustes.pdf

43 Geometric interpretation of the SVD

• Any m x n matrix A transforms a sphere Q of unit radius (called as unit sphere) in Rn into a hyperellipse in Rm (assume m >= n).

Q AQ

44 Geometric interpretation of the SVD

• Assume A has full rank for now. • The singular values of A are the lengths of the n principal semi-axes of the hyperellipse. The lengths are thus σ1, σ 2, …, σ n. • The n left singular vectors of A are the directions u1, u 2, …, u n (all unit-vectors) aligned with the n semi-axes of the hyperellipse. • The n right singular vectors of A are the directions v1, v 2, …, v n (all unit-vectors) in sphere S, which the matrix A transforms into the semi-axes of the hyperellipse, i.e. i,Av  u i i i 45 Geometric interpretation of the SVD • Expanding on the previous equations, we get the reduced form of the SVD

n x n diagonal n x n matrix - S orthonormal m x n matrix matrix V with orthonormal columns - U

46 Geometric interpretation of the SVD

• But why does A transform the sphere into a hyperellipse? • This is because A = USVT. • VT transforms the sphere into another (rotated/reflected) sphere. • S stretches the sphere into a hyperellipse whose semi-axes coincide with the coordinate axes as per V. • U rotates/reflects the hyperellipse without affecting its shape. • As any matrix A has an SVD decomposition, it will always transform the sphere into a hyperellipse. • If A does not not have full rank, then some of the semi-axes of the hyper-ellipse will have length 0!

47 Computation of the SVD • We will not explore algorithms to compute the SVD of a matrix, in this course. • SVD routines exist in the LAPACK library and are interfaced through the following MATLAB functions:

s = svd(X) returns a vector of singular values.

[U,S,V] = svd(X) produces a diagonal matrix S of the same dimension as X, with nonnegative diagonal elements in decreasing order, and unitary matrices U and V so that X = U*S*V'.

[U,S,V] = svd(X,0) produces the "economy size" decomposition. If X is m-by-n with m > n, then svd computes only the first n columns of U and S is n-by-n.

[U,S,V] = svd(X,'econ') also produces the "economy size" decomposition. If X is m- by-n with m >= n, it is equivalent to svd(X,0). For m < n, only the first m columns of V are computed and S is m-by-m.

s = svds(A,k) computes the k largest singular values and associated singular vectors of matrix A. 48 SVD Uniqueness

• If the singular values of a matrix are all distinct, the SVD is unique – up to a multiplication of the corresponding columns of U and V by a sign factor (or a phase factor eiθ for complex matrices). • Why? r t t t t A  Siiuivi  S11u1v1  S22u2v2  ...  Srrurvr i1 t t t  S11(-1)u(-1)v1 1  S22u2v2  ...  Srr(-1)u(-1)vr r

49 SVD Uniqueness

• A matrix is said to have degenerate singular values, if it has the same singular value for 2 or more pairs of left and right singular vectors. • In such a case any normalized linear combination of the left (right) singular vectors is a valid left (right) singular vector for that singular value.

50 Any other applications of SVD?

• Face recognition – the popular eigenfaces algorithm. • Point matching: Consider two sets of points, such that one point set is obtained by an unknown rotation of the other. Determine the rotation! • Structure from motion: given a sequence of images of a object undergoing rotational motion, determine the 3D shape of the object as well as the rotation at every time instant!

51 References

• Scientific Computing, Michael Heath • Numerical Linear Algebra, Treftehen and Bau • http://en.wikipedia.org/wiki/Eigenvalues_and _eigenvectors http://en.wikipedia.org/wiki/Power_iteration • http://en.wikipedia.org/wiki/Singular_value_d ecomposition

52