MSc in Systems and Control Engineering

Lab component for Mathematical Techniques module Numerical : Decompositions

by

G.D. Halikias and J. Kiskiras

London October 2006 1 Introduction

The purpose of this lab is to twofold: The first aim is to familiarize you with the main linear algebraic Matlab functions, related to the MSc module “Mathematical Techniques”, with special emphasis on eigenvalues and eigen- vectors, spectral decomposition of square matrices, the Jordan and Schur form of a matrix, the singular value decomposition and some of its applica- tions (computation of rank and nullity of a matrix, generation of orthogo- nal bases of its range and kernel, solution of linear equations, reduced-rank matrix approximations). This material, along with some basic theory and simple computational exercises is contained in sections 2 and 3. You can go through these exercises at your own pace (during or outside the formal lab sessions); for additional theory background consult your notes and the suggested textbooks. For difficulties with Matlab programming consult the lab demonstrators and use liberally Matlab’s on-line help functions.

The exercises included in section 4 are more demanding, both analytically and computationally. Do not attempt these before you have been familiar with the basics of Matlab programming and the exercises in sections 2 and 3. The exercises include: 1. Programming a function that reduces a matrix into using a sequence of elementary transformations, and using your pro- gramme to solve linear equations and calculate the numerical rank and row-span of a matrix.

2. Verifying “Sylvester’s law of inertia” via simple numerical experiments.

3. Verifying Perron/Frobenious theorem for non-negative matrices and calculating numerically the dominant eigenvalue and eigenvector via an iterative procedure (“Power method”).

4. Implementing least-squares estimation of linear models by fitting the “best” to a set of points lying on the plane.

Note that all exercises form part of the formal assessment of the module. Submit your report by the set deadline; this should include answers to all questions, Matlab code, plots and analytical calculations, whenever needed, to support your arguments. Do not include lengthy pieces of theory in your report!

2 2 Eigenvalues and eigenvectors

The term “eigenvalue” is a partial translation of the German “eigenvert”. A complete translation would be something like “own value” or “characteristic value”, but these are rarely used. As an application, eigenvalues can corre- spond to frequencies of vibration, or critical values of stability parameters, or energy levels of atoms. To motivate the definitions of eigenvalue and eigenvector and to give a simple example of one of their applications, we begin by considering the 2×2 matrix · ¸ 3 3 A = 1 5 If we set £ ¤0 x1 = 3 −1 and £ ¤0 x2 = 1 1 then it is easy to verify that · ¸ · ¸ · ¸ · ¸ · ¸ · ¸ · ¸ · ¸ 3 3 3 6 3 3 3 1 6 1 Ax = = = 2 and Ax = = = 6 1 1 5 −1 −2 −1 2 1 5 1 6 1

In other words the linear transformation A simply multiplies the vector x1 by a factor of two and the vector x2 by a factor of six. We call the number two an eigenvalue of A corresponding to the eigenvector x1 and the number six an eigenvalue of the matrix A corresponding to the eigenvector x2, respectively.

Notice that in this case x1 and x2 are linearly independent and hence they form a basis of R2. Then it follows that every x ∈ R2 can be written uniquely in the form x = c1x1 + c2x2, where c1, c2 ∈ R, so that

Ax = A(c1x1 + c2x2) = c1Ax1 + c2Ax2 = 2c1x1 + 6c2x2 and more generally k k k A = 2 c1x1 + 6 c2x2 Thus, it is obvious that the knowledge of the eigenvalues and eigenvectors of a matrix can be used to simplify computations with the matrix.

3 Problem 2.1. The fundamental algebraic eigen-problem is the determination of those values of λ for which the set of n homogeneous linear equations in n unknowns Ax = λx (1) has a non-trivial solution.

Equation (1) may be written in the form

(λI − A)x = 0 and for almost all λ this set of equations has the unique solution x = 0. Non-trivial solutions arise if and only if the matrix (λI −A) is singular, that is det(λI − A) = 0

Definition 2.1. The equation det(λI − A) = 0 is called the characteristic equation of A, while the polynomial p(λ) := det(λI − A) is called the charac- teristic polynomial of A.

Remark 2.1. Expanding the determinant in the above definition, the char- acteristic polynomial can be written as

n−1 n det(λI − A) = α0 + α1λ + ··· + αn−1λ + λ Yn (2) = (λ − λi) i=1 Since the coefficient of λn is not zero, the characteristic equation always has n roots (i.e. the eigenvalues of A), say λ1, λ2, . . . , λn, which may be real or complex, of any multiplicities up to n.

In general, if the matrix (λI − A) is of rank less than (n − 1) then there will be more than one independent vectors satisfying (λI − A)x = 0. It is evident that if x is a solution to (λI − A)x = 0, then kx is also a solution for every scalar k, so that even when (λI − A) is of rank n − 1, the eigenvector corresponding to λ is arbitrary to the extent of a constant multiplier. This gives rise to the following definition.

4 Definition 2.2. For any eigenvalue λ of a n-by-n matrix A, the set

ker(λI − A) = {x ∈ Rn :(λI − A)x = 0} is the nullspace of the matrix λI − A, a subspace of Rn, or else called the eigenspace corresponding to the eigenvalue λ. · ¸ 3 3 Example 2.1. Find the eigenvalues of A = and find all eigenvectors. 1 5 The characteristic equation of A is µ· ¸¶ λ − 3 −3 det(λI − A) = 0 ⇒ det = 0 ⇒ (λ − 3)(λ − 5) − 3 = 0 −1 λ − 5 ⇒ λ2 − 8λ + 12 = 0 ⇒ (λ − 2)(λ − 6) = 0

Hence the eigenvalues of A are λ1 = 2 and λ2 = 6, with corresponding eigenvectors · ¸ · ¸ 3 1 x = and x = 1 −1 2 1 respectively. The eigenspace corresponding to the eigenvalue 2 is ½ · ¸ ¾ ½ · ¸ ¾ 1 3 3 x ∈ R2 : x = 0 = c : c ∈ R 1 3 −1

The eigenspace corresponding to the eigenvalue 6 is ½ · ¸ ¾ ½ · ¸ ¾ −3 3 1 x ∈ R2 : x = 0 = c : c ∈ R . 1 −1 1

According to definition 2.1 and remark 2.1 if λi is an eigenvalue of a matrix A, then it is a root of the characteristic polynomial p(λ) := det(λI − A) = Qn i=1(λ − λi). The multiplicity of this root, i.e. the number of times the factor (λ − λi) appears in equation (2), is called the algebraic multiplicity of the eigenvalue λi. If the algebraic multiplicity of an eigenvalue λi is equal to 1 then λi is said to be simple. Proposition 2.1. Any n × n matrix of rank n − r (1 ≤ r ≤ n) has a zero eigenvalue (λi = 0) of multiplicity m ≥ r.

5 Further, there is another notion of multiplicity of an eigenvalue.

Definition 2.3. The dimension of the eigenspace ker(λI − A) is called ge- ometric multiplicity of the eigenvalue λ.

Essentially, the above definition suggests that the geometric multiplicity of an eigenvalue, say λi, is equal to the number of linearly independent eigenvectors associated with λi.

Proposition 2.2. Geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity.

In other words, if the n × n matrix A has an eigenvalue λi and λiI − A has rank n − r, then λi has multiplicity m ≥ r.

Definition 2.4. If the algebraic multiplicity of an eigenvalue λi exceeds its geometric multiplicity, then λi is said to be a defective eigenvalue. Further, a matrix with a defective eigenvalue is referred to as a defective matrix.

Example 2.2. Consider the matrix   17 −10 −5 A =  45 −28 −15 −30 20 12

In order to find the eigenvalues of A, we need to find the roots of   λ − 17 10 5 det  −45 λ + 28 15  = 0 30 −20 λ − 12

2 or equivalently, (λ + 3)(λ − 2) = 0. The eigenvalues are therefore λ1 = −3 and λ2 = 2. An eigenvector corresponding to the eigenvalue -3 is a solution of the system (−3I − A)x = 0, i.e.     20 −10 −5 1

(3I + A)x = 0 ⇒  45 −25 −15 x = 0 with root x1 =  3  −30 20 15 −2

6 An eigenvector corresponding to the eigenvalue 2 is a solution of the system (2I − A)x = 0, i.e.       −15 10 5 1 2

−45 30 15  x = 0 with roots x2 = 0 , x3 = 3 30 −20 −10 3 0

Note that the eigenspace corresponding to the eigenvalue -3 is a line passing through the origin, while the eigenspace corresponding to the eigenvalue 2 is a plane through the origin. Note also that the eigenvectors x1, x2, x3 are linearly independent and so form a basis for R3.

Example 2.3. Consider the matrix   2 −1 0 A = 1 0 0 0 0 3

In order to find the eigenvalues of A, we need to find the roots of   λ − 2 1 0 det  −1 λ 0  = 0 0 0 λ − 3

2 in other words, (λ − 3)(λ − 1) = 0. The eigenvalues are therefore λ1 = 3 and λ2 = 1. An eigenvector corresponding to the eigenvalue 3 is a solution of     1 1 0 0

(3I − A)x = 0 ⇒ −1 3 0 x = 0, with root x1 = 0 0 0 0 1 An eigenvector corresponding to the eigenvalue 1 is a solution of     −1 1 0 1

(I − A)x = 0 ⇒ −1 1 0  x = 0, with root x2 = 1 0 0 −2 0

Corollary 2.1. In the light of definition 2.4, we say that a matrix is non- defective if and only if it is diagonalisable.

7 2.1 Spectral decomposition If T is an arbitrary nonsingular (i.e. det(T ) 6= 0) matrix of compatible dimensions with A, the transformations of the form A = T AT¯ −1, A¯ = T −1AT are known as similarity transformations. It is a fact that the eigenvalues remain invariant under such transformations and that if x is an eigenvector of A, then the corresponding eigenvector of A¯ will be T −1x. When A has a complete set of eigenvectors (i.e. when A is n × n and has n linearly independent eigenvectors) we can choose T to be the so-called modal matrix formed by these n eigenvectors, so that Λ = T −1AT is diagonal. Fur- ther, Λ is called spectral matrix. Unfortunately, as stated in the corollary of the previous section, not all matrices can be diagonalised by similarity trans- formations: Matrices of this type do not have a complete set of eigenvectors and are said to be defective (see definition 2.4).

Diagonalisation process. Suppose that A ∈ Rn×n. 1. Determine whether we can find n linearly independent eigenvectors.

2. If not, then A is not diagonalisable. If we can, then write   λ1 0 ··· 0  .  £ ¤  .. .   0 λ2 . .  V = x1 x2 ··· xn and Λ =  . . .   . .. .. 0 

0 ··· 0 λn

where λ1, λ2, . . . , λn ∈ R are the eigenvalues of A and x1, x2, . . . , xn ∈ Rn are their corresponding eigenvectors. Then, V −1AV = Λ. Formally, we say that A = V ΛV −1 is the eigenvalue (or spectral) decompo- sition of A. Example 2.4. Consider the matrix   17 −10 −5 A =  45 −28 −15 −30 20 12

8 as in example 2.2. We have V −1AV = Λ, where     1 1 2 −3 0 0 V =  3 0 3 and Λ =  0 2 0 −2 3 0 0 0 2

Exercise 2.1. Verify this result using the Matlab function [V,D]=eig(A).

Exercise 2.2. Compute now R := T −1V ΛV −1T , where T is a random non- singular matrix. Find what are the eigenvalues and eigenvectors of R and how they are related with the eigenpairs of A. Hint: Use the MATLAB function randn.

Exercise 2.3. Generate the eigenvalues of several square real random matri- ces. Observe that if the matrix has complex eigenvalues, these always appear in conjugate pairs. Can you explain this? (Hint: If a polynomial with real coefficients has complex roots these appear in conjugate pairs).

Exercise 2.4. Find the eigenvalues and eigenvectors of the following n × n matrices:     s 1 0 ··· 0 s 0 0 ··· 0      ..   .. . 0 s 1 . 0 1 s 0 . .  .   .  1.  ..  2.  ..  0 0 s 0 0 1 s 0 . . . .  . . . .  ...... 1 ...... 0 0 0 ··· 0 s 0 ··· 0 1 s and     s 1 0 0 s 0 0 0     0 s 0 0 0 s 0 0 3.   4.   0 0 s 1 0 0 s 1 0 0 0 s 0 0 0 s Briefly discuss your results and state whether any of the above matrices is defective. Hint: The matrix in (1) can be constructed in MATLAB as follows:

9 s=sym(’s’) n=7; for i=1:n for j=1:n if i==j A(i,j)=s; elseif j>i & j

Another way of constructing A, in a more compact form, is the following:

A=diag(s*ones(n,1)) +diag(ones(n-1,1),1)

Exercise 2.5. The trace of a A is defined as the sum of its diagonal elements. Verify numerically that the trace of a matrix is equal to the sum of its eigenvalues. Can you prove this? (Hint: Prove this first for a 2 × 2 matrix and then try to generalize your result). Also verify (and then prove) that if n−1 n p(λ) = α0 + α1λ + ··· + αn−1λ + λ is the characteristic polynomial of an n×n matrix A, then αn−1 = −trace(A) n+1 and α0 = (−1) det(A). Again work first with a 2 × 2 matrix and then generalise your result.

Exercise 2.6. A real square matrix A is called symmetric if A = A0. Verify (and show) that (i) every has real eigenvalues; (ii) If the eigenvalues are simple, the eigenvectors of a symmetric matrix are mutually 0 orthogonal (i.e. xixj = 0 for i 6= j).

2.2 Jordan and Schur decompositions As previously discussed, the eigenvalue decomposition attempts to find a Λ and a nonsingular matrix V so that

A = V ΛV −1

10 A theoretical difficulty is that the above decomposition does not always exist. A more general decomposition is the Jordan decomposition

A = XJX−1

If A is not defective, then it is the same with the eigenvalue decomposition. The columns of X are the eigenvectors and J = Λ is diagonal. But if A is defective, then X consists of eigenvectors and generalised eigenvectors. The matrix J has the eigenvalues on the diagonal and ones on the superdiago- nal in positions corresponding to the columns of X that are not ordinary eigenvectors. The rest of the elements of J are zero.

Theorem 2.1 (Jordan Decomposition). If A ∈ Cn×n, then there exists n×n −1 a nonsingular X ∈ C such that X AX = diag(J1,J2,...,Jn) where   λi 1 0 ··· 0    .. .. .   0 λi . . .   . . . .  Ji =  ......   . 0   . . .   . .. .. 1 

0 ······ 0 λi is mi-by-mi and m1 + m2 + ··· + mt = n.

The Ji are referred to as Jordan blocks. The number and dimensions of the Jordan blocks associated with each distinct eigenvalue is unique, although their ordering along the diagonal is not.

Example 2.5 (Jordan decomposition). Typing in MATLAB

A=gallery(5); [V,J]=jordan(A); produces

V =

11 0 -4 11 -9 1 -84 243 -230 70 0 568 -1710 1717 -575 0 -3892 11675 -11674 3891 0 -1024 3072 -3072 1024 0

J =

0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 where VJV −1 is the Jordan decomposition of A, involving only one Jordan block.

A numerically satisfactory alternative to the Jordan decomposition is pro- vided by the Schur form. Any matrix can be transformed to upper triangular form by a unitary similarity transformation

B = T ∗AT

The eigenvalues of A are on the diagonal of its Schur form B.

Example 2.6 (Schur decomposition). Typing in MATLAB

A=gallery(3); [T,B]=schur(A); produces

A =

-149 -50 -154 537 180 546

12 -27 -9 -25

T =

0.3162 -0.6529 0.6882 -0.9487 -0.2176 0.2294 0.0000 0.7255 0.6882

B =

1.0000 -7.1119 -815.8706 0 2.0000 -55.0236 0 0 3.0000 where TBT 0 is the Schur decomposition of A and T 0T = TT 0 = I.

2.3 Summary In abstract linear algebra terms, eigenvalues are relevant if a square, n-by-n matrix A is thought of as the representation of a mapping of a n-dimensional space onto itself. We try to find a basis for the space so that the matrix becomes diagonal. This basis might be complex even if A is real. In fact, if the eigenvectors are not linearly independent, such basis does not even exist. If this is the case, then it is always possible to construct matrices so that A takes a pseudo-diagonal form.

3 Singular value decomposition

For non-square matrices, the analogue of eigenvalue (spectral) decomposition is the singular value decomposition. Both decompositions are widely used in order to solve linear systems of the form Ax = b. However, the singular value decomposition (SVD) reveals the rank of the original matrix A (i.e. the number of linearly independent rows or columns of A) and hence it is used as a computational tool to determine whether a matrix is rank deficient or it has full rank.

13 Theorem 3.1 (SVD). If A ∈ Fm×n, (F = R or C) then there exist unitary matrices

£ ¤ m×m £ ¤ n×n U = u1 , . . . , um ∈ F and V = v1 , . . . , vn ∈ F such that · ¸ r Σ 0 X A = UΣV ∗ = U r V ∗ = σ u v∗ 0 0 i i i i=1 Further, ¡£ ¤¢ ¡£ ¤¢ Im(A) = Im u1 , . . . , ur and Ker(A) = Im vr+1 , . . . , vn where Σr = diag(σ1, . . . , σr), with σ1 ≥ σ2 ≥ ... ≥ σr > 0 and r = min(m, n). Here, ui and vi denote the i-th columns of matrices U and V , respectively.

The above theorem essentially states that the rank of A is the number of nonzero singular values (note that Σ is an m × n diagonal matrix with non- negative entries). Further, if the rank of A is r, then the first r columns of U form a basis for the range of A and the last n − r columns of V form a basis for the null space of A.

Example 3.1. Typing in MATLAB m=4;n=5; A =randn(m,n) [U,S,V] = SVD(A) produces the random 4-by-5 matrix

A =

-0.5062 -1.1245 -1.2559 -0.5723 2.3726 1.6197 1.7357 -0.2135 -0.9776 0.2293 0.0809 1.9375 -0.1989 -0.4468 -0.2666 -1.0811 1.6351 0.3075 1.0821 0.7017 with singular value decomposition

14 U =

0.6435 -0.7552 0.0413 0.1178 -0.5203 -0.5420 -0.4622 -0.4710 -0.5036 -0.2983 0.0964 0.8050 -0.2481 -0.2165 0.8805 -0.3409

S =

3.5025 0 0 0 0 0 2.6348 0 0 0 0 0 2.4005 0 0 0 0 0 0.6662 0

V =

-0.2687 -0.1085 -0.7139 -0.5837 0.2564 -0.8589 -0.3884 0.3241 0.0786 0.0169 -0.1922 0.4012 0.1243 -0.4688 -0.7529 0.0277 0.3269 0.5574 -0.5036 0.5728 0.3905 -0.7547 0.2433 -0.4239 -0.1977

0 0 0 0 where UU = U U = I4 and V V = VV = I5 (check!). From matrix S we conclude that A has four nonzero singular values and hence the rank of A must be equal to four. Indeed, we verify in MATLAB that

>> r=rank(A) r =

4

Further, a basis of the null space of A is given by the last column of V (n − r = 1), i.e.

>> V_perp = V(:,5)

15 V_perp =

0.2564 0.0169 -0.7529 0.5728 -0.1977 so that

>> A*V_perp ans =

1.0e-015 *

0.1110 -0.5135 -0.0139 0.1110 which is equal to zero.

Remark 3.1. The (non-zero) singular values of matrix are the square roots of the (non-zero) eigenvalues of A0A (or AA0). Hence multiplication by any either from the left and/or from the right preserves the singular values.

Remark 3.2. The matrix A is considered as the representation of a linear mapping from the vector space Fn to the vector space Fm. Having in mind ∗ the dyadic form from the theorem above and the fact that vi vj = δij (since V is unitary) it follows that à ! Xr ∗ Avj = σiuivi vj = σjuj i=1

16 So, vj is mapped into σjuj by A. Moreover,

∗ 2 ∗ 2 Avj = σjuj ⇒ A Avj = σj vj and AA uj = σj uj

2 ∗ ∗ which reveals that σj is an eigenvalue of AA or A A, vj is an eigenvector of ∗ ∗ A A and uj is an eigenvector of AA . In a more compact form:

AA∗ = U(ΣΣ∗)U ∗ and A∗A = V ∗(Σ∗Σ)V

Geometrically, the singular values of A are the lengths of a hyperellipsoid; in the case of two dimensions this is described in figure 1.

A = U Σ V*

= σ Av1 1u1 v2 v = σ 1 Av2 2 u2

V*=V -1 e U 2 σ orthogonal 2 e2 orthogonal

σ e1 1e1

σ 0 Σ = 1 0 σ2

Figure 1: Singular values of A as a gain factor

Exercise 3.1. A real symmetric square matrix A is called positive-definite if the quadratic form x0Ax > 0 for all x 6= 0. You can generate positive-definite matrices A by setting A = BB0 where B is an arbitrary non-singular matrix (in fact every positive definite matrix can be written in this form). Verify and show that: (i) Every positive definite matrix has positive eigenvalues;

17 (ii) Eliminating the same number of rows and columns of a positive-definite matrix results in a positive-definite matrix (of reduced dimensions!). In par- ticular, the diagonal elements of a positive-definite matrix are positive.

Exercise 3.2. We can use the largest singular value to measure the “size” of a matrix. Formally, σmax(A) defines an “induced norm”, i.e. kAxk σmax(A) := kAk2 = max x6=0 kxk pP 2 where kxk = i xi is the “Euclidean norm” of the vector x. Establish the above formula, first by working with a diagonal matrix of non-negative elements and then using the SVD, while noting that orthogonal transforma- tions do not affect the Euclidean norm of a vector, i.e. that kUxk = kxk for every orthogonal matrix U - why?

Exercise 3.3. The singular values can be used as a measure of optimal rank approximation: It may be shown that

min kA − Bk2 = σm+1(A) Rank(B)=m i.e. the smallest possible approximation error is given by the (m + 1)-th largest singular value of A. If A is an arbitrary matrix, construct the optimal approximation matrix B or rank m which achieves the minimum. Hint: If A = UΣV 0 is an SVD of A, set B = UΣ˜V 0 and choose an appropriate matrix Σ.˜

3.1 A digital image processing example Example 3.2. The statements load clown figure(1) subplot(2,2,1) image(X) colormap(gray(64)) axis image, axis off title(’rank=200’)

18 produces the first subplot in figure 3. The matrix X obtained with the load statement is 200 × 320 and is numerically of full rank. Its elements lie between 1 and 64 and serve as indices into a gray-scale color map. The resulting picture is a clown. The statements r=rank(X) [U,S,V]=svd(X,0); sigma=diag(S); figure(2) semilogy(sigma,’.’) produce the logarithmic plot of the singular values of X in figure 2. We se that the singular values decrease rapidly. There are one almost equal to 104 and only five greater than 103.

4 10

3 10

2 10

1 10

0 10 0 50 100 150 200

Figure 2: Singular values of full image

The other three subplots in figure 3 show the images obtained from low order rank approximation of X with r = 1, r = 20, and r = 100. The rank-one approximation show the horizontal and vertical lines that result from a single 0 outer product E1 = σ1u1v1. In the r = 20 approximation the image is blurred compared to the full rank image while in the r = 100 there is hardly any visible difference with the original full rank image.

19 r1=1;r2=20;r3=100; figure(1) E1=U(:,1)*sigma(1)*V(:,1)’; subplot(2,2,2) image(E1) axis image, axis off title(’rank = 1’)

E2=U(:,1:r2)*diag(sigma(1:r2))*V(:,1:r2)’; subplot(2,2,3) image(E2) axis image, axis off title(’rank = 20’)

E3=U(:,1:r3)*diag(sigma(1:r3))*V(:,1:r3)’; subplot(2,2,4) image(E3) axis image, axis off title(’rank = 100’)

Exercise 3.4. Load the image “detail” in MATLAB by typing load detail and apply the same procedure in order to check for a reasonable low order rank approximation.

Remark 3.3 (Generalised inverses). The problem we are interested in is to solve the linear equation Ax = b under the assumption that A is a non-square matrix. If we define the pseudo-inverse of A by · ¸ Σ−1 0 A† = V r U ∗ 0 0 then x = A†b solves the linear problem Ax = b, where x is the unknown. It should be observed that this definition of A† does not depend on the particular choice of U and V in the singular value decomposition of A; for whatever U and V are chosen x = A†b is uniquely determined. It is easily

20 rank = 200 rank = 1

rank = 20 rank = 100

Figure 3: Rank approximation verified that if null(A) 6= 0 (i.e. there exists nonzero x so that Ax = 0), then A† is equal to (A∗A)−1A∗, which is known to the literature as the Moore- Penrose generalised inverse of A.

Remark 3.4 (Matrix Annihilator). In the light of Theorem 3.1 we can always construct left and right annihilators of a matrix A. In particular, if the matrix admits a singular value decomposition · ¸ · ¸ £ ¤ Σ 0 V A = UU⊥ 0 0 V⊥

0 0 then we can select K1U⊥ and K2V⊥ (where K1,K2 are square non-singular) 0 to be the left and right annihilators of A, respectively (since U⊥A = 0 and 0 AV⊥ = 0). Exercise 3.5. Using SVD, compute at least two different left and right

21 annihilators of the following matrix:   2 1 4 8 3   3 7 6 12 2   A = 2 5 4 8 1   4 9 8 16 1 1 3 2 4 1

Further, compute the generalised inverse of A.

Exercise 3.6. Consider matrix A as given in exercise 3.5. Diagonalise A, using the eigenvalue decomposition (observe that A is square) and construct right and left annihilators from the modal matrix (formed by the eigenvec- tors).

Exercise 3.7 (Eigenvalues vs Singular values). Construct in MATLAB the following n × n lower (for arbitrary values of n):   −1 0 0 ··· 0    1 −1 0 ··· 0     ..  A =  1 1 −1 . 0   . . . .   ...... 0  1 1 1 1 −1   aij = 1, if i > j i.e. A = a = −1, if i = j . Show that the smallest singular value  ij aij = 0, if i < j behaves as 2−n, so that for large n the matrix is almost “singular” even though all its eigenvalues are clearly nonzero. Try n = {5, 10, 100, 150}.

Exercise 3.8. Compare these three ways to compute the singular values of a real matrix: svd(A) sqrt(eig(A’*A)) Z=zeros(size(A)); s=eig([Z A; A’ Z]); s=s(s>0)

22 4 Additional Computational exercises

Exercise 4.1. (a) Write three separate Matlab functions “add mult row.m”, “scale row.m” and “permute rows.m” to perform the three standard elemen- tary transformations on the rows of an arbitrary matrix. The input/output data of the three functions should be as follows:

function [B,R]=add_mult_row(A,i,j,factor)

Input/Output variables:

• A: arbitrary real matrix

• i: row index, integer

• j: row index, integer

• factor: real non-zero scalar

• B: transformed matrix of the same size as A, whose j-th row is equal to the j-th row of A plus factor times the i-th row of A

• R: square such that B=R*A

The other two functions should have the format:

function [B,R]=scale_row(A,i,factor) and

function [B,R]=permute_rows(A,i,j)

Make sure that indices i and j always fall within the range of the number of rows of the input matrix A (otherwise output variables should be empty and an error message printed). Look at Matlab’s functions: “:”, “size”, “for”.

(b) Test your three functions for various small-size matrices. Check in each case the R and calculate its inverse (use function

23 ”inv”). Is R always invertible? (check using function “det”). Can you spot (and explain!) the structure of R−1 for each of the three transformations? (c) Check Matlab’s function “rref” which reduces a matrix into row echelon form for a few small and medium-size matrices. If you create your test matrices randomly, these will be generically be full rank (i.e. “almost surely” their rank will be equal to the smallest row-column dimension). Think of ways to generate reduced rank (“pathological”) matrices with which to test the function. For each test, determine the rank of the matrix and a basis of its row-span. (d) (Optional:) Write your own version of rref.m using as subroutines your three functions in part (a). Add an additional output argument to your programme, consisting of a structure which accumulates the sequence of el- ementary transformations. To make your programme more readable, write two preliminary functions, one that ”clears” the column of a matrix using an arbitrary “pivot” element, and a second that checks whether the transformed matrix is in echelon form (so that you know that you can stop). It is impor- tant that you carry your calculations inside your function within a certain numerical tolerance (e.g. 10−14) which you can control, e.g. by defining it as an input variable. (e) The following function can be used to calculate the inverse of a square non-singular matrix A:

function B=inv_mat(A,tol) m=size(A,1); % no of rows n=size(A,2); % no of columns % if m ~= n disp(’Error: Input matrix must be square ...’); B=[]; return; end % if abs(det(A)) < tol disp(’Error: Input matrix is almost singular ...’);

24 B=[]; return; end % C=[A eye(n)]; [R,jb] = rref(C); % it is assumed that the R(:,1:n)=eye(n) on exit! B=R(:,n+1:end);

Test the programme and convince yourself that it inverts the input matrix A. Explain why the algorithm works. If you have written your own version of “rref” in part (d), explain how B = A−1 can be decomposed as a product of elementary matrices.

Exercise 4.2. For a square matrix A, the “inertia” of A is defined as the triple (i+, i−, i0) where i+(A), i−(A) and i0(A) denote, respectively, the number of eigenvalues of A with positive, negative and zero real parts. “Sylvester’s law of inertia” says that the inertia of every square matrix is invariant to non-singular congruent transformations, i.e. transformations of the form A → P AP 0 with det(P ) 6= 0. Check Sylvester’s law for a few small- size real matrices A and P . (Note: If you want to include i0 in your tests you need to define it numerically, i.e. with respect to a small tolerance). Does the law apply if A and P are complex and P 0 denotes the complex-conjugate transpose? Make a guess based on several numerical examples!

Exercise 4.3. For a square n × n matrix A, the spectral radius of A, ρ(A), is defined as:

ρ(A) = max |λi(A)| i=1,2,...,n

A square matrix A is called positive (denoted as A > 0) if aij > 0 for every i, j ∈ {1, 2, . . . , n}. The following theorem (“Perron’s theorem”) applies for positive matrices: Theorem: If A is a positive square matrix, then

(i) ρ(A) > 0.

25 (ii) ρ(A) is an eigenvalue of A (called “Perron-Frobenious eigenvalue”).

(iii) There exists a vector x > 0 such that Ax = ρ(A)x.

(iv) ρ(A) is algebraically a simple eigenvalue of A.

(v) |λ| < ρ(A) for every eigenvalue λ 6= ρ(A), i.e. ρ(A) is the unique eigenvalue of A of maximum modulus.

(vi) [ρ(A)−1A]m → L as m → ∞ where L = xy0, Ax = ρ(A)x, y0A = ρ(A)y0, and x0y = 1.

(a) Verify Perron’s theorem numerically using a few positive matrices of small/medium dimensions. You can generate random square positive matri- ces of dimension n using the command abs(randn(n)). (b) Consider the following iterative method (“Power algorithm”) for estimat- ing the Perron-Frobenious eigenvalue and corresponding right eigenvector of a positive n × n matrix A (actually the method is not limited to positive matrices): ¡ ¢0 1. Let z0 be an arbitrary positive vector, e.g. z0 = 1 1 ... 1 .

2. For k=1,2,. . . , set wk = Azk−1, zk = wk/γk, where γk is the component of wk of greatest modulus.

Then, as k → ∞, zk → x/kxk∞ (i.e. largest element of right eigenvector will be scaled to one) and γk → ρ(A). Program this algorithm via Matlab and verify that it converges as stated. Can you explain the behaviour of the algorithm? (Hint: Start by considering the eigenvalue decomposition of A).

Exercise 4.4. Least-squares estimation involves the estimation of a vector of unknown parameters θ by “fitting” the (linear) model y = Xθ + e where y and X are known data and e is the vector of “residuals”. This can be written in full as         y1 x11 x12 . . . x1q θ1 e1          y2   x21 x22 . . . x2q   θ2   e2   .  =  . . .   .  +  .   .   . . .   .   . 

yn xn1 xn2 . . . xnq θq en

26 It can be shown that the unique optimal solution (i.e. the one that minimises the sum of squares of the residuals) is given as θˆ = (X0X)−1X0y provided X has full column rank (this is easily satisfied in practice as typically n À q). P ˆ 2 (check the formula for θ by differentiating i ei ). (a) Show that Xθˆ is the projection of y onto the Range (column-span) of X. Show also thate ˆ = y − Xθˆ = [I − X(X0X)−1X0]y is the projection of y onto Range(X)⊥ = Ker(X0). (b) Illustrate least-squares estimation for straight-line fit via a short matlab program. First select the two (fixed) parameters of the straight line y = mx+c, m (slope) and c intercept. Generate random points xi (i = 1, 2, . . . , n) and the corresponding yi’s. To make the problem more interesting assume that the yi data are corrupted by noise, i.e. definey ˜i = yi +ei (use randn.m to generate the ei’s). The problem now is to select the ”best” estimates of (m, c) to fit the data. Write the model in the standard matrix formy ˜ = Xθ + e. ¡ ¢0 Here θ = m c is the vector of “unknowns” that need to be estimated, and you need to determine what matrix X is in this case. To get the optimal (least-square) estimates use the formula θˆ = (X0X)−1X0y˜ from which you can obtainm ˆ andc ˆ. Compare these with the “true” parameters m and c for different noise levels and plot the data and the “best” straight line to get a clear picture of the success of the fit. In this case you can actually get closed-form formulae form ˆ andc ˆ by working on the formula (X0X)−1X0y˜ (this involves the inverse of a 2 × 2 symmetric matrix which is easy to do analytically). How would you modify the method if you wanted to fit a parabola? a cubic? an n-th order polynomial?

27