<<

Manchester University Press Series in and

Architectures for Advanced Scientic Computing

NUMERICAL METHODS FOR LARGE

EIGENVALUE PROBLEMS

Y Saad

Contents

Preface vii

I Background in Theory and Linear Alge

bra

Matrices

Square Matrices and Eigenvalues

Typ es of Matrices

Vector Inner Pro ducts and Norms

Matrix Norms

Subspaces

Orthogonal Vectors and Subspaces

Canonical Forms of Matrices

Reduction to the Diagonal Form

The Jordan Canonical Form

The Schur Canonical Form

Normal and Hermitian Matrices

Normal Matrices

Hermitian Matrices

Nonnegative Matrices

II Sparse Matrices

Intro duction

Storage Schemes

ii

Basic Sparse Matrix Op erations

Sparse Direct Solution Metho ds

Test Problems

Random Walk Problem

Chemical Reactions

The HarwellBo eing Collection

SPARSKIT

III Perturbation Theory and Error Analysis

Pro jectors and their Prop erties

Orthogonal Pro jectors

Oblique Pro jectors

Resolvent and Sp ectral Pro jector

Relations with the Jordan form

Linear Perturbations of A

APosteriori Error Bounds

General Error Bounds

The Hermitian Case

The KahanParlettJiang Theorem

Conditioning of problems

Conditioning of Eigenvalues

Conditioning of Eigenvectors

Conditioning of Invariant Subspaces

Lo calization Theorems

IV The Tools of Sp ectral Approximation

Single Vector Iterations

The Power Metho d

The Shifted Power Metho d

Deation Techniques

Wielandt Deation with One Vector

OptimalityinWieldants Deation

Deation with Several Vectors

Partial Schur Decomp osition

Practical Deation Pro cedures

iii

General Pro jection Metho ds

Orthogonal Pro jection Metho ds

The Hermitian Case

Oblique Pro jection Metho ds

Chebyshev Polynomials

Real Chebyshev Polynomials

Complex Chebyshev Polynomials

V Subspace Iteration

Simple Subspace Iteration

Subspace Iteration with Pro jection

Practical Implementations

Lo cking

Linear Shifts

Preconditionings

VI Metho ds

Krylov Subspaces

Arnoldis Metho d

The Basic

Practical Implementations

Incorp oration of Implicit Deation

The Hermitian

The Algorithm

Relation with Orthogonal Polynomials

NonHermitian Lanczos algorithm

The Algorithm

Practical Implementations

Blo ck Krylov Metho ds

Convergence of the Lanczos Pro cess

Distance between K and an Eigenvector

m

Convergence of the Eigenvalues

Convergence of the Eigenvectors

Convergence of the Arnoldi Pro cess

iv

VI I Acceleration Techniques and Hybrid Metho ds

The Basic

Convergence Prop erties

ArnoldiChebyshev Iteration

Purication by Arnoldis Metho d

The Enhanced Initial Vector Approach

Computing an Optimal Ellipse

Starting the Chebyshev Iteration

Cho osing the Parameters m and k

Deated ArnoldiChebyshev

Chebyshev Subspace Iteration

Getting the Best Ellipse

Parameters k and m

Deation

Least Squares Arnoldi

The Least Squares Polynomial

Use of Chebyshev Bases

The

Computing the Best Polynomial

Least Squares Arnoldi Algorithms

VI I I Preconditioning Techniques

Shiftandinvert Preconditioning

General Concepts

Dealing with Complex Arithmetic

ShiftandInvert Arnoldi

Polynomial Preconditioning

Davidsons Metho d

Generalized Arnoldi Algorithms

IX NonStandard Eigenvalue Problems

Intro duction

Generalized Eigenvalue Problems

General Results

Reduction to Standard Form

Deation

v

ShiftandInvert

Pro jection Metho ds

The Hermitian Denite Case

Quadratic Problems

From Quadratic to Generalized Problems

X Origins of Matrix Eigenvalue Problems

Intro duction

Mechanical Vibrations

Electrical Networks

Quantum Chemistry

Stability of Dynamical Systems

Bifurcation Analysis

Chemical Reactions

Macroeconomics

Markov Chain Mo dels

References

Index vi

Preface

Matrix eigenvalue problems arise in a large numb er of disciplines

of sciences and They constitute the basic to ol used

in designing buildings bridges and turbines that are resistent

to vibrations They allow to mo del queueing networks and to

analyze stability of electrical networks or uid ow They also

allow the scientist to understand lo cal physical phenonema or

to study bifurcation patterns in dynamical systems In fact the

writing of this b o ok was motivated mostly by the second class of

problems

Several b o oks dealing with numerical metho ds for solving eigen

value problems involving symmetric or Hermitian matrices have

been written and there are a few software packages b oth public

and commercial available The book by Parlett is an ex

cellent treatise of the problem Despite a rather strong demand

by engineers and scientists there is little written on nonsymmetric

problems and even less is available in terms of software The

book by Wilkinson still constitutes an imp ortant reference

Certainly science has evolved since the writing of Wilkinsons

b o ok and so has the computational environment and the demand

for solving large matrix problems Problems are b ecoming larger

and more complicated while at the same time are able

to deliver ever higher p erformances This means in particular that

metho ds that were deemed to o demanding yesterday are now in

the realm of the achievable I hop e that this b o ok will b e a small

step in bridging the gap between the literature on what is avail

able in the symmetric case and the nonsymmetric case Both

viii Preface

the Hermitian and the nonHermitian case are covered although

nonHermitian problems are given more emphasis

This book attempts to achieve a go o d balance between the

ory and practice I should comment that the theory is esp ecially

imp ortant in the nonsymmetric case In essence what dierenti

ates the Hermitian from the nonHermitian eigenvalue problem is

that in the rst case we can always manage to compute an ap

proximation whereas there are nonsymmetric problems that can

b e arbitrarily dicult to solve and can essentially makeany algo

rithm fail Stated more rigorouslythe eigenvalue of a Hermitian

matrix is always wellconditioned whereas this is not true for non

symmetric matrices On the practical side I tried to give a general

view of algorithms and to ols that have proved ecient Many of

the algorithms describ ed corresp ond to actual implementations

of research software and have b een tested on realistic problems

I have tried to convey our exp erience from the practice in using

these techniques

As a result of the partial emphasis on theory there are a few

chapters that may be found hard to digest for readers inexp eri

enced with These are Chapter III and to some

extent a small part of Chapter IV Fortunately Chapter III is

basically indep endent of the rest of the b o ok The minimal back

ground needed to use the algorithmic part of the b o ok namely

Chapters IV through VIII is calculus and linear algebra at the

undergraduate level The b o ok has b een used twice to teachaspe

cial topics course once in a Mathematics department and once in

a Science department In a quarter period represent

ing roughly weeks of hours lecture p er week Chapter I I I I

and IV to VI have been covered without much diculty In a

semester p erio d weeks of hours lecture weeklyallchapters

canbecovered with various degrees of depth Chapters II and X

need not b e treated in class and can b e given as remedial reading

Finally I would like to extend my appreciation to a number

of people to whom I am indebted Francoise Chatelin who was

my thesis adviser intro duced me to numerical metho ds for eigen

Preface ix

value problems Her inuence on my way of thinking is certainly

reected in this book Beresford Parlett has b een encouraging

throughout my career and has always b een a real inspiration

Part of the motivation in getting this book completed rather

than never nished is owed to L E Scriven from the Chemical

Engineering department and to many others in applied sciences

who expressed interest in mywork I am indebted to Roland Fre

und who has read this manuscript with great care and has p ointed

out numerous mistakes

Minneap olis December

Youcef Saad

x Preface

Chapter I

Background in Matrix

Theory and Linear Algebra

This chapter reviews basic matrix theory and intro duces some

of the elementary notation used throughout the b o ok Matrices

are ob jects that represent linear mappings b etween vector spaces

The notions that will b e predominantly used in this b o ok are very

intimately related to these linear mappings and it is p ossible to

discuss eigenvalues of linear op erators without ever mentioning

their matrix representations However to the numerical analyst

or the engineer any theory that would b e develop ed in this man

ner would be insucient in that it will not be of much help in

developing or understanding computational algorithms The ab

straction of linear mappings on vector spaces do es however pro

vide very concise denitions and some imp ortant theorems

Chapter I

Matrices

When dealing with eigenvalues it is more convenient if not more

relevant to manipulate complex matrices rather than real matri

ces A complex n m matrix A is an n m array of complex

numb ers

a i n j m

ij

The of all n m matrices is a complex denoted

nm

The main op erations with matrices are the following by C

Addition C A B where A B and C are matrices of

size n m and

c a b

ij ij ij

i n j m

Multiplication by a C A where c a

ij ij

Multiplication by another matrix

C AB

nm mp np

where A C B C C C and

m

X

c a b

ij ik kj

k

A notation that is often used is that of column vectors and

rowvectors The column vector a is the vector consisting of the

j

j th column of A ie a a Similarly we will use the

j ij in

notation a to denote the ith row of the matrix A For example

i

we may write that

A a a a

m

or

a

B C

a

B C

B C

A B C

B C

A

a n

Background

nm mn

is a matrix C in C The of a matrix A in C

whose elements are dened by c a i m j n

ij ji

T

The transp ose of a matrix A is denoted by A It is more rele

vant in eigenvalue problems to use the transpose conjugate matrix

H

denoted by A and dened by

H T

T

A A A

in which the bar denotes the elementwise complex conjugation

Finallywe should recall that matrices are strongly related to

linear mappings b etween vector spaces of nite dimension They

are in fact representations of these transformations with resp ect

to two given bases one for the initial vector space and the other

for the image vector space

Square Matrices and Eigenvalues

nn

A matrix b elonging to C is said to be square Some notions

are only dened for square matrices A which is

very imp ortantisthe

I f g

ij ij n

where is the Kronecker symbol The identity matrix satises

ij

the equality AI IA A for every matrix A of size n The

inverse of a matrix when it exists is a matrix C such that CA

AC I The inverse of A is denoted by A

The of a matrix may be dened in several ways

For simplicity we adopt here the following recursive denition

The determinant of a matrix a is dened as the scalar a

Then the determinantofann n matrix is given by

n

X

j

a detA detA

j j

j

Chapter I

where A is an n n matrix obtained by deleting the

j

st row and the j th column of A The determinant of a matrix

determines whether or not a matrix is singular since A is singular

if and only if its determinant is zero Wehave the following simple

prop erties

detAB detBA

T

detA detA

n

detA detA

detA detA

detI

From the ab ove denition of the determinant it can b e shown

by induction that the function that maps a given complex value

to the value p detA I is a p olynomial of degree n

A

Problem P This is referred to as the characteristic polyno

mial of the matrix A

Denition A complex scalar is cal led an eigenvalue of the

n

square matrix A if there exists a nonzerovector u of C such that

Au u The vector u is cal led an eigenvector of A associated

with The set of al l the eigenvalues of A is referred to as the

spectrum of A and is denoted by A

An eigenvalue of A is a ro ot of the characteristic p olynomial

Indeed is an eigenvalue of A i detA I p So

A

there are at most n distinct eigenvalues The maximum mo dulus

of the eigenvalues is called spectral radius and is denoted by A

A max jj

A

The trace of a matrix is equal to the sum of all its diagonal ele

ments

n

X

trA a

ii

i

Background

It can be easily shown that this is also equal to the sum of its

eigenvalues counted with their multiplicities as ro ots of the char

acteristic p olynomial

Prop osition If is an eigenvalue of A then is an eigen

H H

value of A An eigenvector v of A associated with the eigen

value is cal led left eigenvector of A

When a distinction is necessaryaneigenvector of A is often called

a right eigenvector Thus the eigenvalue and the right and left

eigenvectors u and v satisfy the relations

H H

Au u v A v

or equivalently

H H H H

u A u A v v

Typ es of Matrices

The prop erties of eigenvalues and eigenvectors of square matrices

will sometimes dep end on sp ecial prop erties of the matrix A For

example the eigenvalues or eigenvectors of the following typ es of

matrices will all have some sp ecial prop erties

T

Symmetric matrices A A

H

Hermitian matrices A A

T

Skewsymmetric matrices A A

H

SkewHermitian matrices A A

H H

Normal matrices A A AA

Nonnegative matrices a i j n similar

ij

denition for nonp ositive p ositive and negative matrices

Chapter I

H

Unitary matrices Q Q I

H

Often a matrix Q such that Q Q is diagonal is called orthogonal

It is worth noting that a Q is a matrix whose

H

inverse is its transp ose conjugate Q

Some matrices have particular structures that are often con

venient for computational purp oses and play imp ortant roles in

The following list though incomplete gives an

idea of the most imp ortant sp ecial matrices arising in applications

and algorithms

Diagonal matrices a for j i Notation

ij

A diag a a a

nn

Upper triangular matrices a for ij

ij

Lower triangular matrices a for ij

ij

Upper bidiagonal matrices a for j i or j i

ij

Lower bidiagonal matrices a for j i or j i

ij

Tridiagonal matrices a for any pair i j such that

ij

jj ij Notation

A tridiag a a a

ii ii ii

Banded matrices there exist two integers m and m such

l u

that a only if i m j i m The number

ij l u

m m is called the bandwidth of A

l u

Upper Hessenberg matrices a for any pair i j such

ij

that i j One can dene lower Hessenberg matrices

similarly

H

matrices A uv where b oth u and v are

vectors

Background

Permutation matrices the columns of A are a p ermutation

of the columns of the identity matrix

Block diagonal matrices generalizes the by

replacing each diagonal entry by a matrix Notation

A diag A A A

nn

Block tridiagonal matrices generalizes the tridiagonal ma

trix by replacing each nonzero entry by a square matrix

Notation

A tridiag A A A

ii ii ii

The ab ove prop erties emphasize structure ie p ositions of

the nonzero elements with resp ect to the zeros and assume that

there are many zero elements or that the matrix is of low

No such assumption is made for say orthogonal or symmetric

matrices

Vector Inner Pro ducts and Norms

We dene the Hermitian inner pro duct of the two vectors x

n

x and y y of C as the complex number

i in i in

n

X

x y x y

i i

i

which is often rewritten in matrix notation as

H

x y y x

n n

A vector norm on C is a realvalued function on C which

satises the following three conditions

kxk x and kxk i x

n

C kxk j jkxk x C

n

kx y k kxk ky k x y C

Chapter I

Asso ciated with the inner pro duct is the Euclidean norm

of a complex vector dened by

kxk x x

A fundamental additional prop erty in matrix computations is the

simple relation

n H

Ax y x A y x y C

the pro of of which is straightforward The following prop osition

is a consequence of the ab ove equality

Prop osition Unitary matrices preserve the Hermitian inner

product ie Qx Qy x y for any unitary matrix Q

H

Pro of Indeed Qx Qy x Q Qy x y

In particular a unitary matrix preserves the norm metric ie

it is isometric with resp ect to the norm

The most commonly used vector norms in numerical linear

algebra are sp ecial cases of the Holder norms

p

n

X

p

kxk jx j

p i

i

Note that the limit of kxk when p tends to innity exists and is

p

equal to the maximum mo dulus of the x s This denes a norm

i

denoted by kk The cases p p and p lead to the

most imp ortant norms in practice

kxk jx j jx j jx j

n

h i

kxk jx j jx j jx j

n

kxk max jx j

i

in

A useful relation concerning the norm is the socalled Cauchy

Schwartz inequality

jx y j kxk ky k

Background

Matrix Norms

nm

we dene a sp ecial set of norms For a general matrix A in C

of matrices as follows

kAxk

p

kAk max

pq

m

kxk

x xC q

We say that the norms kk are induced by the two norms kk

pq p

and kk These norms satisfy the usual prop erties of norms ie

q

nm

kAk A C and kAk i A

nm

kAk j jkAk A C C

nm

kA B k kAk kB k A B C

Again the most imp ortant cases are the ones asso ciated with

the cases p q The case q p is of particular interest

and the asso ciated norm kk is simply denoted by kk

pq p

A fundamental prop erty of these norms is that

kAB k kAk kB k

p p p

which is an immediate consequence of the denition Ma

trix norms that satisfy the ab ove prop erty are sometimes called

consistent As a result of the ab ove inequality for example we

have that for any square matrix A

n n

kA k kAk

p

p

n

which implies in particular that the matrix A converges to zero

if any of its pnorms is less than

The Frob enius norm of a matrix is dened by

m n

X X

A

kAk ja j

F ij

j i

This can be viewed as the norm of the column or row vector



n

in C consisting of all the columns resp rows of A listed from

Chapter I

to m resp to n It can easily be shown that this norm is

also consistent in spite of the fact that is not induced by a pair

of vector norms ie it is not derived from a formula of the form

see Problem P However it do es not satisfy some of the

other prop erties of the pnorms For example the Frob enius norm

of the identity matrix is not unity Toavoid these diculties we

wil l only use the term matrix norm for a norm that is induced by

two norms as in the denition Thus we will not consider

the Frob enius norm to b e a prop er matrix norm according to our

conventions even though it is consistent

It can be shown that the norms of matrices dened ab ove

satisfy the following equalities that lead to alternative denitions

that are often easier to work with

n

X

kAk max ja j

ij

j m

i

m

X

kAk max ja j

ij

in

j

h i h i

H H

kAk A A AA

h i h i

H H

kAk trA A trAA

F

H

As will be shown in Section the eigenvalues of A A are

nonnegative Their square ro ots are called singular values of A

and are denoted by i m Thus the relation

i

shows that kAk is equal to the largest singular value of A

Example From the ab ove prop erties it is clear that the sp ectral

radius A is equal to the norm of a matrix when the matrix is

Hermitian However it is not a matrix norm in general For example

the rst prop erty of norms is not satised since for

A

we have A while A The triangle inequality is also not

T

satised for the pair AandB A where A is dened ab ove Indeed

A B while AB

Background

Subspaces

n n

is a subset of C that is also a complex vector A subspace of C

space The set of all linear combinations of a set of vectors G

n

is a vector subspace called the of fa a a g of C

q

G

spanfGg span fa a a g

q

q

X

n q

z C a f g C j z

i i iq

i

If the a s are linearly indep endent then each vector of spanfGg

i

admits a unique expression as a of the a s

i

The set G is then called a of the subspace spanfGg

Given two vector subspaces S and S their sum S is a sub

space dened as the set of all vectors that are equal to the sum of a

vector of S and a vector of S The intersection of two subspaces

is also a subspace If the intersection of S and S is reduced to

fg then the sum of S and S is called their direct sum and is

L

n

denoted by S S S When S is equal to C then every vec

n

can b e decomp osed in a unique way as the sum of an tor x of C

element x of S and an element x of S The transformation P

that maps x into x is a linear transformation that is idemp otent

P P It is called a projectoronto S along S

Two subspaces of imp ortance that are asso ciated with a ma

nm

trix A of C are its range dened by

m

RanA fAx j x C g

and its or null space

m

j Ax g KerA fx C

The range of A is clearly equal to the linear span of its columns

The rank of a matrix is equal to the dimension of the range of A

A subspace S is said to be invariant under a square matrix

A whenever AS S In particular for any eigenvalue of A

Chapter I

the subspace KerA I is invariant under A The subspace

KerA I is called the eigenspace asso ciated with and consists

of all the eigenvectors of A asso ciated with and the vector

Orthogonal Vectors and Subspaces

A set of vectors G fa a a g is said to be orthogonal if

r

a a when i j

i j

It is orthonormal if in addition every vector of G has a norm

equal to unity Every subspace admits an orthonormal basis which

is obtained by taking any basis and orthonormalizing it The

orthonormalization can be achieved by an algorithm referred to

as the GramSchmidt pro cess whichwenow describ e Given a set

of linearly indep endentvectors fx x x gwe rst normalize

r

the vector x ie wedivideitby its norm to obtain the scaled

vector q Then x is orthogonalized against the vector q by

subtracting from x amultiple of q to make the resulting vector

orthogonal to q ie

x x x q q

The resulting vector is again normalized to yield the second vec

tor q The ith step of the GramSchmidt pro cess consists of

orthogonalizing the vector x against all previous vectors q

i j

Algorithm GramSchmidt

Start Compute r kx k If r stop else q

x r

Lo op For j r do

a Compute r x q for i j

ij j i

j

P

r q b q x

ij i j

i

Background

c r kqk

jj

d If r then stop else q qr

jj j jj

It is easy to prove that the ab ove algorithm will not break

down ie all r steps will b e completed if and only if the family

of vectors x x x is linearly indep endent From b and

r

c it is clear that at every step of the algorithm the following

relation holds

j

X

r q x

ij i j

i

If we let X x x x Q q q q and if R denotes

r r

the r r upp er whose nonzero elements are

the r dened in the algorithm then the ab ove relation can be

ij

written as

X QR

This is called the QR decomp osition of the n r matrix X Thus

from what was said ab ove the QR decomp osition of a matrix exists

whenever the column vectors of X form a linearly indep endent set

of vectors

The ab ove algorithm is the standard GramSchmidt pro cess

There are other formulations of the same algorithm which are

mathematically equivalent but have b etter numerical prop erties

The Mo died GramSchmidt algorithm MGSA is one such al

ternative

Algorithm Mo died GramSchmidt

Start dene r kx k If r stop else q x r

Loop For j r do

a Dene q x

j

r q q

ij i

b For i j do

q q r q

ij i

c Compute r kqk

jj

Chapter I

d If r then stop else q qr

jj j jj

A vector that is orthogonal to all the vectors of a subspace

S is said to be orthogonal to that subspace The set of all the

vectors that are orthogonal to S is a vector subspace called the

n

is orthogonal complement of S and denoted by S The space C

the direct sum of S and its orthogonal complement The pro jector

onto S along its orthogonal complement is called an orthogonal

projector onto S If V v v v is an orthonormal matrix

r

H H

then V V I ie V is orthogonal However VV is not

the identity matrix but represents the orthogonal pro jector onto

span fV g see Section ofChapter III for details

Canonical Forms of Matrices

In this section we will b e concerned with the reduction of square

matrices into matrices that have simpler forms such as diagonal

or bidiagonal or triangular By reduction we mean a transforma

tion that preserves the eigenvalues of a matrix

Denition Two matrices A and B are said to be similar if

there is a nonsingular matrix X such that

A XBX

The mapping B A is cal led a similarity transformation

It is clear that similarity is an equivalence relation Similarity

transformations preserve the eigenvalues of matrix An eigenvec

tor u of B is transformed into the eigenvector u Xu of A

B A B

In eect a similarity transformation amounts to representing the

matrix B in a dierent basis

We now need to dene some terminology

An eigenvalue of A is said to have algebraic multiplicity if

itisarootofmultiplicity of the characteristic p olynomial

Background

If an eigenvalue is of algebraic multiplicity one it is said to

be simple A nonsimple eigenvalue is said to b e multiple

An eigenvalue of A is said to have geometric multiplicity

if the maximum numb er of indep endenteigenvectors asso ci

ated with it is In other words the geometric multiplicity

is the dimension of the eigenspace Ker A I

A matrix is said to b e derogatory if the geometric multiplic

ityofat least one of its eigenvalues is larger than one

An eigenvalue is said to b e semisimple if its algebraic mul

tiplicity is equal to its geometric multiplicity An eigenvalue

that is not semisimple is called defective

We will often denote by p n all the distinct

p

eigenvalues of A It is a simple exercise to show that the char

acteristic p olynomials of two similar matrices are identical see

Exercise P Therefore the eigenvalues of two similar matrices

are equal and so are their algebraic multiplicities Moreover if v

is an eigenvector of B then Xv is an eigenvector of A and con

versely if y is an eigenvector of A then X y is an eigenvector of

B As a result the numb er of indep endenteigenvectors asso ciated

with a given eigenvalue is the same for two similar matrices ie

their geometric multiplicityis also the same

The p ossible desired forms are numerous but they all havethe

common goal of attempting to simplify the original eigenvalue

problem Here are some p ossibilities with comments as to their

usefulness

Diagonal the simplest and certainly most desirable choice

but it is not always achievable

Jordan this is an upp er with ones or

zero es on the sup er diagonal Always p ossible but not nu

merically trustworthy

Chapter I

Upper triangular in practice this is the most reasonable

compromise as the similarity from the original matrix to a

triangular form can be chosen to be isometric and there

fore the transformation can be achieved via a sequence of

elementary unitary transformations which are numerically

stable

Reduction to the Diagonal Form

The simplest form in which a matrix can be reduced is undoubt

edly the diagonal form but this reduction is unfortunately not

always p ossible A matrix that can be reduced to the diagonal

form is called diagonalizable The following theorem characterizes

such matrices

Theorem A matrix of dimension n is diagonalizable if and

only if it has n linearly independent eigenvectors

Pro of A matrix A is diagonalizable if and only if there exists

a nonsingular matrix X and a diagonal matrix D such that A

XDX or equivalently AX XD where D is a diagonal matrix

This is equivalenttosaying that there exist n linearly indep endent

vectors the n columnvectors of X such that Ax d x ie

i i i

each of these columnvectors is an eigenvector of A

A matrix that is diagonalizable has only semisimple eigenvalues

Conversely if all the eigenvalues of a matrix are semisimple then

there exist n eigenvectors of the matrix A It can b e easily shown

that these eigenvectors are linearly indep endent see Exercise P

As a result we have the following prop osition

Prop osition A matrix is diagonalizable if and only if al l its

eigenvalues are semisimple

Since every simple eigenvalue is semisimple an immediate

corollary of the ab ove result is that when A has n distinct eigen

values then it is diagonalizable

Background

The Jordan Canonical Form

From the theoretical viewp oint one of the most imp ortant canon

ical forms of matrices is the wellknown Jordan form In what fol

lows the main constructive steps that lead to the Jordan canoni

cal decomp osition are outlined For details the reader is referred

to a standard book on matrix theory or linear algebra

For every integer l and each eigenvalue it is true that

i

l l

KerA I KerA I

i i

Because we are in a nite dimensional space the ab ove prop erty

implies that there is a rst integer l such that

i

l l

i i

KerA I KerA I

i i

l l

i

for all l l The and in fact KerA I KerA I

i i i

integer l is called the index of

i i

l

i

The subspace M KerA I is invariant under A More

i i

n

over the space C is the direct sum of the subspaces M s for

i

i p Let m dimM

i i

In each invariant subspace M there are indep endent eigen

i i

vectors ie elements of KerA I with m It turns

i i i

out that this set of vectors can be completed to form a basis

of M by adding to it elements of Ker A I then elements

i i

of KerA I and so on These elements are generated by

i

starting separately from each eigenvector u ie an element of

KerA I and then seeking an element that satises A

i

I z u Then more generally we construct z by solving

i i

the equation A I z z when p ossible The vector z

i i i i

i

b elongs to KerA I and is called a principal vector some

i

times The pro cess is continued until no

more principal vectors are found There are at most l principal

i

vectors for each of the eigenvectors i

Chapter I

The nal step is to represent the original matrix A with resp ect

to the basis made up of the p bases of the invariant subspaces M

i

dened in the previous step

The J of A in the new basis describ ed

ab ove has the blo ck diagonal structure

J

C B

J

C B

C B

C B

C B

C B X AX J

C B

J

i

C B

C B

A

J

p

where each J corresp onds to the subspace M asso ciated with

i i

the eigenvalue It is of size m and it has itself the following

i i

structure

J

i i

B C B C

J

i

B C B C

J B C with J B C

i ik

A A

i

J

i i

i

Each of the blo cks J corresp onds to a dierent eigenvector as

ik

so ciated with the eigenvalue Its size is equal to the number of

i

principal vectors found for the eigenvector to which the blo ck is

asso ciated and do es not exceed l

i

Theorem Any matrix A can be reduced to a block diagonal

matrix consisting of p diagonal blocks each associated with a dis

tinct eigenvalue Each diagonal block number i has itself a block

diagonal structure consisting of subblocks where is the ge

i i

ometric multiplicity of the eigenvalue Each of the subblocks

i

referred to as a Jordan block is an upper bidiagonal matrix of

size not exceeding l with the constant on the diagonal and the

i i

constant one on the super diagonal

Background

We refer to the ith diagonal blo ck i p as the i

th Jordan submatrix sometimes Jordan Box The Jordan

submatrix number i starts in column j m m m

i i

From the ab ove formitis not dicult to see that M KerA

i

l

i

I is merely the span of the columns j j j

i i i i

of the matrix X These vectors are all the eigenvectors and the

principal vectors asso ciated with the eigenvalue

i

Since A and J are similar matrices their characteristic poly

nomials are identical Hence it is clear that the algebraic multi

plicityofan eigenvalue is equal to the dimension of M

i i

m dim M

i i i

As a result

i i

n

is the direct sum of the subspaces M i p Because C

i

each vector x can be written in a unique way as

x x x x x

i p

where x is a memb er of the subspace M The linear transforma

i i

tion dened by

P x x

i i

is a pro jector onto M along the direct sum of the subspaces

i

M j i The family of pro jectors P i p satises the

j i

following prop erties

RanP M

i i

P P P P if i j

i j j i

p

X

P I

i

i

In fact it is easy to see that the ab ove three prop erties dene a

n

decomp osition of C into a direct sum of the images of the pro jec

tors P in a unique way More precisely any family of pro jectors i

Chapter I

that satises the ab ove three prop erties is uniquely determined

n

into the direct and is asso ciated with the decomp osition of C

sum of the images of the P s

i

It is helpful for the understanding of the Jordan canonical

form to determine the matrix representation of the pro jectors P

i

Consider the matrix J which is obtained from the Jordan matrix

i

by replacing all the diagonal submatrices by zero blo cks except

th

the i submatrix which is replaced by the identity matrix

C B

C B

C B

J I C B

i

C B

A

In other words if each ith Jordan submatrix starts at the column

number j then the columns of J will be zero columns except

i i

columns j j which are the corresp onding columns of

i i

the identity matrix Let P X J X Then it is not dicult to

i i

verify that P is a pro jector and that

i

The range of P is the span of columns j j ofthe

i i i

matrix X This is the same subspace as M

i

P whenever i j P P P

i j j i

P P P I

p

According to our observation concerning the uniqueness of a fam

ily of pro jectors that satisfy this implies that

P P i p

i i

Example Let us assume that the eigenvalue is simple Then

i

H H

P Xe e X u w

i i i

i i

H

in whichwehave dened u Xe and w X e It is easy to show

i i i i

that u and w are right and left eigenvectors resp ectively asso ciated

i i

H

u with and normalized so that w

i i i

Background

Consider now the matrix D obtained from the Jordan form of

i

A by replacing each Jordan submatrix by a except the

ith submatrix whichisobtainedby zeroing its diagonal elements

ie

B C

B C

B C

B C

B C

B C

D

i

B C

J I

i i

B C

B C

B C

A

Dene D X D X Then it is a simple exercise to show by

i i

means of the explicit expression for P that

i

D A I P

i i i

l

i

Moreover D ie D is a of index l We

i i

i

are now ready to state the following imp ortant theorem whichcan

b e viewed as an alternative mathematical formulation of Theorem

on Jordan forms

Theorem Every square matrix A admits the decomposition

p

X

A P D

i i i

i

where the family of projectors fP g satises the conditions

i ip

and and where D A I P isanilpo

i i i

tent operator of index l

i

Pro of From we have

AP P D i p

i i i i

Summing up the ab ove equalities for i p we get

p p

X X

A P P D

i i i i

i i

Chapter I

The pro of follows by substituting into the lefthandside

The pro jector P is called the spectral projector asso ciated with

i

the eigenvalue The linear op erator D is called the nilpotent

i i

asso ciated with The decomp osition is referred to as

i

the sp ectral decomp osition of A Additional prop erties that are

easy to prove from the various expressions of P and D are the

i i

following

P D D P P

i j j i ij i

AP P A P AP P D

i i i i i i i

k k k

A P P A P A P

i i i i

k k

P I D I D P

i i i i i i

H

AP x x B y y

i j j i j j

i i i i

where B is the ith Jordan submatrix and where the columns y

i j

H

are the columns of the matrix X

Corollary For any matrix norm kk the fol lowing relation

holds

k k

lim kA k A

k

Pro of The pro of of this corollary is the sub ject of exercise

P

Another way of stating the ab ove corollary is that there is a se

quence such that

k

k k

kA k A

k

where lim

k k

Background

The Schur Canonical Form

Wewillnowshow that any matrix is unitarily similar to an upp er

triangular matrix The only result needed to prove the following

theorem is that any vector of norm one can be completed by

n

n additional vectors to form an orthonormal basis of C

Theorem For any given matrix A there exists a unitary ma

H

trix Q such that Q AQ R is upper triangular

Pro of The pro of is by induction over the dimension n The

result is trivial for n Let us assume that it is true for n and

consider any matrix A of size n The matrix admits at least one

eigenvector u that is asso ciated with an eigenvalue Weassume

without loss of generality that kuk We can complete the

vector u into an orthonormal set ie we can nd an n n

matrix V such that the n n matrix U u V is unitary Then

we have AU u AV and hence

H H

u u AV

H

U AU u AV

H H

V V AV

We now use our induction hyp othesis for the n n

H

matrix B V AV there exists an n n unitary

H

matrix Q such that Q BQ R is upp er triangular Let us

dene the n n matrix

Q

Q

H

and multiply b oth memb ers of by Q from the left and Q

from the right The resulting matrix is clearly upp er triangular

and this shows that the result is true for AwithQ Q U which

is a unitary n n matrix

A simpler pro of that uses the Jordan canonical form and the QR

decomp osition is the sub ject of Exercise P Since the matrix

Chapter I

R is triangular and similar to A its diagonal elements are equal

to the eigenvalues of A ordered in a certain manner In fact

it is easy to extend the pro of of the theorem to show that we

can obtain this factorization with any order we want for the

eigenvalues One might ask the question as to which order might

be best numerically but the answer to the question go es beyond

the scop e of this b o ok Despite its simplicitythe ab ove theorem

has far reaching consequences some of which will b e examined in

the next section

It is imp ortant to note that for any k n the subspace

spanned by the rst k columns of Q is invariant under A This is

b ecause from the Schur decomp osition we have for j k

ij

X

Aq r q

j ij i

i

In fact letting Q q q q andR b e the principal leading

k k k

submatrix of dimension k of R the ab ove relation can b e rewritten

as

AQ Q R

k k k

which we refer to as the partial Schur decomp osition of A The

simplest case of this decomp osition is when k in which case

q is an eigenvector The vectors q are usually referred to as

i

Schur vectors Note that the Schur vectors are not unique and in

fact they dep end on the order chosen for the eigenvalues

A slight variation on the Schur canonical form is the quasi

Schur form also referred to as the real Schur form Here diagonal

blo cks of size x are allowed in the upp er triangular matrix

R The reason for this is to avoid complex arithmetic when the

original matrix is real A blo ck is asso ciated with each

complex conjugate pair of eigenvalues of the matrix

Example Consider the matrix

B C

A

A

Background

The matrix A has the pair of complex conjugate eigenvalues

i

and the real eigenvalue The standard complex Schur form

is given by the pair of matrices

i i

C B

V i i

A

i i

and

i i i

B C

i i S

A

It is p ossible to avoid complex arithmetic by using the quasiSchur

form which consists of the pair of matrices

B C

U

A

and

B C

R

A

We would like to conclude this section by pointing out that

the Schur and the quasi Schur forms of a given matrix are in no

way unique In addition to the dep endence on the ordering of the

eigenvalues any column of Q can b e multiplied by a complex sign

i

e and a new corresp onding R can b e found For the quasi Schur

form there are innitely manyways of selecting the blo cks

corresp onding to applying arbitrary rotations to the columns of

Q asso ciated with these blo cks

Chapter I

Normal and Hermitian Matrices

In this section we lo ok at the sp ecic prop erties of normal matri

ces and Hermitian matrices regarding among other things their

sp ectra and some imp ortant optimality prop erties of their eigen

values The most common normal matrices that arise in practice

are Hermitian or skewHermitian In fact symmetric real ma

trices form a large part of the matrices that arise in practical

eigenvalue problems

Normal Matrices

By denition a matrix is said to b e normal if it satises the rela

tion

H H

A A AA

An immediate prop erty of normal matrices is stated in the fol

lowing prop osition

Prop osition If a is triangular then it is nec

essarily a diagonal matrix

Pro of Assume for example that A is upp er triangular and

normal and let us compare the rst diagonal element of the left

hand side matrix of with the corresp onding elementofthe

matrix on the righthand side We obtain that

n

X

ja j ja j

j

j

which shows that the elements of the rst row are zeros except

for the diagonal one The same argument can now be used for

the second row the third row and so on to the last row to show

that a for i j

ij

As a consequence of this we have the following imp ortant re sult

Background

Theorem A matrix is normal if and only if it is unitarily

similar to a diagonal matrix

Pro of It is straightforward to verify that a matrix which is

unitarily similar to a diagonal matrix is normal Let us nowshow

that any normal matrix A is unitarily similar to a diagonal matrix

H

Let A QR Q b e the Schur canonical form of A where we recall

that Q is unitary and R is upp er triangular By the normalityof

A we have

H H H H H H

QR Q QR Q QR Q QR Q

or

H H H H

QR RQ QR R Q

H

Up on multiplication by Q on the left and Q on the right this

H H

leads to the equality R R RR which means that R is normal

and according to the previous prop osition this is only p ossible if

R is diagonal

Thus any normal matrix is diagonalizable and admits an or

thonormal basis of eigenvectors namely the column vectors of

Q

Clearly Hermitian matrices are just a particular case of nor

mal matrices Since a normal matrix satises the relation A

H

QD Q with D diagonal and Q unitary the eigenvalues of A are

the diagonal entries of D Therefore if these entries are real it is

H

clear that we will have A A This is restated in the following

corollary

Corollary A normal matrix whose eigenvalues are real is

Hermitian

As will b e seen shortly the converse is also true in that a Hermi

tian matrix has real eigenvalues

An eigenvalue of any matrix satises the relation

Au u

u u

Chapter I

where u is an asso ciated eigenvector More generally one might

consider the complex scalars

Ax x

x

x x

n

These ratios are referred dened for any nonzero vector in C

to as Rayleigh quotients and are imp ortant b oth from theoretical

and practical purp oses The set of all p ossible Rayleigh quotients

n

as x runs over C is called the eld of values of A This set is

clearly b ounded since each jxj is b ounded by the the norm

of A ie jxj kAk for all x

n

can b e expressed If a matrix is normal then anyvector x in C

as

n

X

q

i i

i

where the vectors q form an orthogonal basis of eigenvectors and

i

the expression for x b ecomes

P

n n

X

Ax x j j

k k

k

x

P

k k

n

j j x x

k

k

k

where

n

X

j j

i

and

P

i i

n

j j

k

k

i

From a wellknown characterization of convex hulls due to Haus

dor Hausdor s convex hull theorem this means that the set

n

of all p ossible Rayleigh quotients as x runs over all of C is equal

to the convex hull of the s This leads to the following theorem

i

Theorem The eld of values of a normal matrix is equal to

the convex hul l of its spectrum

The question that arises next is whether or not this is also true

for nonnormal matrices and the answer is no ie the convex hull

of the eigenvalues and the eld of values of a nonnormal matrix

are dierent in general see Exercise P for an example As a

Background

generic example one can takeany nonsymmetric real matrix that

has real eigenvaluesonlyitseldofvalues will contain imaginary

values It has b een shown Hausdor that the eld of values of a

matrix is a convex set Since the eigenvalues are memb ers of the

eld of values their convex hull is contained in the eld of values

This is summarized in the following prop osition

Prop osition The eld of values of an arbitrary matrix is

a convex set which contains the convex hul l of its spectrum It

is equal to the convex hul l of the spectrum when the matrix in

normal

Hermitian Matrices

A rst and imp ortant result on Hermitian matrices is the follow

ing

Theorem The eigenvalues of a are real

R ie A

Pro of Let be an eigenvalue of A and u an asso ciated eigen

vector or norm unity Then

Au u u Au Au u

Moreover it is not dicult to see that if in addition the matrix

is real then the eigenvectors can be chosen to be real see Exer

cise P Since a Hermitian matrix is normal an immediate

consequence of Theorem is the following result

Theorem Any Hermitian matrix is unitarily similar to a real

diagonal matrix

Chapter I

In particular a Hermitian matrix admits a set of orthonormal

n

eigenvectors that form a basis of C

In the pro of of Theorem we used the fact that the inner

pro ducts Au u are real More generally it is clear that any

Hermitian matrix is such that Ax x is real for any vector x

n

It turns out that the converse is also true ie it can b e shown C

n

that if Az z is real for all vectors z in C then the matrix A is

Hermitian see Problem P

Eigenvalues of Hermitian matrices can b e characterized byop

timality prop erties of the Rayleigh quotients The b est

known of these is the MinMax principle Let us order all the

eigenvalues of A in descending order

n

Here the eigenvalues are not necessarily distinct and they are

rep eated each according to its multiplicity In what follows we

n

denote by S a generic subspace of C Then wehave the following

theorem

Theorem MinMax theorem The eigenvalues of a Her

mitian matrix A are characterized by the relation

Ax x

min max

k

xSx

x x

S dim S nk

n

consisting Pro of Let fq g b e an orthonormal basis of C

i in

of eigenvectors of A asso ciated with resp ectively Let

n

S be the subspace spanned by the rst k of these vectors and

k

denote by S the maximum of Ax xx x over all nonzero

vectors of a subspace S Since the dimension of S is k a well

k

known theorem of linear algebra shows that its intersection with

any subspace S of dimension n k is not reduced to fg ie

T P

k

there is vector x in S S For this x q we have

k i i

i

P

k

Ax x j j

i i

i

P

k

k

x x

j j

i

i

Background

so that S

k

Consider on the other hand the particular subspace S of di

mension n k which is spanned by q q For eachvector

k n

x in this subspace we have

P

n

j j Ax x

i i

ik

P

k

n

x x j j

i

ik

so that S In other words as S runs over all n k

k

dimensional subspaces S isalways and there is at least

k

one subspace S for which S which shows the result

k

This result is attributed to Courant and Fisher and to Poincare

and Weyl It is often referred to as CourantFisher minmax prin

ciple or theorem As a particular case the largest eigenvalue of

A satises

Ax x

max

x

x x

Actually there are four dierent ways of rewriting the ab ove

characterization The second formulation is

Ax x

max min

k

xSx

x x

S dim S k

and the two other ones can be obtained from the ab ove two for

mulations by simply relab eling the eigenvalues increasingly in

stead of decreasingly Thus with our lab eling of the eigenvalues

in descending order tells us that the smallest eigenvalue

satises

Ax x

min

n

x

x x

with replaced by if the eigenvalues are relab eled increasingly

n

In order for all the eigenvalues of a Hermitian matrix to be

p ositiveitis necessary and sucient that

n

x Ax x x C

Chapter I

Such a matrix is called positive denite A matrix that satises

Ax x forany x is said to b e positive semidenite In partic

H

ular the matrix A A is semip ositive denite for any rectangular

matrix since

H

A Ax xAx Ax x

H

Similarly AA is also a Hermitian semip ositivedenite matrix

H

The square ro ots of the eigenvalues of A A for a general rectan

gular matrix A are called the singular values of A and are denoted

by In Section we have stated without pro of that the

i

norm of any matrix A is equal to the largest singular value of

A This is now an obvious fact because

H

A Ax x Ax Ax kAxk

max max kAk max

x x x

kxk x x x x

which results from

Another characterization of eigenvalues known as the Courant

characterization is stated in the next theorem In contrast with

the minmax theorem this prop erty is recursive in nature

Theorem The eigenvalue and the corresponding eigen

i

vector q of a Hermitian matrix are such that

i

Aq q Ax x

max

n

q q x x

xC x

and for k

Aq q Ax x

k k

max

k

H H

q q x x

xq xq x

k k



k 

In other words the maximum of the Rayleigh quotientover a

subspace that is orthogonal to the rst k eigenvectors is equal

to and is achieved for the eigenvector q asso ciated with

k k k

The pro of follows easily from the expansion of the Rayleigh

quotient

Background

Nonnegative Matrices

A is a matrix whose entries are nonnegative

a

ij

Nonnegative matrices arise in many applications and play a cru

cial role in the theory of matrices They play for example a key

role in the analysis of convergence of iterative metho ds for par

tial dierential equations They also arise in economics queuing

theory chemical engineering etc

A matrix is said to b e reducible if there is a p ermutation ma

T

trix P such that PAP is blo ck upp er triangular An imp ortant

result concerning nonnegative matrices is the following theorem

known as the PerronFrob enius theorem

Theorem Let A beareal n n nonnegative irreducible ma

trix Then A the spectral radius of A is a simple eigen

value of A Moreover there exists an eigenvector u with positive

elements associated with this eigenvalue

Problems

P Showthattwo eigenvectors asso ciated with two distinct eigen

values are linearly indep endent More generally show that a family of

eigenvectors asso ciated with distinct eigenvalues forms a linearly inde

p endent family

P Show that if is any eigenvalue of the matrix AB then it is

also an eigenvalue of the matrix BA Start with the particular case

where A and B are square and B is nonsingular then consider the more

general case where A B may b e singular or even rectangular but such

that AB and BA are square

P Show that the Frob enius norm is consistent Can this norm

be asso ciated to two vector norms via What is the Frob enius

norm of a diagonal matrix What is the pnorm of a diagonal matrix

for any p

Chapter I

P Find the Jordan canonical form of the matrix

A

A

Same question for the matrix obtained by replacing the element a

by

P Give an alternative pro of of Theorem on the Schur form

by starting from the Jordan canonical form Hint write A XJ X

and use the QR decomp osition of X

P Show from the denition of used in Section

that the characteristic p olynomial is a p olynomial of degree n for an

n n matrix

P Showthatthecharacteristic p olynomials of two similar matri

ces are equal

P Show that

k k

lim kA k A

k

for any matrix norm Hint use the Jordan canonical form or Theorem

P Let X b e a nonsingular matrix and for any matrix norm kk

dene kAk kAX k Show that this is indeed a matrix norm Is

X

this matrix norm consistent Similar questions for kXAk and kYAXk

where Y is also a nonsingular matrix These norms are not in general

asso ciated with any vector norms ie they cant be dened by a

formula of the form Why What ab out the particular case

kAk kXAX k

P Find the eld of values of the matrix

A

and verify that it is not equal to the convex hull of its eigenvalues

P Show that any matrix can b e written as the sum of a Hermi

tian and a skewHermitian matrix or the sum of a symmetric and a

skew

Background

P Show that for a skewHermitian matrix S wehave

n

eSx x for any x C

P Given an arbitrary matrix S showthatifSx x forall

n

then wemust have x in C

n

Sy zSz y y z C

Hint expand S y z y z

P Using the result of the previous two problems show that if

n

thenA must b e Hermitian Would this Ax x is real for all x in C

result b e true if wewere to replace the assumption by Ax x is real

for al l real x Explain

P The denition of a p ositive denite matrix is that Ax xbe

real and p ositive for all real vectors x Show that this is equivalent

H

to requiring that the Hermitian part of A namely A A be

Hermitian p ositive denite

P Let A b e a real symmetric matrix and an eigenvalue of A

Showthat if u is an eigenvector asso ciated with then so is u As a

result prove that for anyeigenvalue of a real symmetric matrix there

is an asso ciated eigenvector whichisreal

P Show that a H such that h j

j j

n cannot b e derogatory

Notes and References For additional reading on the material presented

in this Chapter see Golub and Van Loan and Stewart More details

on matrix eigenvalue problems can b e found in Gantmachers b o ok and

Wilkinson Stewart and Suns recent book devotes a separate

chapter to matrix norms and containsawealth of information Some of the

terminology we used is b orrowed from Chatelin and Kato For a

good overview of the linear algebra asp ects of matrix theory and a complete

pro of of Jordans canonical form we recommend Halmos b o ok 

Chapter I

Chapter II

Sparse Matrices

The eigenvalue problems that arise in practice often involve very

large matrices The meaning of large is relative and it is chang

ing rapidly with the progress of computer technology A matrix of

size a few hundreds can b e considered large if one is working on a

workstation while similarly a matrix whose size is in the millions

can be considered large if one is using a sup ercomputer Fortu

nately many of these matrices are also sparse ie they havevery

few nonzeros Again it is not clear how few nonzeros a matrix

must have b efore it can be called sparse A commonly used de

nition due to Wilkinson is to say that a matrix is sparse whenever

it is possible to take advantage of the number and location of its

nonzero entries By this denition a is sparse

but so would also be a triangular matrix which may not be as

convincing It is probably b est to leave this notion somewhat

vague since the decision as to whether or not a matrix should b e

considered sparse is a practical one that is ultimately made by

the user

Chapter II

Intro duction

The natural idea of taking advantage of the zeros of a matrix

and their lo cation has b een exploited for a long time In the sim

plest situation such as for banded or tridiagonal matrices sp ecial

techniques are straightforward to develop However the notion

of exploiting sparsity for general sparse matrices ie sparse ma

trices with irregular structure has b ecome p opular only after the

s The main issue and the rst one to be addressed by

sparse matrix technology is to devise direct solution metho ds for

linear systems that are economical b oth in terms of storage and

computational eort These sparse direct solvers allowto handle

very large problems that could not b e tackled by the usual dense

solvers We will briey discuss the solution of large sparse linear

systems in Section of this Chapter

Figure Anite element grid mo del

Sparse Matrices

There are basically two broad typ es of sparse matrices struc

tured and unstructured A structured sparse matrix is one whose

nonzero entries or square blo cks of nonzero entries form a regular

pattern often along a small number of diagonals A matrix with

irregularly lo cated entries is said to b e irregularly structured The

b est example of a regularly structured matrix is that of a matrix

that consists only of a few diagonals Figure shows a small

irregularly structured sparse matrix asso ciated with the nite el

ement grid problem shown in Figure

Figure Sparse matrix asso ciated with the nite

element grid of Figure

Although the dierence between the two typ es of matrices

may not matter that much for direct solvers it may b e imp ortant

for eigenvalue metho ds or iterative metho ds for solving linear sys

tems In these metho ds one of the essential op erations are matrix

by vector pro ducts The p erformance of these op erations on su

p ercomputers can dier signicantly from one to

Chapter II

another For example diagonal storage schemes are ideal for vec

tor machines whereas more general schemes may suer on such

machines b ecause of the need to use indirect addressing

In the next section we will discuss some of the storage schemes

used for sparse matrices Then we will see how some of the sim

plest matrix op erations with sparse matrices can be p erformed

We will then give an overview of sparse linear system solution

metho ds The last two sections discuss test matrices and a set of

to ols for working with sparse matrices called SPARSKIT

Storage Schemes

In order to take advantage of the large number of zero elements

sp ecial schemes are required to store sparse matrices Clearlythe

main goal is to represent only the nonzero elements and be able

at the same time to p erform the commonly needed matrix op er

ations In the following we will denote by Nz the total number

of nonzero elements We describ e only the most p opular schemes

but additional details can b e found in the b o ok by Du Erisman

and Reid

The simplest storage scheme for sparse matrices is the socalled

co ordinate format The data structure consists of three arrays

a real array containing all the real or complex values of the

nonzero elements of A in any order an integer array containing

their row indices and a second integer array containing their col

umn indices All three arrays are of length Nz Thus the matrix

C B

C B

C B

C B

A

C B

C B

A

will b e represented for example by

Sparse Matrices

AA

JR

JC

In the ab ove example wehave on purp ose listed the elements

in an arbitrary order In fact it would have been more natural

to list the elements by row or columns If we listed the elements

rowwise we would notice that the array JC contains redundant

information and may be replaced by an array that points to the

b eginning of each row instead This would entail nonnegligible

savings in storage The new data structure consists of three arrays

with the following functions

A real array AA contains the real values a stored row by

ij

row from row to n The length of AA is Nz

An integer array JA contains the column indices of the el

ements a as stored in the array AA The length of JA is

ij

Nz

An integer array IA contains the pointers to the b eginning

of each row in the arrays AA and JA Thus the content

of IAi is the p osition in arrays AA and JA where the i

th row starts The length of IA is n with IAn

containing the number IA Nz ie the address in A

and JA of the b eginning of a ctitious row n

Thus the ab ove matrix could be stored as follows

AA

JA

IA

This format is probably the most commonly used to store

general sparse matrices We will refer to it as the Compressed

Chapter II

Sparse Row CSR format An advantage of this scheme over

the co ordinate scheme is that it is often more amenable to p erform

typical computations On the other hand the co ordinate scheme

is attractive b ecause of its simplicity and its exibility For this

reason it is used as the entry format in software packages such

as the Harwell library

There are a number of variations to the Compressed Sparse

Row format The most obvious variation is to store the columns

instead of the rows The corresp onding scheme will be called

the Compressed Sparse Column CSC scheme Another common

variation exploits the fact that the diagonal elements of many

matrices are usually all nonzero andor that they are accessed

more often than the rest of the elements As a result they can b e

stored separately In fact what we refer to as the ModiedSparse

Row MSR format consists of only two arrays a real array AA

and an integer array JA The rst n p ositions in AA contain the

diagonal elements of the matrix in order The p osition n of the

array AA is not used or may sometimes be used to carry some

other information concerning the matrix Starting at p osition

n the nonzero elements of AA excluding its diagonal elements

are stored rowwise Corresp onding to each element AAk the

integer JAk is the column index of the element Ak in the

matrix AA The n rst p ositions of JA contain the pointer

to the b eginning of each rowinAA and JA Thus for the ab ove

example the two arrays will b e as follows

AA

JA

The star denotes an unused lo cation Notice that JAn

JAn indicating that the last row is a zero row once

the diagonal element has been removed

There are a numb er of applications that lead to regularly struc

Sparse Matrices

tured matrices Among these matrices one can distinguish two

dierent typ es blo ck matrices and diagonally structured matri

ces Here we discuss only diagonally structured matrices which

are matrices whose nonzero elements are lo cated along a small

numb er of diagonals To store such matrices wemay store the di

agonals in a rectangular array DI AG n Nd where Nd is

the numb er of diagonals We also need to know the osets of each

of the diagonals with resp ect to the main diagonal These will b e

stored in an array IOF F Nd Thus in p osition i j of

the array DI AG is lo cated the element a of the original

iiIOFFj

matrix ie

DI AGi j a

iiio j

The order in which the diagonals are stored in the columns of

DI AG is unimp ortant in general If many more op erations are

p erformed with the main diagonal there may b e a slight advantage

in storing it in the rst column Note also that all the diagonals

except the main diagonal have fewer than n elements so there

are p ositions in DI AG that will not be used

For example the following matrix which has three diagonals

C B

C B

C B

C B

A

C B

C B

A

will b e represented the two arrays

DIAG IOFF

A more general scheme that has b een p opular on vector ma

chines is the socalled EllpackItpack format The assumption in

Chapter II

this scheme is that we have at most Nd nonzero elements per

row where Nd is small Then two rectangular arrays of dimen

sion n Nd each are required one real and one integer The rst

COEF is similar to DI AG and contains the nonzero elements of

A We can store the nonzero elements of each row of the matrix

in a row of the array COEF n Nd completing the row

by zeros if necessary Together with COEF we need to store an

integer array JCOEF n Nd which contains the column

p ositions of eachentry in COEF Thus for the ab ove matrix we

would have

JCOEF COEF

Note that in the ab ove JCOEF array we have put a column

numb er equal to the rownumb er for the zero elements that have

b een added to pad the rows of DI AG that corresp ond to shorter

rows in the matrix A This is somewhat arbitrary and in fact any

integer between and n would be acceptable except that there

may b e go o d reasons for not putting the same integers to o often

for p erformance considerations

Basic Sparse Matrix Op erations

One of the most imp ortant op erations required in many of the

algorithms for computing eigenvalues of sparse matrices is the

matrixbyvector pro duct We do not intend to show how these

are p erformed for each of the storage schemes considered earlier

but only for a few imp ortant ones

The following Fortran X segmentshows the main lo op of the

matrix byvector op eration for matrices stored in the Compressed

Sparse Row stored format

Sparse Matrices

DO I N

K IAI

K IAI

YI DOTPRODUCTAKKXJA K K

ENDDO

Notice that each iteration of the lo op computes a dierent

comp onent of the resulting vector This has the obvious advan

tage that each of these iterations can b e p erformed indep endently

If the matrix is stored columnwise then we would use the fol

lowing co de instead

DO J N

K IAJ

K IAJ

YJAKK YJAKKXJAKK

ENDDO

In each iteration of the lo op a multiple of the j th column is

added to the result which is assumed to have b een set initially to

zero Notice now that the outer lo op is no longer parallelizable

Barring the use of a dierent data structure the only alternative

left to improve parallelization is to attempt to split the vector op

eration in each inner lo op which has few op erations in general

The p oint of this comparison is that wemayhavetochange data

structures to improve p erformance when dealing with sup ercom

puters

We now consider the matrixvector pro duct in diagonal stor

age

DO J NDIAG

JOFF IOFFJ

DO I N

YI YI DIAGIJXJOFFI

ENDDO ENDDO

Chapter II

Here each of the diagonals is multiplied by the vector x and

the result added to the vector y It is again assumed that the

vector y has b een lled with zero elements b efore the start of the

lo op From the point of view of parallelization andor vectoriza

tion the ab ove co de is probably the one that has the most to oer

On the other hand its drawback is that it is not general enough

Another imp ortant kernel in sparse matrix computations is

that of solving a lower or upp er triangular system The following

segmentshows a simple routine for solving a unit lower triangular

system

X Y

DO K N

K IALK

k IALK

XKYKDOTPRODUCTALK K X JAL K K

ENDDO

Sparse Direct Solution Metho ds

Solution metho ds for large sparse linear systems of equations

are imp ortant in eigenvalue calculations mainly b ecause they are

needed in the context of the shiftandinvert techniques describ ed

in Chapter IV In these techniques the matrix that is used in the

iteration pro cess is A I or A B B for the general

ized eigenvalue problem In this section we give a brief overview

of sparse matrix techniques for solving linear systems The di

culty here is that we must deal with problems that are not only

complex since complex shifts are likely to o ccur but also inde

nite There are two broad classes of metho ds that are commonly

used direct and iterative Direct metho ds are more commonly

used in the context of shiftandinvert techniques b ecause of their

robustness when dealing with indenite problems

Most direct metho ds for sparse linear systems p erform an LU

factorization of the original matrix and try to reduce cost bymini

Sparse Matrices

mizing llins ie nonzero elements intro duced during the elim

ination pro cess in p ositions which were initially zeros Typical

co des in this category include MA see reference from the

Harwell library and the Yale Sparse Matrix Package YSMP see

reference For a detailed view of sparse matrix techniques

we refer to the book by Du Erisman and Reid

Currently the most p opular iterative metho ds are the precon

ditioned conjugate gradient typ e techniques In these techniques

an approximate factorization A LU E of the original matrix is

obtained and then the conjugate gradient metho d is applied to a

preconditioned system a form of whichisU L Ax U L b

The conjugate gradient metho d is a pro jection metho d related

to the Lanczos algorithm which will be describ ed in Chapter

VI One diculty with conjugate gradienttyp e metho ds is that

they are designed for matrices that are p ositive real ie matrices

whose symmetric parts are p ositive denite and as a result they

will p erform well for the typ es of problems that will arise in the

context of shiftandinvert

Test Problems

When developing algorithms for sparse matrix computations it is

desirable to b e able to use test matrices that are well do cumented

and often used by other researchers There are many dierent

ways in which these test matrices can be useful but their most

common use is for comparison purp oses

Two dierent ways of providing data sets consisting of large

sparse matrices for test purp oses have b een used in the past The

rst one is to collect sparse matrices in a wellsp eciced format

from various applications This approach has is used in the well

known HarwellBo eing collection of test matrices The second

approach is to collect subroutines or programs that generate such

matrices This approachistaken in the SPARSKIT package which

we briey describ e in the next section

In the course of the book we will often use two test problems

Chapter II

in the examples These are describ ed in detail next While these

two examples are far from b eing representative of all the problems

that o ccur they have the advantage of being easy to repro duce

They have also b een extensively used in the literature

j

j

j

j

j

j

j

i i i i i i i

Figure Random walk on a triangular grid

Random Walk Problem

The rst test problem is issued from a Markov mo del of a random

walk on a triangular grid It was prop osed by G W Stewart

and has b een used in several pap ers for testing eigenvalue

algorithms The problem mo dels a random walk on a k

k triangular grid as is shown in Figure

We lab el byi j the no de of the grid with co ordinates ih j h

where h is the grid spacing for i j k A particle moves

Sparse Matrices

randomly on the grid by jumping from a no de i j into either of

its at most neighb ors The probabilityof jumping from no de

i j toeitherno de i j or no de i j down transition

is given by

i j

pdi j

k

this probability b eing doubled when either i or j is equal to zero

The probability of jumping from no de i j toeithernodei j

or no de i j up transition is given by

pdi j pui j

Note that there cannot be an up transition when i j k ie

for no des on the oblique b oundary of the grid This is reected

by the fact that in this situation pu i j

The problem is to compute the steady state probability dis

tribution of the chain ie the probabilities that the particle be

lo cated in each grid cell after a very long period of time We

numb er the no des from the b ottom up and from left to right ie

in the order

k k k

The matrix P of transition probabilities is the matrix whose

generic element p is the probability that the particle jumps

kq

from no de k to no de q This is a sto chastic matrix ie its ele

ments are nonnegative and the sum of elements in the same row

T

is equal to one The vector is an eigenvector of P

asso ciated with the eigenvalue unity As is known the steady state

probability distribution vector is the appropriately scaled eigen

vector of the transp ose of P asso ciated with the eigenvalue one

Note that the numb er of dierent states is k k which

is the dimension of the matrix We will denote by Markk the

corresp onding matrix Figure shows the sparsity pattern of

Mark which is a matrix of dimension n with nz

nonzero elements

Chapter II

Figure Sparsity pattern of the matrix Mark

Chemical Reactions

The second test example mo dels concentration waves in reaction

and transp ort interaction of some chemical solutions in a tubular

reactor The concentrations x zy z of two reacting and

diusing comp onents where z represents a co ordinate

along the tub e and is the time are mo deled by the system

x D x

x

f x y

L z

y D y

y

g x y

L z

Sparse Matrices

with the initial condition

xzx z y zy z z

and the Dirichlet b oundary conditions

xxx

y y y

The linear stability of the ab ove system is traditionally studied

around the steady state solution obtained by setting the partial

derivatives of x and y with resp ect to time to be zero More

precisely the stability of the system is the same as that of the

Jacobian of evaluated at the steady state solution In

many problems one is primarily interested in the existence of limit

cycles or equivalently the existence of p erio dic solutions to

This translates into the problem of determining whether

the Jacobian of evaluated at the steady state solution

admits a pair of purely imaginary eigenvalues

We consider in particular the socalled Brusselator wavemodel

in which

f x y A B x x y

g x y Bx x y

Then the ab ove system admits the trivial stationary solution

x A y BA A stable p erio dic solution to the system

exists if the eigenvalues of largest real parts of the Jacobian of

the righthand side of is exactly zero To verify this

numericallywe rst need to discretize the equations with resp ect

to the variable z and compute the eigenvalues with largest real

parts of the resulting discrete Jacobian

For this example the exact eigenvalues are known and the

problem is analytically solvable The following set of parameters

have been commonly used in previous articles

D D D

x x y

A B

Chapter II

The bifurcation parameter is L For small L the Jacobian has only

eigenvalues with negative real parts At L a purely

imaginary eigenvalue app ears

We discretize the interval using n p oints and dene

x

the mesh size h n The discrete vector is of the form

y

where x and y are ndimensional vectors Denoting by f and g

h h

the corresp onding discretized functions f and g the Jacobian is

ax in which the diagonal blo cks and

are the matrices

D f x y

x h

tridiag f g

h L x

and

g x y D

h y

tridiag f g

h L y

resp ectively while the blo cks and are

g x y f x y

h h

and

y x

resp ectively Note that b ecause the steady state solution is a

constant with resp ect to the variable z the Jacobians of either f

h

or g with resp ect to either x or y are scaled identity matrices We

h

denote by A the resulting n x n Jacobian matrix The matrix

A has the following structure

T I

A

I T

In which T tridiag f g and and are scalars

The exact eigenvalues of A are readily computable since there

exists a quadratic relation b etween the eigenvalues of the matrix

A and those of the classical dierence matrix T

The HarwellBo eing Collection

This large collection of test matrices has been gathered over sev

eral years by I Du Harwell and R Grimes and J Lewis Bo e

ing The number of matrices in the collection at the time

Sparse Matrices

of this writing is The matrices have b een contributed by re

searchers and engineers in many dierent areas The sizes of the

matrices vary from very small such as counter example matrices

to very large One drawback of the collection is that it contains

few nonHermitian eigenvalue problems Many of the eigenvalue

problems in the collection are from structural engineering which

are generalized eigenvalue problems One the other hand the col

lection provides a data structure which constitutes an excellent

medium of exchanging matrices

The matrices are stored as ASCI I les with a very sp ecic for

mat consisting of a or line header and then the data containing

the matrix stored in CSC format together with any righthand

sides initial guesses or exact solutions

The collection is available for public distribution from the au

thors

SPARSKIT

SPARSKIT is a package aimed at providing subroutines and util

ities for working with general sparse matrices Its purp ose is not

as much to solve particular problems involving sparse matrices

linear systems eigenvalue problems but rather to makeavailable

the little to ols to manipulate and p erforms simple op erations with

sparse matrices For example there are to ols for exchanging data

structures eg passing from the Compressed Sparse Row format

to the diagonal format and vice versa There are various to ols

for extracting submatrices or p erforming other similar manipula

tions SPARSKIT also provides matrix generation subroutines as

well as basic linear algebra routines for sparse matrices such as

addition multiplication etc

A short description of the contents of SPARSKIT follows The

package is divided up in six mo dules eachhaving a dierent func

tion To refer to these six parts we will use the names of the

sub directories where they are held in the package in its current

version

Chapter II

FORMATS This mo dule contains essentially two sets of rou

tines The rst set contained in the le formatsf consists of the

routines needed to translate data structures Translations from

the basic Compressed Sparse Row format to any of the other for

mats supp orted is provided together with a routine for the reverse

transformation This way one can translate from any of the data

structures supp orted to any other one with two transformation at

most The formats currently supp orted are the following

DNS Dense format

BND Linpack Banded format

CSR Compressed Sparse Row format

CSC Compressed Sparse Column format

COO Co ordinate format

ELL EllpackItpack generalized diagonal format

DIA Diagonal format

BSR Blo ck Sparse Row format

MSR Mo died Compressed Sparse Row format

SSK Symmetric Skyline format

NSK Nonsymmetric Skyline format

JAD The Jagged Diagonal scheme

The second set of routines contains a numb er of routines cur

rently called unary to p erform simple manipulation func

tions on sparse matrices such as extracting a particular diagonal

or p ermuting a matrix or yet for ltering out small elements For

reasons of space we cannot list these routines here

Sparse Matrices

BLASSM This mo dule contains a number of routines for do

ing basic linear algebra with sparse matrices It is comprised of

essentially two sets of routines Basically the rst one consists

of matrixmatrix op erations eg multiplication of matrices and

the second consists of matrixvector op erations The rst set al

lows to p erform the following op erations with sparse matrices

where A B C are sparse matrices D is a diagonal matrix and

T

is a scalar C AB C A B C A B C A B

T

C A B A A I C A D

The second set contains various routines for p erforming matrix

by vector pro ducts and solving sparse triangular linear systems

in dierent storage formats

INOUT This mo dule consists of routines to read and write ma

trices in the HarwellBo eing format For more information on this

format and the HarwellBo eing collection see the reference

It also provides routines for plotting the pattern of the matrix or

simply dumping it in a nice format

INFO There is currently only one subroutine in this mo dule Its

purp ose is to provide as many as p ossible on a matrix

with little cost Ab out lines of information are written For

example the co de analyzes diagonal dominance of the matrix

row and column its degree of symmetry structural as well as

numerical its blo ck structure its diagonal structure etc

MATGEN The set of routines in this mo dule allows one to

generate test matrices For now there are generators for dierent

typ es of matrices

Fivep ointand seven point matrices on rectangular regions

discretizing a general elliptic partial dierential equation

Same as ab ove but provides blo ck matrices several degrees

of freedom per grid p ointinthe PDE

Chapter II

Finite elements matrices for the heat condition problem

using various domains including user provided ones

Test matrices from the pap er by Z Zlatev K Schaumburg

and J Wasniewski

Markov chain matrices arising from a random walk on a

triangular grid See Section for details

UNSUPP As is suggested by its name this mo dule contains

various unsupported software to ols that are not necessarily p ortable

or that do not t in any of the previous mo dules For example

software for viewing matrix patterns on some workstations will

be found here For now UNSUPP contains subroutines for vi

sualizing matrices and a preconditioned GMRES package with

a robust based on Incomplete LU factorization

with controlled llin

Problems

P Write a FORTRAN co de segment to p erform the matrixvector

pro duct for matrices stored in EllpackItpack format

P Write a small subroutine to p erform the following op erations

on a sparse matrix in co ordinate format diagonal format and in CSR

format a count the numb er on nonzero elements in the main diagonal

b extract the diagonal whose oset is k whichmay b e negative c

add a nonzero element in p osition i j of the matrix assume that

this p osition may contain a zero or a nonzero element d add a given

diagonal to the matrix What is the most convenient storage scheme

for each of these op erations

P Generate explicitly the matrix Mark Verify that it is a

sto chastic matrix Verify that and are eigenvalues

Sparse Matrices

Notes and References Two go o d sources of reading on sparse matrix

computations are the b o oks byGeorgeand Liu and byDu Erisman

and Reid Also of interest are and and the early survey by

Du For applications related to eigenvalue problems see and

For details on Markov Chain mo deling see The SPARSKIT pack

age is part of an ongoing pro ject Write to the author for information Some

do cumentation is available in the technical rep ort Another manipula

tion package for sparse matrices similar to SPARSKIT in spirit is SMMS

develop ed by Alvarado 

Chapter II

Chapter III

Perturbation Theory and

Error Analysis

This chapter intro duces some elementary sp ectral theory for linear

op erators on nite dimensional spaces as well as some elements of

p erturbation analysis The main question that p erturbation the

ory addresses is howdoesaneigenvalue and its asso ciated eigen

vectors sp ectral pro jector etc vary when the original matrix

undergo es a small p erturbation This information is imp ortant

b oth for theoretical and practical purp oses The sp ectral theory

intro duced in this chapter is the main to ol used to extend what is

known ab out sp ectra of matrices to general op erators on innite

dimensional spaces However it has also some consequences in an

alyzing the b ehavior of eigenvalues and eigenvectors of matrices

The material discussed in this chapter is probably the most theo

retical of the book Fortunately most of it is indep endent of the

rest and may b e skipp ed in a rst reading The notions of condi

tion numb ers and some of the results concerning error b ounds are

crucial in understanding the diculties that eigenvalue routines

may encounter

Chapter III

Pro jectors and their Prop erties

n

to itself which A pro jector P is a linear transformation from C

is idemp otent ie such that

P P

When P is a pro jector then so is I P and we have KerP

RanI P The two subspaces KerP and RanP have only

the element zero in common This is b ecause if a vector x is in

RanP then Px x and if it is also in KerP then Px

so that x and the intersection of the two subspaces reduces

n

to fg Moreover every element of C can be written as x

n

Px I P x As a result the space C can be decomp osed as

the direct sum

n

C KerP RanP

Converselyevery pair of subspaces M and S that form a direct

n

sum of C dene a unique pro jector such that RanP M and

KerP S The corresp onding transformation P is the linear

n

into the comp onent x mapping that maps any element x of C

where x is the M comp onent in the unique decomp osition x

x x asso ciated with the direct sum In fact this asso ciation

is unique in that a pro jector is uniquely determined by its kernel

n

and range two subspaces that form a direct sum of C

Orthogonal Pro jectors

An imp ortant particular case is when the subspace S is the or

thogonal complementofM ie when

KerP RanP

In this case the pro jector P is said to b e the orthogonal projector

n

the onto M Since RanP and Ker P from a direct sum of C

Perturbation Theory

decomp osition x Px I P x is unique and the vector Px is

uniquely dened by the set of equations

Px M and I P xM

or equivalently

Px M and I P x y y M

Prop osition A projector is orthogonal if and only if it is

Hermitian

Pro of As a consequence of the equality

H

P x y x P y x y

we conclude that

H

KerP RanP

H

KerP RanP

By denition an orthogonal pro jector is one for which KerP

RanP Therefore by if P is Hermitian then it is orthog

onal

H

To show that the converse is true we rst note that P is also

H H H

a pro jector since P P P We then observe that if

H

P is orthogonal then implies that KerP Ker P while

H H

implies that RanP RanP Since P is pro jector this

H

implies that P P b ecause a pro jector is uniquely determined

by its range and its kernel

Given any unitary n m matrix V whose columns form an

orthonormal basis of M RanP we can represent P by the

H

matrix P VV Indeed in addition to being idemp otent the

linear mapping asso ciated with this matrix satises the charac

terization given ab ove ie

H H

VV x M and I VV x M

Chapter III

It is imp ortantto note that this representation of the orthogonal

pro jector P is not unique In fact any orthonormal basis V will

give a dierent representation of P in the ab ove form As a con

sequence for anytwo orthogonal bases V V of M wemust have

H H

V V V V an equality which can also be veried indep en

dently see Exercise P

From the ab ove representation it is clear that when P is an

orthogonal pro jector then we have kPxk kxk for any x As

n

do es not a result the maximum of kPxk kxk for all x in C

exceed one On the other hand the value one is reached for any

element in RanP and therefore

kP k

for any orthogonal pro jector P

n

Recall that the acute angle b etween two nonzero vectors of C

is dened by

jx y j

cos x y x y

kxk ky k

We dene the acute angle between a vector and a subspace S as

the smallest acute angle made b etween x and all vectors y of S

x S min x y

y S

An optimality prop erty of orthogonal pro jectors is the following

Theorem Let P be an orthogonal projector onto the subspace

n

we have S Then given any vector x in C

min kx y k kx Pxk

y S

or equivalently

x S x P x

Perturbation Theory

Pro of Let y any vector of S and consider the square of its

distance from x We have

kx y k kx Px Px y k kx Pxk kPx y k

b ecause x Px is orthogonal to S to which Px y b elongs

Therefore kx y k kx Pxk for all y in S and this establishes

the rst result by noticing that the minimum is reached for y

Px The second equality is a simple reformulation of the rst

It is sometimes imp ortant to be able to measure distances

between two subspaces If P represents the orthogonal pro jector

i

onto M for i a natural measure of the distance between

i

M and M is provided by their gap dened by

kx P xk kx P xk max M M max max

xM xM

 

kxk  kxk 

 

We can also redene M M as

M M maxfkI P P k kI P P k g

and it can even be shown that

M M kP P k

Oblique Pro jectors

A pro jector that is not orthogonal is said to be oblique It is

sometimes useful to have a denition of oblique pro jectors that

resembles that of orthogonal pro jectors ie a denition similar to

If we call L the subspace that is the orthogonal complement

to S KerP it is clear that L will have the same dimension as

M Moreover to say that I P x belongstoKerP is equivalent

to saying that it is in the orthogonal complement of L There

fore from the denitions seen at the b eginning of Section the

pro jector P can be characterized by the dening equation

Px M and I P x L

Chapter III

Wesay that P is a pro jector onto M and orthogonal to L or along

the orthogonal complementofL This is illustrated in Figure

Px M x Px M

Qx M x Qx L

x

L

M

Qx

 Px

Figure Orthogonal and oblique pro jectors P and

Q

Matrix representations of oblique pro jectors require two bases

a basis V v v of the subspace M RanP and the

m

other W w w for the subspace L the orthogonal com

m

plement of KerP Wewillsay that these two bases are biorthog

onal if

v w

i j ij

Given any pair of biorthogonal bases V W the pro jector P can

be represented by

H

P VW

In contrast with orthogonal pro jectors the norm of P is larger

than one in general It can in fact be arbitrarily large which

implies that the norms of P Q for two oblique pro jectors P

and Q will not in general be a go o d measure of the distance

between the two subspaces RanP and RanQ On the other

Perturbation Theory

hand it maygive an idea on the dierence b etween their rank as

is stated in the next theorem

Theorem Let kk beany matrix norm and assume that two

projectors P and Q are such that kP Qk then

rank P rankQ

Pro of First let us show that rankQ rankP Given a

basis fx g of RanQwe consider the family of vectors G

i iq

fPx g in RanP and show that it is linearly indep endent

i iq

Assume that

q

X

Px

i i

i

P

q

Then the vector y x is such that Py and therefore

i i

i

Q P y Qy y and kQ P y k ky k Since kQ P k

this implies that y As a result the family G is linearly

indep endent and so rankP q rank Q It can be shown

similarly that rank P rankQ

The ab ove theorem indicates that no norm of P Q can be less

than one if the two subspaces have dierent dimensions More

over if we have a family of pro jectors P t that dep ends contin

uously on t then the rank of P t remains constant In addition

an immediate corollary is that if the gap between two subspaces

is less than one then they must have the same dimension

Resolvent and Sp ectral Pro jector

For any given complex z not in the sp ectrum of a matrix A we

dene the resolvent op erator of A at z as the linear transformation

R A z A zI

The notation R z is often used instead of R A z if there is no

ambiguity This notion can be dened for op erators on innite

Chapter III

dimensional spaces in which case the sp ectrum is dened as the

set of all complex scalars such that the inverse of A zI do es

not exist see reference for details

The resolvent regarded as a function of z admits singularities

at the eigenvalues of A Away from any eigenvalue the resolvent

R z is analytic with resp ect to z Indeed we can write for any

z around an element z not equal to an eigenvalue

R z A zI A z I z z I

R z I z z R z

The term I z z R z can b e expanded into the Neuman

series whenever the sp ectral radius of z z R z is less than

unity Therefore the Taylor expansion of R z in the op en disk

jz z j R z exists and takes the form

o

X

k k

R z z z R z

k

It is imp ortant to determine the nature of the singularity of

R z at the eigenvalues i p By a simple application

i

of Cramers rule it is easy to see that these singularities are not

essential In other words the Laurent expansion of R z

X

k

R z z C

i k

k

around each p ole has only a nite number of negative powers

i

Thus R z is a meromorphic function

The resolvent satises the following immediate prop erties

First resolvent equality

R z R z z z R z R z

Second resolvent equality

R A z R A zR A zA A R A z

Perturbation Theory

In what follows we will need to integrate the resolvent over

Jordan curves in the complex plane A Jordan curve is a simple

closed curve that is piecewise smo oth and the integration will

always be counter clo ckwise unless otherwise stated There is

not much dierence b etween integrating complex valued functions

nn

In fact suchintegrals can b e dened or in C with values in C

over functions taking their values in Banach spaces in the same

way

Consider any Jordan curve that encloses the eigenvalue

i i

and no other eigenvalue of A and let

Z

P R z dz

i

i

i

The ab ove integral is often referred to as the TaylorDunford in

tegral

i

i

Relations with the Jordan form

The purp ose of this subsection is to show that the op erator P

i

dened by is identical with the sp ectral pro jector dened

in Chapter I by using the Jordan canonical form

Theorem The linear transformations P i p as

i

sociated with the distinct eigenvalues i p are such

i that

Chapter III

P P ie each P is a projector

i i

i

P P P P if i j

i j j i

P

p

P I

i

i

Pro of Let and twocurves enclosing with enclosing

i

Then

Z Z

R z R z dz dz i P

i

Z Z

R z R z dz dz

z z

b ecause of the rst resolvent equality We observe that

Z Z

dz dz

and i

z z z z

so that

Z Z Z Z

R z dz

dz dz R z dz

z z z z

and

Z Z Z Z Z

dz R z

dz dz R z dz i R z dz

z z z z

from which we get P P

i

i

The pro of is similar to and is left as an exercise

Consider

Z

p

X

R z dz P

i

i

i

Since there are no poles of R z outside of the p Jordan curves

we can replace the sum of the integrals by an integral over any

Perturbation Theory

curve that contains all of the eigenvalues of A If we cho ose this

curve to b e a circle C of radius r and center the origin we get

Z

R z dz P

i

C

Making the change of variables t z we nd that

Z Z

dt dt

A tI tA I P

i t i t

C C



where C resp C is the circle of center the origin radius

r run clo ckwise resp counterclo ckwise Moreover because

r must be larger than A we have tA and the inverse of

I tA is expandable into its Neuman series ie the series

X

k

I tA tA

k

converges and therefore

Z

k

X

k k

t A dt I P

i

C



k

by the residue theorem

The ab ove theorem shows that the pro jectors P satisfy the

i

same prop erties as those of the sp ectral pro jector dened in the

previous chapter using the Jordan canonical form However to

show that these pro jectors are identical we still need to prove

that they have the same range Note that since A and R z

commute we get by integration that AP P A and this implies

i i

that the range of P is invariant under A We must show that

i

this invariant subspace is the invariant subspace M asso ciated

i

with the eigenvalue as dened in Chapter I The next lemma

i

establishes the desired result

Chapter III

l

i

Lemma Let M RanP and let M KerA I be

i i i i

the invariant subspace associated with the eigenvalue Then we

i

have M M for i p

i i

Pro of WerstprovethatM M This follows from the fact

i i

l

i

that when x KerA I we can expand R z x as follows

i

R z x A zI x

A I z I x

i i

h i

I z A I x

i i

z

i

l

i

X

j j

z A I x

i i

z

i

j

The integral of this over is simply i x by the residue theo

i

rem hence the result

We now show that M M From

i i

z R z I A I R z

i i

it is easy to see that

Z Z

z R z dz A I R z dz A I P

i i i i

i i

and more generally

Z Z

k k

z R z dz A I R z dz

i i

i i

k

A I P

i i

Notice that the term in the lefthand side of is the co ecient

A of the Laurent expansion of R z which has no essential

k

singularities Therefore there is some integer k after which all

the lefthand sides of vanish This proves that for every

k

x P x in M there exists some l for whichA I x k l

i i i

It follows that x b elongs to M i

Perturbation Theory

This nally establishes that the pro jectors P are identical with

i

those dened with the Jordan canonical form and seen in Chap

ter I Each pro jector P is asso ciated with an eigenvalue How

i i

ever it is imp ortant to note that more generally one can dene

a pro jector asso ciated with a group of eigenvalues which will be

the sum of the individual pro jectors asso ciated with the dierent

eigenvalues This can also be dened by an integral similar to

where is a curve that encloses all the eigenvalues of the

group and no other ones Note that the rank of P thus dened

is simply the sum of the algebraic multiplicities of the eigenvalue

In other words the dimension of the range of such a P would be

the sum of the algebraic multiplicities of the distinct eigenvalues

enclosed by

Linear Perturbations of A

In this section we consider the family of matrices dened by

AtA tH

where t b elongs to the complex plane We are interested in the

b ehavior of the eigenelements of At when t varies around the

origin Consider rst the parameterized resolvent

R t z A tH zI

Noting that R t z R z I tR z H it is clear that if the

sp ectral radius of tR z H is less than one then R t z will be

analytic with resp ect to t More precisely

Prop osition The resolvent R t z is analytic with respect to

t in the open disk jtj HRz

We wish to show by integration over a Jordan curve that a

similar result holds for the sp ectral pro jector P t ie that P t

is analytic for t small enough The result would be true if the

Chapter III

resolvent R t z were analytic with resp ect to t for each z on

i

To ensure this we must require that

jtj inf R z H

z

The question that arises next is whether or not the disk of all t s

dened ab ove is empty The answer is no as the following pro of

shows We have

R z H kR z H kkR z kkH k

The function kR z k is continuous with resp ect to z for z and

therefore it reaches its maximum at some point z of the closed

curve and we obtain

R z H kR z H kkR z kkH k

Hence

inf R z H

z

Theorem Let be a Jordan curve around one or a few

eigenvalues of A and let

infR z H

a

z

Then and the spectral projector

a

Z

P t R t z dz

i

is analytic in the disk jtj

a

We have already proved that The rest of the pro of is

a

straightforward As an immediate corollary of Theorem we

know that the rank of P t will stay constant as long as t stays

in the disk jtj a

Perturbation Theory

Corollary The number m of eigenvalues of At counted

with their algebraic multiplicities located inside the curve is

constant provided that jtj

a

In fact the condition on t is only a sucient condition and it may

b e to o restrictive since the real condition required is that P tbe

continuous with resp ect to t

While individual eigenvalues maynothave an analytic b ehav

ior their average is usually analytic Consider the average

m

X

t t

i

m

i

of the eigenvalues t t t of At that are inside

m

where we assume that the eigenvalues are counted with their

multiplicities Let B tbe a matrix representation of the restric

tion of At to the invariant subspace M t RanP t Note

that since M t is invariant under At then B t is the matrix

representation of the rank m transformation

At AtP t P tAt P tAtP t

jM t jM t jM t jM t

and we have

t trB t trAtP t

jM t

m m

trAtP t

m

The last equality in the ab ove equation is due to the fact that for

any x not in M twehave P tx and therefore the extension

of AtP t to the whole space can only bring zero eigenvalues in

addition to the eigenvalues ti m

i

Theorem The linear transformation AtP t and its

weighted trace t are analytic in the disk jz j a

Chapter III

Pro of That AtP t is analytic is a consequence of the pre

vious theorem That t is analytic comes from the equivalent

expression and the fact that the trace of an op erator X t

that is analytic with resp ect to t is analytic

Therefore a simple eigenvalue tofAt not only stays sim

ple around a neighb orho o d of t but it is also analytic with

resp ect to t Moreover the vector u t P tu is an eigenvector

i i i

of At asso ciated with this simple eigenvalue with u u b e

i i

ing an eigenvector of A asso ciated with the eigenvalue Clearly

i

the eigenvector u t is analytic with resp ect to the variable t

i

However the situation is more complex for the case of a multiple

eigenvalue If an eigenvalue is of multiplicity m then after a small

p erturbation it will split into at most m distinct small branches

t These branches taken individually are not analytic in gen

i

eral On the other hand their arithmetic average is analytic For

this reason it is critical in practice to try to recognize groups

of eigenvalues that are likely to originate from the splitting of a

p erturb ed multiple eigenvalue

Example That an individual branchofthe m branches of eigen

values t is not analytic can b e easily illustrated by the example

i

H A

p

The matrix At has the eigenvalues t which degenerate into the

double eigenvalue as t The individual eigenvalues are not

analytic but their average remains constant and equal to zero

In the ab ove example each of the individual eigenvalues behaves

like the square ro ot of t around the origin One may wonder

whether this typ e of behavior can be generalized The answer is

stated in the next prop osition

Prop osition Any eigenvalue t of At inside the Jordan

i

curve satises

l

i

j t j O jtj

i i

Perturbation Theory

where l is the index of

i i

l

i

Pro of Let f z z We have seen earlier pro of of

i

Lemma that f AP For an eigenvector ut of norm

i

unity asso ciated with the eigenvalue twe have

i

l

i

f AtP tut f Atut At I ut

i

l

i

t ut

i

Taking the norms of b oth memb ers of the ab ove equation and

using the fact that f AP we get

i

l

i

j t j kf AtP tutk

i i

kf AtP tk kf AtP t f AP k

i

Since f A f A P P and P tfAt are analytic

i

the righthandside in the ab ove inequalityisO t and therefore

l

i

O jtj j t j

i i

which shows the result

Example A standard illustration of the ab ove result is provided

by taking A to be a Jordan blo ck and H to be the rank one matrix

T

H e e

n

B C B C

B C B C

B C B C

A H

B C B C

B C B C

A A

The matrix A has nonzero elements only in p ositions i i where

they are equal to one The matrix H has its elements equal to zero

except for the element in p osition n which is equal to one For

t the matrix A tH admits only the eigenvalue The

characteristic p olynomial of A tH is equal to

n n

p z detA tH zI z t t

Chapter III

ij

n

n

and its ro ots are tt e j n Thus if n then

j

for a p erturbation on A of the order of a reasonable number if

double arithmetic is used the eigenvalue will be p erturb ed

byasmuchas

APosteriori Error Bounds

In this section we consider the problem of predicting the error

made on an eigenvalueeigenvector pair from some a p osteriori

knowledge on their approximations The simplest criterion used

to determine the accuracy of an approximate eigenpair u isto

compute the norm of the so called residual vector

r Au u

The aim is to derive error b ounds that relate some norm of r

typically its norm to the errors on the eigenpair Such error

b ounds are referred to a p osteriori error b ounds Such bounds

may help determine how accurate the approximations provided by

some algorithm may be This information can in turn be helpful

in cho osing a stopping criterion in iterative algorithms in order

to ensure that the answer delivered by the numerical metho d is

within a desired tolerance

General Error Bounds

In the nonHermitian case there do es not exist any a p osteriori

error b ounds in the strict sense of the denition The error b ounds

that exist are in general weaker and not as easy to use as those

known in the Hermitian case The rst error bound which we

consider is known as the BauerFike theorem We recall that the

condition numb er of a matrix X relative to the pnorm is dened

by Cond X kX k kX k

p p p

Theorem BauerFike Let u be an approximate eigen

pair of A with residual vector r Au u where u is of norm

Perturbation Theory

unity Moreover assume that the matrix A is diagonalizable and

let X be the matrix that transforms it into diagonal form Then

there exists an eigenvalue of A such that

j j Cond X kr k

Pro of If A the result is true Assume that is not

an eigenvalue From A XDX where D is the diagonal of

eigenvalues and since we assume that A we can write

u A I r X D I X r

and hence

kX D I X r k

kX k kX k kD I k kr k

The matrix D I is a diagonal matrix and as a result its

norm is the maximum of the absolute values of its diagonal

entries Therefore

Cond X kr k max j j

i

A

i

from which the result follows

In case the matrix is not diagonalizable then the previous re

sult can be generalized as follows

Theorem Let u an approximate eigenpair with residual

vector r Au u where u is of norm unity Let X be the

matrix that transforms A into its Jordan canonical form A

XJ X Then there exists an eigenvalue of A such that

l

j j

Cond X kr k

l

j j j j

where l is the index of

Chapter III

Pro of The pro of starts as in the previous case but here the

diagonal matrix D is replaced by the Jordan matrix J Because

the matrix J I is blo ck diagonal its norm is the maximum

of the norms of each blo ck a consequence of the alternative

formulation for norms seen in Chapter I For each of these

blo cks we have

J I I E

i i

where E is the nilp otent matrix having ones in p ositions i i

and zeros elsewhere Therefore

l

i

X

j j

J I E

i i

j

and as a result setting j j and noting that kE k

i i

we get

l l l

i i i

X X X

j j j

l

j

i

kJ I k j j kE k

i i

i i i

j j j

The analogue of is

Cond X kJ I k kr k

Since

l

i

X

j

l

kJ I k max kJ I k max

i

i

i

ip ip

j

we get

l

i

i

min Cond X kr k

P

j

l

i

ip

j i

which is essentially the desired result

Perturbation Theory

Corollary Kahan Parlett and Jiang Under the

same assumptions as those of theorem there exists an eigen

value of A such that

l

j j

Cond X kr k

l

j j

where l is the index of

Pro of Follows immediately from the previous theorem and the

inequality

l

X

j

l

i

i

j

For an alternative pro of see Unfortunately the bounds

of the typ e shown in the previous two theorems are not practical

b ecause of the presence of the condition number of X The second

result even requires the knowledge of the index of whichisnot

i

numerically viable The situation is muchimproved in the partic

ular case where A is Hermitian b ecause in this case Cond X

This is taken up next

The Hermitian Case

In the Hermitian case Theorem leads to the following corol

lary

Corollary Let u be an approximate eigenpair of a Her

mitian matrix A with kuk and let r be the corresponding

residual vector Then there exists an eigenvalue of A such that

j j kr k

Chapter III

This is a remarkable result because it constitutes a simple

yet general error bound On the other hand it is not sharp as

the next a p osteriori error bound due to Kato and Temple

shows We start by proving a lemma that will be used to

prove KatoTemples theorem In the next results it is assumed

that the approximate eigenvalue is the Rayleigh quotientofthe

approximate eigenvector

Lemma Let u be an approximate eigenvector of norm unity

of A and Au u Let be an interval that contains

and no eigenvalue of A Then

kr k

Pro of This lemma uses the observation that the residual vector

r is orthogonal to u Then we have

A I u A I u

A I u I u A I u I u

kr k I I

b ecause of the prop erty mentioned ab ove On the

other hand one can expand u in the orthogonal eigenbasis of A

as

u u u u

n n

to transform the left hand side of the expression into

n

X

j j A I u A I u

i i i

i

Each term in the ab ove sum is nonnegative b ecause of the as

sumptions on and Therefore kr k which

is the desired result

Perturbation Theory

Theorem Kato and Temple Let u be an ap

proximate eigenvector of norm unity of A and Au u As

sume that we know an interval a b that contains and one and

only one eigenvalue of A Then

kr k kr k

a b

Pro of Let be the closest eigenvalue to In the case where

is lo cated at left of then take and b in the lemma

to get

kr k

b

In the opp osite case where use a and to get

kr k

a

This completes the pro of

A simplication of KatoTemples theorem consists of using a

particular interval that is symmetric ab out the approximation

as is stated in the next corollary

Corollary Let u be an approximate eigenvector of norm unity

of A and Au u Let be the eigenvalue closest to and

the distance from to the rest of the spectrum ie

minfj j g

i i

i

Then

kr k

j j

Pro of This is a particular case of the previous theorem with

a and b

Chapter III

It is also p ossible to show a similar result for the angle b etween

the exact and approximate eigenvectors

Theorem Let u be an approximate eigenvector of norm unity

of A Au u and r A I u Let be the eigenvalue

closest to and the distancefrom to the rest of the spectrum

ie minfj j g Then if u is an eigenvector of A

i i i

associated with we have

kr k

sin uu

Pro of Let us write the approximate eigenvector u as u

u cos z sin where z is a vector orthogonal to u We have

A I u cos A I u sin A I z

cos I u sin A I z

The two vectors on the right hand side are orthogonal to each

other b ecause

u A I z A I u z u z

Therefore

kr k kA I uk sin kA I z k cos j j

Hence

sin kA I z k kr k

The pro of follows by observing that since z is orthogonal to u

then kA I z k is larger than the smallest eigenvalue of A I

restricted to the subspace orthogonal to uwhich is precisely

Although the ab ove bounds for the Hermitian case are sharp

they are still not computable since involves a distance from the

next closest eigenvalue of A to which is not readily available

Perturbation Theory

In order to be able to use these bounds in practical situations

one must provide a lower bound for the distance One might

simply approximate by where is some approximation

j j

to the next closest eigenvalue to The result would no longer

be an actual upp er bound on the error but rather an estimate

of the error This may not be safe however To ensure that the

computed error b ound used is rigorous it is preferable to exploit

the simpler inequality provided by Corollary in order to nd

alower b ound for the distance for example

j j j j

j j j j

j j j j

j j j

j j kr k

j j

where kr k is the residual norm asso ciated with the eigenvalue

j

Now the ab ove lower bound of is computable In order for

j

the resulting error b ound to have a meaning kr k must b e small

j

enough to ensure that there are no other p otential eigenvalues

k

that might be closer to than is The ab ove error bounds

j

when used cautiously can be quite useful

Example Let

C B

C B

C B

A

C B

C B

A

p p

g The eigenvalues of A are f

An eigenvector asso ciated with the eigenvalue is

C B

C B

C B

u

C B

C B

A

Chapter III

Consider the vector

B C

B C

B C

u

B C

B C

A

The Rayleigh quotient of u with resp ect to A is The

closest eigenvalue is and the corresp onding actual error is

The residual norm is found to b e

kA I u k

The distance here is

j j

So the error b ound for the eigenvalue found is

For the eigenvector the angle b etween the exact and approximate

eigenvector is such that cos giving an angle

and the sine of the angle is approximately sin The error

as estimated by is

sin

whichisabouttwice as large as the actual error

Wenow consider a slightly more realistic situation There are

instances in which the odiagonal elements of a matrix are small

Then the diagonal elements can b e considered approximations to

the eigenvalues of A and the question is how go o d an accuracy

can one exp ect We illustrate this with an example

Example Let

C B

C B

C B

A

C B

C B

A

Perturbation Theory

The eigenvalues of A rounded to digits are

Af g

A natural question is how accurate is each of the diagonal elements of

A as an approximate eigenvalue We assume that we know nothing

ab out the exact sp ectrum We can take as approximate eigenvectors

the e s i and the corresp onding residual norms are

i

resp ectively The simplest residual b ound tells us that

j j j j

j j j j

j j

The intervals dened ab ove are all disjoint As a result we can get a

reasonable idea of the distance of each of the approximations from

i

the eigenvalues not in the interval For example

ja jj j

and

minfja j ja jg

minfj j j jg

We nd similarly and

Wenow get from the b ounds the following inequalities

j j j j

j j j j

j j

whereas the actual errors are

j j j j

j j j j

j j

Chapter III

The KahanParlettJiang Theorem

We now return to the general nonHermitian case The results

seen for the Hermitian case in the previous section can be very

useful in practical situations For example they can help develop

ecient stopping criteria in iterative algorithms In contrast

those seen in Section for the general nonHermitian case are

not to o easy to exploit in practice The question that one might

ask is whether or not any residual b ounds can b e established that

will provide information similar to that provided in the Hermi

tian case There do es not seem to exist any such result in the

literature A result established by Kahan Parlett and Jiang

which we now discuss seems to b e the b est compromise b etween

generality and sharpness However the theorem is of a dierent

typ e It do es not guarantee the existence of say an eigenvalue

in a given interval whose size dep ends on the residual norm It

only gives us the size of the smallest p erturbation that must be

applied to the original data the matrix in order to transform

the approximate eigenpair into an exact one for the p erturb ed

problem

To explain the nature of the theorem we begin with a very

simple result which can b e regarded as a onesided version of the

one proved by Kahan Jiang and Parlett in that it only considers

the right eigenvalue eigenvector pair instead of the eigentriplet

consisting of the eingenvalue and the rightandleft eigenvectors

Prop osition Let a square matrix A and a unit vector u be

given For any scalar dene the residual vector

r Au u

and let E fE A E u ug Then

min kE k kr k

E E

Perturbation Theory

Pro of From the assumptions we see that each E is in E if and

only if it satises the equality

Eu r

Since kuk the ab ove equation implies that for any such E

kE k kr k

which in turn implies that

min kE k kr k

E E

H

Now consider the matrix E ru which is a member of E since

it satises The norm of E is such that

H H H

kE k fru ur g frr g kr k

max max

As a result the minimum in the left hand side of is reached

for E E and the value of the minimum is equal to kr k

Wenow state a simple version of the KahanParlettJiang the

orem

Theorem Kahan Parlett and Jiang Let a squarema

trix A and two unit vectors u w with u w be given For any

scalar dene the residual vectors

H

r Au u s A w w

H

and let E fE A E u uA E w wg Then

min kE k max fkr k ksk g

E E

Chapter III

Pro of We pro ceed in the same way as for the pro of of the

simpler result of the previous prop osition The two conditions

that a matrix E must satisfy in order to b elong to E translate

into

H

Eu r and E w s

By the same argument used in the pro of of Prop osition any

such E satises

kE k kr k and kE k ksk

which proves the inequality

min kE k maxfkr k ksk g

E E

We now dene

H H

s u w r

x r w

y s u

and consider the particular set of matrices of the form

H H H H

E ru ws wu xy

where is a parameter It is easy to verify that these matrices

satisfy the constraints for any

We distinguish two dierent cases dep ending on whether ksk

is larger or smaller than kr k When ksk kr k we rewrite E

in the form

H H

E xu y ws

and select in such away that

H

s u y

which leads to

ksk j j

Perturbation Theory

We note that the ab ove expression is not valid when ksk j j

H

which o ccurs only when y In this situation E ru for

any and the following sp ecial treatment is necessary As in

the pro of of the previous prop osition E kr k On the other

hand we have

H

ksk j j jw r j kr k

which shows that maxfkr k ksk g kr k and establishes the

result that the minimum in the theorem is reached for E in

this very sp ecial case

Going back to the general case where ksk j j with the

ab ovechoice of the twovectors x and w in the range of E as

dened by are orthogonal and similarly the vectors u y

and s are also orthogonal In this situation the norm of E is

equal to See problem P

H

kE k maxfksk kxk kku y k g

Because of the orthogonalityof x and w we have

kxk kr k j j

Similarly exploiting the orthogonalityof the pair u y and using

the denition of we get

ku y k ky k

ksk j j

ksk

ksk j j

The ab ove results yield

kr k j j

kE k max ksk ksk ksk

ksk j j

This shows from that the equality is satised for the

case when ksk kr k

Chapter III

To prove the result for the case ksk kr k we pro ceed in the

same manner writing this time E as

H H

E ru w xy

H

and cho osing such that u w x A sp ecial treatment

will also be necessary for the case where kr k j j which only

o ccurs when x

The actual result proved by Kahan Parlett and Jiang is es

sentially a blo ckversion of the ab ove theorem and includes results

with other norms such as the Frob enius norm

Example Consider the matrix

B C

B C

B C

A

B C

B C

A

which is obtained by p erturbing the symmetric tridiagonal matrix of

Example Consider the pair

B C

B C

B C

v

B C

B C

A

Then wehave

kr k kA I uk

which tells us using the onesided result Prop osition that we

need to p erturb A by a matrix E of norm to make the pair v

an exact eigenpair of A

Consider now v as dened ab ove and

T

w

Perturbation Theory

where is chosen to normalize w to so that its norm is unity Then

still with we nd

kr k ksk

As a result of the theorem we now need a p erturbation E whose

norm is roughly to make the triplet v w an exact eigentriplet

of Aa much stricter requirement than with the onesided result

The outcome of the ab ove example was to be exp ected If

one of the left of right approximate eigenpair for example the

left pair v is a p o or approximation then it will take a larger

p erturbation on A to make the triplet v w exact than it would

to make the pair u exact Whether one needs to use the one

sided or the twosided result dep ends on whether one is interested

in the left and right eigenvectors simultaneously or in the right

or left eigenvector only

Conditioning of Eigenproblems

When solving a linear system Ax b an imp ortant question that

arises is how sensitive is the solution x to small variations of the

initial data namely to the matrix A and the righthand side b

A measure of this sensitivity is called the condition number of A

dened by

Cond A kAkkA k

relative to some norm

For the eigenvalue problem we raise a similar question but we

must now dene similar measures for the eigenvalues as well as

for the eigenvectors and the invariant subspaces

Conditioning of Eigenvalues

Let us assume that is a simple eigenvalue and consider the

family of matrices At A tE We know from the previous

Chapter III

sections that there exists a branchofeigenvalues tofAtthat

is analytic with resp ect to t when t b elongs to a small enough

disk centered at the origin It is natural to call conditioning of

the eigenvalue of A relative to the p erturbation E the mo dulus

of the derivativeoft at the origin t Let us write

Atut tut

and take the inner pro duct of b oth memb ers with a left eigenvec

tor w of A asso ciated with to get

A tE utw tutw

or

tutw Autw tEutw

H

utA w tEutw

utw tEutw

Hence

t

utw Eutw

t

and therefore by taking the limit at t

Eu w

u w

Here we should recall that the left and right eigenvectors asso ci

ated with a simple eigenvalue cannot b e orthogonal to each other

The actual conditioning of an eigenvalue given a p erturbation in

the direction of E is the mo dulus of the ab ovequantity In prac

tical situations one often do es not know the actual p erturbation

E but only its magnitude eg as measured by some matrix norm

kE k Using the CauchySchwartz inequality and the norm we

can derive the following upp er b ound

kEuk kw k kuk kw k

j j kE k

ju w j ju w j

Perturbation Theory

In other words the actual condition numb er of the eigenvalue is

b ounded from ab oveby the norm of E divided by the cosine of the

acute angle b etween the left and the righteigenvectors asso ciated

with Hence the following denition

Denition The condition number of a simple eigenvalue

of an arbitrary matrix A is dened by

Cond

cos u w

in which u and w are the right and left eigenvectors respectively

associated with

Example Consider the matrix

B C

A

A

The eigenvalues of A are f g The right and left eigenvectors of

A asso ciated with the eigenvalue are approximately

C B C B

and w u

A A

and the corresp onding condition number is approximately

Cond

A p erturbation of order may cause p erturbations of magnitude

up to Perturbing a to yields the sp ectrum

f g

For Hermitian or more generally normal matrices every sim

ple eigenvalue is wellconditioned since Cond On the

other hand the condition number of a nonnormal matrix can be

excessively high in fact arbitrarily high

Chapter III

Example As an example simply consider the matrix

B C

B C

B C

B C

B C

A

n

with and i for i A right eigenvector asso ciated

i

with the eigenvalue is the vector e A left eigenvector is the vector

w whose ith comp onent is equal to i for i n A little

calculation shows that the condition number of satises

p

n n Cond n

Thus this example shows that the condition numb er can b e quite large

even for mo destly sized matrices

An imp ortant comment should b e made concerning the ab ove

example The eigenvalues of A are explicitly known in terms of the

diagonal entries of the matrix whenever the structure of A stays

the same One may wonder whether it is sensible to discuss the

concept of condition numberinsuch cases For example if weper

turb the elements bywe know exactly that the eigenvalue

will be p erturb ed likewise Is the notion of condition number

useless in such situations The answer is no First the argument

is only true if p erturbations are applied in sp ecic p ositions of

the matrix namely its upp er triangular part If p erturbations

take place elsewhere then some or all of the eigenvalues of the

p erturb ed matrix may not be explicitly known Second one can

think of applying an orthogonal similarity transformation to A

H

If Q is orthogonal then the eigenvalues of the matrix B Q AQ

have the same condition number as those of the original matrix

A see Problem P The resulting matrix B may be dense

and the dep endence of its eigenvalues with resp ect to its entries

is no longer explicit

Perturbation Theory

Conditioning of Eigenvectors

To prop erly dene the condition number of an eigenvector we

need to use the notion of reduced resolvent Although the resol

vent op erator R z has a singularity at an eigenvalue it can still

be dened on the restriction to the invariant subspace KerP

More precisely consider the restriction of the mapping A I

n

KerP where P is the sp ectral to the subspace I P C

pro jector asso ciated with the eigenvalue This mapping is in

vertible b ecause if x is an element of KerP then A I x

ie x is in KerA I which is included in RanP and this is

only p ossible when x We will call reduced resolvent at

the inverse of this linear mapping and we will denote it by S

Thus

i h

S A I

jKerP

The reduced resolvent satises the relation

S A I x S A I I P x I P x x

which can be viewed as an alternative denition of S

We now consider a simple eigenvalue of a matrix A with

an asso ciated eigenvector uand writethatapairtut is an

eigenpair of the matrix A tE

A tE ut tut

Subtracting Au u from b oth sides we have

Aut utE uttut u ut ut ut

or

A I ut u tE utt ut

We then multiply b oth sides by the pro jector I P to obtain

I P A I ut u tI P Eut

t I P ut

t I P ut u

Chapter III

The last equality holds b ecause I P u since u is in RanP

Hence

A I I P ut u

I P tE utt ut u

We now multiply b oth sides by S and use to get

I P ut u

S I P tE utt ut u

In the ab ovedevelopmentwehave not scaled utinanyway We

now do so by requiring that its pro jection onto the eigenvector u

be exactly uiePutu for all t With this scaling we have

I P ut uut u

As a result equality b ecomes

ut u S tI P Eut t ut u

from whichwe nally get after dividing by t and taking the limit

u S I P Eu

Using the same argument as b efore we arrive at the following

general denition of the condition number of an eigenvector

Denition The condition number of an eigenvector u asso

ciated with an eigenvalue of an arbitrary matrix A is denedby

Cond ukS I P k

in which S is the reduced resolvent of A at

In the case where the matrix A is Hermitian it is easy to verify

that the condition number simplies to the following

Condu

dist A fg

Perturbation Theory

In the general nonHermitian case it is dicult to assess the size

of Cond u

To b etter understand the nature of the op erator S I P

consider its sp ectral expansion in the particular case where A is

diagonalizable and the eigenvalue of interest is simple

i

p

X

P S I P

j i i

j i

j

j i

Since we can write each pro jector as a sum of outer pro duct ma

P

i

H

u w where the left and right eigenvectors u trices P

k k j

k k

and w are normalized suchthatu w the expression

k j j

can be rewritten as

H

n n

X X

w Eu

i

j

H

u u w Eu u

j i j

j

j i j i

j j

j i j i

which is the standard expression develop ed in Wilkinsons book

What the ab ove expression reveals is that when eigenvalues get

close to one another then the eigenvectors are not to o well dened

This is predictable since a multiple eigenvalue has typically several

indep endent eigenvectors asso ciated with it and we can rotate

the eigenvector arbitrarily in the eigenspace while keeping it an

eigenvector of A As an eigenvalue gets close to b eing multiple the

condition number for its asso ciated eigenvector deteriorates In

fact one question that follows naturally is whether or not one can

dene the notion of condition numb er for eigenvectors asso ciated

with multiple eigenvalues The ab ove observation suggests that

a more realistic alternative is to try to analyze the sensitivity of

the invariant subspace Thisistaken up in the next section

Example Consider the matrix seen in example

B C

A

A

Chapter III

The matrix is diagonalizable since it has three distinct eigenvalues and

B C

A X X

A

One way to compute the reduced resolvent asso ciated with is

to replace in the ab ove equality the diagonal matrix D by the inverse

of D I obtained byinverting the nonzero entries and

and placing a zero in entry ie

C B C B

X S X

A A

We nd that the norm of kS k is kS k Thus

a p erturbation of order may cause changes of magnitude up to

on the eigenvector This turns out to be a p essimistic overes

timate If we p erturb a from to the eigenvec

T

tor u asso ciated with is p erturb ed from u to

T

u A clue as to why we have a poor esti

mate is provided by lo oking at the norms of X and X

kX k and kX k

whichreveals that the eigenvectors are p o orly conditioned

Conditioning of Invariant Subspaces

Often one is interested in the invariant subspace rather than the

individual eigenvectors asso ciated with a given eigenvalue In

these situations the condition number for eigenvectors as dened

b efore is not sucient Wewould liketohave anideaonhowthe

whole subspace behaves under agiven p erturbation

We start with the simple case where the multiplicity of the

eigenvalue under consideration is one and we dene some nota

tion Referring to let Qt be the orthogonal pro jector

onto the invariant subspace asso ciated with the simple eigenvalue

Perturbation Theory

tandQ Q b e the orthogonal pro jector onto the invariant

subspace of A asso ciated with The orthogonal pro jector Q onto

the invariant subspace asso ciated with has dierent prop erties

from those of the sp ectral pro jector For example A and Q do not

commute All we can say is that

AQ QAQ or I QAQ

leading to

I QA I QAI Q

I QA I I QA I I Q

Note that the linear op erator A I when restricted to the range

of I Q is invertible This is because if A I x then x

b elongs to RanQ whose intersection with RanI Q is reduced

to fg We denote by S the inverse of A I restricted

to RanI Q Note that although b oth S and S are

inverses of A I restricted to complements of KerA I

these inverses are quite dierent

Starting from we subtract u from each side to get

A I ut tE utt ut

Now multiply b oth sides by the orthogonal pro jector I Q

I QA I ut tI QEutt I Qut

to obtain from

I QA I I QI Qut

tI QEutt I Qut

Therefore

I QutS tI QEutt I Qut