EE240A - MAE 270A - Fall 2002

Review of some elements of linear

algebra

Prof. Fernando Paganini

September 27, 2002

1 Linear spaces and mappings

In this section we will intro duce some of the basic ideas in . Our

treatment is primarily intended as a review for the reader's convenience, with

some additional fo cus on the geometric asp ects of the sub ject. References are

given at the end of the chapter for more details at b oth intro ductory and ad-

vanced levels.

1.1 Vector spaces

The structure intro duced now will p ervade our course, that of a ,

also called a linear space. This is a set that has a natural addition op eration

de ned on it, together with . Because this is suchanim-

p ortant concept, and arises in a numb er of di erentways,itisworth de ning

it precisely b elow.

Before pro ceeding we set some basic notation. The real numbers will be

p

denoted by R, and the complex numb ers by C ; wewilluse j := 1forthe

imaginary unit. Also, given a z = x + jy with x; y 2 R:



 z = x jy is the complex conjugate;

p

2 2

 jz j = x + y is the complex magnitude;

 x = Rez  is the real part.

+

We use C to denote the op en right half-plane formed from the complex num-

+



b ers with p ositive real part; C is the corresp onding closed half-plane, and the



left half-planes C and C are analogously de ned. Finally, j R denotes the

imaginary axis.

We now de ne a vector space. In the de nition, the eld F can be taken

here to be the real numb ers R, or the complex numb ers C . The terminology

real vector space, or complex vector space is used to sp ecify these alternatives. 1

De nition 1. Suppose V is a nonempty set and F is a eld, and that operations

of vector addition and scalar multiplication are de ned in the fol lowing way.

a For every pair u, v 2V a unique element u + v 2V is assignedcal led their

sum;

b for each 2 F and v 2V,there is a unique element v 2V cal led their

product.

Then V is a vector space if the fol lowing properties hold for al l u, v , w 2 V ,

and al l , 2 F :

i There exists a zero element in V , denotedby0,suchthatv +0 = v ;

ii thereexistsavector v in V , such that v +v =0;

iii the association u +v + w =u + v +w is satis ed;

iv the commutativity relationship u + v = v + u holds;

v scalar distributivity u + v = u + v holds;

vi vector distributivity  + v = v + v is satis ed;

vii the associative rule  v =  v for scalar multiplication holds;

viii for the unit scalar 1 2 F the equality 1v = v holds.

Formally,a vector space is an additive group together with a scalar multiplica-

tion op eration de ned over a eld F , whichmust satisfy the usual rules v{viii

of distributivity and asso ciativity. Notice that b oth V and F contain the zero

element, whichwe will denote by \0" regardless of the instance.

Given two vector spaces V and V , with the same asso ciated scalar eld,

1 2

weuseV V to denote the vector space formed by their Cartesian pro duct.

1 2

Thus every elementofV V is of the form

1 2

v ;v  where v 2V and v 2V :

1 2 1 1 2 2

Having de ned a vector space wenow consider a numb er of examples.

Examples:

Both R and C can be considered as real vector spaces, although C is more

commonly regarded as a complex vector space. The most common example of

n

arealvector space is R =R  R; namely, n copies of R. We represent

n

elements of R in a column vector notation

3 2

x

1

7 6

.

n

.

2 R ; where each x 2 R: x =

5 4

k

.

x

n 2

n

Addition and scalar multiplication in R are de ned comp onentwise:

2 3 2 3

x + y x

1 1 1

6 7 6 7

x + y x

2 2 2

6 7 6 7

n

x + y = ; x = ; for 2 R; x; y 2 R :

6 7 6 7

. .

. .

4 5 4 5

. .

x + y x

n n n

n

Identical de nitions apply to the complex space C . As a further step, consider

mn

the space C of complex m  n matrices of the form

2 3

a  a

11 1n

6 7

. .

.

. . .

A = :

4 5

.

. .

a  a

m1 mn

mn

Using once again comp onentwise addition and scalar multiplication, C is a

real or complex vector space.

We now de ne two vector spaces of matrices which will be central in our

mn

course. First, we de ne the transp ose of the ab ove A 2 C by

3 2

a  a

11 m1

7 6

. .

nm 0

.

. . .

2 C ; A =

5 4

.

. .

a  a

1n mn

and its Hermitian conjugate or adjoint by

3 2

 

a  a

11 m1

7 6

. .

nm 

.

. . .

2 C : A =

5 4

.

. .

 

 a a

mn 1n

In b oth cases the indices have b een transp osed, but in the latter we also take

the complex conjugate of each element. Clearly b oth op erations coincide if the



matrix is real; wethus favor the notation A ,which will serve to indicate b oth

1

the transp ose of a real matrix, and the adjoint of a complex matrix.

nn

The square matrix A 2 C is Hermitian or self-adjoint if



A = A :

n

The space of Hermitian matrices is denoted H , and is a real vector space. If a

nn

Hermitian matrix A is in R it is more sp eci cally referred to as symmetric.

n

The set of symmetric matrices is also a real vector space and will b e written S .

m n n

The set F R ; R  of functions mapping m real variables to R is a vector

space. Addition b etween two functions f and f is de ned by

1 2

f + f x ;::: ;x =f x ;::: ;x +f x ;::: ;x 

1 2 1 m 1 1 m 2 1 m

1 0

The transp ose, without conjugation, of a complex matrix A will b e denoted by A ;how-

ever, it is seldom required. 3

for anyvariables x ;::: ;x ; this is called p ointwise addition. Scalar multipli-

1 m

cation byarealnumber is de ned by

 f x ;::: ;x = f x ;::: ;x :

1 m 1 m

An example of a less standard vector space is given by the set comp osed of

multinomials in m variables, that have homogeneous order n. We denote this

[n]

set by P . To illustrate the elements of this set consider

m

2 3

p x ;x ;x =x x x ; p x ;x ;x =x x ; p x ;x ;x =x x x :

1 1 2 3 2 3 2 1 2 3 2 3 1 2 3 1 2 3

1 1

Each of these is a multinomial in three variables; however, p and p have order

1 2

[4]

four, whereas the order of p is three. Thus only p and p are in P . Similarly

3 1 2

3

of

4 3 2

p x ;x ;x =x + x x and p x ;x ;x =x x x + x

4 1 2 3 2 5 1 2 3 2 3 1

1 3 1

[4] [n]

only p is in P , whereas p is not in any P space since its terms are not

4 5

3 3

[n]

homogeneous in order. Some thought will convince you that P is a vector

m

space under p ointwise addition. 

1.2 Subspaces

A subspace of a vector space V is a subset of V which is also a vector space

with resp ect to the same eld and op erations; equivalently, it is a subset which

is closed under the op erations on V .

Examples:

Avector space can havemany subspaces, and the simplest of these is the zero

subspace, denoted by f0g. This is a subspace of anyvector space and contains

only the zero element. Excepting the zero subspace and the entire space, the

simplest typ e of subspace in V is of the form

S = fs 2V : s = v ; for some 2 Rg;

v

given v in V . That is, each elementinV generates a subspace bymultiplying

2 3

it by all p ossible scalars. In R or R ,such subspaces corresp ond to lines going

through the origin.

Going back to our earlier examples of vector spaces we see that the multi-

[n]

m

nomials P are subspaces of F R ; R, for any n.

m

n

Now R has many subspaces and an imp ortant set is those asso ciated with

m n

the natural insertion of R into R ,whenm

are of the form

 

x

; x =

0

m nm

wherex  2 R and 0 2 R .  4

Given two subspaces S and S we can de ne the addition

1 2

S + S = fs 2V : s = s + s for some s 2S and s 2S g

1 2 1 2 1 1 2 2

which is easily veri ed to b e a subspace.

1.3 Bases, spans, and linear indep endence

Wenow de ne some key vector space concepts. Given elements v ;::: ;v in a

1 m

vector space we denote their span by

spanfv ;::: ;v g;

1 m

which is the set of all vectors v that can b e written as

v = v +  + v

1 1 m m

for some scalars 2 F ; the ab ove expression is called a

k

of the vectors v ;::: ;v . It is straightforward to verify that the span always

1 m

de nes a subspace. If for some vectors wehave

spanfv ;::: ;v g = V ;

1 m

we say that the vector space V is nite dimensional. If no such nite set of

vectors exists wesaythevector space is in nite dimensional. Our fo cus for the

remainder of the chapter is exclusively nite dimensional vector spaces. We will

pursue the study of some in nite dimensional spaces in Chapter 3.

If a vector space V is nite dimensional we de ne its dimension, denoted

dimV , to be the smallest number n such that there exist vectors v ;::: ;v

1 n

satisfying

spanfv ;::: ;v g = V :

1 n

In that case wesay that the set

fv ;::: ;v g is a for V :

1 n

Notice that a basis will automatically satisfy the linear indep endence prop erty,

which means that the only solution to the equation

v +  + v =0

1 1 n n

is =  = =0. Otherwise, one of the elements v could be expressed

1 n i

as a linear combination of the others and V would be spanned by fewer than

n vectors. Given this observation, it follows easily that for a given v 2V,the

scalars  ;::: ;  satisfying

1 n

v +  + v = v

1 1 n n

are unique; they are termed the co ordinates of v in the basis fv ;::: ;v g.

1 n

Linear indep endence is de ned analogously for any set of vectors fv ;::: ;v g;

1 m

it is equivalenttosaying the vectors are a basis for their span. The maximal

numb er of linearly indep endentvectors is n, the dimension of the space; in fact

any linearly indep endent set can b e extended with additional vectors to form a

basis. 5

Examples:

[n]

n mn

From our examples so far R ; C , and P are all nite dimensional vector

m

m n n

spaces; however, F R ; R  is in nite dimensional. The real vector space R

mn

and complex vector space C are n and mn dimensional, resp ectively. The

[n]

dimension of P is more challenging to compute and its determination is an

m

exercise at the end of the chapter.

An imp ortant computational concept in vector space analysis is asso ciating

k

a general k dimensional vector space V with the vector space F . This is done

by taking a basis fv ;::: ;v g for V , and asso ciating each vector v in V with

1 k

the vector of co ordinates in the given basis,

2 3

1

6 7

.

k

.

2 F :

4 5

.

k

Equivalently,eachvector v in the basis is asso ciated with the vector

i

3 2

0

7 6

.

.

7 6

.

7 6

7 6

0

7 6

k

7 6

e = 1 2 F :

i

7 6

7 6

0

7 6

7 6

.

.

5 4

.

0

That is, e is the vector with zeros everywhere excepts its ith entry, which is

i

equal to one. Thus we are identifying the basis fv ;::: ;v g in V with the set

1 k

k

fe ;::: ;e g which is in fact a basis of F , called the canonical basis.

1 k

To see howthistyp e of identi cation is made, supp ose we are dealing with

nm

R ,which has dimension k = nm. Then a basis for this vector space is

3 2

0  0

7 6

. .

.

. . .

E = ;

5 4

ir

.

1 . .

0  0

which are the matrices that are zero everywhere but their i; r th-entry,whichis

k

one. Then weidentify each of these with the vector e 2 R . Thus addi-

nr 1+i

nm

tion or scalar multiplication on R can b e translated to equivalent op erations

k

on R . 

1.4 Mappings and matrix representations

Wenowintro duce the concept of a linear mapping b etween vector spaces. The

mapping A : V!W is linear if

A v + v = Av + Av

1 2 1 2 6

for all v ;v in V , and all scalars and . Here V and W are vector spaces

1 2 1 2

with the same asso ciated eld F . The space V is called the domain of the

mapping, and W its co domain.

Given bases fv ;::: ;v g and fw ;::: ;w g for V and W , resp ectively,we

1 n 1 m

asso ciate scalars a with the mapping A, de ning them such that they satisfy

ik

Av = a w + a w +  + a w ;

k 1k 1 2k 2 mk m

for each 1  k  n. Namely, given any basis vector v , the co ecients a

k ik

are the co ordinates of Av in the chosen basis for W . It turns out that these

k

mn numb ers a completely sp ecify the linear mapping A. Toseethisis true

ik

consider any vector v 2 V , and let w = Av . We can express b oth vectors in

their resp ective bases as v = v +  + v and w = w +  + w .

1 1 n n 1 1 m m

Nowwehave

w = Av = A v +  + v 

1 1 n n

= Av +  + Av

1 1 n n

 !

n m m n

X X X X

= a w = a w ;

k ik i k ik i

i=1 i=1

k =1 k =1

and therefore by uniqueness of the co ordinates wemust have

n

X

a ; i =1;::: ;m: =

k ik i

k =1

To express this relationship in a more convenient form, we can write the set of

numb ers a as the m  n matrix

ik

2 3

a  a

11 1n

6 7

. .

.

. . .

: [A]=

4 5

.

. .

a  a

m1 mn

Then via the standard matrix pro duct wehave

2 2 2 3 3 3

a  a

1 11 1n 1

6 6 6 7 7 7

. . . .

.

. . . . .

= :

4 4 4 5 5 5

.

. . . .

a  a

m m1 mn n

In summary any linear mapping A between vector spaces can b e regarded as a

n m

matrix [A] mapping F to F via .

Notice that the numbers a dep end intimately on the bases fv ;::: ;v g

ik 1 n

and fw ;::: ;w g. Frequently we use only one basis for V and one for W and

1 m

thus there is no need to distinguish b etween the map A and the basis dep endent

matrix [A]. Therefore after this section we will simply write A to denote either

the map or the matrix, making which is meantcontext dep endent.

Wenowgivetwo examples to illustrate the ab ove discussion more clearly. 7

Examples:

k k ll k l k l

Given matrices B 2 C and D 2 C we de ne the map  : C ! C by

X =BX XD;

where the right-hand side is in terms of matrix addition and multiplication.

Clearly  is a linear mapping since

 X + X =B  X + X   X + X D

1 2 1 2 1 2

= BX X D + BX X D 

1 1 2 2

= X + X :

1 2

k l

If we now consider the identi cation between the matrix space C and the

kl kl kl

pro duct space C , then  can b e thoughtofasamapfromC to C , and can

accordingly b e represented by a complex matrix, whichiskl  kl. Wenowdo

an explicit 2  2 example for illustration. Supp ose k = l =2 andthat

   

1 2 5 0

B = and D = :

3 4 0 0

We would like to nd a matrix representation for . Since the domain and

22

co domain of  are equal, we will use the standard basis for C for each. This

basis is given by the matrices E de ned earlier. Wehave

ir

 

4 0

E = = 4E +3E ;

11 11 21

3 0

 

0 1

E = = E +3E ;

12 12 22

0 3

 

2 0

E = =2E E ;

21 11 21

1 0

 

0 2

E = =2E +4E :

22 12 22

0 4

4

Nowweidentify the basis fE ;E ;E ;E g with the standard basis for C

11 12 21 22

given by fe ;e ;e ;e g. Therefore we get that

1 2 3 4

2 3

4 0 2 0

6 7

0 1 0 2

6 7

[] =

4 5

3 0 1 0

0 3 0 4

in this basis.

[n]

de ned earlier Another linear op erator involves the multinomial P

m

[k ] [n]

in this section. Given an element a 2 P we can de ne the mapping : P !

m m

[n+k ]

P by function multiplication

m

px ;x ;:::;x :=ax ;x ;:::;x px ;x ; ::: ; x :

1 2 m 1 2 m 1 2 m 8

d d

2 1

, where d and d ! R Again can b e regarded as a matrix, whichmapsR

1 2

[n] [n+k ]

are the dimensions of P and P , resp ectively. 

m m

Asso ciated with any linear map A : V ! W is its image space, which is

de ned by

Im A = fw 2W : there exists v 2V satisfying Av = w g:

This set contains all the elements of W which are the image of some p ointinV .

Clearly if fv ;::: ;v g is a basis for V ,then

1 n

Im A =spanfAv ;::: ;Av g

1 n

and is thus a subspace. The map A is called surjective when Im A = W .

The dimension of the image space is called the rank of the linear mapping

A, and the concept is applied as well to the asso ciated matrix [A]. Namely,

rank[A]=dimImA:

If S is a subspace of V , then the image of S under the mapping A is denoted

AS . That is,

AS = fw 2W : there exists s 2S satisfying As = w g:

In particular, this means that AV =ImA.

Another imp ortant set related to A is its kernel,ornull space, de ned by

Ker A = fv 2V : Av =0g:

In words, Ker A is the set of vectors in V which get mapp ed by A to the zero

elementin W , and is easily veri ed to b e a subspace of V .

If we consider the equation Av = w , supp ose v and v are b oth solutions;

a b

then

Av v =0:

a b

Plainly, the di erence between any two solutions is in the kernel of A. Thus

given any solution v to the equation, all solutions are parametrized by

a

v + v ;

a 0

where v is any elementinKerA.

0

In particular, when Ker A is the zero subspace, there is at most a unique

solution to the equation Av = w . This means Av = Av only when v = v ;a

a b a b

mapping with this prop erty is called injective.

In summary, a solution to the equation Av = w will exist if and only if

w 2 Im A; it will b e unique only when Ker A is the zero subspace.

The dimensions of the image and kernel of A are linked by the relationship

dimV =dimImA + dimKer A; 9

proved in the exercises at the end of the chapter.

A mapping is called bijective when it is b oth injective and surjective; that

is, for every w 2W there exists a unique v satisfying Av = w . In this case there

1

isawell-de ned inverse mapping A : W!V,such that

1 1

A A = I ; AA = I :

V W

In the ab ove, I denotes the identity mapping in each space, that is the map

that leaves elements unchanged. For instance, I : v 7! v for every v 2V.

V

From the ab ove prop erty on dimensions we see that if there exists a bijective

linear mapping between two spaces V and W , then the spaces must havethe

same dimension. Also, if a mapping A is from V back to itself, namely, A :

V ! V , then one of the two prop erties injectivity or surjectivity suces to

guarantee the other.

We will also use the terms nonsingular or invertible to describ e bijective

mappings, and apply these terms as well to their asso ciated matrices. Notice

that invertibility of the mapping A is equivalenttoinvertibilityof[A]interms

of the standard matrix pro duct; this holds true regardless of the chosen bases.

Examples:

To illustrate these notions let us return to the mappings  and de ned ab ove.

22

For the 2  2numerical example given,  maps C back to itself. It is easily

checked that it is invertible byshowing either

22

Im  = C ; or equivalently Ker  = 0:

[n]

In contrast is not a map on the same space, instead taking P to the

m

[n+k ]

larger space P . And we see that the dimension of the image of is at most

m

n, and the dimension of its kernel at least k . Thus assuming k>0, there are

[n+k ]

for which at least some elements w 2 P

m

v = w

cannot b e solved. These are exactly the values of w that are not in Im .



1.5 and invariance

We have already discussed the idea of cho osing a basis fv ;::: ;v g for the

1 n

vector space V , and then asso ciating every vector x in V with its co ordinates

3 2

1

7 6

.

n

.

2 F ; x =

5 4

v

.

n 10

which are the unique scalars satisfying x = v +  + v . This raises

1 1 n n

the question, supp ose we cho ose another basis u ;::: ;u for V , how can we

1 n

e ectively movebetween these basis representations? That is, given x 2V,how

n

are the co ordinate vectors x ;x 2 F related?

v u

The answer is as follows. Supp ose that each basis vector u is expressed by

k

u = t v +  + t v ;

k 1k 1 nk n

in the basis fv ;::: ; v g. Then the co ecients t de ne the matrix

1 n ik

3 2

t  t

11 1n

7 6

. .

.

. . .

: T =

5 4

.

. .

t  t

n1 nn

Notice that such a matrix is nonsingular, since it represents the identity mapping

I in the bases fv ;::: ;v g and fu ;::: ;u g. Then the relationship b etween

V 1 n 1 n

the two co ordinate vectors is

Tx = x :

u v

n n

Now supp ose A : V ! V and that A : F ! F is the representation of

v

A on the basis v ;::: ;v , and A is the representation of A using the basis

1 n u

u ;::: ;u . HowisA related to A ?

1 n u v

To study this, take any x 2 V and let x , x be its co ordinates in the

v u

resp ective bases, and z , z b e the co ordinates of Ax. Then wehave

v u

1 1 1

z = T z = T A x = T A Tx :

u v v v v u

Since the ab ove identityand

z = A x

u u u

b oth hold for every x ,we conclude that

u

1

A = T A T:

u v

The ab ove relationship is called a similarity transformation. This discussion can

n

b e summarized in the following commutative diagram. Let E : V!F be the

n

map that takes elements of V to their representation in F with resp ect to the

basis fv ;::: ;v g. Then

1 n

Next we examine mappings when viewed with resp ect to a subspace. Supp ose

that SV is a k -dimensional subspace of V , and that v ;::: ;v is a basis for

1 n

V with

spanfv ;::: ;v g = S :

1 k

n

That is the rst k vectors of this basis forms a basis for S . If E : V!F is the

n

asso ciated map which maps the basis vectors in V to the standard basis on F ,

then

k n

E S = F f0gF : 11

1

E

T

n n

V F F

1

A = T A T: A

A

u v v

1

T

E

n n

V F F

n

Thus in F we can view S as the elements of the form

 

x

k

where x 2 F :

0

n

From the p oint of view of a linear mapping A : V !V this partitioning of F

gives a useful decomp osition of the corresp onding matrix [A]. Namely,we can

regard [A]as

 

A A

1 2

[A]= ;

A A

3 4

k k nk k k nk nk nk

where A : F ! F ;A : F ! F ;A : F ! F ,andA : F ! F .

1 2 3 4

Wehavethat

 

A

1

EAS =Im :

A

3

Finally to end this section wehave the notion of invariance of a subspace to

a mapping. Wesay that a subspace SV is A-invariantif A : V!V and

ASS:

Clearly every map has at least twoinvariant subspaces, the zero subspace and

entire domain V . For subspaces S of intermediate dimension, the invariance

prop erty is expressed most clearly bysaying the asso ciated matrix has the form

 

A A

1 2

[A]= :

0 A

4

Here we are assuming, as ab ove, that our basis for V is obtained by extending

k

a basis for S . Similarly if a matrix has this form the subspace F f0g is

[A]-invariant.

We will revisit the question of nding non-trivial invariant subspaces later

in the chapter, when studying eigenvectors and the Jordan decomp osition. 12

2 Matrix theory

The material of this section is aimed directly at b oth analysis and computation.

Our goals will b e to review some basic facts ab out matrices, and present some

additional results for later reference, including two matrix decomp ositions which

have tremendous application, the Jordan form and singular value decomp osi-

tion. Both are extremely useful for analytical purp oses, and the singular value

decomp osition is also very imp ortant in computations. We will also present

some results ab out self-adjoint and p ositive de nite matrices.

2.1 Eigenvalues and Jordan form

In this section we are concerned exclusively with complex square matrices. We

nn

b egin with a de nition: if A 2 C ,wesay that  2 C is an eigenvalue of A if

Ax = x 1

n

can b e satis ed for some nonzero vector x in C . Suchavector x is called an

eigenvector. Equivalently this means that Ker I A 6=0orI A is singular.

A matrix is singular exactly when its is zero, and therefore wehave

that  is an eigenvalue if and only if

detI A=0;

where det  denotes determinant. Regarding  as a we call the p oly-

nomial

n n1

detI A= + a  +  + a

n1 0

the characteristic p olynomial of A. If A is a real matrix then the co ecients a

k

will b e real as well. The characteristic p olynomial can b e factored as

detI A=     :

1 n

The n complex ro ots  , which need not b e distinct, are the eigenvalues of A,

k

and are collectively denoted byeigA. Furthermore if A is a real matrix, then

any nonreal eigenvalues must app ear in conjugate pairs. Also, a matrix has the

eigenvalue zero if and only if it is singular.

Asso ciated with every eigenvalue  is the subspace

k

E =Ker I A;

k k

every nonzero element in E is an eigenvector corresp onding to the eigenvalue

k

 . Now supp ose that a set of eigenvectors satis es

k

n

spanfx ;::: ; x g = C :

1 n

 

x  x

, and from the Then we can de ne the X =

1 n

matrix pro duct we nd

   

 x   x Ax  Ax

= X  = AX =

1 1 n n 1 n 13

where  is the diagonal matrix

2 3

 0

1

6 7

.

.

= :

4 5

.

0 

n

1

Thus in this case wehave a similarity transformation X suchthat X AX =

is diagonal, and wesay that the matrix A is diagonalizable.

Summarizing wehave the following result.

Prop osition 2. A matrix A is diagonalizable if and only if

n

E + E +  + E = C holds.

1 2 n

The following example shows that not all matrices can b e diagonalized. Con-

sider the 2  2matrix

 

0 1

:

0 0

It has a rep eated eigenvalue at zero, but only one linearly indep endent eigen-

vector. Thus it cannot be diagonalized. Matrices of this form have a sp ecial

role in the decomp osition we are ab out to intro duce: de ne the n  n matrix N

by

3 2

0 1 0

7 6

.

.

7 6

.

; N =

7 6

5 4

1

0 0

where N = 0 if the dimension n =1. Such matrices are called nilp otent b ecause

n

N =0. Using these we de ne a matrix to b e a Jordan blo ck if it is of the form

3 2

 1 0

7 6

.

.

7 6

.

: J = I + N =

7 6

5 4

1

0 

Notice all scalars are 1  1 Jordan blo cks. A Jordan blo ck has one eigenvalue

 of multiplicity n. However, it has only one linearly indep endenteigenvector.

Akey feature of a Jordan blo ck is that it has precisely n subspaces which are

J -invariant. They are given by

k

C f0g;

for 1  k  n. When k = 1 this corresp onds exactly to the subspace asso ciated

with its eigenvector. We can now state the Jordan decomp osition theorem. 14

nn

Theorem 3. Suppose A 2 C . Then there exists a nonsingular matrix T 2

nn

C , and an integer 1  p  n, such that

3 2

J 0

1

7 6

J

2

7 6

1

T AT = J = ;

7 6

.

.

5 4

.

0 J

p

where the matrices J areJordan blocks.

k

This theorem states that a matrix can be transformed to one that is blo ck-

diagonal, where each of the diagonal matrices is a Jordan blo ck. Clearly if a

matrix is diagonalizable each Jordan blo ck J will simply b e a scalar equal to

k

an eigenvalue of A. In general each blo ck J has a single eigenvalue of A in

k

all its diagonal entries; however, a given eigenvalue of A may o ccur in several

blo cks.

The relevance of the Jordan decomp osition is that it provides a canonical

form to characterize ; namely,two matrices are similar if and

only if they share the same Jordan form. Another related feature is that the

Jordan form exhibits the structure of invariant subspaces of a given matrix.

This is b est seen by writing the ab ove equation as

AT = TJ:

Now supp ose we denote by T the submatrix of T formed by its rst n columns,

1 1

where n is the dimension of the blo ck J . Then the rst n columns of the

1 1 1

preceding equation give

AT = T J ;

1 1 1

which implies that S =ImT is invariantunderA. Furthermore, we can use

1 1

this formula to study the linear mapping on S obtained by restriction of A. In

1

fact we nd that in the basis de ned by the columns of T , this linear mapping

1

has the asso ciated matrix J ; in particular, the only eigenvalue of A restricted

1

to S is  .

1 1

The preceding idea can b e extended by selecting T to contain the columns

1

corresp onding to more than one Jordan blo ck. The resulting invariant subspace

will b e such that the restriction of A to it has only the eigenvalues of the chosen

blo cks. Even more generally, we can pick any invariant subspace of J and

generate from it invariant subspaces of A. Indeed there are exactly n invariant

k

subspaces of A asso ciated with the n  n Jordan blo ck J , and all invariant

k k k

subspaces of A can b e constructed from this collection.

We will not explicitly require a constructive metho d for transforming a ma-

trix to Jordan form, and will use this result solely for analysis.

2.2 Self-adjoint, unitary, and p ositive de nite matrices



We have already intro duced the adjoint A of a complex matrix A; in this

section we study in more detail the structure given to the space of matrices by 15

this op eration. A rst observation, which will b e used extensively b elow, is that

  

AB  = B A

for matrices A and B of compatible dimensions; this follows directly by de ni-

tion.

Another basic concept closely related to the adjoint is the Euclidean length

n

of a vector x 2 C , de ned by

p



jxj = x x

This extends the usual de nition of magnitude of a complex number, so our

notation will not cause anyambiguity. In particular,

n

X

2 2 

jx j : jxj = x x =

i

i=1

Clearly jxj is never negative, and is zero only when the vector x =0. Later in

the course we will discuss generalizations of this concept in more general vector

spaces.

Wehave already encountered the notion of a Hermitian matrix, characterized

 n

by the self-adjoint prop erty Q = Q. Recall the notation H for the real vector

space of complex Hermitian matrices. We now collect some prop erties and

intro duce some new de nitions, for later use. Everything we will state will

n

apply as well to the set S of real, symmetric matrices.

Our rst result ab out self-adjoint matrices is that their eigenvalues are always

real. Supp ose Ax = x for nonzero x. Then wehave

    

x x = x Ax =Ax x =  x x:

 

Since x x>0we conclude that  =  .

n

Wesay that twovectors x,y 2 C are orthogonal if



y x =0:

n

Given a set of vectors fv ;::: ; v g in C wesaythevectors are orthonormal if

1 k



1; if i = r ;



v v =

r

i

0; if i 6= r .

The vectors are orthonormal if each has unit length and is orthogonal to all the

others. It is easy to show that orthonormal vectors are linearly indep endent, so

such a set can have at most n members. If k

nd a vector v such that fv ;::: ; v g is an orthonormal set. To see this,

k +1 1 k +1

form the k  n matrix

2 3



v

1

6 7

.



.

V = :

4 5

k

.



v

k 16



The kernel of V has the nonzero dimension n k , and therefore any element

k

of the kernel is orthogonal to the vectors fv ;::: ; v g. We conclude that any

1 k



element of unit length in Ker V is a suitable candidate for v . Applying this

k +1

k

pro cedure rep eatedly we can generate an orthonormal basis fv ;::: ; v g for

1 n

n

C .

nn

A square matrix U 2 C is called unitary if it satis es



U U = I:

From this de nition we see that the columns of any forms an

n  1

orthonormal basis for C . Further, since U is square it must b e that U = U

 

and therefore UU = I . So the columns of U also form an orthonormal basis.

n

Akey prop erty of unitary matrices is that if y = Ux, for some x 2 C , then the

length of y is equal to that of x:

p

p

p

   

y y = Ux Ux= x U Ux = jxj: jy j =

Unitary matrices are the only matrices that leave the length of every vector

unchanged. We are now ready to state the sp ectral theorem for Hermitian

matrices.

n

Theorem 4. Suppose H is a matrix in H . Then there exist a unitary matrix

U and a real diagonal matrix  such that



H = U U :

 1

Notice that since U = U for a unitary U , the ab ove expression is a similarity

transformation. Therefore the theorem says that a self-adjoint matrix can be

diagonalized by a unitary similarity transformation. Thus the columns of U

are all eigenvectors of H . Since the pro of of this result assembles a number of

concepts from this chapter weprovide it b elow.

Pro of . We will use an induction argument. Clearly the result is true if H is

simply a scalar, and it is therefore sucienttoshow that if the result holds for

n1 n

matrices in H then it holds for H 2 H . We pro ceed with the assumption

that the decomp osition result holds for n 1  n 1 Hermitian matrices.

The matrix H has at least one eigenvalue  , and  is real since H is

1 1

Hermitian. Let x b e an eigenvector asso ciated with this eigenvalue, and without

1

loss of generalitywe assume it to have length one. De ne X to b e anyunitary

matrix with x as its rst column, namely,

1

X =[x x ]:

1 n

 

Now consider the pro duct X HX. Its rst column is given by X Hx =

1



 X x =  e , where e is the rst element of the canonical basis. Its rst

1 1 1 1 1

    

row is describ ed by x HX,which is equal to  x X =  e ,sincex H =  x

1 1 1

1 1 1 1 1

because H is self-adjoint. Thus wehave

 

 0

1



X HX = ;

0 H

2 17

n1

where H a Hermitian matrix in H . By the inductivehyp othesis there exists

2

n1n1 

a unitary matrix X in C such that H = X  X ,where is b oth

2 2 2 2 2

2

diagonal and real. We conclude that

     

I 0  0 I 0

1



: H = X X



0 X 0  0 X

2 2

2

The right-hand side gives the desired decomp osition.



We remark, additionally, that the eigenvalues of H can b e arranged in de-

creasing order in the diagonal of . This follows directly from the ab oveinduc-

tion argument: just take  to b e the largest eigenvalue.

1

Wenow fo cus on the case where these eigenvalues have a de nite . Given

n

Q 2 H ,wesayitis p ositive de nite, denoted Q> 0, if



x Qx > 0;

n

for all nonzero x 2 C . Similarly Q is p ositive semide nite, denoted Q  0, if

the inequality is nonstrict; and negative de nite and negative semide nite are

similarly de ned. If a matrix is not p ositive or negative semide nite, then it is

inde nite.

The following prop erties of p ositive matrices follow directly from the de ni-

tion, and are left as exercises:

nm 

 If Q > 0 and A 2 C , then A QA  0. If Ker A = f0g, then



A QA > 0.

 If Q > 0, Q > 0, then  Q +  Q > 0whenever  > 0,   0. In

1 2 1 1 2 2 1 2

n

particular, the set of p ositive de nite matrices is a convex cone in H ,as

de ned in the previous section.

At this p ointwemaywell ask, how can wecheck whether a matrix is p ositive

de nite? The following answer is derived from Theorem 4:

n

If Q 2 H ; then Q> 0 if and only if the eigenvalues of Q are all p ositive.

Notice in particular that a p ositive de nite matrix is always invertible, and its

inverse is also p ositive de nite. Also a matrix is p ositive semide nite exactly

when none of its eigenvalues are negative; in that case the number of strictly

p ositive eigenvalues is equal to the rank of the matrix.

An additional useful prop erty for p ositive matrices is the existence of a



square ro ot. Let Q = U U  0, in other words the diagonal elements of  are

1

2

non-negative. Then we can de ne  to b e the matrix with diagonal elements

1

2

 , and

k

1 1



2 2

:= U  U : Q

1 1 1 1

2 2 2 2

 0 also Q > 0when Q>0 and it is easily veri ed that Q Q = Q. Then Q 18

Having de ned a notion of p ositivity, our next aim is to generalize the idea

of ordering to matrices: namely, what do es it mean for a matrix to be larger

than another matrix? We write

Q>S

n

for matrices Q, S 2 H to denote that Q S>0. We refer to such expressions

generally as matrix inequalities. Note that for matrices that it may be that

neither Q  S nor Q  S holds; that is, not all matrices are comparable.

We conclude our discussion by establishing a very useful result, known as

the Schur complement formula.

Theorem 5. Suppose that Q; M , and R are matrices and that M and Q are

self-adjoint. Then the fol lowing areequivalent:

a The matrix inequalities Q>0 and

1 

M RQ R > 0 both hold :

b The matrix inequality

 

M R

> 0 is satis ed :



R Q

Pro of . The two inequalities listed in a are equivalent to the single blo ck

inequality.

 

1 

M RQ R 0

> 0 :

0 Q

Now left- and right-multiply this inequalityby the nonsingular matrix

 

1

I RQ

0 I

and its adjoint, resp ectively,toget

      

1 1 

M R I RQ M RQ R 0 I 0

=  > 0:

 1 

R Q 0 I 0 Q Q R I

Therefore inequality b holds if and only if a holds. 

We remark that an identical result holds in the negative de nite case, re-

placing all \>"by\<".

Having assembled some facts ab out self-adjoint matrices, wemoveontoour

nal matrix theory topic. 19

2.3 Singular value decomp osition

Here we intro duce the singular value decomp osition of a rectangular matrix,

which will have many applications in our analysis, and is of very signi cant

computational value. The term singular value decomp osition, or SVD, refers to



the pro duct U V in the statement of the theorem b elow.

mn

Theorem 6. Suppose A 2 C and that p = min fm; ng. Then there exist

mm nn

unitary matrices U 2 C and V 2 C such that



A = U V ;

mn

where  2 R and its scalar entries satisfy

a the condition  =0, for i 6= r ;

ir

b the ordering         0.

11 22 pp



Pro of . Since the result holds for A if and only if it holds for A , we assume



without loss of generalitythat n  m. To start let r b e the rank of A A,which

is Hermitian and therefore byTheorem4wehave

2 3

 0

 

1

2



 0

6 7

 

.



.

> 0and V is unitary. V ; where = A A = V

4 5

.

0 0

0 

r

We also assume that the nonstrict ordering   holds. Now de ne

1 r

 



 0

J =

0 I

and wehave

 

I 0

r

1   1 1  1

J V A AV J =AV J  AV J = ;

0 0

where I denotes the r  r . From the right-hand side we see

r

1

that the rst r columns of AV J form an orthonormal set, and the remaining

columns mustbezero. Thus

 

1

U 0

AV J = ;

1

mr

where U 2 C . This leads to

1

   

 

   

 0  0

 

U 0 U U

A = V = V ;

1 1 2

0 I 0 0

mmr 

where the right-hand side is valid for any U 2 C . So cho ose U such

2 2

 

U U

that is unitary.

1 2

 20

When n = m the matrix  in the SVD is diagonal. When these dimensions are

not equal  has the form of either

3 2

3 2

 0

11

 0

11

7 6

.

.

7 7 6 6

. .

.

when nm,or

7 6

5 4

.

5 4



nn

0  0

mm

0 0

The rst p non-negative scalars  are called the singular values of the matrix

kk

A, and are denoted by the ordered set  ;::: , where  =  . As we already

1 p k kk

saw in the pro of, the decomp osition of the theorem immediately gives us that

     

A A = V  V and AA = U  U ;

   1

which are singular value decomp ositions of A A and AA . But since V = V

 1

and U = U it follows that these are also the diagonalizations of the matrices.

Thus

2 2 2

     0

1 2 p

 

are exactly the p largest eigenvalues of A A and AA ; the remaining eigenvalues

of either matrix are all necessarily equal to zero. This observation provides

a straightforward metho d to obtain the singular value decomp osition of any

 

matrix A,by diagonalizing the Hermitian matrices A A and AA .

The SVD of a matrix has many useful prop erties. We use  A to denote

the largest singular value  ,which from the SVD has the following prop erty.

1

n

 A = maxfjAv j : v 2 C and jv j =1g:

Namely,itgives the maximum magni cation of length a vector v can exp erience

when acted up on by A.

   

u  u v  v

Finally, partition U = and V = and supp ose

1 m 1 n

that A has r nonzero singular values. Then

   

u u v  v

Im A =Im and Ker A =Im :

1 r r +1 n

That is, the SVD provides an orthonormal basis for b oth the image and kernel

of A. Furthermore notice that the rank of A is equal to r , precisely the number

of nonzero singular values.

Notes and references

Given its ubiquitous presence in analytical sub jects, intro ductory linear algebra

is the sub ject of many excellent books; one choice is [5]. For an advanced

treatment from a geometric p ersp ective the reader is referred to [2].

Two excellent sources for matrix theory are [3] and the companion work [4].

For information and algorithms for computing with matrices see [1]. 21

References

[1] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins

University Press, 1996.

[2] W.H. Greub. Linear Algebra. Springer, 1981.

[3] R.A. Horn and C.R. Johnson. Matrix Analysis.Cambridge University Press,

1991.

[4] R.A. Horn and C.R. Johnson. Topics in Matrix Analysis. Cambridge Uni-

versity Press, 1995.

[5] G. Strang. Linear Algebra and its Applications. Academic Press, 1980. 22