<<

Down with !

Sheldon Axler

21 Decemb er 1994

1. Intro duction

Ask anyone why a square of complex numb ers has an eigenvalue, and you'll

probably get the wrong answer, which go es something like this: The characteristic

p olynomial of the matrix|which is de ned via determinants|has a ro ot (by the

fundamental theorem of ); this ro ot is an eigenvalue of the matrix.

What's wrong with that answer? It dep ends up on determinants, that's what.

Determinants are dicult, non-intuitive, and often de ned without motivation. As

we'll see, there is a b etter pro of|one that is simpler, clearer, provides more insight,

and avoids determinants.

This pap er will showhow can b e done b etter without determinants.

Without using determinants, we will de ne the multiplici ty of an eigenvalue and

prove that the numb er of eigenvalues, counting multiplicities, equals the

of the underlying space. Without determinants, we'll de ne the characteristic and

minimal p olynomials and then prove that they b ehave as exp ected. Next, we will

easily prove that every matrix is similar to a nice upp er-triangular one. Turning

to inner pro duct spaces, and still without mentioning determinants, we'll havea

simple pro of of the nite-dimensional Sp ectral Theorem.

Determinants are needed in one place in the undergraduate curricu-

lum: the change of variables formula for multi-variable integrals. Thus at the end of

this pap er we'll revive determinants, but not with any of the usual abstruse de ni-

tions. We'll de ne the of a matrix to b e the pro duct of its eigenvalues

(counting multiplici ties). This easy-to-rememb er de nition leads to the usual for-

mulas for computing determinants. We'll derive the change of variables formula for

multi-variable integrals in a fashion that makes the app earance of the determinant

there seem natural.

This work was partially supp orted by the National Science Foundation. Many p eople

made comments that help ed improve this pap er. I esp ecially thank Marilyn Brouwer,

William Brown, Jonathan Hall, Paul Halmos, Richard Hill, Ben Lotto, and Wade Ramey.

 

@

det

@

 

2

A few friends who use determinants in their researchhave expressed unease at the

title of this pap er. I know that determinants play an honorable role in some areas of

research, and I do not mean to b elittle their imp ortance when they are indisp ensable.

But most mathematicians and most students of mathematics will have a clearer

understanding of linear algebra if they use the determinant-free approach to the

basic structure theorems.

The theorems in this pap er are not new; they will already b e familiar to most

readers. Some of the pro ofs and de nitions are new, although many parts of this

approachhave b een around in bits and pieces, but without the attention they de-

served. For example, at a recentannual meeting of the AMS and MAA, I lo oked

through every linear algebra text on display. Out of over fty linear algebra texts

o ered for sale, only one obscure b o ok gave a determinant-free pro of that eigen-

values exist, and that b o ok did not manage to develop other key parts of linear

algebra without determinants. The anti-determinant philosophyadvo cated in this

pap er is an attempt to counter the undeserved dominance of determinant-dep endent

metho ds.

This pap er fo cuses on showing that determinants should b e banished from much

of the theoretical part of linear algebra. Determinants are also useless in the com-

putational part of linear algebra. For example, Cramer's rule for solving systems

of linear equations is already worthless for 10  10 systems, not to mention the

much larger systems often encountered in the real world. Many computer programs

eciently calculate eigenvalues numerically|none of them uses determinants. To

emphasize the p oint, let me quote a numerical analyst. Henry Thacher, in a review

(SIAM News , Septemb er 1988) of the Turb o Pascal Numerical Metho ds Toolbox ,

writes,

I nd it hard to conceive of a situation in whichthenumerical value of a

determinant is needed: Cramer's rule, b ecause of its ineciency,iscom-

pletely impractical, while the magnitude of the determinant is an indication

of neither the condition of the matrix nor the accuracy of the solution.

2. Eigenvalues and Eigenvectors

The basic ob jects of study in linear algebra can b e thought of as either linear

transformations or matrices. Because a -free approach seems more natural, this

pap er will mostly use the language of linear transformations; readers who prefer the

language of matrices should have no trouble making the appropriate translation.

The term linear op erator will mean a linear transformation from a to

itself; thus a linear op erator corresp onds to a (assuming some choice

of basis).

Notation used throughout the pap er: n denotes a p ositiveinteger, V denotes

an n-dimensional complex vector space, T denotes a linear op erator on V ,andI

denotes the identity op erator.

A  is called an eigenvalue of T if T I is not injective. Here is

the central result ab out eigenvalues, with a simple pro of that avoids determinants.

 

@

det

@

 

3

Theorem 2.1 Every linear op erator on a nite-dimensional complex vector space

has an eigenvalue.

Pro of. Toshow that T (our linear op erator on V ) has an eigenvalue, x any non-

2 n

zero vector v 2 V . The vectors v; T v; T v; : : : ; T v cannot b e linearly indep endent,

b ecause V has dimension n and wehave n +1 vectors. Thus there exist complex

numbers a ;:::;a , not all 0, such that

0 n

n

a v + a Tv + + a T v =0:

0 1 n

Make the a's the co ecients of a p olynomial, which can b e written in factored form

as

n

a + a z + + a z = c(z r ) :::(z r );

0 1 n 1 m

where c is a non-zero complex numb er, each r is complex, and the equation holds

j

for all complex z .We then have

n

0= (a I + a T + + a T )v

0 1 n

= c(T r I ) :::(T r I )v;

1 m

which means that T r I is not injective for at least one j .Inotherwords, T has

j

an eigenvalue.

Recall that a vector v 2 V is called an eigenvector of T if Tv = v for some

eigenvalue . The next prop osition|which has a simple, determinant-free pro of|

obviously implies that the numb er of distinct eigenvalues of T cannot exceed the

dimension of V .

Prop osition 2.2 Non-zero eigenvectors corresp onding to distinct eigenvalues of T

are linearly indep endent.

Pro of. Supp ose that v ;:::;v are non-zero eigenvectors of T corresp onding to

1 m

distinct eigenvalues  ;:::; .We need to prove that v ;:::;v are linearly inde-

1 m 1 m

p endent. To do this, supp ose a ;:::;a are complex numb ers such that

1 m

a v + + a v =0:

1 1 m m

Apply the linear op erator (T  I )(T  I ) :::(T  I ) to b oth sides of the

2 3 m

equation ab ove, getting

a (  )(  ) :::(  )v =0:

1 1 2 1 3 1 m 1

Thus a = 0. In a similar fashion, a = 0 for each j , as desired.

1 j

 

@

det

@

 

4

3. Generalized eigenvectors

Unfortunately, the eigenvectors of T need not span V .For example, the linear

2

op erator on C whose matrix is

 

0 1

0 0

has only one eigenvalue, namely 0, and its eigenvectors form a one-dimensional

2

subspace of C .We will see, however, that the generalized eigenvectors (de ned

b elow) of T always span V .

Avector v 2 V is called a generalized eigenvector of T if

k

(T I ) v =0

for some eigenvalue  of T and some p ositiveinteger k . Obviously, the set of

generalized eigenvectors of T corresp onding to an eigenvalue  is a subspace of V .

The following lemma shows that in the de nition of generalized eigenvector, instead

of allowing an arbitrary p ower of T I to annihilate v ,we could have restricted

th

attention to the n power, where n equals the dimension of V . As usual, ker is an

abbreviation for (the set of vectors that get mapp ed 0).

Lemma 3.1 The set of generalized eigenvectors of T corresp onding to an eigen-

n

value  equals ker(T I ) .

n

Pro of. Obviously,every elementofker(T I ) is a generalized eigenvector of

T corresp onding to . To prove the inclusion in the other direction, let v be a

n

generalized eigenvector of T corresp onding to .Weneedtoprove that (T I ) v =

0. Clearly,we can assume that v 6= 0, so there is a smallest non-negativeinteger

k

k such that (T I ) v =0.Wewillbedoneifwe show that k  n. This will b e

proved byshowing that

2 k 1

v; (T I )v; (T I ) v; : : : ; (T I ) v (3.2)

are linearly indep endentvectors; we will then have k linearly indep endent elements

in an n-dimensional space, which implies that k  n.

Toprove the vectors in (3.2) are linearly indep endent, supp ose a ;:::;a are

0 k 1

complex numbers such that

k 1

a v + a (T I )v + + a (T I ) v =0: (3.3)

0 1 k 1

k 1 k 1

Apply (T I ) to b oth sides of the equation ab ove, getting a (T I ) v =0,

0

k 2

which implies that a =0.Now apply (T I ) to b oth sides of (3.3), getting

0

k 1

a (T I ) v = 0, which implies that a =0: Continuing in this fashion, wesee

1 1

that a = 0 for each j , as desired.

j

The next result is the key to ol we'll use to give a description of the structure of a linear op erator.

 

@

det

@

 

5

Prop osition 3.4 The generalized eigenvectors of T span V .

Pro of. The pro of will b e by induction on n, the dimension of V .Obviously, the

result holds when n =1.

Supp ose that n>1 and that the result holds for all vector spaces of dimension

less than n. Let  be anyeigenvalue of T (one exists by Theorem 2.1). We rst

show that

n n

 ran(T I ) ; (3.5) V =ker(T I )

{z } {z } | |

V V

1 2

here, as usual, ran is an abbreviation for range. To prove(3.5), supp ose v 2 V \ V .

1 2

n n

Then (T I ) v = 0 and there exists u 2 V suchthat(T I ) u = v . Applying

n 2n

(T I ) to b oth sides of the last equation, wehave(T I ) u = 0. This implies

n

that (T I ) u =0 (by Lemma 3.1), which implies that v =0.Thus

V \ V = f0g: (3.6)

1 2

Because V and V are the kernel and range of a linear op erator on V ,wehave

1 2

dim V =dimV +dimV : (3.7)

1 2

Equations (3.6) and (3.7) imply (3.5).

Note that V 6= f0g (b ecause  is an eigenvalue of T ), and thus dim V

1 2

n

Furthermore, b ecause T commutes with (T I ) ,we easily see that T maps V

2

into V . By our induction hyp othesis, V is spanned by the generalized eigenvectors

2 2

of T j ,each of whichisobviously also a generalized eigenvector of T .Everything

V

2

in V is a generalized eigenvector of T , and hence (3.5) gives the desired result.

1

A nice corollary of the last prop osition is that if 0 is the only eigenvalue of T , then

T is nilp otent (recall that an op erator is called nilp otent if some p ower of it equals

0). Pro of: If 0 is the only eigenvalue of T , then every vector in V is a generalized

eigenvector of T corresp onding to the eigenvalue 0 (by Prop osition 3.4); Lemma 3.1

n

then implies that T =0.

Non-zero eigenvectors corresp onding to distinct eigenvalues are linearly indep en-

dent (Prop osition 2.2). We need an analogous result with generalized eigenvectors

replacing eigenvectors. This can b e proved by following the basic pattern of the

pro of of Prop osition 2.2, as wenowdo.

Prop osition 3.8 Non-zero generalized eigenvectors corresp onding to distinct eigen-

values of T are linearly indep endent.

Pro of. Supp ose that v ;:::;v are non-zero generalized eigenvectors of T corre-

1 m

sp onding to distinct eigenvalues  ;:::; .We need to prove that v ;:::;v are

1 m 1 m

linearly indep endent. To do this, supp ose a ;:::;a are complex numb ers such that

1 m

a v + + a v =0: (3.9)

1 1 m m

 

@

det

@

 

6

k

Let k b e the smallest p ositiveinteger suchthat(T  I ) v = 0. Apply the linear

1 1

op erator

k 1 n n

(T  I ) (T  I ) :::(T  I )

1 2 m

to b oth sides of (3.9), getting

k 1 n n

a (T  I ) (T  I ) :::(T  I ) v =0; (3.10)

1 1 2 m 1

n n

where wehave used Lemma 3.1. If we rewrite (T  I ) :::(T  I ) in (3.10) as

2 m

 

n n

; ::: (T  I )+ (  )I (T  I )+ (  )I

1 1 m 1 1 2



n

using the binomial theorem and and then expand each (T  I )+ (  )I

1 1 j

multiply everything together, we get a sum of terms. Except for the term

n n

(  ) :::(  ) I;

1 2 1 m

each term in this sum includes a p ower of (T  I ), which when combined with

1

k 1

the (T  I ) on the left and the v on the right in (3.10) gives 0. Hence (3.10)

1 1

b ecomes the equation

n n k 1

a (  ) :::(  ) (T  I ) v =0:

1 1 2 1 m 1 1

Thus a = 0. In a similar fashion, a = 0 for each j , as desired.

1 j

Nowwe can pull everything together into the following structure theorem. Part

(b) allows us to interpret each linear transformation app earing in parts (c) and (d)

as a linear op erator from U to itself.

j

Theorem 3.11 Let  ;:::; b e the distinct eigenvalues of T , with U ;:::;U

1 m 1 m

denoting the corresp onding sets of generalized eigenvectors. Then

(a) V = U  U ;

1 m

(b) T maps each U into itself;

j

(c) each (T  I )j is nilp otent;

j U

j

(d) each T j has only one eigenvalue, namely  .

U j

j

Pro of. The pro of of (a) follows immediately from Prop ositions 3.8 and 3.4.

k

To prove (b), supp ose v 2 U . Then (T  I ) v = 0 for some p ositiveinteger k .

j j

Wehave

k k

(T  I ) Tv = T (T  I ) v = T (0) = 0:

j j

Thus Tv 2 U , as desired.

j

The pro of of (c) follows immediately from the de nition of a generalized eigen-

vector and Lemma 3.1.

0

To prove(d),let  b e an eigenvalue of T j with corresp onding non-zero eigen-

U

j

0

vector v 2 U .Then(T  I )v =(  )v , and hence

j j j

k 0 k

(T  I ) v =(  ) v

j j

for each p ositiveinteger k . Because v is a generalized eigenvector of T corresp onding

0

to  , the left-hand side of this equation is 0 for some k .Thus  =  , as desired.

j j

 

@

det

@

 

7

4. The Minimal

Because the space of linear op erators on V is nite dimensional, there is a smallest

positiveinteger k such that

2 k

I; T; T ;:::;T

are not linearly indep endent. Thus there exist unique complex numb ers a ;:::;a

0 k 1

such that

2 k 1 k

a I + a T + a T + :::a T + T =0:

0 1 2 k 1

The p olynomial

2 k 1 k

a + a z + a z + :::a z + z

0 1 2 k 1

is called the minimal p olynomial of T . It is the monic p olynomial p of smallest

degree such that p(T )= 0 (a monic p olynomial is one whose term of highest degree

has co ecient1).

The next theorem connects the minimal p olynomial to the decomp osition of V

as a direct sum of generalized eigenvectors.

Theorem 4.1 Let  ;:::; b e the distinct eigenvalues of T ,let U denote the set

1 m j

of generalized eigenvectors corresp onding to  , and let b e the smallest p ositive

j j

j

integer such that (T  I ) v =0 for every v 2 U .Let

j j

1 m

p(z )= (z  ) :::(z  ) : (4.2)

1 m

Then

(a) p is the minimal p olynomial of T ;

(b) p has degree at most dim V ;

(c) if q is a p olynomial such that q (T )=0, then q is a p olynomial multiple of p.

Pro of. We will prove rst (b), then (c), then (a).

Toprove (b), note that each is at most the dimension of U (by Lemma 3.1

j j

applied to T j ). Because V = U U (by Theorem 3.11(a)), the 's can

U 1 m j

j

add up to at most the dimension of V .Thus (b) holds.

To prove (c), supp ose q is a p olynomial such that q (T )= 0.Ifweshow that q is

j

a p olynomial multiple of each(z  ) , then (c) will hold. To do this, x j .The

j

p olynomial q has the form

  

1 M

q (z )= c(z r ) :::(z r ) (z  ) ;

1 M j

where c 2 C,ther 's are complex numb ers all di erent from  , the  's are p ositive

k j k

integers, and  is a non-negativeinteger. If c =0,we are done, so assume that c 6=0.



Supp ose v 2 U . Then (T  I ) v is also in U (by Theorem 3.11(b)). Now

j j j

  

M 1

(T  I ) v = q (T )v =0; :::(T r I ) c(T r I )

j M 1

 

1 M

and (T r I ) :::(T r I ) is injectiveonU (by Theorem 3.11(d)). Thus

1 M j



(T  I ) v =0. Because v was an arbitrary elementofU , this implies that

j j

j

  .Thus q is a p olynomial multiple of (z  ) , and (c) holds.

j j

Toprove (a), supp ose v is a vector in some U . If wecommute the terms of

j

1 m j

(T  I ) :::(T  I ) (which equals p(T )) so that (T  I ) is on the right,

1 m j

 

@

det

@

 

8

we see that p(T )v = 0. Because U ;:::;U span V (Theorem 3.11(a)), we conclude

1 m

that p(T )=0.Inotherwords, p is a monic p olynomial that annihilates T .We

know from (c) that no monic p olynomial of lower degree has this prop erty.Thus p

must b e the minimal p olynomial of T , completing the pro of.

Note that byavoiding determinants wehave b een naturally led to the description

of the minimal p olynomial in terms of generalized eigenvectors.

5. Multiplicity and the Characteristic Polynomial

The multiplicity of an eigenvalue  of T is de ned to b e the dimension of the set

of generalized eigenvectors of T corresp onding to .We see immediately that the

sum of the multiplicities of all eigenvalues of T equals n, the dimension of V (from

Theorem 3.11(a)). Note that the de nition of multiplicitygiven here has a clear

connection with the geometric b ehavior of T , whereas the usual de nition (as the

multiplicity of a ro ot of the p olynomial det(zI T )) describ es an ob ject without

obvious meaning.

Let  ;:::; denote the distinct eigenvalues of T , with corresp onding multi-

1 m

plicities ;:::; . The p olynomial

1 m

1 m

(z  ) :::(z  ) (5.1)

1 m

is called the characteristic p olynomial of T . Clearly, it is a p olynomial of degree n.

Of course the usual de nition of the characteristic p olynomial involves a determi-

nant; the characteristic p olynomial is then used to prove the existence of eigenval-

ues. Without mentioning determinants, wehavereversed that pro cedure. We rst

showed that T has n eigenvalues, counting multiplicities, and then used that to give

a more natural de nition of the characteristic p olynomial (\counting multiplicities"

means that each eigenvalue is rep eated as many times as its multiplicity).

The next result is called the Cayley-Hamilton Theorem. With the approachtaken

here, its pro of is easy.

Theorem 5.2 Let q denote the characteristic p olynomial of T .Thenq (T )= 0.

Pro of. Let U and b e as in Theorem 4.1, and let equal the dimension of U .As

j j j j

we noted earlier,  (by Lemma 3.1 applied to T j ). Hence the characteristic

j j U

j

p olynomial (5.1) is a p olynomial multiple of the minimal p olynomial (4.2).Thus

the characteristic p olynomial must annihilate T .

6. Upp er-TriangularForm

A square matrix is called upp er-triangular if all the entries b elow the main diago-

nal are 0. Our next goal is to showthateach linear op erator has an upp er-triangular

matrix for some choice of basis. We'll b egin with nilp otent op erators; our main

structure theorem will then easily give the result for arbitrary linear op erators.

Lemma 6.1 Supp ose T is nilp otent. Then there is a basis of V with resp ect to

which the matrix of T contains only 0's on and b elow the main .

 

@

det

@

 

9

2

Pro of. First cho ose a basis of ker T . Then extend this to a basis of ker T . Then

3

extend to a basis of ker T .Continue in this fashion, eventually getting a basis of V .

The matrix of T with resp ect to this basis clearly has the desired form.

By avoiding determinants and fo cusing on generalized eigenvectors, we can now

give a simple pro of that every linear op erator can b e put in upp er-triangular form.

We actually get a b etter result, b ecause the matrix in the next theorem has many

more 0's than required for upp er-triangular form.

Theorem 6.2 Let  ;:::; b e the distinct eigenvalues of T . Then there is a

1 m

basis of V with resp ect to which the matrix of T has the form

3 2

 

1

7 6

7 6

.

.

0

7 6

.

7 6

7 6

0 

1

7 6

7 6

7 6

 

2

7 6

7 6

.

7 6

.

.

7 6

:

7 6

7 6

0 

2

7 6

7 6

.

7 6

.

.

7 6

7 6

7 6

 

m

7 6

7 6

.

7 . 6

0

.

5 4

0 

m

Pro of. This follows immediately from Theorem 3.11 and Lemma 6.1.

For many traditional uses of the Jordan form, the theorem ab ove can b e used

instead. If Jordan form really is needed, then many standard pro ofs show (without

determinants) that every nilp otent op erator can b e put in Jordan form. The result

for general linear op erators then follows from Theorem 3.11.

7. The Sp ectral Theorem

In this section we assume that h ; i is an inner pro duct on V . The nicest linear

op erators on V are those for which there is an orthonormal basis of V consisting

of eigenvectors. With resp ect to any such basis, the matrix of the linear op erator

is diagonal, meaning that it is 0 everywhere except along the main diagonal, which

must contain the eigenvalues. The Sp ectral Theorem, whichwe'll prove in this

section, describ es precisely those linear op erators for which there is an orthonormal

basis of V consisting of eigenvectors.



Recall that the adjoint of T is the unique linear op erator T on V such that



hTu;vi = hu; T v i

for all u, v 2 V . The linear op erator T is called normal if T commutes with its

 

adjoint; in other words, T is normal if TT = T T . The linear op erator T is called

 

@

det

@

 

10



self-adjoint if T = T .Obviously,every self-adjoint op erator is normal. We'll see

that the normal op erators are precisely the ones that can b e diagonalized byan

orthonormal basis. Before proving that, we need a few preliminary results. Note

that the next lemma is trivial if T is self-adjoint.



Lemma 7.1 If T is normal, then ker T =ker T .

Pro of. If T is normal and v 2 V , then

   

hTv;Tvi = hT Tv;vi = hTT v; vi = hT v; T v i:



Thus Tv = 0 if and only if T v =0.

The next prop osition, combined with our result that the generalized eigenvectors

of a linear op erator span the domain (Prop osition 3.4), shows that the eigenvectors

of a normal op erator span the domain.

Prop osition 7.2 Every generalized eigenvector of a normal op erator is an eigen-

vector of the op erator.

Pro of. Supp ose T is normal. Wewillprove that

k

ker T =ker T (7.3)

for every p ositiveinteger k . This will complete the pro of of the prop osition, b ecause

we can replace T in (7.3) by T I for arbitrary  2 C.

Weprove (7.3) by induction on k . Clearly, the result holds for k = 1. Supp ose

k +1

now that k is a p ositiveinteger such that (7.3) holds. Let v 2 ker T . Then

k k +1 k  k

T (T v )= T v =0. In other words, T v 2 ker T , and so T (T v )=0(by

Lemma 7.1). Thus

 k k 1 k k

0=hT (T v );T v i = hT v; T v i:

k

Hence v 2 ker T , which implies that v 2 ker T (by our induction hyp othesis). Thus

k +1

ker T =ker T , completing the induction.

The last prop osition, together with Prop osition 3.4, implies that a normal op er-

ator can b e diagonalized by some basis. The next prop osition will b e used to show

that this can b e done by an orthonormal basis.

Prop osition 7.4 Eigenvectors of a normal op erator corresp onding to distinct eigen-

values are orthogonal.

Pro of. Supp ose T is normal and ,  are distinct eigenvalues of T , with corresp ond-





ing eigenvectors u, v .Thus (T I )v = 0, and so (T I )v =0 (by Lemma 7.1).





In other words, v is also an eigenvector of T , with eigenvalue .Now



( )hu; v i = h u; v i hu; v i



= hTu;vi hu; T v i

= hTu;vi hTu;vi

=0:

 

@

det

@

 

11

Thus hu; v i = 0, as desired.

Nowwe can put everything together, getting the nite-dimensional Sp ectral The-

orem for complex inner pro duct spaces.

Theorem 7.5 There is an orthonormal basis of V consisting of eigenvectors of T

if and only if T is normal.

Pro of. To prove the easy direction, rst supp ose that there is an orthonormal basis

of V consisting of eigenvectors of T . With resp ect to that basis, T has a diagonal



matrix. The matrix of T (with resp ect to the same basis) is obtained by taking the



conjugate transp ose of the matrix of T ; hence T also has a . Any



two diagonal matrices commute. Thus T commutes with T , which means that T

is normal.

Toprove the other direction, now supp ose that T is normal. For each eigen-

value of T ,cho ose an orthonormal basis of the asso ciated set of eigenvectors. The

union of these bases (one for each eigenvalue) is still an orthonormal set, b ecause

eigenvectors corresp onding to distinct eigenvalues are orthogonal (by Prop osition

7.4). The span of this union includes every eigenvector of T (by construction), and

hence every generalized eigenvector of T (by Prop osition 7.2). But the generalized

eigenvectors of T span V (by Prop osition 3.4), and so wehave an orthonormal basis

of V consisting of eigenvectors of T .

The prop osition b elow will b e needed in the next section, when weprove the

Sp ectral Theorem for real inner pro duct spaces.

Prop osition 7.6 Every eigenvalue of a self-adjoint op erator is real.

Pro of. Supp ose T is self-adjoint. Let  b e an eigenvalue of T , and let v b e a non-zero

vector in V such that Tv = v . Then

2 2



kv k = hv ; v i = hTv;vi = hv; T vi = hv; vi = kv k :



Thus  = , which means that  is real, as desired.

8. Getting Real

So far wehave b een dealing only with complex vector spaces. As we'll see, a

real vector space U can b e emb edded, in a natural way, in a complex vector space

called the complexi cation of U . Each linear op erator on U can b e extended to a

linear op erator on the complexi cation of U . Our results ab out linear op erators on

complex vector spaces can then b e translated to information ab out linear op erators

on real vector spaces. Let's see how this pro cess works.

Supp ose that U isarealvector space. As a set, the complexi cation of U , denoted

U , equals U  U .Formally,atypical elementofU is an ordered pair (u; v ), where

C C

u; v 2 U , but we will write this as u + iv ,forobvious reasons. We de ne addition

on U by

C

(u + iv )+(u + iv )= (u + u )+ i(v + v ):

1 1 2 2 1 2 1 2

 

@

det

@

 

12

The notation shows howwe should de ne multiplication by complex scalars on U :

C

(a + ib)(u + iv )= (au bv )+ i(av + bu)

for a; b 2 R and u; v 2 U . With these de nitions of addition and multiplica-

tion, U b ecomes a complex vector space. We can think of U as a subset of U by

C C

identifying u 2 U with u + i0. Clearly,any basis of the real vector space U is also

a basis of the complex vector space U . Hence the dimension of U as a real vector

C

space equals the dimension of U as a complex vector space.

C

For S a linear op erator on a real vector space U ,thecomplexi cation of S ,

denoted S , is the linear op erator on U de ned by

C C

S (u + iv )= Su + iS v

C

for u; v 2 U .Ifwecho ose a basis of U and also think of it as a basis of U ,then

C

clearly S and S have the same matrix with resp ect to this basis.

C

Note that any real eigenvalue of S is also an eigenvalue of S (b ecause if a 2 R

C

and S (u + iv )= a(u + iv ), then Su = au and Sv = av ). Non-real eigenvalues of

C

S come in pairs. More precisely,

C

j j



(S I ) (u + iv )= 0() (S I ) (u iv )= 0 (8.1)

C C

for j a p ositiveinteger,  2 C,andu; v 2 U , as is easily proved by induction on j .



In particular, if  2 C is an eigenvalue of S ,thensois ,andthemultiplicityof

C

 (recall that this is de ned as the dimension of the set of generalized eigenvectors



of S corresp onding to ) is the same as the multiplicityof. Because the sum

C

of the multiplici ties of all the eigenvalues of S equals the (complex) dimension of

C

U (by Theorem 3.11(a)), we see that if U has o dd (complex) dimension, then S

C C C

must have a real eigenvalue. Putting all this together, wehaveproved the following

theorem. Once again, a pro of without determinants o ers more insightinto why

the result holds than the standard pro of using determinants.

Theorem 8.2 Every linear op erator on an o dd-dimensional real vector space has

a real eigenvalue.

The minimal and characteristic p olynomials of a linear op erator S on a real vector

space are de ned to b e the corresp onding p olynomials of the complexi cation S .

C

Both these p olynomials have real co ecients|this follows from our de nitions of

minimal and characteristic p olynomials and (8.1). The reader should b e able to

derive the prop erties of these p olynomials easily from the corresp onding results on

complex vector spaces (Theorems 4.1 and 5.2).

Our pro cedure for transferring results from complex vector spaces to real vector

spaces can also b e used to prove the real Sp ectral Theorem. To see how that works,

supp ose now that U is a real inner pro duct space with inner pro duct h ; i.We

make the complexi cation U into a complex inner pro duct space by de ning an

C

inner pro duct on U in the obvious way:

C

hu + iv ;u + iv i = hu ;u i + hv ;v i + ihv ;u iihu ;v i:

1 1 2 2 1 2 1 2 1 2 1 2

 

@

det

@

 

13

Note that any orthonormal basis of the real inner pro duct space U is also an or-

thonormal basis of the complex inner pro duct space U .

C

If S is a self-adjoint op erator on U , then obviously S is self-adjointonU .We

C C

can then apply the complex Sp ectral Theorem (Theorem 7.5) to S and transfer to

C

U , getting the real Sp ectral Theorem. The next theorem gives the formal statement

of the result and the details of the pro of.

Theorem 8.3 Supp ose U is a real inner pro duct space and S is a linear op erator

on U . Then there is an orthonormal basis of U consisting of eigenvectors of S if

and only if S is self-adjoint.

Pro of. To prove the easy direction, rst supp ose that there is an orthonormal basis

of U consisting of eigenvectors of S . With resp ect to that basis, S has a diagonal



matrix. Clearly, the matrix of S (with resp ect to the same basis) equals the matrix

of S .Thus S is self-adjoint.

Toprove the other direction, now supp ose that S is self-adjoint. As noted ab ove,

this implies that S is self-adjointonU .Thus there is a basis

C C

fu + iv ;:::;u + iv g (8.4)

1 1 n n

of U consisting of eigenvectors of S (by the complex Sp ectral Theorem, whichis

C C

Theorem 7.5); here each u and v is in U . Each eigenvalue of S is real (Prop osition

j j C

7.6), and thus each u and each v is an eigenvector of S . Clearly, fu ;v ;:::;u ;v g

j j 1 1 n n

spans U (b ecause (8.4) is a basis of U ). Conclusion: The eigenvectors of S span U .

C

For each eigenvalue of S ,cho ose an orthonormal basis of the asso ciated set of

eigenvectors in U . The union of these bases (one for each eigenvalue) is still or-

thonormal, b ecause eigenvectors corresp onding to distinct eigenvalues are orthogo-

nal (Prop osition 7.4). The span of this union includes every eigenvector of S (by

construction). Wehave just seen that the eigenvectors of S span U , and so wehave

an orthonormal basis of U consisting of eigenvectors of S , as desired.

9. Determinants

At this stage wehave proved most of the ma jor structure theorems of linear

algebra without even de ning determinants. In this section wewillgive a simple

de nition of determinants, whose main reasonable use in undergraduate mathemat-

ics is in the change of variables formula for multi-variable integrals.

The constanttermofthecharacteristic p olynomial of T is plus or minus the

pro duct of the eigenvalues of T , counting multiplicity (this is obvious from our

de nition of the characteristic p olynomial). Let's lo ok at some additional motivation

for studying the pro duct of the eigenvalues.

Supp ose wewanttoknowhowtomakea change of variables in a multi-variable

n

integral over some subset of R . After linearization, this reduces to the question

n

of how a linear op erator S on R changes volumes. Let's consider the sp ecial

n

case where S is self-adjoint. Then there is an orthonormal basis of R consisting of

eigenvectors of S (by the real Sp ectral Theorem, which is Theorem 8.3). A moment's

thought ab out the of an orthonormal basis of eigenvectors shows that if

 

@

det

@

 

14

n

E is a subset of R , then the volume (whatever that means) of S (E )must equal

the volume of E multiplied by the absolute value of the pro duct of the eigenvalues

of S ,counting multiplicity. We'll prove later that a similar result holds even for

non-self-adjoint op erators. Atany rate, we see that the pro duct of the eigenvalues

seems to b e an interesting ob ject. An arbitrary linear op erator on a real vector

space need not haveany eigenvalues, so we will return to our familiar setting of

a linear op erator T on a complex vector space V . After getting the basic results

on complex vector spaces, we'll deal with real vector spaces by using the notion of

complexi cation discussed earlier.

Nowwe are ready for the formal de nition. The determinant of T , denoted

det T , is de ned to b e the pro duct of the eigenvalues of T , counting multiplicity.

This de nition would not b e p ossible with the traditional approach to eigenvalues,

b ecause that metho d uses determinants to prove that eigenvalues exist. With the

techniques used here, we already know(by Theorem 3.11(a)) that T has dim V

eigenvalues, counting multiplici ty.Thus our simple de nition makes sense.

In addition to simplicity, our de nition also makes transparent the following

result, which is not at all obvious from the standard de nition.

Theorem 9.1 An op erator is invertible if and only if its determinant is non-zero.

Pro of. Clearly, T is invertible if and only if 0 is not an eigenvalue of T , and this

happ ens if and only if det T 6=0.

With our de nition of determinant and characteristic p olynomial, we see immedi-

n

ately that the constant term of the characteristic p olynomial of T equals (1) det T ,

where n = dim V . The next result shows that even more is true|our de nitions

are consistent with the usual ones.

Prop osition 9.2 The characteristic p olynomial of T equals det(zI T ).

Pro of. Let  ;:::; denote the eigenvalues of T ,withmultiplicitie s ;:::; .

1 m 1 m

Thus for z 2 C, the eigenvalues of zI T are z  ;:::;z  , with multipliciti es

1 m

;:::; . Hence the determinantofzI T is the pro duct

1 m

1 m

(z  ) :::(z  ) ;

1 m

which equals the characteristic p olynomial of T .

Note that determinant is a similarityinvariant. In other words, if S is an invertible

1

linear op erator on V , then T and ST S have the same determinant (b ecause they

have the same eigenvalues, counting multiplici ty).

We de ne the determinant of a square matrix of complex numb ers to b e the

determinant of the corresp onding linear op erator (with resp ect to some choice of

basis, which do esn't matter, b ecause two di erent bases giverisetotwo linear

op erators that are similar and hence have the same determinant). Fix a basis of

V , and for the rest of this section let's identify linear op erators on V with matrices

with resp ect to that basis. Howcanwe nd the determinantofT from its matrix,

without nding all the eigenvalues? Although getting the answer to that question

 

@

det

@

 

15

will b e hard, the metho d used b elow will showhow someone mighthave discovered

the formula for the determinant of a matrix. Even with the derivation that follows,

determinants are dicult, which is precisely why they should b e avoided.

We b egin our search for a formula for the determinantby considering matrices

of a sp ecial form. Let a ;:::;a 2 C. Consider a linear op erator T whose matrix is

1 n

3 2

0 a

n

7 6

a 0

7 6

1

7 6

a 0

7 6

2

; (9.3)

7 6

. .

7 6

. .

. .

5 4

a 0

n1

here all entries of the matrix are 0 except for the upp er right-hand corner and

along the line just b elow the main diagonal. Let's nd the determinantofT . Note

n n1

that T = a :::a I . Because the rst columns of fI; T; : : : ; T g are linearly

1 n

indep endent (assuming that none of the a is 0), no p olynomial of degree less than

j

n

n can annihilate T . Thus z a :::a is the minimal p olynomial of T . Hence

1 n

n

z a :::a is also the characteristic p olynomial of T .Thus

1 n

n1

det T =(1) a :::a :

1 n

(If some a is 0, then clearly T is not invertible, so det T = 0, and the same formula

j

holds.)

th

Nowlet  be a permutation of f1;:::;ng, and consider a matrix T whose j

th

column consists of all zero es except for a in the  (j ) row. The p ermutation 

j

is a pro duct of cyclic p ermutations. Thus T is similar to (and so has the same

determinant as) a blo ck diagonal matrix where eachblock of size greater than one

has the form of (9.3). The determinant of a blo ck diagonal matrix is obviously the

pro duct of the determinants of the blo cks, and weknow from the last paragraph

how to compute those. Thus we see that det T = (sign  )a :::a .To put this into

1 n

a form that do es not dep end up on the particular p ermutation  , let t denote the

i;j

entry in row i, column j ,ofT (so t = 0 unless i =  (j )), and let P (n) denote the

i;j

set of all p ermutations of f1;:::;ng. Then

X

det T = (sign  )t :::t ; (9.4)

 (1);1  (n);n

 2P (n)

b ecause each summand is 0 except the one corresp onding to the p ermutation  .

Consider now an arbitrary matrix T with entries t . Using the paragraph ab ove

i;j

as motivation, we guess that the formula for det T is given by(9.4). The next

prop osition shows that this guess is correct and gives the usual formula for the

determinant of a matrix.

P

(sign  )t :::t : Prop osition 9.5 det(T )=

 (1);1  (n);n

 2P (n)

 

@

det

@

 

16

Pro of. De ne a function d on the set of n  n matrices by

X

d(T )= (sign  )t :::t :

 (1);1  (n);n

 2P (n)

1

Wewant to prove that det T = d(T ). To do this, cho ose S so that ST S is in the

1

upp er triangular form given by Theorem 6.2. Now d(ST S ) equals the pro duct of

1

the entries on the main diagonal of ST S (b ecause only the identitypermutation

1

makes a non-zero contribution to the sum de ning d(ST S )). But the entries on

1

the main diagonal of ST S are precisely the eigenvalues of T , counting multiplicity,

1

so det T = d(ST S ). Thus to complete the pro of, we need only show that d is a

1

similarityinvariant; then we will havedetT = d(ST S )= d(T ).

To show that d is a similarityinvariant, rst prove that d is multiplicative, mean-

ing that d(AB )= d(A)d(B )forall n  n matrices A and B . The pro of that d is

multiplicative, which will not b e given here, consists of a straightforward rearrange-

ment of terms app earing in the formula de ning d(AB )(seeany text that de nes

det(T )tobe d(T ) and then proves that det AB = (det A)(det B )). The multiplica-

tivityofd now leads to a pro of that d is a similarityinvariant, as follows:

1 1 1 1

d(ST S )=d(ST )d(S )=d(S )d(ST )= d(S ST )= d(T ):

Thus det T = d(T ), as claimed.

All the usual prop erties of determinants can b e proved either from the (new)

de nition or from Prop osition 9.5. In particular, the last pro of shows that det is

multiplicative.

The determinant of a linear op erator on a real vector space is de ned to b e the

determinant (pro duct of the eigenvalues) of its complexi cation. Prop osition 9.5

holds on real as well as complex vector spaces. To see this, supp ose that U is a real

vector space and S is a linear op erator on U .Ifwecho ose a basis of U and also

think of it as a basis of the complexi cation U , then S and its complexi cation

C

S have the same matrix with resp ect to this basis. Thus the formula for det S ,

C

whichby de nition equals det S ,isgiven by Prop osition 9.5. In particular, det S

C

is real. The multiplicativity of det on linear op erators on a real vector space follows

from the corresp onding prop erty on complex vector spaces and the multiplicativi ty

of complexi cation: (AB ) = A B whenever A and B are linear op erators on a

C C C

real vector space.

The to ols we've develop ed provide a natural connection b etween determinants

n

and volumes in R .To understand that connection, rst we need to explain what

is meantby the square ro ot of an op erator times its adjoint. Supp ose S is a linear



op erator on a real vector space U . If  is an eigenvalue of S S and u 2 U is a

corresp onding non-zero eigenvector, then



hu; ui = hu; ui = hS Su; ui = hSu; Sui;



and thus  must b e a non-negativenumb er. Clearly, S S is self-adjoint, and so there



is a basis of U consisting of eigenvectors of S S (by the real Sp ectral Theorem, which



is Theorem 8.3). We can think of S S as a diagonal matrix with resp ect to this basis.

 

@

det

@

 

17



The entries on the diagonal, namely the eigenvalues of S S , are all non-negative, as

p





wehave just seen. The square ro ot of S S , denoted S S , is the linear op erator on

U corresp onding to the diagonal matrix obtained by taking the non-negative square

p





S S is self-adjoint, and its ro ot of eachentry of the matrix of S S .Obviously,



square equals S S . Also, the multiplicativity of det shows that

p

2   2



S S ) = det(S S ) = (det S )(det S ) = (det S ) : (det

p p

 

Thus det S S = jdet S j (b ecause det S S must b e non-negative).

The next lemma provides the to ol we will use to reduce the question of volume

change by a linear op erator to the self-adjoint case. It is called the p olar decomp o-

sition of an op erator S , b ecause it resembles the p olar decomp osition of a complex

p

p

i



number z = e r .Herer equals zz (analogous to S S in the lemma), and mul-

i

tiplication by e is an isometry on C (analogous to the isometric prop ertyofA in

the lemma).

Lemma 9.6 Let S b e a linear op erator on a real inner pro duct space U . Then

p



there exists a linear isometry A on U such that S = A S S .

Pro of. For u 2 U wehave

p p p

2  2

  

S Suk = h S Su; S Sui = hS Su; ui = hSu; Sui = kSuk : k

p p

 

S Suk = kSuk.Thus the function A de ned on ran S S by In other words, k

p p

 

A( S Su)= Su is well de ned and is a linear isometry from ran S S onto ran S .

Extend A to a linear isometry of U onto U by rst extending A to b e any isometry

p

? ?



of (ran S S ) onto (ran S ) (these two spaces have the same dimension, b ecause

p



S S onto ran S ), and then wehave just seen that there is a linear isometry of ran

extend A to all of U by linearity (with the Pythagorean Theorem showing that A

p



is an isometry on all of U ). The construction of A shows that S = A S S ,as

desired.

Nowwe are ready to give a clean, illuminating pro of that a linear op erator changes

volumes by a factor of the absolute value of the determinant. We will not formally

de ne volume, but only use the obvious prop erties that volume should satisfy.In

n

particular, the subsets E of R considered in the theorem b elow should b e restricted

to whatever class the reader uses most comfortably (p olyhedrons, op en sets, or

measurable sets).

n

Theorem 9.7 Let S b e a linear op erator on R .Then

vol S (E )= jdet S j vol E

n

for E  R .

p



S S b e the p olar decomp osition of S as given by Lemma 9.6. Pro of. Let S = A

n

Let E  R . Because A is an isometry,itdoesnotchange volumes. Thus

p p



 

vol S (E )= vol A S S (E ) =vol S S (E ):

 

@

det

@

 

18

p



But S S is self-adjoint, and we already noted at the b eginning of this section that

each self-adjoint op erator changes volume by a factor equal to the absolute value of

the determinant. Thus wehave

p p

 

vol S (E )= vol S S (E )= jdet S S j vol E = jdet S j vol E;

as desired.

10. Conclusion

As mathematicians, we often read a nice new pro of of a known theorem, enjoy

the di erent approach, but continue to derive our internal understanding from the

metho d we originally learned. This pap er aims to change drastically the way math-

ematicians think ab out and teach crucial asp ects of linear algebra. The simple pro of

of the existence of eigenvalues given in Theorem 2.1 should b e the one imprinted

in our minds, written on our blackb oards, and published in our textb o oks. Gen-

eralized eigenvectors should b ecome a central to ol for the understanding of linear

op erators. As wehave seen, their use leads to natural de nitions of multiplici ty

and the characteristic p olynomial. Every mathematician and every linear algebra

student should at least rememb er that the generalized eigenvectors of an op erator

always span the domain (Prop osition 3.4)|this crucial result leads to easy pro ofs

of upp er-triangular form (Theorem 6.2) and the Sp ectral Theorem (Theorems 7.5

and 8.3).

Determinants app ear in many pro ofs not discussed here. If you scrutinize such

pro ofs, you'll often discover b etter alternatives without determinants. Down with

Determinants!

Department of Mathematics

Michigan State University

East Lansing, MI 48824 USA

e-mail address : [email protected]