The Jordan-Form Proof Made Easy∗
Total Page:16
File Type:pdf, Size:1020Kb
THE JORDAN-FORM PROOF MADE EASY∗ LEO LIVSHITSy , GORDON MACDONALDz, BEN MATHESy, AND HEYDAR RADJAVIx Abstract. A derivation of the Jordan Canonical Form for linear transformations acting on finite dimensional vector spaces over C is given. The proof is constructive and elementary, using only basic concepts from introductory linear algebra and relying on repeated application of similarities. Key words. Jordan Canonical Form, block decomposition, similarity AMS subject classifications. 15A21, 00-02 1. Introduction. The Jordan Canonical Form (JCF) is undoubtably the most useful representation for illuminating the structure of a single linear transformation acting on a finite-dimensional vector space over C (or a general algebraically closed field.) Theorem 1.1. [The Jordan Canonical Form Theorem] Any linear transforma- tion T : Cn ! Cn has a block matrix (with respect to a direct-sum decomposition of Cn) of the form J1 0 0 · · · 0 2 0 J2 0 · · · 0 3 0 0 J · · · 0 6 3 7 6 . 7 6 . .. 0 7 6 7 6 0 0 0 0 J 7 4 p 5 where each Ji (called a Jordan block) has a matrix representation (with respect to some basis) of the form λ 1 0 · · · 0 2 0 λ 1 · · · 0 3 . 6 0 0 λ .. 0 7 : 6 7 6 . 7 6 . .. 1 7 6 7 6 0 0 0 0 λ 7 4 5 Of course, the Jordan Canonical Form Theorem tells you more: that the above decomposition is unique (up to permuting the blocks) while the sizes and number of blocks can be given in terms of generalized multiplicities of eigenvalues. ∗Received by the editors on ????. Accepted for publication on ????. Handling Editor: ???. yDepartment of Mathematics,Colby College, Waterville, Maine 04901, USA. (l [email protected], [email protected]). zDepartment of Mathematics and Computer Science, University of Prince Edward Island, Char- lottetown, P.E.I., C1A 4P3, Canada ([email protected]). Supported by a grant from NSERC Canada. xDepartment of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 3J3, Canada ([email protected]). Supported by a grant from NSERC Canada. 1 With the JCF theorem in hand, all the mysteries of single linear transformations are exposed and many of the other theorems of Linear Algebra (for example the Rank- Nullity Theorem and the Cayley-Hamilton Theorem) become immediately obvious. The JCF also has many practical applications. The one to which most students of mathematics are exposed is that of linear systems of differential equations with constant coefficients. With the JCF of the coefficient matrix in hand, solving such systems is a routine exercise. Unfortunately, the proof of the JCF is a mystery to many students. In many linear algebra courses, the JCF is given short shrift, since the traditional proofs re- quire not only knowledge of minimal polynomials and block decompositions but also advanced knowledge regarding nilpotents, generalized eigenvalues, esoteric subspaces (usually called admissible subspaces) and polynomial ideals (usually called stuffers or conductors). The proof does not vary much from linear algebra text to linear algebra text and the exposition in Hoffman and Kunze's text [4] is representative. A number of papers have given alternative proofs of JCF. For example: Fletcher and Sorensen [1], Galperin and Waksman [2] and V¨aliaho [5]). Of these proofs, ours is most similar in flavor to that of Fletcher and Sorenson, in that our proof is given in terms of matrices and that we take an algorithmic approach. But our proof is more elementary and relies on repeated similarities to gradually bring the matrix representation to the required form. The proof presented here requires only basic facility with row reduction, matrix representations, block decompositions, kernels and ranges of transformations, similar- ity, subspaces, minimal polynomials and eigenvalues. In short, the usual contents of a one semester introductory course in Linear Algebra. As far as pedagogy is concerned, this makes the JCF a natural culmination of such a course, bringing all the basic concepts together and ending the course with a bang. We have chosen to take a comprehensive approach, and included all the details of the proof. This makes the article longer, but, we hope, also makes it more accessible to students with only the aforementioned introductory course in linear algebra. In this paper, V and W are finite-dimensional vector spaces over C (the field of complex numbers) and T : V ! W, is a linear transformation mapping V to W (although all definitions and results hold for linear transformations acting on finite-dimensional vector spaces over any algebraically closed field). Some standard terminology and notation that will prove useful is: • L(V; W) will denote the space of all linear transformations (also called op- erators) from the vector space V to the vector space W. We also write L(V) for L(V; W) • dim(V) denotes the dimension of V • The kernel (or nullspace) of a linear transformation T 2 L(V; W) is ker(T ) = fv 2 V : T(v) = 0g • The minimal polynomial of T is mT (x) (the unique monic polynomial of minimal degree such that mT (T )= 0.) • A direct sum decomposition V = V1 ⊕V2 ⊕· · ·⊕Vn of a vector space V is an n ordered list of subspaces fVjgj=1of V such that Vj \(V1 +· · ·+Vj−1) = f0g for all 2 ≤ j ≤ n (this is one way of saying that the subspaces are independent) 2 n and such that for all v 2 V there exist vi2 Vi such that v = i=1 vi. • A block matrix of a linear transformation T with respect toPa direct sum decomposition such as the one above is a matrix of linear transformations Tij 2 L(Vj; Vi) such that, with usual definition of matrix multiplication and n identification of a vector v = i=1 vi (where vi2 Vi) with P v1 2 v2 3 v3 6 7 ; 6 . 7 6 . 7 6 7 6 v 7 4 n 5 T11 T12 T13 · · · T1n v1 2 T21 T22 T23 · · · T2n 3 2 v2 3 T T T · · · T n v3 T (v) = 6 31 32 33 3 7 6 7 6 . 7 6 . 7 6 . .. 7 6 . 7 6 7 6 7 6 T T T · · · T 7 6 vn 7 4 n1 n2 n3 nn 5 4 5 • A linear transformation T : V ! V is said to be nilpotent if its only eigen- value is the zero eigenvalue. By consideration of the minimal polynomial it is clear that T n = 0 for some n 2 N. The smallest value of n such that T n = 0 is called the index (or index of nilpotency) of T . 2. Primary decomposition. Before getting into the technicalities of the proof of the JCF, a special case of the Primary Decomposition Theorem allows us to obtain a block diagonal matrix for a linear transformation T where the eigenvalue structure of each block is as simple as possible; there is only one eigenvalue for each block. Theorem 2.1. Let T 2 L(Cn) (or L(Fn) for F an algebraically closed field) with distinct eigenvalues λ1; λ2; : : : ; λm . Then there is a direct sum decomposition V = Kλ1 ⊕ Kλ2 ⊕ · · · ⊕ Kλn with respect to which T has block matrix λ1I + N1 0 · · · 0 2 0 λ2I + N2 · · · 0 3 . 6 . .. 0 7 6 7 6 0 0 · · · λ I + N 7 4 n n 5 where each Nj is nilpotent. We prove this using a sequence of Lemmas. Lemma 2.2. Let N be a nilpotent r × r matrix, M an invertible s × s matrix and F an arbitrary r × s matrix. Then the matrix equation X − NXM = F is solvable. Proof. Let k be such that N k 6= 0 and N k+1 = 0 (Of course, k can be 0). Check that X = F + NF M + N 2F M 2 + · · · + N kF M k 3 is a solution. The solution is actually unique, but we don't need that fact here. Lemma 2.3. If λ is not an eigenvalue of the s × s matrix A and N is a nilpotent r × r matrix, then the two matrices λI + N E λI + N 0 and 0 A 0 A are similar (for any choice of the r × s matrix E.) Proof. It is sufficient to show there exists matrix X such that λI + N E I X I X λI + N 0 = 0 A 0 I 0 I 0 A or equivalently, X(A − λI) − NX = E. Since A − λI is invertible by hypothesis, we can rewrite this as X − NX(A − λI)−1 = E(A − λI)−1; which is solvable by Lemma 2.2. Lemma 2.4. Every operator T 2 L(Cn) (or L(Fn) for F an algebraically closed field) with eigenvalue λ has a matrix of the form λI + N 0 0 A where N is nilpotent and λ is not an eigenvalue of A. Proof. Use induction on n, the dimension of the space. Clearly the Lemma is true if n = 1. Assuming it is true for all dimensions less than n, choose a basis fe1; e2; : : : ; eng such that T (e1) = λe1. Then T has block matrix λ E : 0 B If λ is not an eigenvalue of B, then Lemma 2.3 gives the result. If λ is an eigenvalue of B, since B acts on a space of dimension less than n, by the induction hypothesis we have that T has a matrix of the form λ E1 E2 2 0 λI 0 3 0 0 A 4 5 where λ is not an eigenvalue of A. Again by Lemma 2.3, this matrix is similar to λ E 0 1 λI + N 0 0 λI 0 = 2 3 0 A 0 0 A 4 5 4 where 0 E N = 1 : 0 0 Theorem 2.1 now follows by applying Lemma 2.4 inductively.