Quick viewing(Text Mode)

Chapter 4. EIGENVALUES to Prove the Cauchy–Schwarz Inequality, Note That It Suffices to Consider the Case W = 1

Chapter 4. EIGENVALUES to Prove the Cauchy–Schwarz Inequality, Note That It Suffices to Consider the Case W = 1

Chapter 4

Eigenvalues

1 The

Hermitian Spaces Given a C- , an Hermitian inner product in is defined as a Hermitian symmetricV such thatVthe corresponding Hermitian quadratic form is positive definite. A space equipped with an Hermitian inner product , is called a Her- Vmitian space.1 h· ·i The inner square z, z is interpreted as the square of the length z of the vector z. Respectively,h i the distance between two points z and| | w in an Hermitian space is defined as z w . Since the Hermitian inner product is positive, distance is well-defined,| − | symmetric, and positive (unless z = w). In fact it satisfies the triangle inequality2:

z w z + w . | − | ≤ | | | | This follows from the Cauchy – Schwarz inequality:

z, w 2 z, z w, w , |h i| ≤ h i h i where the equality holds if and only if z and w are linearly dependent. To derive the triangle inequality, write:

z w 2 = z w, z w = z, z z, w w, z + w, w | − | h − − i h i − h i − h i h i z 2 + 2 z w + w 2 = ( z + w )2. ≤ | | | || | | | | | | | 1Other terms used are unitary space and finite dimensional . 2This makes a Hermitian space a metric space.

135 136 Chapter 4. EIGENVALUES To prove the Cauchy–Schwarz inequality, note that it suffices to consider the case w = 1. Indeed, when w = 0, both sides vanish, and when w =| |0, both sides scale the same way when w is normalized to the6 unit length. So, assuming w = 1, we put λ := w, z and consider the projection λw of| the| vector z to the lineh spannedi by w. The difference z λw is orthogonal to w: w, z λw = w, z λ w, w = 0. From− positivity of inner squares, weh have:− i h i− h i

0 z λw, z λw = z, z λw = z, z λ z, w . ≤ h − − i h − i h i− h i Since z, w = w, z = λ¯, we conclude that z 2 z, w 2 as required.h Noticei h thati the equality holds true only| when| ≥z |h= λwi|. All Hermitian spaces of the same are iso- metric (or Hermitian isomorphic), i.e. isomorphic through iso- morphisms respecting Hermitian inner products. Namely, as it fol- lows from the Inertia Theorem for Hermitian forms, every Hermitian space has an orthonormal , i.e. a basis e1,..., en such that ei, ej = 0 for i = j and = 1 for i = j. In the coordinate system hcorrespondingi to an6 orthonormal basis, the Hermitian inner product takes on the standard form:

z, w =z ¯ w + +z ¯ w . h i 1 1 · · · n n An orthonormal basis is not unique. Moreover, as it follows from the proof of Sylvester’s rule, one can start with any basis f1,..., fn in and then construct an orthonormal basis e ,..., e such that e V 1 n k ∈ Span(f1,..., fk). This is done inductively; namely, when e1,..., ek 1 − have already been constructed, one subtracts from fk its projection to the space Span(e1,..., ek 1): −

˜fk = fk e1, fk e1 ek 1, fk ek 1. − h i −···−h − i −

The resulting vector ˜fk lies in Span(f1,..., fk 1, fk) and is orthogonal − to Span(f1,..., fk 1) = Span(e1,..., ek 1). Indeed, − − k 1 − e , ˜f = e , f e , f e , e = 0 h i ki h i ki− h j kih i ji Xj=1 for all i = 1,...,k 1. To construct ek, one normalizes ˜fk to the unit length: − e := ˜f / ˜f . k k | k| 1. The Spectral Theorem 137 The above algorithm of replacing a given basis with an orthonormal one is known as Gram–Schmidt orthogonalization.

EXERCISES 316. Prove that if two vectors u and v in an Hermitian space are orthog- onal, then u 2 + v 2 = u v 2. Is the converse true?  | | | | | − | 317. Prove that for any vectors u, v in an Hermitian space, u + v 2 + u v 2 =2 u 2 +2 v 2. | | | − | | | | | Find a geometric interpretation of this fact. 

318. Apply Gram–Schmidt orthogonalization to the basis f1 = e1 +2ie2 + 3 2ie3, f2 = e1 +2ie2, f3 = e1 in the coordinate Hermitian space C .

319. Apply Gram–Schmidt orthogonalization to the standard basis e1, e2 of C2 to construct an orthonormal basis of the Hermitian inner product z, w =z ¯ w +2¯z w +2¯z w +5¯z w . h i 1 1 1 2 2 1 2 2 320. Let f be a vector in an Hermitian space, e ,..., ek an orthonor- ∈V 1 mal basis in a subspace . Prove that u = ei, v ei is the point of W h i W closest to v, and that v u is orthogonal to . (The point u is called − W ∈W the orthogonal projection of v to .) P ⋆ W 321. Let f1,..., fN be a finite sequence of vectors in an Hermitian space. The Hermitian N N- fi, fj is called the of the × h i sequence. Show that two finite sequences of vectors are isometric, i.e. ob- tained from each other by a unitary transformation, if and only if their Gram matrices are the same.

Normal Operators Our next point is that an Hermitian inner product on a com- plex vector space allows one to identify sesquilinear forms on it with linear transformations. Let be an Hermitian vector space, and T : a C-linear transformation.V Then the function C V 7→ V V×V→ T (w, z) := w, T z h i is C-linear in z and anti-linear in w, i.e. it is sesquilinear.

In coordinates, w = i wiei, z = j zjej, and

w, TPz = w Pe , T e z , h i i h i ji j Xi,j i.e., T (e , e )= e , T e form the coefficient matrix of the sesquilin- i j h i ji ear form. On the other hand, T ej = i tijei, where [tij] is the matrix P 138 Chapter 4. EIGENVALUES of the linear transformation T with respect to the basis (e1,..., en). Note that if the basis is orthonormal, then ei, T ej = tij, i.e. the two matrices coincide. h i Since tij could be arbitrary, it follows that every sesquilinear form on is uniquely represented by a linear transformation. V Earlier we have associated with a sesquilinear form its Hermitian adjoint (by changing the order of the arguments and conjugating the value). When the sesquilinear form is obtained from a linear transformation T , the adjoint corresponds to another linear trans- formation denoted T † and called Hermitian adjoint to T . Thus, by definition,

w, T z = z, T w = T †w, z for all z, w . h i h † i h i ∈V Of course, we also have T w, z = w, T z (check this!), and either h i h † i identity completely characterizes T † in terms of T . The matrix of T † in an orthonormal basis is obtained from that of T by complex conjugation and transposition:

t† := e , T †e = T e , e = e , T e =: t . ij h i ji h i ji h j ii ji Definition. A linear transformation on an Hermitian vector is called normal (or a normal operator) if it commutes with its adjoint: T †T = T T †.

Example 1. A scalar operator is normal. Indeed, (λI)† = λI, which is also scalar, and scalars commute. Example 2. A linear transformation on an Hermitian space is called Hermitian if it coincides with its Hermitian adjoint: S† = S. A Hermitian operator3 is normal. Example 3. A linear transformation is called anti-Hermitian if it is opposite to its adjoint: A† = A. Multiplying an Hermitian op- erator by √ 1 yields an anti-Hermitian− one, and vice versa (because (√ 1I) = −√ 1I). Anti-Hermitian operators are normal. − † − − Example 4. Every linear transformation T : can be uniquely written as the sum of Hermitian and anti-HermitianV →V opera- tors: T = S + Q, where S = (T + T )/2= S , and Q = (T T )/2= † † − † Q†. We claim that an operator is normal whenever its Hermitian −and anti-Hermitian parts commute. Indeed, T = S Q, and † − T T † T †T = (S + Q)(S Q) (S Q)(S + Q) = 2(QS SQ). − − − − − 3The term operator in Hermitian geometry is synonimous to . 1. The Spectral Theorem 139 Example 5. An invertible linear transformation U : is called unitary if it preserves inner products: V →V

Uw,Uz = w, z for all w, z . h i h i ∈V Equivalently, w, (U U I)z = 0 for all w, z . Taking w = h † − i ∈ V (U †U I)z, we conclude that (U †U I)z = 0 for all z , and hence − − 1 ∈V U †U = I. Thus, for a unitary map U, U − = U †. The converse state- 1 ment is also true (and easy to check by starting from U − = U † and reversing our computation). Since every invertible transformation commutes with its own inverse, we conclude that unitary transfor- mations are normal.

EXERCISES 322. Generalize the construction of Hermiatian adjoint operators to the case of operators A : between two different Hermitian spaces. V →W Namely, show that A† : is uniquely determined by the identity W →V A†w, v = w, Av for all v and w . h iV h iW ∈V ∈W 323. Show that the matrices of A : and A† : in orthornor- V→W W→V mal bases of and are obtained from each other by transposition and V W complex conjugation. 324. The of a A is defined as the sum of its entries, and is denoted tr A. Prove that A, B := tr(A†B) defines an h i Hermitian inner product on the space Hom(Cn, Cm) of m n-matrices. × 325. Let A ,...,Ak : be linear maps between Hermitian spaces. 1 V →W Prove that if A†Ai = 0, then A = = Ak = 0. i 1 ··· 326. Let A : be a linear map between Hermitian spaces. Show PV →W that B := A†A and C = AA† are Hermitian, and that the corresponding Hermitian forms B(x, x) := x,Bx in and C(y, y) := y, Cy in are h i V h i W non-negative. Under what hypothesis about A is the 1st of them positive? the 2nd one? both? 327. Let be a subspace in an Hermitian space, and let P : W⊂V V→V be the map that to each vector v assigns its orthogonal projection ∈ V to . Prove that P is an Hermitian operator, that P 2 = P , and that W Ker P = ⊥.(It is called the orthogonal projector to .) W W 328. Prove that an n n-matrix is unitary if and only if its rows (or × columns) form an orthonormal basis in the coordinate Hermitian space Cn. 329. Prove that the determinant of a is a of absolute value 1. 330. Prove that the Cayley transform: C (I C)/(I+C), well-defined 7→ − for linear transformations C such that I+C is invertible, transforms unitary 140 Chapter 4. EIGENVALUES operators into anti-Hermitian and vice versa. Compute the square of the Cayley transform.  331. Prove that the commutator AB BA of anti-Hermitian operators A − and B is anti-Hermitian. 332. Give an example of a normal 2 2-matrix which is not Hermitian, × anti-Hermitian, unitary, or diagonal. 333. Prove that for any n n-matrix A and any complex numbers α, β of × absolute value 1, the matrix αA + βA† is normal.

334. Prove that A : is normal if and only if Ax = A†x for all V →V | | | | x . ∈V

The Spectral Theorem for Normal Operators

Let A : be a linear transformation, v a vector, and λ C aV scalar. →V The vector v is called an eigenvector∈ V of A with the∈ eigenvalue λ, if v = 0, and Av = λv. In other words, A preserves the line spanned6 by the vector v and acts on this line as the multiplication by λ. Theorem. A linear transformation A : on a finite dimensional Hermitian vector space is normalV →V if and only if has an orthonormal basis of eigenvectors of A. V Proof. In one direction, the statement is almost obvious: If a basis consists of eigenvectors of A, then the matrix of A in this basis is diagonal. When the basis is orthonormal, the matrix of the Hermitian adjoint operator A† in this basis is Hermitian adjoint to the matrix of A and is also diagonal. Since all diagonal matrices commute, we conclude that A is normal. Thus, it remains to prove that, conversely, every normal operator has an orthonormal basis of eigenvectors. We will prove this in four steps. Step 1. Existence of eigenvalues. We need to show that there exists a scalar λ C such that the system of linear equations Ax = λx has a non-trivial∈ solution. Equivalently, this means that the linear transformation λI A has a non-trivial . Since is finite dimensional, this can− be re-stated in terms of the determinaV nt of the matrix of A (in any basis) as

det(λI A) = 0. − This relation, understood as an equation for λ, is called the char- acteristic equation of the operator A. When A = 0, it becomes 1. The Spectral Theorem 141 λn = 0, where n = dim . In general, it is a degree-n polynomial equation V n n 1 λ + p1λ − + + pn 1λ + pn = 0, · · · − where the coefficients p1,...,pn are certain algebraic expressions of matrix entries of A (and hence are complex numbers). According to the Fundamental Theorem of Algebra, this equation has a complex solution, say λ0. Then det(λ0I A) = 0, and hence the system (λ I A)x = 0 has a non-trivial− solution, v = 0, which is therefore 0 − 6 an eigenvector of A with the eigenvalue λ0. Remark. Solutions to the system Ax = λ0x form a linear sub- space in , namely the kernel of λ I A, and eigenvectors of A W V 0 − with the eigenvalue λ0 are exactly all non-zero vectors in . Slightly abusing terminology, is called the eigenspace of A Wcorrespond- W ing to the eigenvalue λ0. Obviously, A( ) . Subspaces with such property are called A-. ThusW ⊂ eigenspaces W of a linear transformation A are A-invariant. Step 2. A†-invariance of eigenspaces of A. Let = 0 be the eigenspace of a normal operator A, corresponding toW an 6 { eigenvalue} λ. Then for every w , ∈W A(A†w)= A†(Aw)= A†(λw)= λ(A†w).

Therefore A w , i.e. the eigenspace is A -invariant. † ∈W W † Step 3. Invariance of orthogonal complements. Let be W⊂V a . Denote by ⊥ the orthogonal complement of the subspace with respectW to the Hermitian inner product: W ⊥ := v w, v = 0 for all w . W { ∈V|h i ∈W }

Note that if e1,..., ek is a basis in , then ⊥ is given by k linear equations e , v = 0, i = 1,...,k, andW thusW has dimension n k. h i i ≥ − On the other hand, ⊥ = 0 , because no vector w = 0 can be orthogonal toW itself: ∩Ww, w {>}0. It follows from dimension6 counting formulas that dim h = ni k. Moreover, this implies that W⊥ − = ⊥, i.e. the whole space is represented as the ofV twoW⊕W orthogonal subspaces. We claim that if a subspace is both A- and A†-invariant, then its orthogonal complement is also A- and A†-invariant. Indeed, suppose that A ( ) , and v . Then for any w , we have: † W ⊂ W ∈ W⊥ ∈ W w, Av = A†w, v = 0, since A†w . Therefore Av ⊥, i.e. h is Ai -invariant.h i By the same token,∈ W if is A-invariant,∈ then W W⊥ W W⊥ is A†-invariant. 142 Chapter 4. EIGENVALUES Step 4. Induction on dim . When dim = 1, the theorem is obvious. Assume that the theoremV is proved forV normal operators in spaces of dimension

1 Note that for unitary matrices, U † = U − , and therefore the above transformations coincide with similarity transformations A 7→ 1. The Spectral Theorem 143 1 U − AU. This is how the matrix A of a linear transformation changes under a change of basis. When both the old and new bases are orthonormal, the transition matrix U must be unitary (because in both old and new coordinates the Hermitian inner product has the same standard form: x, y = x y). The result follows. h i † EXERCISES 335. Prove that the characteristic polynomial det(λI A) of a square ma- − 1 trix A does not change under similarity transformations A C− AC and 7→ thus depends only on the linear operator defined by the matrix. n n 1 336. Show that if λ + p1λ − + + pn is the characteristic polynomial n ··· of a matrix A, then pn = ( 1) det A, and p = tr A, and conclude that − 1 − the trance is invariant under similarity transformations.

337. Prove that tr A = λi, where λi are the roots of det(λI A) = 0. − − 338. Prove that if A and B are normal and AB = 0, then BA = 0. Does P this remain true without the normality assumption? 339.⋆ Let operator A be normal. Prove that the set of complex numbers x, Ax x = 1 is a convex polygon whose vertices are the eigenvalues {h i | | | } of A.  340. Prove that two (or several) commuting normal operators have a com- mon orthonormal basis of eigenvectors. 

341. Prove that if A is normal and AB = BA, then AB† = B†A, A†B = BA†, and A†B† = B†A†.

Unitary Transformations Note that if λ is an eigenvalue of a unitary operator U then λ = 1. Indeed, if x = 0 is a corresponding eigenvector, then x,|x | = Ux,Ux = λλ¯ x6 , x , and since x, x = 0, it implies λλ¯ = 1.h i h i h i h i 6 Corollary 5. A transformation is unitary if and only if in some orthonormal basis its matrix is diagonal, and the diagonal entries are complex numbers of absolute value 1. On the complex line C, multiplication by λ with λ = 1 and arg λ = θ defines the through the angle θ.| We| will call this transformation on the complex line a unitary rotation. We arrive therefore to the following geometric characterization of unitary transformations. Corollary 6. Unitary transformations in an Hermitian space of dimension n are exactly unitary rotations (through possibly different angles) in n mutually perpendicular com- plex directions. 144 Chapter 4. EIGENVALUES Orthogonal Diagonalization Corollary 7. A linear operator is Hermitian (respectively anti-Hermitian) if and only if in some orthonormal basis its matrix is diagonal with all real (respectively imaginary) diagonal entries. Indeed, if Ax = λx and A = A, we have: † ± λ x, x = x, Ax = A†x, x = λ¯ x, x . h i h i h i ± h i Therefore λ = λ¯ provided that x = 0, i.e. eigenvalues of an Hermi- tian operator are± real and of anti-Hermitian6 imaginary. Vice versa, a real diagonal matrix is obviously Hermitian, and imaginary anti- Hermitian. Recall that (anti-)Hermitian operators correspond to (anti-)Her- mitian forms A(x, y) := x, Ay . Applying the Spectral Theorem and reordering the basis eigenvectorsh i in the monotonic order of the corresponding eigenvalues, we obtain the following classification re- sults for forms. Corollary 8. In a Hermitian space of dimension n, an Hermitian form can be transformed by unitary changes of coordinates to exactly one of the normal forms

λ z 2 + + λ z 2, λ λ . 1| 1| · · · n| n| 1 ≥···≥ n

Corollary 9. In a Hermitian space of dimension n, an anti-Hermitian form can be transformed by unitary changes of coordinates to exactly one of the normal forms

iω z 2 + + iω z 2, ω ω . 1| 1| · · · n| n| 1 ≥···≥ n

Uniqueness follows from the fact that eigenvalues and dimensions of eigenspaces are determined by the operators in a coordinate-less fashion. Corollary 10. In a complex vector space of dimension n, a pair of Hermitian forms, of which the first one is positive definite, can be transformed by a choice of a coordinate system to exactly one of the normal forms:

z 2 + + z 2, λ z 2 + + λ z 2, λ λ . | 1| · · · | n| 1| 1| · · · n| n| 1 ≥···≥ n 1. The Spectral Theorem 145

This is the Orthogonal Diagonalization Theorem for Her- mitian forms. It is proved in two stages. First, applying the Inertia Theorem to the positive definite form one transforms it to the stan- dard form; the 2nd Hermitian form changes accordingly but remains arbitrary at this stage. Then, applying Corollary 8 of the Spectral Theorem, one transforms the 2nd Hermitian form to its normal form by transformations preserving the 1st one. Note that one can take the positive definite sesquilinear form corresponding to the 1st Hermitian form for the Hermitian inner product, and describe the 2nd form as z, Az , where A is an opera- tor Hermitian with respect to this innerh product.i The operator, its eigenvalues, and their multiplicities are thus defined by the given pair of forms in a coordinate-less fashion. This guarantees that pairs with different collections λ1 λn of eigenvalues are non-equivalent to each other. ≥···≥

EXERCISES 342. Prove that all roots of characteristic polynomials of Hermitian matri- ces are real. 343. Find eigenspaces and eigenvalues of an orthogonal projector to a sub- space in an Hermitian space. W⊂V 344. Prove that every Hermitian operator P satisfying P 2 = P is an or- thogonal projector. Does this remain true if P is not Hermitian? 345. Prove directly, i.e. not referring to the Spectral Theorem, that every Hermitian operator has an orthonormal basis of eigenvectors. 

346. Prove that if (λ λ1) (λ λn) is the characteristic polynomial of − ··· 2− a normal operator A, then λi = tr(A†A). | | 347. Classify up to linearP changes of coordinates pairs (Q, A) of forms, where S is positive definite Hermitian, and A anti-Hermitian. 348. An Hermitian operator S is called positive (written: S 0) if ≥ x,Sx) 0 for all x. Prove that for every positive operator S there is h ≥ a unique positive square root (denoted by √S), i.e. a positive operator whose square is S. 349.⋆ Prove the Singular Value Decomposition Theorem: For a rank r linear map A : between Hermitian spaces, there exist orthonormal V→W bases v ,..., vn in and w ,..., wm in , and reals µ µr > 0, 1 V 1 W 1 ≥···≥ such that Av = µ w ,...,Avr = µrwr, Avr = = Avn = 0.  1 1 1 +1 ··· 146 Chapter 4. EIGENVALUES 350. Prove that for every complex m n-matrix A of rank r, there exist × unitary m m- and n n-matrices U and V , and a diagonal r r-matrix × × M 0 × M with positive diagonal entries, such that A = U V .  † 0 0   351. Using the Singular Value Decomposition Theorem with m = n, prove that every linear transformation A of an Hermitian space has a polar de- composition A = SU, where S is positive, and U is unitary. 352. Prove that the polar decomposition A = SU is unique when A is 1 invertible; namely S = √AA∗, and U = S− A. What are polar decompo- sitions of non-zero 1 1-matrices? × 2. Euclidean Geometry 147 2 Euclidean Geometry

Euclidean Spaces Let be a real vector space. A Euclidean inner product (or Eu- clideanV structure) on is defined as a positive definite symmetric bilinear form , . A realV vector space equipped with a Euclidean in- ner product ish· called·i a . A Euclidean inner product allows one to talk about distances between points and angles between directions: x, y x y = x y, x y , cos θ(x, y) := h i . | − | h − − i x y p | | | | It follows from the Inertia Theorem that every finite dimen- sional Euclidean vector space has an orthonormal basis. In coordinates corresponding to an orthonormal basis e1,..., en the in- ner product is given by the standard formula:

n x, y = x y e , e = x y + + x y . h i i jh i ji 1 1 · · · n n i,jX=1 Thus, every Euclidean space of dimension n can be identified with the coordinate EuclideanV space Rn by an isomorphism Rn respecting inner products. Such an isomorphism is not unique,→ but V can be composed with any invertible linear transformation U : preserving the Euclidean structure: V→V Ux,Uy = x, y for all x, y . h i h i ∈V Such transformations are called orthogonal. A Euclidean structure on a vector space allows one to identify V the space with its dual ∗ by the rule that to a vector v assigns the linear function on Vwhose value at a point x is equal∈V to the inner product v, x .V Respectively, given a linear∈V map A : h i t V →W between Euclidean spaces, the adjoint map A : ∗ ∗ can be considered as a map between the spaces themselves:W A→t V: . The defining property of the adjoint map reads: W →V Atw, v = w, Av for all v and w . h i h i ∈V ∈W Consequently matrices of adjoint maps A and At with respect to orthonormal bases of the Euclidean spaces and are transposed to each other. V W 148 Chapter 4. EIGENVALUES As in the case of Hermitian spaces, one easily derives that a linear 1 t transformation U : is orthogonal if and only if U − = U . In the matrix form,V the →V relation U tU = I means that columns of U form an orthonormal set in the coordinate Euclidean space.

Our goal here is to develop the spectral theory for real nor- mal operators, i.e. linear transformations A : on a Eucle- dean space commuting with their transposed operators:V→V AtA = AAt. Symmetric (At = A), anti-symmetric (At = A), and orthogonal transformations are examples of normal operators− in Euclidean ge- ometry. The right way to proceed is to consider Euclidean geometry as Hermitian geometry, equipped with an additional, real structure, and apply the Spectral Theorem of Hermitian geometry to real normal operators extended to the complex space.

EXERCISES 353. Prove the Cauchy-Schwartz inequality for Euclidean inner products: x, y 2 x, x y, y , strictly, unless x and y are proportional, and derive h i ≤ h ih i from this that the angle between non-zero vectors is well-defined.  n n 2 n 1 354. For x, y R , put x, x = 2x 2 − xixi . Show that the ∈ h i i=1 i − i=1 +1 corresponding symmetric bilinear form defines on Rn a Euclidean structure, and find the angles between the standardP coordinateP axes in Rn.  n 355. Prove that x, x := 2 xixj defines in R a Euclidean structure, h i i j find pairwise angles between the≤ standard coordinate axes, and show that permutations of coordinatesP define orthogonal transformations.  n+1 356. In the standard Euclidean space R with coordinates x0,...,xn, consider the hyperplane H given by the equation x + + xn = 0. Find 0 ··· explicitly a basis fi in H, in which the Euclidean structure has the same { } form as in the previous exercise, and then yet another basis hi in which { } it has the same form as in the exercise preceding it.  357. Prove that if U is orthogonal, then det U = 1.  ± 358. Provide a geometric description of orthogonal transformations of the Euclidean . Which of them have determinant 1, and which 1?  − 359. Prove that an n n-matrix U defines an orthogonal transformation × in the standard Euclidean space Rn if and only if the columns of U form an orthonormal basis. 360. Show that rows of an form an orthonormal basis. 2. Euclidean Geometry 149 Complexification Since R C, every complex vector space can be considered as a real vector⊂ space simply by “forgetting” that one can multiply by non-real scalars. This operation is called realification; applied to a C-vector space , it produces an R-vector space, denoted R, of real dimension twiceV the complex dimension of . V V In the reverse direction, to a real vector space one can associate a complex vector space, C, called the complexificationV of . As a real vector space, it is theV direct sum of two copies of : V V C := (x, y) x, y . V { | ∈ V} Thus the addition is performed componentwise, while the multipli- cation by complex scalars α + iβ is introduced with the thought in mind that (x, y) stands for x + iy: (α + iβ)(x, y) := (αx βy, βx + αy). − This results in a C-vector space C whose complex dimension equals the real dimension of . V V Example. (Rn)C = Cn = x + iy x, y Rn . { | ∈ } A productive point of view on complexification is that it is a com- plex vector space with an additional structure that “remembers” that the space was constructed from a real one. This additional structure is the operation of complex conjugation (x, y) (x, y). C C 7→ − The operation in itself is a map σ : , satisfying σ2 = id, which is anti-linear over C. The latterV means→V that σ(λz) = λσ¯ (z) for all λ C and all z C. In other words, σ is R-linear, but anti-commutes∈ with multiplication∈ V by i: σ(iz)= iσ(z). − Conversely, let be a complex vector space equipped with an anti-linear operatorW whose square is the identity4: σ : , σ2 = id, σ(λz)= λσ¯ (z) for all λ C, z . W→W ∈ ∈W Let denote the real subspace in that consists of all σ-invariant vectors.V We claim that is canonicallyW identified with the complexification of :W = C. Indeed, every vector z is uniquely written asV theW sum ofV σ-invariant and σ-anti-invariant∈ W vectors: 1 1 z = (z + σz)+ (z σz). 2 2 − 4An transformation whose square is the identity is called an involution. 150 Chapter 4. EIGENVALUES Since σi = iσ, multiplication by i transforms σ-invariant vectors to σ-anti-invariant− ones, and vice versa. Thus, as a real space is the direct sum (i ) = x + iy x, y , whereW multiplication by i acts in theV required⊕ V for{ the complexification| ∈ V} fashion: i(x + iy) = y + ix. − The construction of complexification and its abstract description in terms of the complex conjugation operator σ are the tools that allow one to carry over results about complex vector spaces to real vector spaces. The idea is to consider real objects as complex ones invariant under the complex conjugation σ, and apply (or improve) theorems of complex in a way that would respect σ. Example. A real matrix can be considered as a complex one. This way an R-linear map defines a C-linear map (on the complexified space). More abstractly, given an R-linear map A : , one can associate to it a C-linear map AC : C C byVAC →V(x, y) := (Ax, Ay). This map is real in the sense thatV → it commutesV with the complex conjugation: ACσ = σAC. Vice versa, let B : C C be a C-linear map that commutes with σ: σ(Bz) = Bσ(zV) for→ all V z C. When σ(z) = z, we find σ(Bz) = Bz, i.e. the subspaces∈ Vand i of real and± imaginary vectors are±B-invariant. Moreover,V since B Vis C-linear, we find that for x, y , B(x + iy)= Bx + iBy. Thus B = AC where the linear operator∈VA : is obtained by restricting B to . V→V V Our nearest goal is to obtain real analogues of the Spectral Theo- rem and its corollaries. One way to do it is to combine corresponding complex results with complexification. Let be a Euclidean space. We extend the inner product to the complexificationV C in such a way that it becomes an Hermitian inner product. Namely,V for all x, y, x , y , put ′ ′ ∈V x + iy, x′ + iy′ = x, x′ + y, y′ + i x, y′ i y, x′ . h i h i h i h i− h i It is straightforward to check that this form on C is sesquilinear and Hermitian symmetric. It is also positive definiteV since x + iy, x + 2 2 h iy = x + y . Note that changing the signs of y and y′ preserves thei real| | part| and| reverses the imaginary part of the form. In other words, for all z, w C, we have: ∈V σ(z),σ(w) = z, w (= w, z ). h i h i h i This identity expresses the fact that the Hermitian structure of C came from a Euclidean structure on . When A : C C is a realV V V →V 2. Euclidean Geometry 151 operator, i.e. σAσ = A, the Hermitian adjoint operator A† is also real.5 Indeed, since σ2 = id, we find that for all z, w C ∈V

σA†σz, w = σw, A†σz = Aσw,σz = σAw,σz = z, Aw , h i h i h i h i h i i.e. σA†σ = A†. In particular, complexifications of orthogonal 1 t t t (U − = U ), symmetric (A = A), anti-symmetric (A = A), normal (AtA = AAt) operators in a Euclidean space are respec-− tively unitary, Hermitian, anti-Hermitian, normal operators on the complexified space, commuting with the complex conjugation.

EXERCISES 361. Consider Cn as a real vector space, and describe its complexification. 362. Let σ be the complex conjugation operator on Cn. Consider Cn as a real vector space. Show that σ is symmetric and orthogonal. 363. On the complex line C1, find all involutions σ anti-commuting with the multiplication by i: σi = iσ. − 364. Let σ be an involution on a complex vector space . Considering W W as a real vector space, find eigenvalues of σ and describe the corresponding eigenspaces. 

The Real Spectral Theorem Theorem. Let be a Euclidean space, and A : a normal operator. ThenV in the complexification C,V→V there exists an orthonormal basis of eigenvectors of ACVwhich is invariant under complex conjugation and such that the eigenvalues corresponding to conjugated eigenvectors are conjugated. Proof. Applying the complex Spectral Theorem to the normal operator B = AC, we obtain a decomposition of the complexified space C into a direct orthogonal sum of eigenspaces ,..., V W1 Wr of B corresponding to distinct complex eigenvalues λ1, . . . , λr. Note that if v is an eigenvector of B with an eigenvalue µ, then Bσv = σBv = σ(µv) =µσ ¯ v, i.e. σv is an eigenvector of B with the con- jugate eigenvalueµ ¯. This shows that if λi is a non-real eigenvalue, then its conjugate λ¯i is also one of the eigenvalues of B (say, λj), and the corresponding eigenspaces are conjugated: σ( )= . By the Wi Wj 5This is obvious in the matrix form: In a real orthonormal basis of V (which is t a complex orthonormal basis of VC) A has a real matrix, so that A† = A . Here we argue the “hard way” in order to illustrate how various aspects of σ-invariance fit together. 152 Chapter 4. EIGENVALUES same token, if λ is real, then σ( )= . This last equality means k Wk Wk that k itself is the complexification of a real space, namely of the σ-invariantW part of . It coincides with the space Ker(λ I A) Wk k − ⊂V of real eigenvectors of A with the eigenvalue λk. Thus, to construct a required orthonormal basis, we take: for each real eigenspace k, a Euclidean orthonormal basis in the corresponding real eigenspace,W and for each pair i, j of eigenspaces, an Her- mitian orthonormalW basisW f in and the conjugate basis σ(f ) { α} Wi { α } in j = σ( i). The vectors of all these bases altogether form an W W C orthonormal basis of satisfying our requirements.  V Example 1. Identify C with the Euclidean plane R2 in the usual way, and consider the operator (x + iy) (α + iβ)(x + iy) of mul- tiplication by given complex number α +7→iβ. In the basis 1, i, it has the matrix α β A = . β− α   Since At represents multiplication by α iβ, it commutes with A. Therefore A is normal. It is straightforward− to check that

1 1 1 1 z = and ¯z = √ i √ i 2  −  2   are complex eigenvectors of A with the eigenvalues α + iβ and α iβ respectively, and form an Hermitian orthonormal basis in (R2)C−. n Example 2. If A is a linear transformation in R , and λ0 is a non-real root of its characteristic polynomial det(λI A), then the − system of linear equations Az = λ0z has non-trivial solutions, which cannot be real though. Let z = u + iv be a complex eigenvector of A with the eigenvalue λ = α + iβ. Then σz = u iv is an eigenvector 0 − of A with the eigenvalue λ¯0 = α iβ. Since λ0 = λ¯0, the vectors z and σz are linearly independent− over C, and hence6 the real vectors u and v must be linearly independent over R. Consider the plane Span(u, v) Rn. Since ⊂ A(u iv) = (α iβ)(u iv) = (αu βv) i(βu + αv), − − − − − we conclude that A preserves this plane and in the basis u, v in it α β − (note the sign change!) acts by the matrix . If we assume β− α in addition that A is normal (with respect to the standard Euclidean structure in Rn), then the eigenvectors z and σz must be Hermitian 2. Euclidean Geometry 153 orthogonal, i.e.

u iv, u + iv = u, u v, v + 2i u, v = 0. h − i h i − h i h i We conclude that u, v = 0 and u 2 v 2 = 0, i.e. u and v are orthogonal and haveh thei same length.| | − Normalizing | | the length to 1, we obtain an orthonormal basis of the A-invariant plane, in which the transformation A acts as in Example 1. The geometry of this transformation is known to us from studying geometry of complex numbers: It is the composition of the rotation through the angle arg(λ0) with the expansion by the factor λ0 . We will call such a transformation of the Euclidean plane a complex| | multiplication or multiplication by a complex scalar, λ0. Corollary 1. Given a normal operator on a Euclidean space, the space can be represented as a direct orthogonal sum of invariant lines and planes, on each of which the transformation acts as multiplication by a real or complex scalar respectively. Corollary 2. A transformation in a Euclidean space is orthogonal if and only if the space can be represented as the direct orthogonal sum of invariant lines and planes on each of which the transformation acts as multiplication by 1 and rotation respectively. ± Corollary 3. In a Euclidean space, every symmetric op- erator has an orthonormal basis of eigenvectors. Corollary 4. Every quadratic form in a Euclidean space of dimension n can be transformed by an orthogonal change of coordinates to exactly one of the normal forms:

λ x2 + + λ x2 , λ λ . 1 1 · · · n n 1 ≥···≥ n Corollary 5. In a Euclidean space of dimension n, every anti-symmetric bilinear form can be transformed by an or- thogonal change of coordinates to exactly one of the normal forms

r x, y = ωi(x2i 1y2i x2iy2i 1), ω1 ωr > 0, 2r n. h i − − − ≥···≥ ≤ Xi=1 154 Chapter 4. EIGENVALUES Corollary 6. Every real A can be written in the form A = U tMU where U is an orthogonal matrix, and M is block-diagonal matrix with each block either of α β size 1, or of size 2 of the form − , where α2 + β2 = 0. β α 6   If A is symmetric, then only blocks of size 1 are present (i.e. M is diagonal). If A is anti-symmetric, then blocks of size 1 are zero, 0 ω and of size 2 are of the form , where ω > 0. ω − 0   If A is orthogonal, then all blocks of size 1 are equal cos θ sin θ to 1, and blocks of size 2 have the form − , ± sin θ cos θ where 0 <θ<π.  

EXERCISES 365. Prove that an operator on a Euclidean vector space is normal if and only if it is the sum of commuting symmetric and anti-symmetric operators. 366. Prove that in the complexification (R2)C of a Euclidean plane, all rotations of R2 have a common basis of eigenvectors, and find these eigen- vectors.  367. Prove that an orthogonal transformation in R3 is either the rotation through an angle θ, 0 θ π, about some axis, or the composition of such ≤ ≤ a rotation with the reflection about the plane perpendicular to the axis. 368. Find an orthonormal basis in Cn in which the transformation defined by the cyclic permutation of coordinates: (z ,z ,...,zn) (z ,...,zn,z ) 1 2 7→ 2 1 is diagonal. 369. In the coordinate Euclidean space Rn with n 4, find real and com- ≤ plex normal forms of orthogonal transformations defined by various permu- tations of coordinates. 370. Transform to normal forms by orthogonal transformations: (a) x x + x x , (b) 2x2 4x x + x2 4x x , 1 2 3 4 1 − 1 2 2 − 2 3 (c) 5x2 +6x2 +4x2 4x x 4x x . 1 2 3 − 1 2 − 1 3 371. Show that any anti-symmetric bilinear form on R2 is proportional to

x1 x2 det[x, y]= = x1y2 x2y1. y1 y2 −

Find the operator corresponding to this form, its complex eigenvalues and eigenvectors.  2. Euclidean Geometry 155 372. In Euclidean spaces, classify all operators which are both orthogonal and anti-symmetric. 373. Recal that a biliniar form on is called non-dedgenerate if the V corresponding linear map ∗ is an isomorphism, and degenerate V →V otherwise. Prove that all non-degenerate anti-symetric bilinear forms on R2n are equivalent to each other, and that all antisymmetric bilinear forms on R2n+1 are degenerate. 374. Derive Corollaries 1 – 6 from the Real Spectral Theorem. 375. Let and be two subspaces of dimension 2 in the Euclidean 4- U V space. Consider the map T : defined as the composition: R4 V→V V ⊂ → R4 , where the arrows are the orthogonal projections to and U ⊂ → V U respectively. Prove that T is positive, and that its eigenvalues have the V form cos φ, cos ψ where φ, ψ are certain angles, 0 φ, ψ π/2. ≤ ≤ 376. Solve Gelfand’s problem: In the Euclidean 4-space, classify pairs of planes passing through the origin up to orthogonal transformations of the space. 

Courant–Fischer’s Minimax Principle One of the consequences (equivalent to Corollary 4) of the Spectral Theorem is that a pair (Q,S) of quadratic forms in Rn, of which the first one is positive definite, can be transformed by a linear change of coordinates to the normal form: Q = x2 + + x2 , S = λ x2 + + λ x2 , λ λ . 1 · · · n 1 1 · · · n n 1 ≥···≥ n The eigenvalues λ1 . . . λn form the spectrum of the pair (Q,S). The following result≥ gives a coordinate-less, geometric description of the spectrum (and thus implies the Orthogonal Diagonalization Theorem as it was stated in the Introduction). Theorem. The k-th greatest spectral number is given by S(x) λk = max min , : dim =k x 0 Q(x) W W ∈W− where the maximum is taken over all k-dimensional sub- spaces Rn, and minimum over all non-zero vectors in the subspace.W ⊂ Proof. When is given by the equations xk+1 = = xn = 0, the minimal ratio WS(x)/Q(x) (achieved on vectors proportional· · · to ek) is equal to λk because λ x2 + + λ x2 λ (x2 + + x2) when λ λ . 1 1 · · · k k ≥ k 1 · · · k 1 ≥···≥ k 156 Chapter 4. EIGENVALUES Therefore it suffices to prove for every other k-dimensional subspace the minimal ratio cannot be greater than λk. For this, denote Wby the subspace of dimension n k + 1 given by the equations V − x1 = = xk 1 = 0. Since λk λn, we have: · · · − ≥···≥ λ x2 + + λ x2 λ (x2 + + x2 ), k k · · · n n ≤ k k · · · n i.e. for all non-zero vectors x in the ratio S(x)/Q(x) λk. Now we invoke the dimension countingV argument: dim + dim≤ = k + (n k +1) = n + 1 > dim Rn, and conclude that Whas a non-trivialV intersection− with . Let x be a non-zero vectorW in . Then S(x)/Q(x) λ , andV hence the minimum of the ratio W∩VS/Q on 0 ≤ k W− cannot exceed λk.  Applying Theorem to the pair (Q, S) we obtain yet another characterization of the spectrum: − S(x) λk = min max . : dim =n k+1 x 0 Q(x) W W − ∈W− Formulating some applications, we assume that the space Rn is Euclidean, and refer to the spectrum of the pair (Q,S) where Q = x 2, simply as the spectrum of S. | | Corollary 1. When a quadratic form increases, its spec- tral numbers do not decrease: If S S′ then λk λk′ for all k = 1,...,n. ≤ ≤ Proof. Indeed, since S/Q S /Q, the minimum of the ratio ≤ ′ S/Q on every k-dimensional subspace cannot exceed that of S′/Q, which in particular remains true for thatW on which the maximum W of S/Q equal to λk is achieved. The following result is called Cauchy’s interlacing theorem. Corollary 2. Let λ λ be the spectrum of a 1 ≥ ··· ≥ n quadratic form S, and λ1′ λn′ 1 be the spectrum of ≥ ··· ≥ − the quadratic form S′ obtained by restricting S to a given hyperplane Rn 1 Rn passing through the origin. Then: − ⊂ λ1 λ1′ λ2 λ2′ λn 1 λn′ 1 λn. ≥ ≥ ≥ ≥···≥ − ≥ − ≥ Proof. The maximum over all k-dimensional subspaces can- not be smaller than the maximum (of the same quantities) overW sub- spaces lying inside the hyperplane. This proves that λk λk′ . Apply- ing the same argument to S and subspaces of dimension≥ n k 1, we conclude that λ − λ .  − − − k+1 ≥− k′ 2. Euclidean Geometry 157 An ellipsoid in a Euclidean space is defined as the level-1 set E = x S(x) = 1 of a positive definite quadratic form, S. It follows from{ the| Spectral} Theorem that every ellipsoid can be transformed by an orthogonal transformation to principal axes: a normal form

2 2 x1 xn 2 + + 2 = 1, 0 < α1 αn. α1 · · · αn ≤···≤

The vectors x = αkek lie on the ellipsoid, and their lengths αk are called semiaxes± of E. They are related to the spectral numbers 1 λ1 λk > 0 of the quadratic form by αk− = √λk. From Corollaries≥ ··· ≥ 1 and 2 respectively, we obtain: Given two concentric ellipsoids enclosing one another, the semi- axes of the inner ellipsoid do not exceed corresponding semiaxes of the outer: If E E, then α α for all k = 1,...,n. ′ ⊂ k′ ≤ k Semiaxes of a given ellipsoid are interlaced by semiaxes of any section of it by a hyperplane passing through the center: If E = E Rn 1, then α α α for k = 1,...,n 1. ′ ∩ − k ≤ k′ ≤ k+1 − EXERCISES 377. Prove that every ellipsoid in Rn has n pairwise perpendicular hyper- planes of bilateral symmetry. 378. Given an ellipsoid E R3, find a plane passing through its center ⊂ and intersecting E in a circle.  379. Formulate and prove counterparts of Courant–Fischer’s minimax prin- ciple and Cauchy’s interlacing theorem for Hermitian forms. 380. Prove that semiaxes α α ... of an ellipsoid in Rn and semiaxes 1 ≤ 2 ≤ α′ α′ ... of its section by a linear subspaces of codimension k are k ≤ 2 ≤ related by the inequalities: αi α′ αi k, i =1,...,n k. ≤ i ≤ + − 381. From the Real Spectral Theorem, derive the Orthogonal Diagonal- ization Theorem as it is formulated in the Introduction, i.e. for pairs of quadratic forms on Rn, one of which is positive definite.

Small Oscillations

Let us consider the system of n identical masses m positioned at the vertices of a regular n-gon, which are cyclically connected by n iden- tical elastic springs, and can oscillate in the direction perpendicular to the plane of the n-gon. 158 Chapter 4. EIGENVALUES Assuming that the amplitudes of the oscillation are small, we can describe the motion of the masses as solutions to the following system of n second-order Ordinary Differential Equations (ODE for short) expressing Newton’s law of motion (mass acceleration = force): × mx¨ = k(x x ) k(x x ), 1 − 1 − n − 1 − 2 mx¨2 = k(x2 x1) k(x2 x3), − − − − . · · · mx¨n 1 = k(xn 1 xn 2) k(xn 1 xn) − − − − − − − − mx¨n = k(xn xn 1) k(xn x1). − − − − − Here x1,...,xn are the displacements of the n masses in the direction perpendicular to the plane, and k characterizes the rigidity of the springs. 6

Figure 43

In fact the above ODE system can be read off a pair of quadratic forms: the kinetic energy

mx˙ 2 mx˙ 2 mx˙ 2 K(x˙ )= 1 + 2 + + n , 2 2 · · · 2 and the potential energy

(x x )2 (x x )2 (x x )2 P (x)= k 1 − 2 + k 2 − 3 + + k n − 1 . 2 2 · · · 2 6More precisely (see Figure 43, where n = 4), we may assume that the springs are stretched, but the masses are confined on the vertical rods and can only slide along them without friction. When a string of length L is horizontal (∆x = 0), the stretching force T is compensated by the reactions of the rods. When ∆x =6 0, the horizontal component of the stretching force is still compensated, but the vertical component contributes to the right hand side of Newton’s equations. When ∆x is small, the contribution equals approximately −T (∆x)/L (so that k = −T/L). 2. Euclidean Geometry 159 Namely, for any conservative mechanical system with quadratic kinetic and potential energy functions 1 1 K(x˙ )= x˙ , Mx˙ , P (x)= x,Qx 2h i 2h i the equations of motion assume the form Mx¨ = Qx. − A linear change of variables x = Cy transforms the kinetic and potential energy functions to a new form with the matrices t t M ′ = C MC and Q′ = C QC. On the other hand, the same change of variables transforms the ODE system Mx¨ = Qx to t − MCy¨ = QCy. Multiplying by C we get M ′y¨ = Q′y and see that the relationship− between K, P and the ODE system− is preserved. The relationship is therefore intrinsic, i.e. independent on the choice of coordinates. Since the kinetic energy is positive we can apply the Orthogonal Diagonalization Theorem in order to transform K and P simultane- ously to 1 1 (X˙ 2 + ... + X˙ 2), and (λ X2 + ... + λ X2). 2 1 n 2 1 1 n n The corresponding ODE system splits into unlinked 2-nd order ODEs X¨ = λ X , ..., X¨ = λ X . 1 − 1 1 n − n n When the potential energy is also positive, we obtain a system of n unlinked harmonic oscillators with frequencies ω = √λ1, ..., √λn. Example 1:Harmonic oscillators. The equation X¨ = ω2X has solutions − X(t)= A cos ωt + B sin ωt, where A = X(0) and B = X˙ (0)/ω are arbitrary real constants. It is convenient to plot the solutions on the phase plane with coordinates (X,Y ) = (X, X/ω˙ ). In such coordinates, the equations of motion assume the form X˙ = ωY , Y˙ = ωX − and the solutions X(t) cos ωt sin ωt X(0) = . Y (t) sin ωt cos ωt Y (0)    −   160 Chapter 4. EIGENVALUES In other words (see Figure 44), the motion on the phase plane is described as clockwise rotation with the angular velocity ω. Since there is one trajectory through each point of the phase plane, the general theory of Ordinary Differential Equations (namely, the theorem about uniqueness and existence of solutions with given ini- tial conditions) guarantees that these are all the solutions to the ODE X¨ = ω2X. − Y

X

Figure 44

Let us now examine the behavior of our system of n masses cycli- cally connected by the springs. To find the common orthogonal basis of the pair of quadratic forms K and P , we first note that, since K is proportional to the standard Euclidean structure, it suffices to find an orthogonal basis of eigenvectors of the Q. In order to give a concise description of the ODE system mx¨ = Qx, introduce operator T : Rn Rn which cyclically shifts the co- t → ordinates: T (x1,x2,...,xn) = (x2,...,xn,x1). Then Q = k(T + 1 T − 2I). Note that the operator T is obviously orthogonal, and hence− unitary in the complexification Cn of the the space Rn. We will now construct its basis of eigenvectors, which should be called 7 k n the Fourier basis. Namely, let xk = ζ where ζ = 1. Then the se- quence xk is repeating every n terms, and xk+1 = ζxk fo all k Z. { } 2 n 2π√ 1∈l/n Thus T x = ζx, where x = (ζ,ζ ,...,ζ )t. When ζ = e − , l = 1, 2,...,n runs various nth roots of unity, we obtain n eigenvec- tors of the operator T , which corresponds to different eigenvalues, and hence are linearly independent. They are automatically pair- wise Hermitian orthogonal (since T is unitary), and happen to have the same Hermitian inner square, equal to n. Thus, when divided

7After French mathematician Joseph Fourier (1768–1830), and by analogy with the theory of Fourier series. 2. Euclidean Geometry 161 by √n, these vectors form an orthonormal basis in Cn. Besides, this basis is invariant under complex conjugation (because replacing the eigenvalue ζ with ζ¯ also conjugates the corresponding eigenvector). 1 Now, applying this to Q = k(T + T − 2I), we conclude that Q is diagonal in the Fourier basis with the eigenvalues−

1 2 k(ζ +ζ− 2) = 2k(cos(2πl/n) 1) = 4k sin πl/n, l = 1, 2,...,n. − − − When ζ = ζ¯, this pair of roots of unity yields the same eigenvalue of Q, and the6 real and imaginary parts of the Fourier eigenvector x = (ζ,...,ζn)t span in Rn the 2-dimensional eigenplane of the operator Q. When ζ = 1 or 1 (the latter happens only when n is even), the corresponding eigenvalue− of Q is 0 and 4k respectively, and the eigenspace is 1-dimensional (spanned the respective− Fourier vectors (1,..., 1)t and ( 1, 1,..., 1, 1)t. The whole systems decomposes into superposition− of independent− “modes of oscillation” (patterns) described by the equations

k l X¨ = ω2X , where ω = 2 sin π , l = 1,...,n. l − l l l m n r

ab c d Figure 45

Example 2: n = 4. Here z = 1, 1, i. The value z = 1 cor- responds to the eigenvector (1, 1, 1, 1)− and± the eigenvalue 0. This “mode of oscillation” is described by the ODE X¨ = 0, and actually corresponds to the steady translation of the chain as a whole with the constant speed (Figure 45a). The value ζ = 1 corresponds to the eigenvector ( 1, 1, 1, 1) (Figure 45b) with the− frequency of − − oscillation 2 k/m. The values ζ = i correspond to the eigen- vectors ( i, 1, i, 1). Their real and± imaginary parts (0, 1, 0, 1) and (1, 0,± 1p,−0) (Figures∓ 45cd) span the plane of modes of oscillation− − with the same frequency 2k/m. The general motion of the system is the superposition of these four patterns. p 162 Chapter 4. EIGENVALUES Remark. In fact the oscillatory system we’ve just studied can be considered as a model of sound propagation in a one-dimensional crystal. One can similarly analyze propagation of sound waves in 2- dimensional membranes of rectangular or periodic (toroidal) shape, or in similar 3-dimesnional regions. Physicists often call the resulting picture: the superposition of independent sinusoidal waves, an ideal gas of phonons. Here “ideal gas” refers to independence of eigen- modes of oscillators (therefore behaving as noninterracting particles of a sparse gas), and “phonons” emphasises that the “particles” are rather bells producing sound waves of various frequencies. The mathematical aspect of this theory is even more general: the Orthogonal Diagonalization Theorem guarantees that small oscil- lations in any conservative mechanical system near a local minimum of potential energy are described as superposi- tions of independent harmonic oscillations.

EXERCISES 382. A mass m is suspended on a weightless rod of length l (as a clock pendulum), and is swinging without friction under the action of the force of gravity mg (where g is the gravitation constant). Show that the Newton equation of motion of the pendulum has the form lx¨ = g sin x, − where x is the angle the rod makes with the downmward vertical direction, and show that the frequency of small oscillations of the pendulum near the lower equlibrium (x = 0) is equal to g/l.  383. In the mass-spring chain (studiedp in the text) with n = 3, find fre- quencies and describe explicitly the modes of oscillations. 384. The same, for 6 masses positioned at the vertices of the regular hexagon (like the 6 carbon atoms in benzene molecules). ⋆ 385. Given n numbers C1,...,Cn (real or complex), we form from them an infinite periodic sequence Ck , ...,C 1, C0, C1,...,Cn, Cn+1,... , where { } − Ck+n = Ck. Let C denote the n n-matrix whose entries cij = Cj i. Prove × − that all such matrices (corresponding to different n-periodic sequences) are normal, that they commute, and find their common eigenvectors.  386.⋆ Study small oscillations of a 2-dimansional crystal of toroidal shape consisting of m n identical masses (positioned in m circular “rows” × and n circular “columns”, each interacting only with its four neighbors). 387. Using Courant–Fischer’s minimax principle, explain why a cracked bell sounds lower than the intact one. 3. Jordan Canonical Forms 163 3 Jordan Canonical Forms

Characteristic Polynomials and Root Spaces Let be a finite dimensional K-vector space. We do not assume that isV equipped with any structure in addition to the structure of a K- vectorV space. In this section, we study geometry of linear operators on . In other words, we study the problem of classification of linear V 1 operators A : up to similarity transformations A C− AC, where C standsV→V for arbitrary invertible linear transformations7→ of . V Let n = dim , and let A be the matrix of a linear operator with respect to someV basis of . Recall that V n n 1 det(λI A)= λ + p1λ − + + pn 1λ + pn − · · · − is called the characteristic polynomial of A. In fact it does not depend on the choice of a basis. Indeed, under a change x = Cx′ of coordinates, the matrix of a linear operator x Ax is transformed 1 7→ into the matrix C− AC similar to A. We have: 1 1 det(λI C− AC) = det[C− (λI A)C]= − − 1 (det C− ) det(λI A)(det C) = det(λI A). − − Therefore, the characteristic polynomial of a linear operator is well- defined (by the geometry of A). In particular, coefficients of the characteristic polynomial do not change under similarity transformations. Let λ K be a root of the characteristic polynomial. Then 0 ∈ det(λ0I A) = 0, and hence the system of homogeneous linear equa- tions Ax−= λ x has a non-trivial solution, x = 0. As before,we call 0 6 any such solution an eigenvector of A, and call λ0 the correspond- ing eigenvalue. All solutions to Ax = λ0x (including x = 0) form a linear subspace in , called the eigenspace of A corresponding to V the eigenvalue λ0. Let us change slightly our point of view on the eigenspace. It is the null space of the operator A λ0I. Consider powers of this − k operator and their null spaces. If (A λ0I) x = 0 for some k > 0, then (A λ I)lx = 0 for all l k. Thus− the null spaces are nested: − 0 ≥ Ker(A λ I) Ker(A λ I)2 Ker(A λ I)k . . . − 0 ⊂ − 0 ⊂···⊂ − 0 ⊂ On the other hand, since dim < , nested subspaces must stabi- lize, i.e. starting from some m>V 0,∞ we have: := Ker(A λ I)m = Ker(A λ I)m+1 = Wλ0 − 0 − 0 · · · 164 Chapter 4. EIGENVALUES We call the subspace a root space of the operator A, namely, Wλ0 the root space corresponding to the root λ0 of the characteristic polynomial. m Note that if x λ0 , then Ax λ0 , because (A λ0I) Ax = m ∈W ∈W − A(A λ0I) x = A0 = 0. Thus a root space is A-invariant. Denote − m by λ0 the range of (A λ0I) . It is also A-invariant, since if x = (A U λ I)my, then Ax =−A(A λ I)my = (A λ I)m(Ay). − 0 − 0 − 0 Lemma. = . V Wλ0 ⊕Uλ0 m Proof. Put B := (A λ0I) , so that λ0 = Ker B, λ0 = B( ). Let x = By Ker B−. Then Bx = 0,W i.e. y Ker B2U. But V 2 ∈ ∈ Ker B = Ker B by the assumption that Ker B = λ0 is the root space. Thus y Ker B, and hence x = By = 0. ThisW proves that Ker B B( )=∈ 0 . Therefore the subspace in spanned by Ker B and (∩ ) isV their{ direct} sum. On the other hand,V for any operator, dim KerB VB + dim B( ) = dim . Thus, the subspace spanned by Ker B and B( ) is theV whole spaceV . V V Corollary 1. For any λ = λ , the root space . 6 0 Wλ ⊂Uλ0 Proof. Indeed, λ is invariant with respect to A λ0I, but contains no eigenvectorsW of A with eigenvalue λ . Therefore− A λ I 0 − 0 and all powers of it are invertible on λ. Thus λ lies in the range of B = (A λ I)m. W W Uλ0 − 0 Corollary 2. Suppose that (λ λ )m1 . . . (λ λ )mr is the − 1 − r characteristic polynomial of A : , where λ1, . . . , λr K are pairwise distinct roots. ThenV→Vis the direct sum of root∈ spaces: V = . V Wλ1 ⊕···⊕Wλr Proof. From Corollary 1, it follows by induction on r, that = λ1 λr , where = λ1 λr . In particular, V is WA-invariant⊕···⊕W as the⊕U intersectionU ofU A-invariant∩···∩U subspaces, but U contains no eigenvectors of A with eigenvalues λ1, . . . , λr. Picking bases in each of the direct summands λi and in , we obtain a basis in , in which the matrix of A is block-diagonal.W U Therefore the characteristicV polynomial of A is the product of the characteristic polynomials of A restricted to the summands. So far we haven’t used the hypothesis that the characteristic polynomial of A factors into a product of λ λi. Invoking this hypothesis, we see that the factor of the characteristic− polynomial, corresponding to must have degree 0, and hence dim = 0. U U 3. Jordan Canonical Forms 165 Remarks. (1) We will see later that dimensions of the root spaces coincide with multiplicities of the roots: dim = m . Wλi i (2) The restriction of A to has the property that some power Wλi of A λiI vanishes. A linear operator some power of which vanishes is called− nilpotent. Our next task will be to study the geometry of nilpotent operators. (3) Our assumption that the characteristic polynomial factors completely over K is automatically satisfied in the case K = C due to the Fundamental Theorem of Algebra. Thus, we have proved for every linear operator on a finite dimensional complex vector space, that the space decomposes in a canonical fashion into the direct sum of invariant subspaces on each of which the operator differs from a nilpotent one by scalar summand.

EXERCISES 388. Let A, B : be two commuting linear operators, and p and q V →V two polynomials in one variable. Show that the operators p(A) and q(B) commute. 389. Prove that if A commutes with B, then root spaces of A are B- invariant.

390. Let λ0 be a root of the characteristic polynomial of an operator A, and m its multiplicity. What are possible values for the dimension of the eigenspace corresponding to λ0? 391. Let v be a non-zero vector, and a : K a non-zero linear ∈ V V → function. Find eigenvalues and eigenspaces of the operator x a(x)v. 7→ Nilpotent Operators Example. Introduce a nilpotent linear operator N : Kn Kn by describing its action on vectors of the standard basis: →

Nen = en 1,Nen 1 = en 2,...,Ne2 = e1,Ne1 = 0. − − − n n 1 Then N = 0 but N − = 0. We will call N, as well as any operator similar to it, a regular nilpotent6 operator. The matrix of N in the standard basis has the form 0 1 0 . . . 0 0 0 1 . . . 0  . . .  .  0 0 . . . 0 1   0 0 . . . 0 0      166 Chapter 4. EIGENVALUES

It has the range of dimension n 1 spanned by e1,..., en 1 and the − − null space of dimension 1 spanned by e1. Proposition. Let N : be a nilpotent operator on a K-vector space of finiteV→V dimension. Then the space can be decomposed into the direct sum of N-invariant subspaces, on each of which N is regular. Proof. We use induction on dim . When dim = 0, there is nothing to prove. Now consider the caseV when dim V> 0. V The range N( ) is N-invariant, and dim N( ) < dim V (since otherwise N couldV not be nilpotent). By the inductionV hypothesis, the space N( ) can be decomposed into the direct sum of N-invariant subspaces, onV each of which N is regular. Let l be the number of these (i) (i) subspaces, n1,...,nl their dimensions, and e1 ,..., eni a basis in the ith subspace such that N acts on the basis vectors as in Example:

e(i) e(i) 0. ni 7→ · · · 7→ 1 7→ (i) (i) Since each eni lies in the range of N, we can pick a vector eni+1 (i) (i) (1) (l) ∈V such that Neni+1 = eni . Note that e1 ,..., e1 form a basis in (Ker N) N( ). We complete it to a basis ∩ V (1) (l) (l+1) (r) e1 ,..., e1 , e1 ,..., e1 (i) of the whole null space Ker N. We claim that all the vectors ej form a basis in , and therefore the l + r subspaces V (i) (i) (i) Span(e1 ,..., eni , eni+1), i = 1,...,l,l + 1,...,r, (of which the last r l are 1-dimensional) form a decomposition of into the direct sum− with required properties. V To justify the claim, notice that the subspace spanned U ⊂V by n + + n = dim N( ) vectors e(i) with j > 1 is mapped by 1 · · · l V j N onto the space N( ). Therefore: (a) those vectors form a basis of , (b) dim = dimV N( ), and (c) Ker N = 0 . On the U U (i) V U ∩ { } other hand, vectors ej with j = 1 form a basis of Ker N, and since dim Ker N + dim N( ) = dim , together with the above basis of , they form a basis of V .  V U V Corollary 1. The matrix of a nilpotent operator in a suitable basis is block diagonal with regular diagonal blocks (as in Example) of certain sizes n n > 0. 1 ≥···≥ r 3. Jordan Canonical Forms 167 The basis in which the matrix has this form, as well as the de- composition into the direct sum of invariant subspaces as described in Proposition, are not canonical, since choices are involved on each step of induction. However, the dimensions n1 nr > 0 of the subspaces turn out to be uniquely determined≥ by··· the ≥ geometry of the operator. To see why, introduce the following Young tableaux (Figure 46). It consists of r rows of identical square cells. The lengths of the rows represent dimensions n1 n2 nr > 0 of invariant subspaces, and in the cells of each≥ row≥···≥ we place the basis vectors of the corresponding subspace, so that the operator N sends each vector to its left neighbor (and those of the leftmost column to 0).

e(1) e (1) e (1)e(1) e (1) 1 2 3 n−11 n 1 ee(2) e (2) e (2) e (2) 1 2 3 n2 e(3) e (3) e (3)e (3) 1 2 3 n3

(r−1) (r−1) e1 e2 (r) e1 Figure 46

The format of the tableaux is determined by the partition of the total number n of cells (equal to dim ) into the sum n1 + + nr of positive integers. Reading the sameV format by columns, we· · obtain · another partition n = m + + m , called transposed to the first 1 · · · d one, where m1 md > 0 are the heights of the columns. Obviously, two transposed≥ ··· ≥ partitions determine each other. (i) It follows from the way how the cells are filled with vectors ej , that the vectors in the columns 1 through k form a basis of the space Ker N k. Therefore

k k 1 m = dimKer N dim Ker N − , k = 1,...,d. k − Corollary 2. Consider the flag of subspaces defined by a nilpotent operator N : : V→V Ker N Ker N 2 Ker N d = ⊂ ⊂···⊂ V and the partition of n = dim V into the summands mk = dim Ker N k dim Ker N k 1. The summands of the transposed − − 168 Chapter 4. EIGENVALUES partition n = n1 + + nr are the dimensions of the regular nilpotent blocks of· ·N · (described in Proposition and Corollary 1). Corollary 3. The number of equivalence classes of nilpo- tent operators on a vector space of dimension n is equal to the number of partitions of n.

EXERCISES

392. Let n K[x] be the space of all polynomials of degree < n. Prove V ⊂ d that the differentiation : n n is a regular nilpotent operator. dx V →V 393. Find all matrices commuting with a regular nilpotent one. 394. Is there an n n-matrix A such that A2 = 0 but A3 = 0: (a) if n = 2? × 6 (b) if n = 3? 395. Classify similarity classes of nilpotent 4 4-matrices. × 396. An operator is called , if it differs from the identity by a nilpotent operator. Prove that the number of similarity classes of unipotent n n-matrices is equal to the number of partitions of n, and find this × number for n = 5.

The Jordan Theorem We proceed to classification of linear operators A : Cn Cn up to similarity transformations. → Theorem. Every complex matrix is similar to a block- diagonal normal form with each diagonal block of the form:

λ0 1 0 . . . 0 0 λ0 1 . . . 0  . . .  , λ C, 0 ∈  0 0 . . . λ0 1   0 0 . . . 0 λ   0    and such a normal form is unique up to permutations of the blocks. The block-diagonal matrices described in the theorem are called Jordan canonical forms (or Jordan normal forms). Their di- agonal blocks are called Jordan cells. It is instructive to analyze a Jordan canonical form before going into the proof of the theorem. The characteristic polynomial of a m Jordan cell is (λ λ0) where m is the size of the cell. The charac- teristic polynomial− of a block-diagonal matrix is equal to the product 3. Jordan Canonical Forms 169 of characteristic polynomials of the diagonal blocks. Therefore the characteristic polynomial of the whole Jordan canonical form is the mi product of factors (λ λi) , one per Jordan cell. Thus the diagonal entries of Jordan cells− are roots of the characteristic polynomial. Af- ter subtracting the scalar matrix λ0I, Jordan cells with λi = λ0 (and only these cells) become nilpotent. Therefore the root space λ0 is exactly the direct sum of those subspaces on which the JordanWcells with λi = λ0 operate. Proof of Theorem. Everything we need has been already es- tablished in the previous two subsections. Thanks to the Fundamental Theorem of Algebra, the character- istic polynomial det(λI A) of a complex n n-matrix A factors into − × n1 the product of powers of distinct linear factors: (λ λ1) (λ nr − n · · · − λr) . According to Corollary 2 of Lemma, the space C is decom- posed in a canonical fashion into the direct sum of Wλ1 ⊕···⊕Wλr A-invariant root subspaces. On each root subspace λi , the oper- ator A λ I is nilpotent. According to Proposition,W the root space − i ni is represented (in a non-canonical fashion) as the direct sum of Winvariant subspaces on each of which A λ I acts as a regular nilpo- − i tent operator. Since the scalar operator λiI leaves every subspace invariant, this means that λi is decomposed into the direct sum of A-invariant subspaces, onW each of which A acts as a Jordan cell with the eigenvalue λ0 = λi. Thus, existence of a basis in which A is described by a is established.

To prove uniqueness, note that the root spaces λi are intrin- sically determined by the operator A, and the partitionW of dim Wλi into the sizes of Jordan cells with the eigenvalue λi is uniquely deter- mined, according to Corollary 2 of Proposition, by the geometry of the operator A λiI nilpotent on λi . Therefore the exact structure of the Jordan normal− form of A (i.e.W the numbers and sizes of Jordan cells for each of the eigenvalues λi) is uniquely determined by A, and only the ordering of the diagonal blocks remains ambiguous.  Corollary 1. Dimensions of root spaces coincide Wλi with multiplicities of λi as roots of the characteristic poly- nomial. Corollary 2. If the characteristic polynomial of a com- plex matrix has only simple roots, then the matrix is diag- onalizable, i.e. is similar to a diagonal matrix. Corollary 3. Every operator A : Cn Cn in a suitable basis is described by the sum D + N of two→ commuting ma- 170 Chapter 4. EIGENVALUES trices, of which D is diagonal, and N strictly upper trian- gular. Corollary 4. Every operator on a complex vector space of finite dimension can be represented as the sum D + N of two commuting operators, of which D is diagonalizable and N nilpotent. Remark. We used that K = C only to factor the characteris- tic polynomial of the matrix A into linear factors. Therefore the same results hold true over any field K such that all non-constant polynomials from K[λ] factor into linear factors. Such fields are called algebraically closed. In fact (see [8]) every field K is con- tained in an algebraically closed field. Thus every linear operator A : Kn Kn can be brought to a Jordan normal form by transfor- → 1 mations A C− AC, where however entries of C and scalars λ0 in Jordan cells7→ may belong to a larger field F K.8 We will see how this works when K = R and F = C. ⊃

EXERCISES 397. Find Jordan normal forms of the following matrices: 

1 20 4 60 13 16 16 (a) 0 20 , (b) 3 5 0 , (c) 5 7 6 , " 2 2 1 # " −3 −6 1 # " −6 −8 −7 # − − − − − − − 3 0 8 4 2 10 7 12 2 (d) 3 1 6 , (e) −43 7 , (f) 3 − 4− 0 , " 2− 0 −5 # " −31 7 # " 2− 0 2 # − − − − 2 8 6 0 3 3 1 1 1 (g) −4 10 6 , (h) 1 8 6 , (i) 3 3− 3 , " −4 8 4 # " −2 14 10 # " −2 −2 2 # − − − − − − 1 1 2 1 1 1 3 7 3 (j) 3 −3 6 , (k) −5 21 17 , (l) 2 5− 2 , " 2 −2 4 # " −6 26 21 # " −4 −10 3 # − − − − − 8 30 14 9 22 6 4 5 2 (m) 6 19− 9 , (n) 1 4− 1 , (o) 2 2− 1 . " −6 −23 11 # " −8− 16 5 # " −1 −1 1 # − − − − −

398. Compute powers of Jordan cells.  399. Prove that if some power of a complex matrix is the identity, then the matrix is diagonalizable.

8For this, F does not have to be algebraically closed, but only needs to contain all roots of det(λI − A). 3. Jordan Canonical Forms 171 400. Prove that transposed square matrices are similar.

401. Prove that tr A = λi and det A = λi, where λ1,...,λn are all roots of the characteristic polynomial (repeated according to their multi- plicities). P Q 402. Prove that a square matrix satisfies its own characteristic equation; namely, if p denotes the characteristic polynomial of a matrix A, then p(A) = 0. (This identity is called the Hamilton–Cayley equation.)

The Real Case

Let A : Rn Rn be an R-linear operator. It acts9 on the com- plexification →Cn of the real space, and commutes with the complex conjugation operator σ : x + iy x iy. 7→ − The characteristic polynomial det(λI A) has real coefficients, − but its roots λi can be either real or come in pairs of complex con- jugated roots (of the same multiplicity). Consequently, the complex root spaces , which are defined as null spaces in Cn of sufficiently Wλi high powers of A λiI, come in two types. If λi is real, then the root space is real in the− sense that it is σ-invariant, and thus is the com- plexification of the real root space Rn. If λ is not real, and λ¯ is Wλi ∩ i i its complex conjugate, then λi and λ¯i are different root spaces of A, but they are transformedW into eachW other by σ. Indeed, σA = Aσ, ¯ d and σλi = λiσ. Therefore, if z λi , i.e. (A λiI) z = 0 for some d ∈W ¯ d − d, then 0 = σ(A λiI) z = (A λiI) σz, and hence σz λ¯i . This allows one to− obtain the following− improvement for the∈ Jordan W Canonical Form Theorem applied to real matrices. Theorem. A real linear operator A : Rn Rn can be repre- sented by the matrix in a Jordan normal→ form with respect to a basis of the complexified space Cn invariant under com- plex conjugation.

Proof. In the process of construction bases in λi in which A has a Jordan normal form, we can use the following procedure.W When λ is real, we take the real root space Rn and take in it a real i Wλi ∩ basis in which the matrix of A λiI is block-diagonal with regular nilpotent blocks. This is possible− due to Proposition applied to the case K = R. This real basis serves then as a σ-invariant complex basis in the complex root space λi . When λi and λ¯i is a pair of complex conjugated root spaces,W then we takeW a requiredW basis in

9Strictly speaking, it is the complexification AC of A that acts on Cn =(Rn)C, but we will denote it by the same letter A. 172 Chapter 4. EIGENVALUES one of them, and then apply σ to obtain such a basis in the other. Taken together, the bases form a σ-invariant set of vectors.  Of course, for each Jordan cell with a non-real eigenvalue λ0, there is another Jordan cell of the same size with the eigenvalue λ¯0. Moreover, if e1,..., em is the basis in the A- of the first cell, i.e. Aek = λ0ek + ek 1, k = 2,...,m, and Ae1 = λ0e1, then the A-invariant subspace corresponding− to the other cell comes with the complex conjugate basis e¯1,..., e¯m, where e¯k = σek. The direct sum := Span(e1,..., em, e¯1,..., e¯m) of the two subspaces is both A- andU σ-invariant and thus is a complexification of the real A-invariant subspace Rn. We use this to describe a real normal form for the action ofUA ∩on this subspace. Namely, let λ0 = α iβ, and write each basis vector ek in terms of its real and imaginary− part: e = u iv . Then the real vectors u k k − k k and vk form a real basis in the real part of the complex 2-dimensional space spanned by ek ande ¯k = uk + ivk. Thus, we obtain a basis n u1, v1, u2, v2,..., um, vm in the subspace R . The action of A on this basis is found from the formulas: U ∩

Au + iAv = (α iβ)(u + iv ) = (αu + βv )+ i( βu + αv ), 1 1 − 1 1 1 1 − 1 1 Auk + iAvk = (αuk + βvk + uk 1)+ i( βuk + αvk + vk 1),k> 1. − − − Corollary 1. A linear operator A : Rn Rn is repre- sented in a suitable basis by a block-diagonal→ matrix with the diagonal blocks that are either Jordan cells with real eigenvalues, or have the form (β = 0): 6 α β 1 0 0 . . . 0 0 β− α 0 1 0 . . . 0 0  0 0 α β 1 0 . . . 0  −  0 0 β α 0 1 . . . 0     . . .  .    0 . . . 0 α β 1 0   0 . . . 0 β− α 0 1     0 0 . . . 0 0 α β   −   0 0 . . . 0 0 β α      Corollary 2. If two real matrices are related by a com- plex similarity transformation, then they are related by a real similarity transformation. Remark. The proof meant here is that if two real matrices are similar over C then they have the same Jordan normal form, and 3. Jordan Canonical Forms 173 thus they are similar over R to the same real matrix, as described in Corollary 1. However, Corollary 2 can be proved directly, without a reference to the Jordan Canonical Form Theorem. Namely, if two real matrices, A and A′, are related by a complex similarity trans- 1 formation: A′ = C− AC, we can rewrite this as CA′ = AC, and taking C = B + iD where B and D are real, obtain: BA′ = AB and DA′ = AD. The problem now is that neither B nor D is guaranteed to be invertible. Yet, there must exist an invertible linear combina- tion E = λB + µD, for if the polynomial det(λB + µD) of λ and µ vanishes identically, then det(B + iD) = 0 too. For invertible E, we 1 have EA′ = AE and hence A′ = E− AE.

EXERCISES 403. Classify all linear operators in R2 up to linear changes of coordinates. a b 404. Consider real traceless 2 2-matrices as points in the 3- × c a − dimensional space with coordinates a,b,c. Sketch the partition of this space into similarity classes.

4. Linear Dynamical Systems 175 4 Linear Dynamical Systems

We consider here applications of Jordan Canonical Forms to dynam- ical systems, i.e. models of evolutionary processes. An evolutionary system can be described mathematically as a set, X, of states of the system, and a collection of maps gt2 : X X which describe the t1 → evolution of the states from the time moment t1 to the time moment t3 t2 t2, whereas the composition of gt2 with gt1 is required to coincide t3 with gt1 . One says that the dynamical sustem is time-independent (or stationary), if the maps gt1,t2 depend only on the time increment t2 t2 t1 t = t t , i.e. g = g − . Omitting the subscript 0, we then have 2 − 1 t1 0 a family of maps gt : X X satisfying gtgu = gt+u. → We will consider time-independent linear dynamical systems, i.e. such systems where the states form a vector space, , and the evolution maps are linear, and examine both cases: of discreteV time or continuous time. In the latter case, time takes on integer values t Z, and the evolution is described by iterations of an invertible linear∈ map G : . Namely, if the state of the system at t =0 is x(0) , then theV→V state of the system at the moment n Z is x(n)= Gnx(0).∈V In the case of continuous time, a time-independent∈ linear dynamical systems are described by a system of constant coefficients linear ordinary differential equations (ODE for short): x˙ = Ax, and the evolution maps are found by solving the system. We begin with the case of discrete time.

Iterations of Linear Maps

Example 1: Fibonacci numbers.. The sequence of integers

0, 1, 1, 2, 3, 5, 8, 13, 21, 34,..., formed by adding the previous two terms to obtaine the next one, is well-known under the name of Fibonacci sequence.10 It is the solution of the 2nd order linear recursion relation fn+1 = fn + fn 1, satisfying the intial conditions f0 = 0,f1 = 1. The sequence can− be recast as a trajectory of a linear dynamical system on the plane as follows. Append the pair fn,fn+1 of consecutive terms of t the sequence into a 2-column (xn,yn) and express the next pair as

10After Leonardo Fibonacci (c. 1170 – c. 1250). 176 Chapter 4. EIGENVALUES a function of the previous one: x 0 1 x x f n+1 = n , where n := n . yn+1 1 1 yn yn fn+1          We will now derive the general formula for the numbers fn (and for all solutions to the recursion relation). 0 1 The characteristic polynomial of the matrix G := is 1 1   λ2 λ 1. It has two roots λ = (1 √5)/2. The corresponding − − ± ± 1 eigenvectors can be taken in the form v = . Let v(0) be the ± λ  ±  vector of initial conditions (it is (0, 1)t for the Fibonacci sequence), and let it be written as a linear combination of the eigenvectors: n n n v(0) = C+v+ +C v . Then v(n) := G v(0) = C+λ+v+ +C λ v . Taking the first component− − of this vector equality, we obtain the− − gen-− eral formula, depending on two arbitrary constants, C , for solutions ± of the recursion relation xn+1 = xn + xn 1: − n n 1+ √5 1 √5 xn = C+ + C − . 2 ! − 2 !

For the Fibonacci sequence per se, from λ+C+ + λ C = 1, − − C+ + C = 0, we find C = 1/√5, and hence − ± ± n n 1 1+ √5 1 √5 fn = − . √5 " 2 ! − 2 ! #

Note that λ < 1, and λ+ > 1. Consequently, as n increases indef- initely, the| second’summand−| | | will tends to 0, and the first to infinity. Thus asymptotically (for large n) f [(1 + √5)/2]n/√5, while con- n ≈ secutive ratios fn+1/fn tend to λ+ = (1+ √5)/2, known as golden ratio. In fact the fractions 3/2, 5/3, 8/5, 13/8, 21/13, 34/21,... , i.e. fn+1/fn, are the best possible rational approximations to the golden 11 ratio with the denominators not eexceeding fn. 11The golden ratio often occurs in Nature. For example, on a pine cone, or a pineapple fruit, count the numbers of spirals (formed by the scales near the butt) going in clockwise and in counter-clockwise directions. Most likely you’ll find two consquitive Fibonacci numbers: 5 and 8, or 8 and 13. It is not surprising that such observations led to the mystical believes of our ancestors into magical properties of the golden ratio. In fact we shouldn’t judge them too harshely: Inspite of some plausible scientific models, the causes for occurences of Fibonacci numbers and the golden ratio in various plants are still not entirely clear. 4. Linear Dynamical Systems 177 In general, an (invertible) linear map G : Rm Rm defined a linear dynamical system: Given the initial state v(0),→ the state at the discrete time moment n Z is defined by v(n) = Gnv(0). In order to efficiently compute the∈ state v(n), we need therefore to compute powers of a linear map. According to the general theory, there exists an invertible com- 1 plex matrix C such that G = CJC− , where J is one of the Jordam Canonical Forms. Therefore

n 1 1 1 n 1 G = (CJC− )(CJC− ) . . . (CJC− )= CJ C− , and the problem reduces to that for J. Recall that J = D + N, the sum of a diagonal matrix D (whose diagonal entries are eigenvalues of G), and of a nilpotent N (in particular, N m = 0), commuting with D. Thus, by the binomial formula, we have a finite sum:

n n n n 1 n n 1 2 n n 3 3 J = (D+N) = D +nD − N + D − N + D − N + . 2 3 · · ·     Example 2: Powers of Jordan cells. In fact, to find explicitly the powers of J n it suffcices to do this for each Jordan cell ceparately:

n n n 1 n n 2 n n 3 λ 1 0 0 . . . λ nλ − 2 λ − 3 λ − . . . n n 1 n n 2 0 λ 1 0 . . . 0 λ nλ − 2 λ − . . .     n  n 1  0 0 λ 1 . . . = 0 0 λ nλ − . . .  n  0 0 0 λ ...   0 0 0 λ . . .   . . .       . . .      Example 3. Let us find the general formula for solutions of the recursive sequence an+1 = 4an 6an 1 + 4an 2 an 3. Put v(n)= t − − − − − (an 3, an 2, an 1, an) , Then v(n +1) = Gv(n), where − − − 01 00 00 10 G = .  00 01  1 4 6 4    − −  The reader is asked to check that

det(λI G)= λ4 4λ3 + 6λ2 4λ +1=(λ 1)4. − − − − Therefore G has only one eigenvalue, λ = 1, or multiplicity 4. Be- sides, the rank of I G is at least 3 due to the 3 3- − × 178 Chapter 4. EIGENVALUES which occurs in the right upper corner of G. This shows that the eigenspace has dimension 1, and thus the Jordan canonical form of G has only one Jordan cell, J, of size 4, with the eigenvalue 1. Exam- ining the matrix entries of J n from Example 2 (with λ = 1, m = 4), n we see that they all have the form k , where k = 0, 1, 2, 3. Note that as functions of n these binomial coefficients form a basis in the space  2 3 of polynomilas of the form C0 + C1n + C2n + C3n . We conclude n n 1 that matrix entries of G = CJ C− must be, as functions of n, polynomilas of degree 3. In particular, an must be a polynomial of n of degree 3. Since≤ solution sequences a must depend on ≤ { n} 4 arbitrary parameters a0, a1, a2, a3, and thus form a 4-dimensional space (the same as the space of polynomials of degree 3), we finally conlude that the general solution to the recursion equation≤ has the 2 3 form an = C0 +C1n+C2n +C3n , where Ci are arbitrary constants. Given specific values of a0, a1, a2, a3, one can find the corresponding values of C0,C1,C2,C3 by plugging n = 0, 1, 2, 3, and solving the system of 4 linear equations.

As it follows from Example 2 (and is illustrated by Examples 1 and 3), for any discrete dymanical system v(n +1) = Gv(n), components of the vector trajectories v(n)= Gnv(0) as func- n k 1 tions of n are linear combinations of the monomials λi n − , where λi is a roots of the characteristic polynomial of the operator G, and k does not exceed the multiplicity of this root. In particular, if all roots are simple, the egneral solution has m n the form v(n)= i=1 Ciλi vi, where vi is an eigenvector correspond- ing to the eigenvalue λi, and Ci are arbitrary constants. P EXERCISES

405. Find the general formula for the sequence an such that a0 = 2, a1 = 1, and an+1 = an + an 1 for all n 1. − ≥ 406. Find all solutions to the recursion equation an+1 =5an +6an 1. − 407. For the recursion equation an+1 = 6an 9an 1, express the general − − solutions as a function of n, a0 and a1. 

408. Given two sequences xn , yn satisfying xn =4xn +3yn, yn = { } { } +1 +1 3xn +2yn, and such that x0 = y0 = 13. Find the limit of the ratio xn/yn as n tends to: (a) + ; (b) .  ∞ −∞ 409. Prove that for a given G : , all trajectories x(n) of a given V→V { } dynamical system x(n +1) = Gx(n) form a linear subspace of dimension dim in the space of all vector-valued sequences. V n n ⋆ 3+√17 3 √17 410. For integer n> 0, prove that 2 + −2 is an odd integer.     4. Linear Dynamical Systems 179 Linear ODE Systems

Let x˙ 1 = a11x1 + ... + a1nxn ... x˙ 2 = an1x1 + ... + annxn be a linear homogeneous system of ordinary differential equations with constant (possibly complex) coefficients aij. It can be written in the matrix form as x˙ = Ax. Consider the infinite matrix series

t2A2 t3A3 tkAk etA := I + tA + + + ... + + ... 2 6 k! If M is an upper bound for the absolute values of the entries of A, then the matrix entries of tkAk/k! are bounded by nktkM k. It is easy to deduce from this that the series converges (at least as fast as the series for entM ).

Proposition. The solution to the system x˙ = Ax with the initial condition x(0) is given by the formula x(t)= etAx(0).

k k Proof. Differentiating the series 0∞ t A /k! we find P d ∞ tk 1Ak ∞ tkAk+1 etA = − = = AetA dt (k 1)! k! Xk=1 − Xk=0 d tA tA and hence dt e x(0) = A(e x(0)). Thus x(t) satisfies the ODE system. At t = 0 we have e0Ax(0) = Ix(0) = x(0) and therefore the initial condition is also satisfied. 

Remark. This Proposition defins the exponential fucntion A eA of a matrix (or of a linear operator). The exponential fucntion7→ satisfies eAeB = eA+B provided that A and B commute. This can be proved the same way as this is done in the supplement Complex Numbers for the exponential function of the complex argument.

The proposition reduces the problem of solving the ODE system x˙ = Ax to computation of the exponential function etA of a matrix. 1 Notice that if A = CBC− then

k 1 1 1 1 k 1 A = CBC− CBC− CBC− ... = CBBB...C− = CB C− 180 Chapter 4. EIGENVALUES 1 and therefore exp(tA) = C− exp(tB)C. This observation reduces computation of etA to that of etB where the Jordan normal form of A can be taken on the role of B. Example 4. Let Λ be a diagonal matrix with the diagonal entries k λ1, ..., λn. Then Λ is a diagonal matrix with the diagonal entries k k λ1, ..., λn and hence

λ1t t2 e 0 ... etΛ = I + tΛ+ Λ2 + ... = ... . 2  ... 0 eλnt    Example 5. Let N be a nilpotent Jordan cell of size m. We have (for m = 4):

0 10 0 00 10 00 01 0 01 0 00 01 00 00 N = N 2 = N 3 =  0 00 1   00 00   00 00  0 00 0 00 00 00 00             and N 4 = 0. Generalizing to arbitrary m, we find:

t2 tm−1 1 t 2 ... (m 1)! 2 − 2 3 t t t  0 1 t 2 ...  etN = I + tN + N 2 + N 3 + ... = . 2 6 0 0 1 t ...    ...   ... 0 1      Let λI + N be the Jordan cell of size m with the eigenvalue λ. Then et(λI+N) = etλI etN = eλtetN . Here we use the multiplicative property of the function, valid since I and N commute. B 0 Finally, let A = be a block-diagonal square matrix. 0 D   Bk 0 etB 0 Then Ak = and respectively etA = . To- 0 Dk 0 etD gether with Examples 4 and 5, this shows how to compute the expo- nential function etJ for any Jordan normal form J: each Jordan cell has the form λI + N and should be replaced by eλtetN . Since any square matrix A can be reduced to one of the Jordan normal matri- tA tJ 1 ces J by similarity transformations, we conclude that e = Ce C− with suitable invertible C. 4. Linear Dynamical Systems 181 Applying Example 3 to ODE systems x˙ = Ax we arrive at the following conclusion. Suppose that the characteristic polynomial of A has n distinct roots λ1, ..., λn. Let C be the matrix whose columns are the corresponding n complex eigenvectors v1, ..., vn. Then the solution with the initial condition x(0) is given by the formula

eλ1t 0 ... 1 x(t)= C ... C− x(0).  ... 0 eλnt    1 t Notice that C− x(0) here (as well as x(0)) is a column c = (c1, ..., cn) tΛ λit of arbitrary constants, while the columns of Ce are e vi. We conclude that the general solution formula reads

λ1t λnt x(t)= c1e v1 + ... + cne vn.

This formula involves eigenvalues λi of A, the corresponding eigen- vectors vi, and the arbitrary complex constants ci, to be found from the system of linear equations x(0) = c1v1 + ... + cnvn. Example 6. The ODE system

x˙ 1 = 2x1 + x2 2 1 0 x˙ 2 = x1 + 3x2 x3 has the matrix A = 1 3 1 . x˙ = 2x + 3x − x  1 2 3−  3 2 3 − 1 −   The characteristic polynomial of A is λ3 8λ2 + 22λ 20. It has − 2 − a real root λ0 = 2, and factors as (λ 2)(λ 6λ + 10). Thus is has two complex conjugate roots λ −= 3 i.− The eigenvectors t ±t ± v0 = (1, 0, 1) and v = (1, 1 i, 2 i) are found from the systems of linear equations: ± ± ∓

2x + x = 2x 2x + x = (3 i)x 1 2 1 1 2 ± 1 x1 + 3x2 x3 = 2x2 and x1 + 3x2 x3 = (3 i)x2 . x + 2x +− 3x = 2x x + 2x +− 3x = (3 ± i)x − 1 2 3 3 − 1 2 3 ± 3 Thus the general complex solution is a linear combination of the solutions

1 1 1 2t (3+i)t (3 i)t e 0 , e 1+ i , e − 1 i  1   2 i   2+− i  −       182 Chapter 4. EIGENVALUES with arbitrary complex coefficients. Real solutions can be extracted from the complex ones by taking their real and imaginary parts:

cos t sin t e3t cos t sin t , e3t cos t + sin t .  2 cos t−+ sin t   2 sin t cos t  −     Thus the general real solution is described by the formulas

t 3t 3t x1(t) = c1e + c2e cos t + c3e sin t 3t 3t x2(t) = c2e (cos t sin t) + c3e (2 cos t + sin t) , x (t) = c et + c e3t(cos t −+ sin t) + c e3t(2 sin t cos t) 3 1 2 3 − where c1, c2, c3 are arbitrary real constants. At t = 0 we have x (0) = c + c , x (0) = c + 2c , x (0) = c + c c . 1 1 2 2 2 3 3 1 2 − 3 Given the initial values (x1(0),x2(0),x3(0)) the corresponding con- stants c1, c2, c3 can be found from this system of linear algebraic equations. In general, even if the characteristic polynomial of A has multiple roots, our theory (and Examples 4, 5) show that all components of solutions to the ODE system x˙ = Ax are expressible as linear combinations of functions tkeλt, tkeat cos bt, tkeat sin bt, where λ are real eigenvalues, a ib are complex eigenvalues, and k = 0, 1, 2, ... is to be smaller± than the multiplicity of the corresponding eigenvalue. This observation suggests to ap- proach the ODE systems with multiple eigenvalues in the following way avoiding explicit similarity transformation to the Jordan nor- mal form: look for the general solution in the form of linear combi- nations of these functions with arbitrary coefficients by substituting them into the equations, and find the relations between the arbitrary constants from the resulting system of linear algebraic equations. Example 7. The ODE system

x˙ 1 = 2x1 + x2 + x3 2 1 1 x˙ 2 = 3x1 2x2 3x3 has the matrix A = 3 2 3 − − −  − − −  x˙ 3 = 2x1 + 2x2 + 3x3 2 2 3   with the characteristic polynomial λ3 3λ2 +3λ 1 = (λ 1)3. Thus we can look for solutions in the form− − −

t 2 t 2 t 2 x1 = e (a1+b1t+c1t ), x2 = e (a2+b2t+c2t ), x3 = e (a3+b3t+c3t ). 4. Linear Dynamical Systems 183 Substituting into the ODE system, and introducing the notation ai =: A, bi =: B, ci =: C, we get (omitting the factor et): P P P 2 2 (a1 + b1)+(b1 + 2c1)t + c1t =(a1 + A)+(b1 + B)t +(c1 + C)t , 2 2 (a2 + b2)+(b2 + 2c2)t + c2t = −(3A − a2) − (3B − b2)t − (3C − c2)t , 2 2 (a3 + b3)+(b3 + 2c3)t + c3t = (2A + a3)+(2B + b3)t + (2C + c3)t . Introducing the notation P := A + Bt + Ct2, we rewrite the system of 9 linear equations in 9 unknowns in the form

b + 2c t = P, b + 2c t = 3P, b + 2c t = 2P. 1 1 2 2 − 3 3 This yields b = A, b = 3A, b = 2A, and (since B = A 3A+2A = 1 2 − 3 − 0), we find c1 = c2 = c3 = 0. Thus, the general solution, depending on 3 arbitrary constants a , a 2, a , reads: 1 − 3 t x1 = e (a1 + (a1 + a2 + a3)t) x = et(a 3(a + a + a )t) 2 2 − 1 2 3 t x3 = e (a3 + 2(a1 + a2 + a3)t).

In particular, since t2 does not occur in the formulas, we can conclude that the Jordan form of our matrix has only Jordan cells of size 1 or 2 (and hence one of each, for the total size must be 3):

1 1 0 J = 0 1 0 .  0 0 1    EXERCISES 411.⋆ Give an example of (non-commuting) 2 2 matrices A and B for × which eAeB = eA+B.  6 412. Solve the following ODE systems. Find the solution satisfying the initial condition x1(0) = 1, x2(0) = 0, x3(0) = 0:

(a) λ1 =1 (b) λ1 =1 (c) λ1 =1 x˙ =3x x + x x˙ = 3x +4x 2x x˙ = x x x 1 1 − 2 3 1 − 1 2 − 3 1 1 − 2 − 3 x˙ 2 = x1 + x2 + x3 x˙ 2 = x1 + x3 x˙ 2 = x1 + x2 x˙ =4x x +4x x˙ =6x 6x +5x x˙ =3x + x 3 1 − 2 3 3 1 − 2 3 3 1 3

(d) λ1 =2 (e) λ1 =1 x˙ =4x x x x˙ = x + x 2x 1 1 − 2 − 3 1 − 1 2 − 3 x˙ 2 = x1 +2x2 x3 x˙ 2 =4x1 + x2 x˙ = x x +2− x x˙ =2x + x x 3 1 − 2 3 3 1 2 − 3 184 Chapter 4. EIGENVALUES

(f) λ1 =2 (g) λ1 =1 x˙ =4x x x˙ =2x x x 1 1 − 2 1 1 − 2 − 3 . x˙ 2 =3x1 + x2 x3 x˙ 2 =2x1 x2 2x3 x˙ = x + x − x˙ = x −+ x −+2x 3 1 3 3 − 1 2 3 413. Find the Jordan canonical forms of the 3 3-matrices of the ODE × systems (a–g) from the previous exercise. 414.⋆ Prove the following identity: det eA = etr A. 

Higher Order Linear ODE A linear homogeneous constant coefficient n-th order ODE

n n 1 d d − d n x + a1 n 1 x + ... + an 1 x + anx = 0 dt dt − − dt can be rewritten, by introducing the notations

n 1 n 1 x1 := x, x2 :=x, ˙ x2 :=x, ¨ ..., xn 1 := d − x/dt − , − as a system x˙ = Ax of n first order ODE with the matrix 0 1 0 ... 0 0 0 10 ... A =  ...  ,  0 ... 0 0 1   a a ... a a   n n 1 2 1   − − − − −  Then our theory of linear ODE system applies. There are however some simplifications which are due to the special form of the matrix A. First, computing the characteristic polynomial of A we find

n n 1 det(λI A)= λ + a1λ − + ... + an 1λ + an. − − Thus the polynomial can be easily read off the high order differential equation. Next, let λ1, ..., λr be the roots of the characteristic poly- nomial, and m1, ..., mr their multiplicities (m1 + ... + mr = n). Then it is clear that the solutions x(t)= x1(t) must have the form

λ1t λrt e P1(t)+ ... + e Pr(t),

mi 1 where Pi = a0 + a1t + ... + ami 1t − is a polynomial of degree − < mi. The total number of arbitrary coefficients in these polyno- mials equals m1 + ... + mr = n. On the other hand, the general solution to the n-th order ODE must depend on n arbitrary initial 4. Linear Dynamical Systems 185 (n 1) values (x(0), x˙(0), ..., x − (0)). This can be justified by the general Existence and Uniqueness Theorem for solutions of ODE systems. We conclude that each of the functions

λ1t λ1t λ1t m1 1 λrt λrt λrt mr 1 e , e t, ..., e t − , ..., e , e t, ..., e t − must satisfy our differential equation, and any (complex) solution is uniquely written as a linear combination of these functions with suitable (complex) coefficients. (In other words, these functions form a basis of the space of complex solutions to the differential equation.) Example 8: x(xii) 3x(viii) +3x(iv) x = 0 has the characteristic polynomial λ12 3λ8 +3− λ4 1 = (λ 1)−3(λ+1)3(λ i)3(λ+i)3. The following 12 functions− form− therefore− a complex basis− of the solution space:

t t 2 t t t 2 t it it 2 it it it 2 it e , te , t e , e− , te− , t e− , e , te , t e , e− , te− , t e− .

Of course, a basis of the space of real solutions is obtained by taking real and imaginary parts of the complex basis:

t t 2 t t t 2 t e , te , t e , e− , te− , t e− , cos t, sin t, t cos t, t sin t, t2 cos t, t2 sin t.

λit k Remark. The fact that the functions e t , k

EXERCISES 415. Solve the following ODE, and find the solution satisfying the initial (n 1) (3) condition x(0) = 1, x˙(0) = 0, ..., x − (0)=0: (a) x 8x = 0, − (b) x(4) +4x =0, (c) x(6) + 64x =0, (d) x(5) 10x(3) +9x =0, − (e) x(3) 3x ˙ 2x =0, (f) x(5) 6x(4) + x(3) =0, − − − (g) x(5) +8x(3) + 16x ˙ =0, (h) x(4) +4¨x +3x =0.

416. Rewrite the ODE system

x¨1 + 4x ˙ 1 2x1 2x ˙ 2 x2 =0 x¨ 4x ˙ − x¨ +− 2x ˙ +2− x =0 1 − 1 − 2 2 2 of two 2-nd order equations in the form of a linear ODE system x˙ = Ax of four 1-st order equations and solve it. n n 1 417. Show that every polynomial λ +a λ − + +an is the charactersitic 1 ··· polynomial of an n n-matrix. ×