Short notes on

by Sanand D

————————————————————————————————————————-

1 Vector spaces, independence, , basis, linear operators

1.1 Introduction Definition 1.1. Let V be a set whose elements are called vectors with an operation of vector addition (+) such that + is commutative and associative and the following properties are satisfied

• {0} ∈ V such that 0 + v = v ∀v ∈ V.

• For all v ∈ V, −v ∈ V such that v + (−v) = 0.

• For all v1, v2 ∈ V, v1 + v2 ∈ V. Let F be a field such that there is an operation of scalar multiplication (.) between elements of F and V satisfying the following properties

• 1.v = v ∀v ∈ V, 1 ∈ F

• (a1a2)v = a1(a2v), ∀a1, a2 ∈ F • a(v + w) = av + aw, ∀a ∈ F and ∀v, w ∈ V

• (a1 + a2)v = a1v + a2v, ∀a1, a2 ∈ F. A set V satisfying the properties above is said to be a over a field F.

Example 1.2. Cn, Rn, Cn×m, Rn×m, solutions of homogeneous linear equations, solutions of homo- geneous odes, set of real/complex valued continuous/differentiable/analytic functions are examples of vector spaces. A set of with real/complex coefficients form a vector space over real/complex numbers.

Definition 1.3. A subset W of V satisfying the properties above is called a subspace of V.

Example 1.4. R, R2 form a subspace of R3. A set of differentiable functions form a subspace of a vector space of continuous functions. A set of complex polynomials of degree at most n form a subspace of C[x]

(Geometrically, one can think of vector spaces as euclidean spaces and subspaces are planes passing through the origin.) Suppose v1, . . . , vk ∈ V. Then < v1, . . . , vk > is the span of {v1, . . . , vk} which is a collection of all linear combinations of {v1, . . . , vk}. The span of any set forms a subspace.

Definition 1.5. A set of non-zero vectors v1, . . . , vk is said to be independent if α1v1+...+αkvk = 0 implies that all αis (1 ≤ i ≤ k) are equal to zero. A set of vectors which is not independent is said to be dependent. (Geometrically if v depends on v1, . . . , vk, then v lies in the space spanned by v1, . . . , vk. Whereas if v is independent on v1, . . . , vk, then v lies outside the space spanned by v1, . . . , vk.) Definition 1.6. A maximal linearly independent set is called a basis.

1 Basis is not unique. By the definition of basis and linear independence, each v ∈ V can be represented as a unique linear combination of basis vectors. Any linearly independent set can be extended to a basis by adjoining linearly independent vectors to the previous set. [ ] [ ] [ ] T T T Example 1.7. e1 = 1 0 ... 0 , e2 = 0 1 ... 0 , . . . , en = 0 0 ... 1 form a basis for Cn. This is called the standard basis.

Definition 1.8. Dimension of a vector space is the cardinality of its basis.

Example 1.9. Dimension of Cn over C is n. Dimension of C[x] over C is infinite with basis 1, x, . . . , xn,.... Dimension of a subspace of polynomials of degree at most n is n + 1 with basis 1, x, . . . , xn.

If V1 and V2 are two subspaces of V, then V1 +V2 and V1 ∩V2 are subspaces of V. By choosing a basis for V1 and extending to basis of V by taking linearly independent vectors in V2 which are not in V1, it follows that dim(V1 + V2) = dim(V1)+ dimV2− dim (V1 ∩ V2)

Definition 1.10. If V1 + V2 = V and V1 ∩ V2 = {0}, then we say that V is a direct sum of V1 and V2. It is denoted by V = V1 ⊕ V2. Example 1.11. Cn = C ⊕ C ⊕ ... ⊕ C i.e. direct sum of n copies of C.

1.2 Co-ordinates and linear maps on vector spaces

Co-ordinates: Let {v1, . . . , vn} be a basis of V. Let v ∈ V. Therefore, v can be expressed uniquely as[ a linear combination] of vis, v = α1v1 + ... + αnvn. Thus, w.r.t. this basis, v has co-ordinates T α1 α2 . . . αn . Matrix representation of linear operators: Let A : V → V. A is said to be linear if for all v, w ∈ V and c1, c2 ∈ F, A(c1v + c2w) = c1Av + c2Aw. Let {v1, . . . , vn} be a basis of V. Then for any v ∈ V, Av = α1Av1 +...+αnAvn. Thus, it is enough to define the action of A on basis vectors.   α1   [ ]  α2    Av = Av1 Av2 . . . Avn  .  . (1)  .  αn

Let Avi = a1iv1 + ... + anivn for 1 ≤ i ≤ n. Then, the matrix representation of A w.r.t. the basis above is   a11 a12 . . . a1n    a21 a22 . . . a2n    A =  ......  (2)  ......  an1 an2 . . . ann and, co-ordinates of Av w.r.t. the given basis are     a11 a12 . . . a1n α1      a21 a22 . . . a2n   α2      Av =  ......   .  . (3)  ......   .  an1 an2 . . . ann αn

Definition 1.12. Kernel of an operator A : V → W is the set of vectors v ∈ V such that Av = 0.

2 Kernel of an operator A is denoted by ker(A) and it is a subspace of V. Image of A is set of all w ∈ W such that w = Av for some v ∈ V. Im(A) is also a subspace. When A is represented by a matrix w.r.t. some basis, then any vector in the image space of A is a linear combination of columns of A. Thus, Im(A) is spanned by columns of A.

Definition 1.13. Rank of a matrix is equal to the number of its linearly independent columns. Consequently, it is equal to the dimension of the image space of A.

Thus, Ax = b has a solution iff b lies in the column span of A iff the augmented matrix [A b] has the same rank as that of A. Note that A is onto ⇔ the rank of A is equal to the dimension of the codomain space and A is 1 − 1 ⇔ the dimension of ker(A) is equal to zero. It is 1 − 1 and onto ⇔ the dimension of ker(A) is equal to zero and the rank of A is equal to the dimension of the codomain space. (Note that Im(A) is a subspace of the codomain. If rank(A) = dimension of Im(A) is equal to the dimension of the codomain, then Im(A) = codomain space.)

Theorem 1.14. (Rank-Nullity) Let A : V → V where V is an n−dimensional vector space. Then rank(A)+dim(ker(A)) = n.

Proof. Let v1, . . . , vk for a basis for ker(A). Extend this set to a full basis of V say v1, . . . , vn. Claim: {Avk+1, . . . , Avn} are linearly independent. Suppose not, therefore, αk+1Avk+1 +...+αnAvn = 0. This implies that A(αk+1vk+1 +...+αnvn) = 0, hence αk+1vk+1 +...+αnvn ∈ ker(A). Therefore, αk+1vk+1 +...+αnvn = β1v1 +...+βkvk. But this contradicts the linear independence of {v1, . . . , vn}. Therefore, {Avk+1, . . . , Avn} are linearly independent. Since rank (A) = dim(Im(A)), rank(A) = n − k and dim(ker(A)) = k.

Suppose A : V → W. Let {v1, . . . , vn} be a basis of V and {w1, . . . , wm} be a basis for W. Then to find a matrix representation for A, it is enough to define Avi (1 ≤ i ≤ n). Avi = a1iw1 + ... + amiwm (1 ≤ i ≤ n). Thus, A is represented by an m × n matrix A = [aij]. Rank- Nullity theorem holds for these linear maps as well where n is the dimension of the domain.

Dual spaces: For a vector space V, its dual V∗ consists of all linear maps from V to F (the underlying field). If V is an n dimensional vector space whose elements are represented by column vectors w.r.t. some basis, then elements of V∗ are represented by 1 × n matrices which can be thought of as row vectors. Thus, the dual space of column vectors is row vectors and vice versa. { } V { ∗ ∗} V∗ ∗ For any basis v1, . . . , vn of , there exists a dual basis v1, . . . , vn of such that vi (vj) = 1 if ∗ ̸ V → W ∗ W∗ → V∗ i = j and vi (vj) = 0 if i = j. A A : induces a map A : . The action ∗ ∗ ∗ ∗ of A is defined as A w (v) := w (Av). Let A = [aij] be a matrix representation of A w.r.t. bases ∗ v1, . . . , vn and w1, . . . , wm of V and W respectively and [bij] be a representation of A w.r.t. dual ∗ ∗ ∗ W∗ ∗ bases. Consider the action of A on basis vectors w1, . . . , wm of . Since A = [bij] is a matrix representation of A∗,

∗ ∗ ∗ ∗ A wi = (b1iv1 + ... + bnivn) (4) ⇒ ∗ ∗ ∗ ∗ A wi (vj) = (b1iv1 + ... + bnivn)vj = bji. (5)

Observe that since A∗w∗(v) = w∗(Av) from the definition of A∗,

∗ ∗ ∗ ∗ A wi (vj) = wi (Avj) = wi (a1jw1 + ... + amjwm) = aij (6) (7)

∗ Thus, aij = bji and the matrix representation of induced map A is the transpose of A.

Observe that (V∗)∗ = V i.e. the double dual of V is V itself for finite dimensional vector spaces.

3 2 Change of basis

Let e1, . . . , en be the standard basis of a vector space V. Let this be an old basis and let Aold be a matrix representation of a linear operator A : V → V. Let v ∈ V and v = α1e1 + ... + αnen.

  α1   [ ]  α2    v = e1 e2 . . . en  .  (8)  .  αn [ ] T Therefore, coordinates of v w.r.t. the old basis are given by vold = α1 α2 . . . αn . Since A is linear operator, to find the image of any vector in V, it is enough to define the action of A on the basis vectors, in this case, eis. Thus,   α1   [ ]  α2    Av = α1Ae1 + ... + αnAen = Ae1 Ae2 . . . Aen  .  . (9)  .  αn

Let

Aei = a1ie1 + ... + anien for 1 ≤ i ≤ n.

Therefore, the coordinate representation for A in the standard basis is   a11 a12 . . . a1n    a21 a22 . . . a2n    Aold =  ......  (10)  ......  an1 an2 . . . ann

Let w = Av.     a11 a12 . . . a1n α1      a21 a22 . . . a2n   α2      wold =  ......   .  . (11)  ......   .  an1 an2 . . . ann αn

Let {v1, . . . vn} be a new basis of V. Let

vi = t1ie1 + ... + tnien for 1 ≤ i ≤ n.

Let   t11 t12 . . . t1n   [ ] [ ]  t21 t22 . . . t2n    v1 v2 . . . vn = e1 e2 . . . en  ......  . (12)  ......  tn1 tn2 . . . tnn

4 Let T = [tij]. Thus, [ ] [ ] v v . . . v T −1 = e e . . . e (13) 1 2 n  1 2 n   α1 α1     [ ]  α2  [ ]  α2  −1     ⇒ v1 v2 . . . vn T  .  = e1 e2 . . . en  .  . (14)  .   .  αn αn     α1 β1  .   .  Let T −1   =  .  .   .  α β n n [ ] T Let v = β1v1 + ... + βnvn = α1e1 + ... + αnen and vnew = β1 β2 . . . βn is the new coordinate vector.

Let Anew be the matrix representation of A w.r.t. the new basis. Let w = Av.

−1 −1 wnew = T wold, vnew = T vold (15) −1 −1 wnew = T wold = T Aoldvold (16) −1 wnew = Anewvnew = AnewT vold (17) −1 −1 ⇒ AnewT vold = T Aoldvold (18) −1 ⇒ TAnewT vold = Aoldvold for all vold ∈ V. (19)

−1 −1 This implies that TAnewT vold = Aold, hence, Anew = T AoldT . Finally, let A : V → W where {e1, . . . , en} is a basis for V and {e1, . . . , em} is a basis for W. By the same arguments used earlier, taking the action of A, on the basis vectors, one obtains a matrix representation of A   a11 a12 . . . a1n    a21 a22 . . . a2n    Aold =  ......  (20)  ......  am1 an2 . . . amn

Let {v1, . . . , vn} be a new basis for V and {w1, . . . , wm} be a new basis for W. Let [ ] [ ] v v . . . v = e e . . . e T and (21) [1 2 n ] 1[ 2 n ] w1 w2 . . . wn = e1 e2 . . . em S. (22)

−1 −1 Thus, if v ∈ V and w ∈ W, vnew = T vold and wnew = S wold. Let Anew be the matrix representation of A w.r.t. the new bases and let w = Av. Therefore,

−1 −1 wnew = S wold, vnew = T vold (23) −1 −1 wnew = S wold = S Aoldvold (24) −1 wnew = Anewvnew = AnewT vold (25) −1 −1 ⇒ AnewT vold = S Aoldvold (26) −1 ⇒ SAnewT vold = Aoldvold for all vold ∈ V. (27)

−1 −1 This implies that SAnewT vold = Aold, hence, Anew = S AoldT . Exercise: Suppose vectors in a vector space are represented as rows instead of columns. Find the affect of change of basis on coordinates and matrix representations.

5 3 Invariant subspaces and canonical forms

Definition 3.1. A subspace W ⊂ V is said to be A−invariant if AW ⊆ W i.e. for all w ∈ W, Aw ∈ W.

Let W ⊂ V be an A−. Then choosing a basis for W and extending it to a basis for V, A can be represented in the following matrix form due to invariance of W under the action of A. [ ] A A A = 1 12 . 0 A2

If W1 and W2 are two A−invariant subspaces such that V = W1 ⊕ W2, then choosing bases for W1 and W2, [ ] A 0 A = 1 . 0 A2

3.1 Eigenvalues and eigenvectors 1. Characteristic and Cayley-Hamilton theorem.

Definition 3.2. The characteristic polynomial of a linear operator A : V → V is defined by pA(s) = det(sI − A) for any matrix representation of A.

Theorem 3.3. (Cayley-Hamilton) pA(A) = 0.

− − − Proof. (sI A).Adj(sI ∑A) = det(sI A).I = pA(s).I. − n−1 i Let B =Adj(sI A) = i=0 s Bi. Therefore,

n∑−1 n∑−1 n∑−1 i i+1 i (sI − A).Adj(sI − A) = (sI − A). s Bi = s Bi − s ABi i=0 i=0 i=0 n∑−1 n i = s Bn−1 + s (Bi−1 − ABi) − AB0. i=1

n n−1 Let pA(s) = s + an−1s + ... + a0. Thus, equating coefficients of pA(s).I and (sI − A).Adj(sI − A),

Bn−1 = I,Bi−1 − ABi = ai.I for all 1 ≤ i ≤ n − 1, −AB0 = a0.I. (28)

Multiplying coefficients of si by Ai and summing up one obtains

n∑−1 n i i+1 n n−1 A Bn−1 + (A Bi−1 − A Bi) − AB0 = A + an−1A + ... + a1A + a0I. i=1 Note that the terms in the left hand side cancel each other (telescopic sum) hence, zero and the right hand side is pA(A) which proves the theorem.

The characteristic polynomial does not change after a change of basis since det(sI − A) = det(T −1(sI − A)T ) = det(sI − T −1AT ). Matrices A and T −1AT are said to be similar under the similarity transform T .

2. The minimal polynomial divides the characteristic polynomial.

6 Definition 3.4. The minimal polynomial of a linear operator A : V → V is the monic polynomial mA(s) of least degree such that mA(A) = 0.

By definition, degree of pA ≥ degree of mA. Suppose pA(s) = mA(s).q(s) + r(s) where r ≠ 0. But pA(A) = mA(A) = 0 implies r(A) = 0. But r(s) has degree strictly less than the degree of mA(s) by the division algorithm which is a contradiction to the minimality of mA. Therefore, r(s) = 0 and mA divides pA. 3. The minimal polynomial of a vector divides the minimal polynomial of A. Definition 3.5. The minimal polynomial of a vector v ∈ V w.r.t. a linear operator A : V → V is the monic polynomial pv(s) of least degree such that pv(A).v = 0.

By definition, degree of mA ≥ degree of pv. Suppose mA(s) = pv(s).q(s) + r(s). Then mA(A).v = pv(A).q(A).v + r(A).v = 0. But this implies r(A).v = 0 and degree of r < the degree of pv a contradiction. Therefore, r = 0 and pv|mA. One can similarly define the minimal polynomial pW(s) of a subspace W ⊂ V w.r.t. a linear operator A as the monic polynomial of least degree such that pW(A) vanishes on the subspace n−1 W1 = W + AW + ... + A W. 4. Eigenvalues and eigenvectors Definition 3.6. Eigenvalues of a linear operator are roots of the characteristic polynomial pA(s).

Suppose λ is a root of the characteristic polynomial pA(s). Therefore, λ is an eigenvalue of A and pA(λ) = 0. This implies that the matrix λI − A is singular since the determinant of (λI − A) is pA(λ) = 0. Therefore, there exists v ∈ ker(λI − A) such that (λI − A)v = 0. Hence, Av = λv. This is called an eigenvector associated with eigenvalue λ. All vectors in the kernel of (λI − A) are eigenvectors associated with an eigenvalue λ.

n1 n Definition 3.7. Let pA(s) = (s − λ1) ... (s − λl) l . Then n1, . . . , nl are called algebraic multiplicities of eigenvalues λ1, . . . , λl respectively. Let gi = dim(ker(λiI − A)) (i = 1, . . . , l). g1, . . . gl are called geometric multiplicities of eigenvalues λ1, . . . , λl respectively.

Lemma 3.8. For every eigenvalue λi of A, ni ≥ gi for 1 ≤ i ≤ l.

Proof. Since gi is a geometric multiplicity of λi, there are gi linearly independent eigenvectors associated with eigenvalue λi. Extend this set to a basis. W.r.t. this basis, [ ] λ I × ∗ A = i gi gi 0 C

g g Thus, pA(s) = (s − λi) i pC (s). This implies that (s − λi) i is a factor of pA(s) and this proves the lemma.

Lemma 3.9. Suppose A is a linear operator. If λ1, λ2, . . . , λk are distinct eigenvalues of A and v1, v2, . . . , vk are the corresponding eigenvectors, then v1, v2, . . . , vk are linearly independent. Proof. We proceed by induction. True for n = 1. Assume that it is true for n = k − 1. Suppose v1, . . . , vk is not a linearly independent set. Suppose vk = α1v1 + ... + αk−1vk−1. Thus, Avk = λkvk = α1λ1v1+...+αk−1λk−1vk−1. If λk = 0, then α1λ1v1+...+αk−1λk−1vk−1 = 0 a contradiction to the induction hypothesis. λ1 λk−1 Suppose λk ≠ 0. Thus, vk = α1v1+...+αk−1 vk−1. But vk = α1v1+...+αk−1vk−1. Thus, λk λk β1v1 + ... + βk−1vk−1 = 0. Therefore, again we get a contradiction to the induction hypothesis. Thus, v1, v2, . . . , vk are linearly independent.

7 Lemma 3.10. Let g be the sum of geometric multiplicities of eigenvalues of A. Then A has g linearly independent eigenvectors.

Proof. We proceed by induction. True for n = 1. Assume that it is true for n = g−1. Let v1, . . . , vg be eigenvectors of A. Suppose this is not a linearly independent set. Suppose vg = α1v1 +...+αkvk such that k ≤ g − 1 and all αis are non-zero. Thus, Avg = λgvg = α1λ1v1 + ... + αkλkvk. If λg = 0, then α1λ1v1 + ... + αkλkvk = 0. Since, αis are non-zero and v1, . . . , vk (k ≤ g − 1) are linearly independent by induction, all λis must be zero. Thus, all eigenvalues of A must be zero. Hence, if g is the geometric multiplicity, then there are g linearly independent eigenvectors associated with the zero eigenvalue by the definition of geometric multiplicity. Suppose λ ≠ 0. Thus, v = λ1 α v + ... + α λk v = α v + ... + α v . Case (i): If λ ≠ λ for g g λg 1 1 k λg k 1 1 k k g i at least one i (1 ≤ i ≤ k), then we get a contradiction to the induction hypothesis since v1, . . . , vg−1 are linearly independent. Thus, vg is linearly independent of v1, . . . , vg−1. Case (ii): λg = λ1 = ... = λk. Again we have only one eigenvalue of A and by the same arguments used above for the zero eigenvalue, there exists a linearly independent set of g eigenvectors by the definition of geometric multiplicity.

Theorem 3.11. A linear operator A is diagonalizable iff geometric multiplicity of all eigenvalues of A is equal to their algebraic multiplicities.

Proof. If A is diagonalizable, then there exists a matrix V such that V −1AV = D and columns of V form eigenvectors of A and geometric multiplicity of all eigenvalues of A is equal to their algebraic multiplicities. Conversely, if geometric multiplicity of all eigenvalues of A is equal to their algebraic multiplicities, then a set of eigenvectors can be selected which forms a basis of V by the previous lemma. Using this basis, A is diagonalizable.

3.2 Rational and Jordan canonical forms: Existence and uniqueness Lemma 3.12. (minimal polynomial of A = characteristic polynomial of A) ⇒ (geometric multi- plicity of all eigenvalues of A is equal to one).

Proof. Suppose A has an eigenvalue λ with geometric multiplicity greater than or equal to 2. Consider a subspace W spanned by two linearly independent eigenvectors associated with λ and extend this set to a basis. W.r.t. this basis, A is represented as [ ] λI × ∗ A = 2 2 0 A2 − Suppose q(s) = (s λ)mA2 (s) where mA2 (s) is the minimal polynomial of A2. It is clear that mA(s)|q(s), since [ ][ ] [ ] 0 ∗ ∗ ∗ 0 0 q(A) = (A − λI)m (A) = = A2 0 ∗ 0 0 0 0

− 2 Note that pA(s) = (s λ) pA2 (s) and degree of q(s) is strictly less than the degree of pA(s). Hence, mA(s) ≠ pA(s) since mA(s)|q(s). Definition 3.13. A subspace is said to be cyclic w.r.t. A if it is given by the span of {b, Ab, . . . , Akb} where b ∈ V.

Definition 3.14. A subspace W is said to be indecomposable w.r.t. A; if W¯ ⊆ W is any subspace of W, then W¯ ≠ W1 ⊕ W2 where both W1 and W2 are A−invariant and non-zero. In the following, we prove the existence of Rational and Jordan canonical forms and uniqueness under similarity transforms.

8 1. There exists a cyclic vector of A ⇒ the minimal polynomial of A is equal to the characteristic polynomial of A.

n−1 Proof. If a cyclic vector v ∈ V exists such that < v, Av, . . . , A v >= V, then pv = mA = pA.

Suppose {v, Av, . . . , An−1v} is a basis for V. Then with respect to this basis, A has a matrix representation   0 0 ... 0 −α0    1 0 ... 0 −α1     0 1 ... 0 −α2  A =   . (29)  ......   ......  0 0 ... 1 −αn−1

Changing the order of the basis vectors, i.e. choosing {An−1v, An−2v, . . . , Av, v} as a basis,   −αn−1 1 ... 0 0    −αn−2 0 ... 0 0     ......  A =   .  ......    −α1 0 ... 1 −α0 0 ... 0 0

Now vectors in V are represented by row vectors and A acts on these vectors from right (i.e. considering v as a row vector instead of column vector). Suppose < vT, vTA, . . . , vTAn−1 >= V. Then with respect to this basis,   0 1 ... 0 0    0 0 ... 0 0     ......  A =    ......   0 0 ... 0 1  −α0 −α1 ... −αn−2 −αn−1

And finally again changing the order of the basis {vTAn−1, vTAn−2, . . . , vTA, vT},   −αn−1 −αn−2 ... −α1 −α0    1 0 ... 0 0     0 1 ... 0 0  A =   .  ......   ......  0 0 ... 1 0

All four representations of A above are called companion form representations of a linear map.

2. Claim (i): V indecomposable ⇒ A has only one eigenvalue with geometric multiplicity equal to one.

Proof. If the geometric multiplicity is greater than one with eigenvectors v1, v2, then a sub- space v1 ⊕ v2 ⊆ V is decomposable into two A−invariant subspaces.

9 Claim (ii): V is A−indecomposable ⇒ V is A−cyclic (converse not true).

Proof. Suppose V indecomposable but not cyclic w.r.t A. Therefore, V does not have a k cyclic vector. Let V1 =< v1, Av1,...,A 1 v1 > be the maximal cyclic subspace of V such that V1 ⊂ V and V1 ≠ V. Consider v2 ∈ V and v2 ∈/ V1. Consider a cyclic subspace generated by v2 under the action of A say V2. Since V is indecomposable, V1 ∩ V2 ≠ k i {0}. Therefore, there exists k2 such that A 2 v2 ∈ V1 for some k2 > 1 and A v2 ∈/ V1 for i = 0, . . . , k2 − 1. Since V1 is the maximal A−invariant cyclic subspace, k2 < k1. We k k −1 know that {v1, Av1,...,A 1 v1, v2, Av2,...,A 2 v2} is a linearly independent set of vectors. Consider a subspace W generated by this linearly independent set of vectors. It is clear from the construction that this subspace is A− invariant. Observe that V1 ⊂ W. Suppose k1+1 k1 k2 k1 A v1 = α1v1 + ... + αk1 A v1 and A v2 = β1v1 + ... + βk1 A v1. Thus, w.r.t. the chosen basis for W, a matrix representation of AW is as follows   0 0 ... 0 α 0 0 ... 0 β  1 1   1 0 ... 0 α 0 0 ... 0 β   2 2   ......     ......     0 0 ... 1 α 0 0 ... 0 β   k1 k1    AW =  0 0 ... 0 0 0 0 ... 0 0  (30)    0 0 ... 0 0 1 0 ... 0 0     0 0 ... 0 0 0 1 ... 0 0     ......   ......  0 0 ... 0 0 0 0 ... 1 0

Since the above matrix has a row of zeros, it has a zero eigenvalue. Observe from the matrix k +1 k above that Av1,...,A 1 v1,A 2 v2 (i.e. first k1 + 1 columns and the last column) form a linearly dependent set of vectors. Thus, there exists γ1, . . . , γk1+1, γ such that

k1+1 k2 γ1Av1 + ... + γk1+1A v1 + γA v2 = 0 − ⇒ k1 k2 1 A(γ1v1 + ... + γk1+1A v1 + γA v2) = 0. (31)

k1 k2−1 Thus, γ1v1 +...+γk1+1A v1 +γA v2 is an eigenvector associated with the zero eigenvalue k −1 and it lies outside V1 (since A 2 v2 ∈/ V1) which implies that W is decomposable hence V is decomposable, a contradiction. Therefore, V must be cyclic.

n Since A has only one eigenvalue and V is A−cyclic, pA(s) = mA(s) = (s − λ) . Moreover, there exists v such that A.v = λv. Clearly, (A − λI)n = 0 and (A − λI)n−1 ≠ 0. Thus, there n−1 n−1 n−1 exists v1 such that (A − λI) v1 ≠ 0 and (A − λI)(A − λ) v1 = 0. Hence, (A − λI) v1 n−1 is an eigenvector. Consider a set {v1, (A−λI)v1,..., (A−λI) v1}. If this set is dependent, n−1 then α0v1 + ... + αn−1(A − λ) v1 = 0. Thus, q(A)v1 = 0 where degree of q(A) is strictly n n n−1 less than n and q(A)|(A − λI) . Therefore, q(A) = (A − λI) 1 . But since (A − λI) v1 ≠ 0, n−1 there is a contradiction and the set {v1, (A − λ)v1,..., (A − λ) v1} is linearly independent. n−1 n−2 Choosing {(A − λ) v1, (A − λ) v1,..., (A − λ)v1, v1} as a basis,   λ 1 ... 0 0    0 λ . . . 0 0     ......  A = Jλ =   (32)  ......   0 0 . . . λ 1  0 0 ... 0 λ

10 3. Suppose A is a linear map on a vector space V. If V is indecomposable, we are in the previous item. Suppose it is decomposable. If it is cyclic, we are in item 1. Hence, suppose it is not cyclic. Let V1 be the maximal cyclic subspace of V. Now there exists Vˆ 2 such that V = V1 ⊕ Vˆ 2. Now follow the same procedure to decompose Vˆ 2 = V2 ⊕ Vˆ 3 where V2 is the ˆ maximal A−cyclic subspace of V2 and so on. Thus, V = V1 ⊕ V2 ⊕ ... ⊕ Vl where each Vi is A−cyclic. By item 1, there exists a basis obtained from cyclic vectors such that A can be decomposed into a block companion form   A1 0 ... 0    0 A2 ... 0    A =  ......  (33)  ......  0 0 ...Al

where each Ai is in companion form (item 1). Moreover, from item 1, for each Ai, the minimal polynomial of Ai is equal to the characteristic polynomial of Ai and hence, the geometric multiplicity of any eigenvalue of Ai is equal to one. | − Claim: mAi+1 mAi , where i = 1, . . . l 1 (existence of Rational canonical form).

| Proof. We will show that mA2 mA1 , the remaining identities can be proved using the exact

same arguments. Suppose mA2 does not divide mA1 . Therefore, mA2 (s) has a factor say − ˆ (s λ) which does not divide mA1 (s). k Let V1 =< v1, Av1,...,A 1 v1 > be the maximal A−invariant subspace of V and let Av2 = ˆ ∈ V ∈ V k1+1 k1 λv2 where v2 2 and v2 / 1. Suppose A v1 = α1v1 + ... + αk1+1A v1. Thus, k1+1 − k1 − − { k1 } mA1 (s) = s s αk1+1 ... α1. Let v = v1 + v2. Note that v1, Av1,...A v1, v2 is linearly independent. Consider a subspace spanned by these vectors. Consider the matrix k +1 [v Av . . . A 1 v]. W.r.t the chosen basis, it is represented as following (k1 + 2) × (k1 + 2) matrix   1 0 ... 0 α1    0 1 ... 0 α2     ......    . (34)  ......    0 0 ... 1 αk1+1 1 λˆ . . . (λˆ)k1 (λˆ)k1+1

Now the first k1 + 1 columns are linearly independent. The last column is dependent on the ⇔ ˆ k1+1 − ˆ k1 − − ˆ ˆ previous columns (λ) αk1+1(λ) ... α1 = 0 i.e. mA1 (λ) = 0. But λ is not a { k1+1 } root of mA1 (s). Hence, the columns must be independent. Therefore, v, Av, . . . , A v is independent. But this implies that there exists a cyclic subspace of dimension greater than V V | the dimension of 1 contradicting the maximality of 1. Therefore, mA2 mA1 .

4. Consider V1. If it is indecomposable, then A1 has only one eigenvalue and we are in item 2. If it is decomposable, let W1 be its maximal indecomposable subspace and let V1 = W1 ⊕Wˆ 2. Now decompose Wˆ 2 = W2 ⊕ Wˆ 3 where W2 is the maximal indecomposable subspace of Wˆ 2 and so on. Thus, V1 = W1 ⊕ ... ⊕ Wm where each Wj is indecomposable. Therefore, using item 2, there exists a basis for V1 obtained from bases of Wjs such that A1 is in Jordan form where each eigenvalue of A1 has geometric multiplicity equal to one.

Similarly, other Ais can be decomposed into Jordan blocks and one obtains a full Jordan canonical form of A (existence of Jordan canonical form).

11   Jλ (1) 0 ... 0  1   0 Jλ2 (1) ... 0    A1 =  ......   ...... 

0 0 ...Jλk (1)

where each Jλi (1) is in the form given in (32). Similarly, A2,...,Al are decomposed giving a full Jordan canonical form   Jλ (1) 0 ... 0 0  1   0 Jλ2 (1) ... 0 0     ......  A =    ......    0 0 ...Jλk−1 (l) 0

0 0 ... 0 Jλk (l)

5. The minimal polynomial of A is equal to its characteristic polynomial ⇒ there exists a cyclic vector ∈ V ⊕ Lemma 3.15. Let v, v1, v2 where v = v1 v2 and pv, pv1 , pv2 be their minimal polynomials V → V respectively w.r.t. a linear operator A : . Then pv = lcm(pv1 , pv2 ).

Proof. 0 = pv(A)v = pv(A)v1 ⊕ pv(A)v2. Therefore pv(A)v1 = 0 and pv(A)v2 = 0. Hence, | | | pv1 pv and pv2 pv. Consequently, lcm(pv1 , pv2 ) pv. ⊕ | Let qv = lcm(pv1 , pv2 ). qv(A)v = qv(A)v1 qv(A)v2 = 0. Therefore, pv qv. Thus, pv = qv.

Suppose the minimal polynomial of A is equal to its characteristic polynomial. Therefore, the geometric multiplicity of each eigenvalue is one. Let vi be the generators of indecom- W ⊕ ⊕ − ki posable subspaces i. Let v = v1 ... vm. Since pvi (s) = (s λi) i = 1, . . . , m, from − the above lemma, lcm(pv1 , . . . , pvm ) = pv1 .pv2 . . . pvm is an n th degree polynomial. Thus, v = v1 ⊕ ... ⊕ vm is a generator for the entire vector space. Therefore, from item 1 and item 5, there exists a cyclic vector of A ⇔ mA(s) = pA(s).

Note that from item 5, we know how to generate cyclic subspaces using the Jordan canonical form of A. Moreover, form item 5, if geometric multiplicities of all eigenvalues is one, then there exists a cyclic vector which implies mA(s) = pA(s). Thus, from Lemma 3.12, mA(s) = pA(s) ⇔ geometric multiplicities of all eigenvalues of A is equal to one. 6. (uniqueness)Suppose two matrices A and B have the same Rational/Jordan canonical forms. Therefore, there exists S and T such that T −1AT = S−1BS. Hence, A and B are similar. Conversely, if A and B are similar, then they are related by a similarity transform B = T −1AT . Using the Rational and Jordan canonical forms of B and the similarity transform T , A also has the same Rational and Jordan canonical forms.

To summarize, we have proved the following

1. algebraic multiplicity = geometric multiplicity for all eigenvalues of A ⇔ A is diagonalizable. If A is diagonalizable, then mA(s) is a product of linear factors. Conversely, if mA(s) is a product of linear factors, then all Jordan blocks of A must be trivial. Hence, A is diagonal- izable.

12 2. algebraic multiplicity > geometric multiplicity for at least one eigenvalue of A ⇔ A is not diagonalizable.

3. algebraic multiplicity = geometric multiplicity ⇒ mA(s) ≠ pA(s) (converse not true in gen- eral).

4. geometric multiplicity for all eigenvalues of A is equal to one ⇔ mA(s) = pA(s) ⇔ there exists a cyclic vector.

5. There exists a Rational canonical form and a Jordan canonical form for each n × n matrix which does not change under similarity transforms (existence and uniqueness). We refer the reader to Appendix on how to compute a similarity transformation matrix to obtain the Jordan canonical form.

4 Projection operators and Quotient spaces

4.1 Projection

Let V = V1 ⊕ V2. Then P : V → V1 is said to be a projection onto V1 along V2 if P v1 = v1 for 2 [all v1 ∈]V1 and P v2 = 0 for all v2 ∈ V1. Note that P = P and P has a matrix representation I 0 w.r.t. bases of V and V . If v = v ⊕ v , then P v = P v ⊕ P v = v . Note that I − P is 0 0 1 2 1 2 1 2 1 a projection on V2 along V1.

4.2 Quotient spaces, invariant spaces and induced maps

Let W be a subspace of V. Define an equivalence relation of V as follows. Suppose v1, v2 ∈ V. Then v1 ∼ v2 ⇔ v1 − v2 ∈ W. This partitions V into equivalence classes and each equivalence class is representation by an element in that equivalence class. If v ∈ V, then v + W represents a set of all vectors which are equivalent to v. (The equivalence classes are translates of the subspace W.) Let {w1, . . . , wm, v1, . . . , vk} be a basis for V and {w1, . . . , wm} be a basis for W. Letv ¯i be a representative of vi + W for 1 ≤ i ≤ k. These equivalence classes form a vector space V/W spanned byv ¯i (1 ≤ i ≤ k) which also form a basis for V/W. There is a natural projection map P : V → V/W where P : vi 7→ v¯i (1 ≤ i ≤ k) and P : wi 7→ 0. n n−1 Example 4.1. Let p(x) = x + an−1x + ... + a1x + a0 ∈ C[x]. Let λ1, . . . , λn be the roots of this polynomial. Consider C[x] modulo p(x). This is a n dimensional vector space with basis vectors {1¯, x,¯ . . . , x¯n−1}. Consider a following linear map on this vector space which is multiplication by x¯

x¯ : 1 7→ x¯ x¯ :x ¯ 7→ x¯2 . . n−1 n−1 x¯ :x ¯ 7→ −a0.1¯ − a1x¯ − ... − an−1x¯ This matrix representation of the linear map x¯ is   0 0 ... 0 −α0    1 0 ... 0 −α1     0 1 ... 0 −α2  A =    ......   ......  0 0 ... 1 −αn−1

13 n n−1 n n−1 Moreover, x¯ = −a0.1¯−a1x¯−...−an−1x¯ . Therefore, x¯ +an−1x¯ +...+a0 = 0. Since this is an n−th degree polynomial and matrix representation of A is of size n×n, this is the characteristic polynomial of A hence, eigenvalues of A are[λ1, . . . , λn. ] Note that vector representation of x¯ is 0 1 ... 0 T which is a representation of x¯ as a basis vector; whereas multiplication by x¯ is a linear map on C[x] modulo p(x) and as a linear map, representation of x¯ is given above as an n × n matrix in companion form.

Let A : V → V and V1 be an A−invariant subspace. Let P : V → V/V1 be a natural projection map. Then A induces a natural map A¯ : V/V1 → V/V1 such that AP¯ = PA i.e. for all v ∈ V, AP¯ v = P Av. (Observe that PA : V → V/V1 and AP¯ : V → V/V1.) Let v1, . . . , vn be a basis for V and let A : V → V. Let W =< v1, . . . , vk >= ker(A). Thus, from the rank nullity theorem, Avi (k + 1 ≤ i ≤ n) form a basis for Im(A). Observe thatv ¯i (k + 1 ≤ i ≤ n) form a basis for V/W and a linear map C : V/W → Im(A) such that Cv¯i = Avi (k + 1 ≤ i ≤ n) is 1 − 1 and onto. Thus, Im(A) is isomorphic to V/ker(A).

5 Inner products, norms, Cauchy-Schwartz, least squares

Definition 5.1. An inner product on a vector space V is a function < ., . >: V × V → R such that the following properties are satisfied. • ⟨v, v⟩ ≥ 0 and ⟨v, v⟩ > 0 if v ≠ 0 (positivity).

• ⟨c1v1 + c2v2, w⟩ = c1⟨v1, w⟩ + c2⟨v2, w⟩ (linearity in the 1st variable). • ⟨w, v⟩ = ⟨v, w⟩ (conjugate linearity in the 2nd variable).

The second and the third properties imply that ⟨v, c1w1 + c2w2⟩ =c ¯1⟨v, w1⟩ +c ¯2⟨v, w2⟩. Definition 5.2. A norm on a vector space V is a function ∥.∥ : V → R such that the following properties are satisfied for all v, w ∈ V • ∥αv∥ = |α|∥v∥. • ∥v∥ ≥ 0 and ∥v∥ = 0 ⇔ v = 0. (positivity) • ∥v + w∥ ≤ ∥v∥ + ∥w∥. (triangle inequality) We denote ⟨v, v⟩ by ∥v∥2 by abuse of notation. We will see later that the inner products indeed define a norm. It is clear that ⟨av, av⟩ = |a|2⟨v, v⟩ and ⟨v, v⟩ ≥ 0 by properties of inner products. We need to show that triangle inequality is satisfied by inner products. This will be proved using Cauchy-Schwartz inequality.

5.1 Orthogonal Projection, C-S, G-S ∈ V v ∈ V ⟨ ⟩ Let v and letv ˆ = ∥v∥ be a unit vector in the direction of v. Let w . Consider w, vˆ vˆ. This is a component of w along the unit vectorv ˆ. ⟨w − ⟨w, vˆ⟩v,ˆ vˆ⟩ = ⟨w, vˆ⟩ − ⟨w, vˆ⟩∥vˆ∥2 = 0. (35) Thus w−⟨w, vˆ⟩vˆ is orthogonal tov ˆ hence orthogonal to ⟨w, vˆ⟩vˆ. This is called orthogonal projection along a vector. By orthogonality, ∥w∥2 = ∥⟨w, vˆ⟩vˆ∥2 + ∥w − ⟨w, vˆ⟩vˆ∥2 (Pythagoras theorem). This gives rise to the Cauchy-Schwartz inequality as follows. Consider the vector w − ⟨w, vˆ⟩vˆ. 0 ≤ ∥w − ⟨w, vˆ⟩vˆ∥2 = ⟨w − ⟨w, vˆ⟩v,ˆ w − ⟨w, vˆ⟩vˆ⟩ (36) From Equation (35), w − ⟨w, vˆ⟩vˆ is orthogonal tov ˆ ⇒ 0 ≤ ⟨w − ⟨w, vˆ⟩v,ˆ w⟩ = ⟨w, w⟩ − ⟨w, vˆ⟩⟨v,ˆ w⟩ |⟨w, v⟩|2 = ∥w∥2 − (37) ∥v∥2

14 Hence |⟨w, v⟩| ≤ ∥w∥∥v∥. Note that if w = αv, then |⟨w, v⟩| = |⟨av, v⟩| = ∥w∥∥v∥.

Observe that the Cauchy-Schwartz inequality is derived from the Pythagorean principal, c2 = b2 + a2 for right angled triangles and b2 = c2 − a2 ≥ 0 which is used in the arguments above. As a consequence of Cauchy-Schwartz, we have the following triangle inequality.

∥v + w∥2 = ⟨v + w, v + w⟩ = ⟨v, v⟩ + ⟨w, w⟩ + 2Re{⟨v, w⟩} ≤ ∥v∥2 + ∥w∥2 + 2|⟨v, w⟩| = (∥v∥ + ∥w∥)2 (38)

Thus, ∥v +w∥ ≤ ∥v∥+∥w∥ (triangle inequality). Therefore, inner products define a norm on vector spaces.

The orthogonal projection leads to Gram-Schmidt orthonormalization procedure which gives an orthonormal basis as its output given any arbitrary basis as its input. Suppose columns of matrix A form a basis of V.

1. Let q = v1 . 1 ∥v1∥

2. Letq ˆ = v − ⟨v , q ⟩q . By previous arguments,q ˆ ⊥ q . Let q = qˆ2 . 2 2 2 1 1 2 1 2 ∥qˆ2∥

3. Letq ˆ = v − ⟨v , q ⟩q − ⟨v , q ⟩q and q = qˆ3 . Similarly, obtain q , . . . , q . 3 3 3 1 1 3 2 2 3 ∥qˆ3∥ 4 n

Thus A = QR where q1, . . . , qn are columns of Q and R is an upper triangular matrix obtained by the Gram-Schmidt procedure.

For any subspace W ⊂ V, there exists its orthogonal complement W⊥ such that all vectors in W⊥ are orthogonal to all vectors in W and W ⊕ W⊥ = V.

Lemma 5.3. Im(AT)⊥ = ker(A) and Im(A)⊥ = ker(AT).

Proof. For a linear operator A, ker(A) is a subspace orthogonal to row span of A which is range of AT. Hence ker(A) ⊂ Im(AT)⊥. Conversely, if y ∈ Im(AT)⊥, Ay = 0 and Im(AT)⊥ ⊂ ker(A). Thus, Im(AT)⊥ = ker(A). Similarly, Im(A)⊥ = ker(AT).

Lemma 5.4. Rank(AT) = Rank(A).

Proof. Suppose A : V → W where dim(V) = n and dim(W) = m. Thus AT : W∗ → V∗. By Rank-Nullity, dim(ker(A))+ rank(A) = n and dim(ker(AT))+ rank(AT) = m. Column rank of A is n − k where k is the dimension of ker(A). Hence, the dimension of Im(AT)⊥ is also k. But Im(AT) lies in V∗ which has dimension n. Hence, the dimension of Im(AT) is n− dim(Im(AT)⊥) = n − k. Therefore, the column rank of AT = n − k.

Thus, the row rank of A (i.e. the number of independent rows of A) is equal to the column rank of A.

5.2 Hermitian matrices, orthogonal and unitary matrices, positive definite ma- trices If O is orthogonal, then OOT = OTO = I. Orthogonal matrices preserve norm of a vector and inner product between two vectors for real inner product spaces i.e. ⟨Ov, Ow⟩ = ⟨v, w⟩. Unitary matrix U satisfies UU ∗ = U ∗U = I and it preserves complex inner products i.e., ⟨Uv, Uw⟩ = ⟨v, w⟩.

Theorem 5.5. If A is Hermitian/symmetric, then A satisfies the following properties:

• Eigenvalues of A are real.

15 • Eigenvectors associated with two distinct eigenvalues are orthogonal.

• A is unitarily/orthogonally diagonalizable.

Proof. Let Av = λv, hence v∗A = λv¯ ∗. Thus, v∗Av = λv∗v = λv¯ ∗v. Hence λ = λ¯. Therefore, λ is real. Since symmetric matrices are Hermitian too, their eigenvalues are real. ̸ ∗ ∗ ∗ Let Av1 = λ1v1 and Av2 = λ2v2 (λ1 = λ2). Therefore, v2Av1 = λ1v2v1 = λ2v2v1. Since ̸ ∗ λ1 = λ2, v2v1 = 0. Thus, eigenvectors associated with two distinct eigenvalues are orthogonal for both Hermitian and symmetric matrices. Let Av1 = λ1v1 and extend v1 to an orthonormal basis of V. Let columns of V form these ∗ −1 ∗ basis vectors. Therefore, V = V . Consider V AV = A1. Clearly, A1 is Hermitian. Moreover, ∗ ∗ ∗ V AV e1 = V Av1 = λ1V v1 = λ1e1. Therefore, λ1e1 is the first column of A1. Since A1 is ∗ Hermitian, λ1e1 is the first row of A1. Thus,   λ1 0 ... 0 0    0 ∗ ... ∗ ∗     0 ∗ ... ∗ ∗  A1 =    ......   ......  0 ∗ ... ∗ ∗

Since A1 is Hermitian, the (n − 1) × (n − 1) Hermitian block is diagonalizable by induction. The proof for symmetric matrices is exactly similar.

Theorem 5.6. A ∈ Cn×n is unitarily upper triangular.

Proof. We proceed by induction. For n = 1, the case is trivial. Assume true for n − 1. Let v1 be an eigenvector of A and extend it to an orthonormal basis {v1, . . . , vn}. Suppose columns of V are given by {v1, . . . , vn}. Therefore, [ ] λ ∗ V ∗AV = 1 0 B

Now applying induction on B, the theorem follows.

Definition 5.7. A matrix P is said to be positive definite if x∗P x > 0 (xTP x > 0) for all non-zero x ∈ Cn(x ∈ Rn).

As a consequence, all eigenvalues of P and all principal minors are positive. Positive definite matrices are invertible since all eigenvalues are positive. Positive semidefinite matrices are those which satisfy x∗P x ≥ 0. Their eigenvalues are greater than or equal to zero.

Lemma 5.8. P > 0 ⇒ P is Hermitian. 1 ∗ 1 − ∗ 1 ∗ 1 − ∗ Proof. Let P = 2 (P + P ) + 2 (P P ). Let A = 2 (P + P ) and B = 2i (P P ). Clearly, P = A + iB where both A, B are Hermitian. Thus, x∗P x = x∗Ax + ix∗Bx > 0. Since both A and B are Hermitian, x∗Ax and x∗Bx are real. Thus, for x∗P x to be real and positive, x∗Bx = 0 for all x ∈ C. Hence, B = 0. Therefore, P = A, hence, P is Hermitian. ≥ 1 Likewise, P 0 also implies that P is Hermitian. Let P be a real matrix. Let P = 2 (P + T 1 − T 1 T 1 − T − T T P ) + 2 (P P ). Let A = 2 (P + P ) and B = 2 (P P ). Observe that B = B . x Bx = (xTBx)T = −xTBx. Hence, xTBx = 0. Thus, xTP x > 0 ⇔ xTAx > 0. Thus, there are positive definite real matrices which are not symmetric. However,[ they are not] positive definite as complex 1 10 matrices. For example, a non-symmetric matrix P = is positive definite as a real −10 2

16 matrix but not as a complex matrix. In general, it is always assumed that real positive definite matrices are symmetric. Rayleigh quotient: For Hermitian matrices, we define a Rayleigh quotient function as follows: x∗P x R(P, x) := (39) x∗x It is clear that following inequality is satisfied x∗P x x∗P x x∗P x min ∈Cn ≤ ≤ max ∈Cn . (40) x x∗x x∗x x x∗x ∑ n Let v1, . . . , vn be eigenvectors of P i.e. P vi = λivi∑. Let x = i=1 αivi. Therefore, P x = ∑ ∑ ∗ n λ |α |2 n ∗ n | |2 x P x ∑i=1 i i ≤ ≤ i=1 αiλivi and x P x = i=1 λi αi . Thus, ∗ = n | |2 . Since λmin(P ) λi λmax(P ) ∑ x x i=1 αi n λ |α |2 ≤ ≤ ≤ ∑i=1 i i ≤ for 1 i n, this implies that λmin(P ) n | |2 λmax(P ). Therefore, we have the following i=1 αi inequality x∗P x λ (P ) ≤ ≤ λ (P ). (41) min x∗x max We saw that Hermitian matrices are unitarily diagonalizable. Suppose A is unitarily diagonaliz- able i.e. U ∗AU = D. Then, A = UDU ∗ and A∗ = UD∗U ∗. Thus, AA∗ = UDD∗U ∗ = UD∗DU ∗ = UD∗U ∗UDU ∗ = A∗A. Definition 5.9. A is said to be normal if AA∗ = A∗A. Theorem 5.10. A is normal ⇔ A is unitarily diagonalizable. Proof. (⇐) Obvious from the statement above the definition of normal matrices. (⇒) We proceed by induction. True for n = 1, assume true for n − 1. Let v1 be an eigenvector of A and extend it to an orthonormal basis {v1, . . . , vn} which forms columns of matrix V . Thus [ ] λ w∗ V ∗AV = 1 = A . 0 B 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Hence, A = VA1V and A = VA1V . Since AA = A A, VA1A1V = VA1A1V . Thus A1 is normal. [ ][ ] [ ] λ w∗ λ∗ 0 |λ |2 + w∗w w∗B∗ A A∗ = 1 1 = 1 (42) 1 1 0 B w B∗ Bw BB∗ [ ][ ] [ ] ∗ ∗ | |2 ∗ ∗ ∗ λ1 0 λ1 w λ1 λ1w A1A1 = ∗ = ∗ (43) w B 0 B λ1w B B [ ] λ 0 Therefore, w∗w = 0. This implies that w = 0 and BB∗ = B∗B. Hence, A = 1 . Now 1 0 B since B is normal, it is unitarily diagonalizable by induction.

5.3 Least squares solution and overdetermined systems Analytical solution: Suppose A ∈ Rm×n (m > n) be a full column rank matrix. We want to solve the following unconstrained optimization problem.

1 2 min ∈Rn ∥Ax − b∥ (44) x 2 1 − T − Let Φ(x) = 2 (Ax b) (Ax b). From first order optimality conditions, T T T T ∇x(Φ(x)) = A Ax − A b = 0 ⇒ A Ax = A b (45)

17 Since A is full column rank, ATA is positive definite, hence invertible. Therefore, x = (ATA)−1ATb is a candidate for a minimizing solution. Moreover, Hessian of Φ(x) is equal to ATA which is T −1 T positive definite. Therefore, by second order sufficient conditions, xls = (A A) A b is the least squares solution. ˆ Let Axls = b.

(b − ˆb)Tˆb = bTA(ATA)−1ATb − bTA(ATA)−1ATA(ATA)−1ATb = 0. (46)

Thus, (b − ˆb) ⊥ ˆb and ˆb is an orthogonal projection of b. Therefore, the vector closest to b in the column span of A is obtained by orthogonal projection of b onto the column span of A. Geometric solution: Let A = QR be the decomposition of A when Gram-Schmidt procedure is applied to the columns of A. If b is not in the column span of A, take orthogonal projection of b onto the column span of A. Columns of Q are orthogonal and span the column span of A.

QTQR = QTb ⇒ Rx = QTb. (47)

Since R is upper triangular, xls is obtained by back-substitution. Note that the analytical method does not work when A does not have full column rank. But the geometric method still works as it does not involve inverse of any matrix.

5.4 Least norm solution and under-determined systems Suppose A ∈ Rm×n (m < n) be a full row rank matrix. We want to solve the following constrained optimization problem. 1 min ∥x∥2 2 subject to Ax = b (48)

Using Lagrange multipliers, convert this to an unconstrained optimization problem with cost func- L 1 ∥ ∥2 − T − L ∈ Rn tion (x, λ) = 2 x λ (Ax b). The problem now is to minimize (x, λ) for x, λ . From the first order optimality conditions,

T ∇x(L(x, λ)) = x − A λ = 0 (49)

∇λ(L(x, λ)) = Ax − b = 0. (50)

Substituting Equation (49) in Equation (50), AATλ = b. Since, A is full row rank, AAT is positive T −1 T T −1 definite, hence invertible. Thus, λ = (AA ) b and hence, xln = A (AA ) b is a candidate for an optimal solution. Letx ˆ be any other solution of Ax = b. We will show thatx ˆ − xln is orthogonal to xln. T − T T −1 − T T −1 xln(ˆx xln) = b (AA ) Axˆ b (AA ) Axln = bT(AAT)−1b − bT(AAT)−1b = 0.

Thus, xln ⊥ xˆ − xln. Therefore, usingx ˆ =x ˆ − xln + xln and the orthogonality relation above, 2 2 2 2 2 ∥xˆ∥ = ∥xˆ − xln∥ + ∥xln∥ . Thus, ∥xˆ∥ ≥ ∥xln∥ . Therefore, xln is the least norm solution.

5.5 Operator norms and Svd Definition 5.11. A matrix norm is a function ∥.∥ : Fm×n → R which satisfies the following properties: For all α ∈ F and A, B ∈ Fm×n, • ∥αA∥ = |α|∥A∥.

• ∥A∥ ≥ 0 and ∥A∥ = 0 ⇔ A = 0. (positivity)

18 • ∥A + B∥ ≤ ∥A∥ + ∥B∥. (triangle inequality) Moreover, if m = n, then • ∥AB∥ ≤ ∥A∥∥B∥. (sub-multiplicativity) A norm on the vector space V induces a norm on linear operators A : V → V. ∥Ax∥ ∥A∥ = sup ∈V ̸ . (51) x ,x=0 ∥x∥ If A : V → W, then ∥ ∥ ∥ ∥ Ax W A := sup x∈V,x=0̸ (52) ∥x∥V where ∥.∥W is a norm on W and ∥.∥V is a norm on V. Equation (51) can be re-written as follows

∥A∥ = max x∈V,x=0̸ ,∥x∥=1∥Ax∥. (53) Since norm is a continuous function and ∥x∥ = 1 is a compact set (closed and bounded), maximum is attained. Hence, there exists v ∈ V such that ∥v∥ = 1 and ∥A∥ = ∥Av∥ = maxx∈V,x=0̸ ,∥x∥=1∥Ax∥. It follows from Equation (51) that ∥Ax∥ ∥A∥ ≥ ⇒ ∥Ax∥ ≤ ∥A∥∥x∥. (54) ∥x∥ We now check that the induced norm satisfies the properties mentioned in the definition of the matrix norm. Clearly, first and second are obvious. To check triangle inequality, let ∥A + B∥ = ∥(A + B)y∥ where ∥y∥ = 1. Due to the triangle inequality of the vector norm and Equation (54),

∥A + B∥ = ∥(A + B)y∥ = ∥Ay + By∥ ≤ ∥Ay∥ + ∥By∥ ≤ ∥A∥ + ∥B∥.

Let ∥AB∥ = ∥(AB)z∥ where ∥z∥ = 1.

∥AB∥ = ∥(AB)z∥ = ∥A(Bz)∥ ≤ ∥A∥∥Bz∥ ≤ ∥A∥∥B∥.

Thus, the induced norm is indeed a matrix norm. We now find an svd for a given matrix. Suppose we have a standard inner product on V which gives a 2−norm. Theorem 5.12 (Existence and uniqueness of SVD). Let A ∈ Cm×n. The there exists unitary matrices U ∈ Cm×m and V ∈ Cn×n such that   σ1 0 ... 0 0 ... 0    0 σ2 ... 0 0 ... 0     ......    ∗  ......  U AV =   (55)  0 0 . . . σk 0 ... 0     0 0 ... 0 0 ... 0    ...... 0 0 ... 0 0 ... 0 where σi ≥ σi+1 for 1 ≤ i ≤ k − 1 and σi ∈ R for 1 ≤ i ≤ k. These σis are called singular values of A and the above decomposition is called singular value decomposition (SVD) of A. Furthermore, the singular values σi are uniquely determined, and, if A is square and the σi are distinct, then columns of U and V are uniquely determined up to multiplication by unit modulus complex numbers eiθ.

19 Proof. Note that ∥A∥2 = ∥Ax∥2 for some x ∈ V such that ∥x∥2 = 1. Let Ax = y = σ1y1 where ∥y∥2 = 1. Extend {x} to an orthonormal basis and let columns of V = [x V1] denote this orthonormal basis. Similarly, extend y1 to an orthonormal basis and let columns of U = [y1 U1] ∗ denote this orthonormal basis. Consider U AV = A1. [ ] [ ] [ ] [ ] ∗ [ ] ∗ [ ] ∗ ∗ ∗ ∗ y1 y1 σ1y1y1 y1AV1 σ1 w U AV = ∗ A x V1 = ∗ σ1y1 AV1 = ∗ ∗ = (56) U1 U1 σ1U1 y1 U1 AV1 0 B ∗ Note that unitary[ matrices] [ preserve vector] norms. Therefore, ∥A1∥2 = ∥U AV ∥2 = ∥A∥2 = σ1. σ∗ σ2 + w∗w Observe that A 1 = 1 1 w Bw [ ] σ∗ ∥A 1 ∥2 1 w 2 (σ2 + ∥w∥2)2 + ∥Bw∥2 ⇒ σ2 = ∥A ∥2 ≥ [ ] = 1 2 2 ≥ (σ2 + ∥w∥2) 1 1 2 ∗ σ2 + ∥w∥2 1 2 ∥ σ1 ∥2 1 2 w 2 [ ] σ 0 Therefore, w = 0 and A = 1 . Now by induction, B can be transformed into the canonical 1 0 B form. Let σ2 = ∥B∥2. ∥A∥2 ≥ ∥B∥2. Therefore, σ1 ≥ σ2 and σi ≥ σi+1 for 1 ≤ i ≤ k − 1. Uniqueness: Observe that AA∗ = UΣ2U ∗ and A∗A = V ∗Σ2V . Thus, the singular values of A can be obtained from the square roots of the eigenvalues of AA∗ or A∗A implying uniqueness of the singular values. (However, this is not the way singular values are computed numerically.) The statement about the uniqueness of U and V follows from the corresponding result on eigenvectors.

Remark 5.13. Notice that UΣ = AV . Thus, UΣei = AV ei ⇒ σiui = Avi. These ui, vi are called pair of singular vectors associated with the singular value σi. Geometrically, when A is full rank, it maps the sphere formed by the unit vectors associated with the columns of V into an ellipse whose axes are given by the columns of U and scaled by singular values of A. For Hermitian matrices, U ∗AU = D where D is a diagonal matrix having eigenvalues of A. We now obtain svd for A. Choose columns of V matrix as follows: Let vi = ui if λi > 0 and vi = −ui ∗ if λi < 0 for 1 ≤ i ≤ n. Thus, U AV = Σ where Σ contains modulus of eigenvalues of A on its diagonal. Therefore, the singular values of a Hermitian matrix are given by the modulus of its eigenvalues. For positive definite matrices P , U ∗AU = D = Σ and eigenvalues and singular values are one and the same. Geometrically, one can think of svds as follows. Consider the action of A on unit vectors i.e. action of A on the unit sphere Sn. ASn = UΣV ∗Sn. Since V ∗ is unitary, V ∗Sn ⊆ Sn. Now Σ maps n S in to an ellipse and U changes the orientation of this ellipse. Maximum elongation is along u1 which forms the major axis of the ellipse.

6 Smith form

Definition 6.1. A polynomial matrix U(s) ∈ Cn×n[s] is said to be unimodular if U −1(s) ∈ Cn×n[s]. (Entries of a matrix U(s) ∈ Cn×n[s] are polynomials i.e., U(s)(i, j) ∈ C[s] for 1 ≤ i, j ≤ n.)

We show that for all matrices in Cn×n[s], there exists a Smith normal form. Consider a matrix   s − a11 a12 . . . a1n    a21 s − a22 . . . a2n    sI − A =  ......   ......  an1 an2 . . . s − ann

20 1. Using row and column permutations, bring any element which has the lowest polynomial degree to (1, 1) position. Thus, A1(s) = U1(s)(sI − A)V1(s) where U1(s) and V1(s) are unimodular.

2. Using this entry, do elementary row and column operations such that   ∗ 0 ... 0    0 ∗ ... ∗    A2(s) = U2(s)A1(s)V2(s) =  ......   ......  0 ∗ ... ∗

3. Repeat step 1 if the (1, 1)th entry is not the lowest degree polynomial among all non-zero entries of A2(s) and again follow step 2.

4. Let d1(s) be the the lowest degree polynomial at (1, 1)th position which divides all other entries.   d1(s) 0 ... 0    0 ∗ ... ∗    Ak(s) = Uk(s)Ak−1(s)Vk(s) =  ......   ......  0 ∗ ... ∗

5. Now follow the same steps for the lower dimensional block. Hence,   d1(s) 0 ... 0    0 d2(s) ... 0    Σ(s) = U(s)A(s)V (s) =  ......   ......  0 0 . . . dn(s)

where di|di+1 (by construction). Σ(s) is called the Smith-form of sI −A and pA(s) = det(sI − −1 −1 A) = det(U (s)Σ(s)V (s)) = d1(s).d2(s). . . . .dn(s). Smith-form is well defined for non- square polynomial matrices A(s) too as follows.   d1(s) 0 ... 0 0 ... 0    0 d2(s) ... 0 0 ... 0     ......     ......  Σ(s) = U(s)A(s)V (s) =    0 0 . . . dk(s) 0 ....     0 0 ... 0 0 ... 0    ...... 0 0 ... 0 0 ... 0

Note that the blocks of zeros may or may not be there depending on the rank of A(s).

Theorem 6.2. A ∈ Cn×n and B ∈ Cn×n are similar ⇔ (sI − A) and (sI − B) have the same Smith form.

−1 −1 Proof. Proof: (⇒) Suppose A = T BT . Let U1(s)(sI − A)V1(s) = Σ(s). Let U2(s) = U1(s)T and V2(s) = TV1(s). Therefore, U2(s)(sI − B)V2(s) = sU1(s)V1(s) − U1(s)AV1(s) = U1(s)(sI − A)V1(s) = Σ(s). Thus, (sI − A) and (sI − B) have the same Smith form.

21 −1 (⇐) Suppose U1(s)(sI − A)V1(s) = U2(s)(sI − B)V2(s) = Σ(s). Let U(s) = U2(s) U1(s) and −1 V (s) = V1(s)V2(s) . Therefore,

U(s)(sI − A)V (s) = sI − B ⇒ s(U(s)V (s) − I) = U(s)AV (s) − B (57)

Since lhs is a matrix polynomial in s of degree one, and rhs involves a constant matrix B, this is true iff both lhs and rhs are zero i.e., U(s)V (s) = I and U(s)AU(s)−1 = B. Thus, A and B are similar under a unimodular transformation. m −1 k Let U(s) = s Um + ... + sU1 + U0 and U (s) = s Vk + ... + sV1 + V0. Let T = U(0) obtained by substituting s = 0 in the polynomial expression for U(s). Thus, T AT −1 = B.

Corollary 6.3. A, B are similar ⇔ A and B have the same Rational and Jordan canonical forms ⇔ (sI − A) and (sI − B) have the same Smith form.

6.1 Smith form and invariant factors Suppose A has only one Jordan block with one eigenvalue λ. We want to find the Smith form of sI − A. We perform unimodular operations on sI − A to convert it into the Smith form. Let Ci denote columns and Ri denote rows of sI − A and its iterative versions.     s − λ −1 0 ... 0 −1 s − λ 0 ... 0      0 s − λ −1 ... 0   s − λ 0 −1 ... 0       ......   ......  sI − A =   C2 ↔ C1    ......   ......   0 0 0 ... −1   0 0 0 ... −1  0 0 0 . . . s − λ 0 0 0 . . . s − λ [ ] −1 s − λ Consider the leading principal 2 × 2 sub-matrix s − λ 0 [ ][ ][ ] [ ] 1 0 −1 s − λ 1 s − λ 1 0 = s − λ 1 s − λ 0 0 1 0 (s − λ)2

Thus, after the unimodular operations mentioned above,   1 0 0 ... 0  2   0 (s − λ) −1 ... 0     ......  U1(s)(sI − A)V1(s) =    ......   0 0 0 ... −1  0 0 0 . . . s − λ

Now working in the same manner with the second and the third column, we obtain   1 0 0 ... 0    0 1 0 ... 0   3   0 0 (s − λ) ... 0    U2(s)(sI − A)V2(s) =  ......     ......   0 0 0 ... −1  0 0 0 . . . s − λ

22 Continuing this way, finally we obtain   1 0 ... 0 0    0 1 ... 0 0     ......  U(s)(sI − A)V (s) =    ......   0 0 ... 1 0  0 0 ... 0 (s − λ)n

Thus, dn(s) = mA(s) = pA(s). Now we consider the case when A is cyclic i.e. mA = pA. For simplicity, assume that A has only two eigenvalues. The general case follows in the same spirit. [ ] sI − J(λ ) 0 sI − A = 1 . 0 sI − J(λ2) From the previous case, there exists unimodular transformations such that   1 0 ... 0 0 ... 0    0 1 ... 0 0 ... 0     ......     ......   n  U1(s)(sI − A)V1(s) =  0 0 ... (s − λ1) 1 0 ... 0  .    0 0 ... 0 1 ... 0     ......    ...... n 0 0 ... 0 0 ... (s − λ2) 2 Now to get the Smith form, we need to work with the block [ ] n (s − λ1) 1 0 n . 0 (s − λ2) 2 n n Since (s − λ1) 1 and (s − λ2) 2 don’t have common factors, their gcd is 1. Thus, there exists n n a(s), b(s) such that a(s)(s − λ1) 1 + b(s)(s − λ2) 2 = 1. [ ][ ][ ] [ ] 1 0 (s − λ )n1 0 1 0 (s − λ )n1 0 1 = 1 R ↔ R a(s) 1 0 (s − λ )n2 b(s) 1 1 (s − λ )n2 1 2 [ ] 2 2 n 1 (s − λ2) 2 n . (s − λ1) 1 0

[Now the above matrix][ can be reduced to the][ Smith form as follows:] [ ] n n 1 0 1 (s − λ2) 2 1 −(s − λ2) 2 1 0 n n = n n . −(s − λ1) 1 1 (s − λ1) 1 0 b(s) −1 0 (s − λ1) 1 (s − λ2) 2 Thus, in general when A is cyclic, the Smith form of A is   1 0 ... 0 0    0 1 ... 0 0     ......    .  ......   0 0 ... 1 0  0 0 ... 0 mA(s) Now consider a general A. We know that one can do a cyclic decomposition of A and the Smith form for each cyclic block is as mentioned above. Thus, using permutation matrices, we can get a standard Smith form where dn(s) = mA(s) = m1(s) (w.r.t. the notation in the Rational canonical form), dn−1(s) = m2(s) and so on. Therefore, invariant factors obtained by the Rational canonical form can be identified with the invariant factors obtained using the Smith form.

23 7 Appendix

Definition 7.1. An equivalence relation on a set A is a binary relation ∼ defined on A × A which is reflexive, symmetric and transitive i.e. for a, b, c ∈ A,

• a ∼ a (reflexive)

• if a ∼ a, then b ∼ a (symmetric)

• if a ∼ b and b ∼ c, then a ∼ c (transitive).

All elements which are related to each other by this equivalence relation form an equivalence class.

An equivalence relation on a set partitions the set into equivalence classes.

Example 7.2. Consider an equivalence relation defined on Z as follows. Let n ∈ Z. We say that n1 ∼ n2 if n|(n1 − n2). Thus all multiples of n are equivalent to each other and the set {..., −2n, −n, 0, n, 2n, . . .} forms an equivalence class represented by one representative say 0¯. Sim- ilarly, {..., −2n + 1, −n + 1, 1, n + 1, 2n + 1,...} form an equivalence class represented by 1¯ and so on. These equivalence classes are together represented as Z/n = {0¯, 1¯,..., n −¯ 1}.

Definition 7.3. Group (G, ∗) is a set with a binary operation ∗ such that the following properties are satisfied

• There exists an identity element e ∈ G such that for all g ∈ G, g ∗ e = e ∗ g = g.

• For each g ∈ G, there exists g−1 ∈ G such that g ∗ g−1 = g−1 ∗ g = e.

• If g1, g2 ∈ G, then g1 ∗ g2 ∈ G. If the operation ∗ is commutative, then G is called an abelian or commutative group.

Example 7.4. Set of integers Z form a commutative group under addition. Set of invertible matrices form a non-commutative group under multiplication.

Definition 7.5. A commutative (A, +,.) is a set with two commutative operations, addition and multiplication such that the following properties are satisfied

• A is an abelian group w.r.t. +.

• There exists 1 ∈ A such that a.1 = 1.a = a.

• If a1, a2 ∈ A, then a1.a2 = a2.a1 ∈ A. If the operation ∗ is commutative, then G is called an abelian or commutative group.

Example 7.6. Set of integers Z forms a commutative ring. Set of univariate/multivariate polyno- mials with real or complex coefficients form a commutative ring.

Definition 7.7. (F, +,.) is a set with two operations such that it is an abelian group w.r.t + operation and F − {0} is a group w.r.t. multiplication operation.

Example 7.8. C, R, Q, univariate/multivariate rational functions are examples of fields. Z/p where p is a prime forms a finite field i.e. a field with finitely many elements.

Definition 7.9. A linear map A : V → W is said to be 1 − 1 or injective if Ax = Ay ⇒ x = y.

If A is 1 − 1, then a solution of Ax = b may or may not exists but if it exists, then it is unique.

24 Definition 7.10. A linear map A : V → W is said to be onto or surjective if for all y ∈ W, there exists x ∈ V such that Ax = y.

If A is onto, then a solution of Ax = b always exists but it need not be unique.

Definition 7.11. A linear map A : V → W is said to be bijective if it is both 1 − 1 and onto.

If A is bijective, then a solution of Ax = b always exists and it is unique.

Definition 7.12. Let A : V → W, then inverse image of w ∈ W is the set of all v ∈ V such that Av = w.

If A is bijective, A−1 : W → V is well defined and V and W are isomorphic. If A is not 1 − 1 or onto, solution of Ax = b may or may not exist. On computing the Jordan form: We need a similarity transformation matrix T to bring a matrix A in the Jordan canonical form. We have seen the existence and uniqueness of Jordan forms. Now we will see how to construct a similarity transform T which brings A in the Jordan canonical form. Let λ1 be an eigenvalue of A. Consider A − λ1I. Due to the Jordan structure 2 associated with λ1, dim(ker(A − λ1I)) ≤ dim(ker(A − λ1I) ) ≤ .... Let n1 be the algebraic multiplicity of λ1 as a factor of the minimal polynomial mA of A. Therefore, due to Jordan n n +i n blocks of λ1, dim(ker(A − λ1I) 1 ) = dim(ker(A − λ1I) 1 ) for i ≥ 0. Let v ∈ ker(A − λ1I) 1 n −1 but (A − λ1I) 1 v ≠ 0. Now as seen in the proof of existence and uniqueness of Jordan forms, − {(A − λ I)n1 1v, . . . , (A − λ I)v, v} forms the largest indecomposable subspace V1 associated 1 1 λ1 n −1 with λ1. The set {(A − λ1I) 1 v, . . . , (A − λ1I)v, v} is called as a Jordan chain. We know that − V = V1 ⊕ Vˆ. Extend {(A − λ I)n1 1v, . . . , (A − λ I)v, v} to a basis of V and consider a restriction λ1 1 1 of A to the subspace Vˆ. Again consider (A − λ1I) and construct a largest indecomposable subspace − V2 ⊂ Vˆ associated with λ with a Jordan chain {(A−λ I)n2 1v ,..., (A−λ I)v , v }. Continuing λ1 1 1 1 1 1 1 this process until we get a subspace where λ1 is not an eigenvalue of A, we obtain all Jordan chains associated with λ1. Repeat this process for all other eigenvalues of A. This process gives a basis obtained by putting together all Jordan chains which form columns of a similarity transform T which brings A to the Jordan canonical form.

Theorem 7.13. If AB = BA, then both A and B are simultaneously upper triangularizable. More- over, if AB = BA and both A and B are diagonalizable, then they are simultaneously diagonalizable.   λ1I1 0 ... 0    0 λ2I2 ... 0    Proof. Suppose A is diagonalizable and assume that by a change of basis, A =  ......   ......    0 0 . . . λkIk B11 B12 ...B1k    B21 B22 ...B2k    such that λi ≠ λj for 1 ≤ i, j ≤ k. Let B =  ...... . Thus, AB = BA implies  ......  Bk1 Bk2 ...Bkk that     λ1B11 λ1B12 . . . λ1B1k λ1B11 λ2B12 . . . λkB1k      λ2B21 λ2B22 . . . λ2B2k   λ1B21 λ2B22 . . . λkB2k      AB =  ......  =  ......  = BA. (58)  ......   ......  λkBk1 λkBk2 . . . λkBkk λ1Bk1 λ2Bk2 . . . λkBkk

25   B11 0 ... 0    0 B22 ... 0  ̸ ̸   −1 Since λi = λj, Bij = 0 for i = j. Thus, B =  ...... . Now suppose Vi BiiVi =  ......    0 0 ...Bkk V1 0 ... 0    0 V2 ... 0    JBii . Thus, choosing V =  ......  as a similarity transform, it is clear that A and B  ......  0 0 ...Vk are simultaneously upper triangularizable. If B is diagonalizable, then A and B can be simultane- ously diagonalizable by choosing columns of Vi as eigenvectors of Bii.  Jλ 0 ... 0  1   0 Jλ2 ... 0    Now suppose A is not diagonalizable and by change of basis, A =  ......  such  ...... 

0 0 ...Jλk ̸ that λi = λj and Jλi contains all Jordan blocks associated with eigenvalue λ. Again partitioning B into block form,     Jλ B11 Jλ B12 ...Jλ B1k B11Jλ B12Jλ ...B1kJλ  1 1 1   1 2 k   Jλ2 B21 Jλ2 B22 ...Jλ2 B2k   B21Jλ1 B22Jλ2 ...B2kJλ     k  AB =  ......  =  ......  = BA. (59)  ......   ...... 

Jλk Bk1 Jλk Bk2 ...Jλk Bkk Bk1Jλ1 Bk2Jλ2 ...BkkJλk ̸ Thus, Jλi Bij = BijJλj for i = j. Since Jλi and Jλj have no common eigenvalues, this system of equations has a unique solution (Please refer Horn n Johnson [2], Thm. 4.4.5). Hence, Bij = 0 ̸ for i = j. Thus, B is again in block diagonal form. Note that Jλi Bii = BiiJλi . Since Jλi is upper triangular, Bii must be upper triangular (It can be checked that if two matrices commute and one of them strictly is upper/lower triangular, then the other matrix must be upper/lower triangular). Thus, B is upper triangular.

Consider AB and BA where A and B are general square matrices. Suppose λ is an eigenvalue of AB.

(AB − λI)v = 0 ⇒ B(AB − λI)v = 0 ⇒ (BA − λI)Bv = 0 (60)

Thus, λ is an eigenvalue of BA with eigenvector Bv. Using similar arguments, if γ is an eigenvalue of BA with an eigenvector w, then it is also an eigenvalue of AB with an eigenvector Aw. Thus, eigenvalues of AB and BA are the same.

Acknowledgements

Thanks to Shauvik Das for a careful reading of the document and comments. Thanks to P. Shiv- ramakrishna, Kewal Bajaj, Lt.CDR. Santhan Vamsi and Moduga Vijay Babu too.

References

[1] K. Hoffman R. Kunze, Linear Algebra, Second edition, Prentice hall, 1971.

[2] R. A. Horn, C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, 1991.

26