<<

Lectures 16,17,18

6 Inner Product Spaces

6.1 Basic Definition Parallelogram law, the ability to measure angle between two vectors and in particular, the concept of perpendicularity make the euclidean space quite a special type of a . Essentially all these are consequences of the . Thus, it makes sense to look for operations which share the basic properties of the dot product. In this section we shall briefly discuss this.

Definition 6.1 Let V be a vector space. By an inner product on V we mean a binary operation, which associates a say u, v for each pair of vectors u, v) in V, satisfying h i the following properties for all u, v, w in V and α, β any scalar. (Let “ ” denote the complex − conjugate of a .)

(1) u, v = v, u (Hermitian property or conjugate symmetry); h i h i (2) u, αv + βw = α u, v + β u, w (sesquilinearity); h i h i h i (3) v, v > 0 if v =0 (positivity). h i 6

A vector space with an inner product is called an .

Remark 6.1 (i) Observe that we have not mentioned whether V is a real vector space or a complex vector space. The above definition includes both the cases. The only difference is that if K = R then the conjugation is just the identity. Thus for real vector spaces, (1) will becomes ‘symmetric property’ since for a c, we have c¯ = c. (ii) Combining (1) and (2) we obtain (2’) αu + βv, w =α ¯ u, w + β¯ v, w . In the special case when K = R, (2) and (2’) h i h i h i together are called ‘bilinearity’.

Example 6.1 (i) The dot product in Rn is an inner product. This is often called the standard inner product. t t Cn n ∗ (ii) Let x =(x1, x2,...,xn) and y =(y1,y2,...,yn) . Define x, y = i=1 xiyi = x y. ∈Cn h i This is an inner product on the complex vector space . P (iii) Let V = C[0, 1]. For f,g V, put ∈ 1 f, g = f(t)g(t) dt. 0 h i Z It is easy to see that f, g is an inner product. h i

50 6.2 Norm Associated to an Inner Product Definition 6.2 Let V be an inner product space. For any v V, the norm of v, denoted by v , is the ∈ k k positive of v, v : v = v, v . h i k k h i q For standard inner product in Rn, v is the usual length of the vector v. k k Proposition 6.1 Let V be an inner product space. Let u, v V and c be a scalar. Then ∈ (i) cu = c u ; k k | |k k (ii) u > 0 for u = 0; k k 6 (iii) u, v u v ; |h i|≤k kk k (iv) u + v u + v , equality holds in this triangle inequality iff u, v are linearly k k≤k k k k dependent over R.

Proof: (i) and (ii) follow from definitions 6.1,6.2. (iii) This is called the Cauchy-Schwarz inequality. If either u or v is zero, it is clear. So let u be nonzero. Consider the vector

w = v au. − where a is a scalar to be specified later. Then

0 w, w = v au, v au ≤h i h − − i = v, v v, au au, v + au, au h i−h i−h i h i = v, v a v, u a u, v + aa u, u h i − h i − h i h i

Substitute a = u, v / u, u and multiply by u, u , we get u, v 2 u, u v, v . h i h i h i |h i| ≤ h ih i This proves (iii). (iv) Using (iii), we get ( u, v ) u v . Therefore, ℜ h i ≤k kk k u + v 2 = u 2 + u, v + v, u + v 2 k k k k h i h i k k = u 2 +2 u, v + v 2 k k ℜh i k k u 2 +2 u v + v 2 ≤ k k k kk k k k = ( u + v )2. k k k k Thus u + v u + v . k k≤k k k k Now equality holds in (iv) iff u, v = u, v and equality holds in (iii). This is the ℜh i h i same as saying that u, v is real and w = 0. This is the same as saying that v = au and h i v, u = a u, u is real. Which is the same as saying that v is a real multiple of u. h i h i ♠ 6.3 Distance and Angle Definition 6.3 In any inner product space V, the distance between u and v is defined as

d(u, v)= u v . k − k

51 Definition 6.4 Let V be a real inner product space. The angle between vectors u, v is defined to be the angle θ, 0 θ π so that ≤ ≤ u, v cos θ = h i . u v k kk k Proposition 6.2 (i) d(u, v) 0 and d(u, v)=0 iff u = v; ≥ (ii) d(u, v)= d(v, u); (iii) d(u, w) d(u, v)+ d(v, w). ≤ 6.4 Gram-Schmidt Orthonormalization

n The standard E = e1, e2,...,en of R has two properties : { } (i) each ei has length one (ii) any two elements of E are mutually perpendicular. When vectors are expressed in terms of the standard basis, calculations become easy due to the above properties. Now, let V be any inner product space of dimension d. We wish to construct an analogue of E for V.

Definition 6.5 Two vectors u, v of an inner product space V are called orthogonal to each other or perpendicular to each other if u, v =0. We express this symbolically by u v. A h i ⊥ set S consisting of non zero elements of V is called orthogonal if u, v =0 for every pair of h i elements of S. Further if every element of V is of unit norm then S is called an orthonormal set. Further, if S is a basis for V then it is called an orthonormal basis.

One of the biggest advantage of an orthogonal set is that Pythagorus theorem and its general form are valid, which makes many computations easy.

Theorem 6.1 Pythagorus Theorem: Given w, u V such that w u we have ∈ ⊥ w + u 2 = w 2 + u 2. (47) k k k k k k Proof: We have,

w + u 2 = w + u, w + u = w, w + u, w + w, u + u, u = w 2 + u 2 k k h i h i h i h i h i k k k k since u, w = w, u =0. h i h i ♠

Proposition 6.3 Let S = u1, u2,..., un be an orthogonal set. Then S is linearly inde- { } pendent.

Proof: Suppose c1,c2,...,cn are scalars with

c1u1 + c2u2 + . . . + cnun =0.

Take inner product with ui on both sides to get

ci ui, ui =0. h i

Since ui = 0, we get ci =0. Thus U is linearly independent. 6 ♠ 52 Definition 6.6 If u and v are vectors in an inner product space V and v = 0 then the 6 vector v v, u h i v 2 k k is called the orthogonal projection of u along v.

Remark 6.2 The order in which you take u and v matters here when you are working over complex numbers because of the inner product is linear in the second slot and conjugate linear in the first slot. Over the real numbers, however, this is not a problem.

Theorem 6.2 (Gram-Schmidt Orthonormalization Process): Let V be any inner product space. Given a linearly independent subset u1,..., un there exists an orthonormal { } set of vectors v1,..., vn such that L( u1,..., uk ) = L( v1,..., vk ) for all 1 k n. { } { } { } ≤ ≤ In particular, if V is finite dimensional, then it has an orthonormal basis.

Proof: The construction of the orthonormal set is algorithmic and of course, inductive. First we construct an intermediate orthogonal set w1,..., wn with the same property for each { } k. Then we simply ‘normalize’ each element, viz, take

wi vi = . wi k k That will give us an orthonormal set as required. The last part of the theorem follows if we begin with u1,..., um as any basis for V. { } Take w1 := u1. So, the construction is over for k = 1. To construct w2, subtract from u2, its orthogonal projection along w1 Thus

w1 w2 = u2 w1, u2 2 . −h i w1 k k

u2

w1 w2

Check that w2, w1 =0. Thus w1, w2 is an orthogonal set. Check also that L( u1, u2 )= h i { } { } L( w1, w2 ). Now suppose we have constructed wi for i k < n as required. Put { } ≤ k wj wk+1 = uk+1 wj, uk+1 2 . − j=1h i wj X k k

Now check that wk+1 is orthogonal to wj, j k and L( w1, w2,..., wk+1 )= L( u1,..., uk ). ≤ { } { } By induction, this completes the proof. ♠ 53 Example 6.2 Let V = P3[ 1, 1] denote the real vector space of of degree − − at most 3 defined on [ 1, 1] together with the zero . V is an inner product space − under the inner product 1 f, g = f(t)g(t) dt. −1 h i Z 2 3 To find an orthonormal basis, we begin with the basis 1, x, x , x . Set v1 = w1 =1. Then { } 1 w2 = x x, 1 −h i 1 2 k k 1 1 = x tdt = x, −1 − 2 Z 2 2 1 2 x w3 = x x , 1 x , x −h i2 −h i(2/3) 1 1 3 1 = x2 t2dt x t3dt −1 −1 − 2 Z − 2 Z 1 = x2 , − 3 2 1 3 3 1 3 x 3 2 1 x 3 w4 = x x , 1 x , x x , x − −h i2 −h i(2/3) −h − 3i (2/5) 3 = x3 x. − 5 Thus 1, x, x2 1 , x3 3 x is an orthogonal basis. We divide these by respective norms to { − 3 − 5 } get an orthonormal basis. 1 3 1 3√5 3 5√7 , x , (x2 ) , (x3 x) . √2 s 2 − 3 2√2 − 5 2√2   You will meet these polynomials while studying differential equations.

6.5 Orthogonal Complement Definition 6.7 Let V be an inner product space. Given v, w V we say v is perpendicular ∈ to W and write v w if v, w =0. Given a subspace W of V, we define ⊥ h i W ⊥ := v V : v w, w W { ∈ ⊥ ∀ ∈ } and call it the orthogonal complement of W. Proposition 6.4 Let W be a subspace of an inner product space. Then (i) W ⊥ is a subspace of V ; (ii) W W ⊥ = (0); ∩ (iii) W (W ⊥)⊥; ⊂ Proof: All the proofs are easy. Proposition 6.5 Let W be a finite dimensional subspace of a vector space. Then every element v V is expressible as a unique sum ∈ v = w + w⊥ where w W and w⊥ W ⊥. Further w is characterised by the property that it is the near ∈ ∈ most point in W to v.

54 k Proof: Fix an orthonormal basis w1,..., wk for W. Define w = wi, v wi. Take { } i=1h i w⊥ = v w and verify that w⊥ W ⊥. This proves the existence of such an expression. − ∈ P Now suppose, we have v = u + u⊥ is another similar expression. Then it follows that w u = w⊥ u⊥ W W ⊥ = (0). Therefore w = u and w⊥ = u⊥. This proves the − − ∈ ∩ uniqueness. Finally, let u W be any vector. Then v u = (v w)+(w u) and ∈ − − − since w u W and v w W ⊥, we can apply Pythagorus theorem to conclude that − ∈ − ∈ v u 2 v w 2 and equality holds iff u = w. This proves that w is the (only) nearest k − k ≥k − k point to v in W. ♠ Remark 6.3 (i) The element w so obtained is called the orthogonal projection of v on W. One can easily check that the assignment v w itself defines a PW : V W. (Use the formula 7→ −→ 2 PW (v)= wi, v wi.) Also observe that PW is identity on W and hence P = PW . ih i W (ii) ObserveP that if V itself is finite dimensional, then it follows that the above proposition is applicable to all subspaces in V. In this case it easily follows that dim W + dim W ⊥ = dimV (see exercise at the end of section Linear Dependence.)

6.6 Least square problem As an application of orthogonal projection, let us consider the following problem. Suppose we are given a system of linear equations and a likely candidate for the solution. If this candidate is actually a solution well and good. Otherwise we would like to get a solution which is nearmost to the candidate. This problem is a typical example of finding ‘best approximations’. (It can also be discussed using Lagrange multipliers.) Thus if the system is a homgeneous set of m linear equations involving n variables, we get a linear transformation f : Rn Rm associated to it and the space of solutions is −→ nothing but the subspace W = (f), the null space. Given a point v Rn the nearmost N ∈ point in W to v is nothing but the orthogonal projection of v on W. In case the system is inhomogeneous, we still consider the linear map f defined by the homogeneous part and put W = (f). Then we must first of all have one correct solution p of the inhomogeneous N system. We then take v p, take its projection on W as above and then add back p to it − to get the nearmost solution to v. Problem (Method of Least Squares) Given a linear system of m equations in n variables Ax = b, and an n-vector v, find the solution y such that y v is minimal. k − k Answer: Step 1. Apply GJEM to find a basis for the null-space W of A and a partic- ular solution p of the system. Step 2. Apply Gram-Schmidt process to the basis of W to get an orthonomal basis

v1,..., vk for W. { } k Step 3. Let PW (v p) = (v p), vj vj denote the orthogonal projection of v p − j=1h − i − on the space W. Take y = Pw(v p)+ p. (Verify that this is the required solution. ) P −

Example 6.3 Consider a system of three linear equations in 5 variables x1,...,x5 given by

2x3 2x4 +x5 = 2 − 2x2 8x3 +14x4 5x5 = 2 − − x2 +3x3 +x5 = 8

55 (a) Write down a basis for the space of solution of the associated homogeneous system. (b) Given the vector v = (1, 6, 2, 1, 1)t, find the solution to the system which is nearmost to v.

Solution: The form of the system is

00 2 2 1 2 −  0 2 8 14 5 2 

− −  013 0 1 8     

Applying GJEM, we see that the Gauss-Jordan form of this system is

010 3 1/2 5 −  0 0 1 1 1/2 1 

−  0000 0 0     

Therefore the general solution is of the form

1 x2 5 0 3 1/2 x1 = −   . " x3 # " 10 1 1/2 #  x4  −    x5      A particular solution of the system is obtained by taking x1 = x4 = x5 = 0 and hence x2 = 5, x3 = 1, i.e., p = (0, 5, 1, 0, 0). Also the space W of solutions of the homogeneous system is given by 0 x2 5 0 3 1/2 x1 = −   . " x3 # " 10 1 1/2 #  x4  −    x5      Therefore a basis for this space W (which is the null-space of the un-augemented matrix) is (1, 0, 0, 0, 0)t, (0, 3, 1, 1, 0)t, (0, 1/2, 1/2, 0, 1)t { − − } So we consider the orthogonal projection of v p = (1, 1, 1, 1, 1) on the space W which is − equal to t t 1 t 2 w = pW (v p)=(1, 0, 0, 0, 0) (0, 3, 1, 1, 0) ( )+(0, 1/2, 1/2, 0, 1) ( ) − − − 11 − 3 = (1, 2/33, 8/33, 1/11, 2/3)t. − Now the solution to the original system which is nearmost to the given vector v is obtained by adding p to w, viz. Answer: (1, 167/33, 25/33, 1/11, 2/3)t.

56 7 Eigenvalues and Eigenvectors

7.1 Introduction The simplest of matrices are the diagonal ones. Thus a linear map will be also easy to handle if its associated matrix is a . Then again we have seen that the matrix associated depends upon the choice of the bases to some extent. This naturally leads us to the problem of investigating the existence and construction of a suitable basis with respect to which the matrix associated to a given linear transformation is diagonal. Definition 7.1 A n n matrix A is called diagonalizable if there exists an invertible n n × × matrix M such that M −1AM is a diagonal matrix. A linear map f : V V is called −→ diagonalizable if the matrix associated to f with respect to some basis is diagonal.

Remark 7.1 (i) Clearly, f is diagonalizable iff the matrix associated to f with respect to some basis (any basis) is diagonalizable.

(ii) Let v1,..., vn be a basis. The matrix Mf of a linear transformation f w.r.t. this basis { } is diagonal iff f(vi)= λivi, 1 i n for some scalars λi. Naturally a subquestion here is: ≤ ≤ does there exist such a basis for a given linear transformation?

Definition 7.2 Given a linear map f : V V we say v V is an eigenvector for f if −→ ∈ v = 0 and f(v) = λv for some λ K. In that case λ is called as eigenvalue of f. For a 6 ∈ square matrix A we say λ is an eigenvalue if there exists a non zero column vector v such that Av = λv. Of course v is then called the eigenvector of A corresponding to λ.

Remark 7.2 (i) It is easy to see that eigenvalues and eigenvectors of a linear transformation are same as those of the associated matrix. (ii) Even if a linear map is not diagonalizable, the existence of eigenvectors and eigenvalues itself throws some light on the nature of the linear map. Thus the study of eigenvalues becomes extremely important. They arise naturally in the study of differential equations. Here we shall use them to address the problem of diagonalization and then see some geometric applications of diagonalization itself.

7.2 Characteristic Polynomial Proposition 7.1 (1) Eigenvalues of a square matrix A are solutions of the equation

χA(λ) = det (A λI)=0. − (2)The null space of A λI is equal to the eigenspace −

EA(λ) := v : Av = λv = (A λI). { } N − Proof: (1) If v is an eigenvector of A then v = 0 and Av = λv for some scalar λ. Hence 6 (A λI)v = 0. Thus the nullity of A λI is positive. Hence (A λI) is less than n. − − − Hence det (A λI)=0. − (2) EA(λ)= v V : Av = λv = v V : (A λI)v =0 = (A λI). { ∈ } { ∈ − } N − ♠ 57 Definition 7.3 For any square matrix A, the polynomial χA(λ) = det (A λI) in λ is called − the characteristic polynomial of A.

Example 7.1 1 2 (1) A = . To find the eigenvalues of A, we solve the equation " 0 3 # 1 λ 2 det (A λI) = det − = (1 λ)(3 λ)=0. − " 0 3 λ # − − − Hence the eigenvalues of A are 1 and 3. Let us calculate the eigenspaces E(1) and E(3). By definition E(1) = v (A I)v =0 and E(3) = v (A 3I)v =0 . { | − } { | − } 0 2 0 2 x 2y 0 A I = . Hence (x, y)t E(1) iff = = . Hence − " 0 2 # ∈ " 0 2 # " y # " 2y # " 0 # E(1) = L (1, 0) . { }

1 3 2 2 2 2 2 x 0 A 3I = − = − . Suppose − = . − " 0 3 3 # " 0 0 # " 0 0 # " y # " 0 # − 2x +2y 0 Then − = . This is possible iff x = y. Thus E(3) = L( (1, 1) ). " 0 # " 0 # { }

3 0 0 2 (2) Let A =  2 4 2  . Then det (A λI)=(3 λ) (6 λ). − − − −  2 1 5   −    Hence eigenvalues of A are 3 and 6. The eigenvalue λ =3 is a double root of the charac- teristic polynomial of A. We say that λ =3 has algebraic multiplicity 2. Let us find the eigenspaces E(3) and E(6). 0 0 0 λ =3 : A 3I =  2 1 2  . Hence rank (A 3I)=1. Thus nullity (A 3I)=2. By − − − −  2 1 2   −  solving the system(A 3I)v =0 , we find that −

(A 3I)= EA(3) = L( (1, 0, 1), (1, 2, 0) ). N − { }

The dimension of EA(λ) is called the geometric multiplicity of λ. Hence geometric mul- tiplicity of λ =3 is 2. 3 0 0 − λ =6 : A 6I =  2 2 2  . Hence rank(A 6I)=2. Thus dim EA(6) = 1. (It − − − −  2 1 1   − −  can be shown that (0, 1, 1) is a basis of EA(6).) Thus both the algebraic and geometric { } multiplicities of the eigenvalue 6 are equal to 1. 1 1 (3) A = . Then det (A λI) = (1 λ)2. Thus λ = 1 has algebraic multiplicity " 0 1 # − − 2.

58 0 1 A I = . Hence nullity (A I)=1 and EA(1) = L e1 . In this case the − " 0 0 # − { } geometric multiplicity is less than the algebraic multiplicity of the eigenvalue 1.

Remark 7.3

(i) Observe that χA(λ) = χM −1AM (λ). Thus the characteristic polynomial is an of similarity. Thus the characteristic polynomial of any linear map f : V V is also −→ defined (where V is finite dimensional) by choosing some basis for V, and then taking the characteristic polynomial of the associated matrix (f) of f. This definition does not depend M upon the choice of the basis. (ii) If we expand det (A λI) we see that there is a term −

(a11 λ)(a22 λ) (ann λ). − − ··· − This is the only term which contributes to λn and λn−1. It follows that the degree of the characteristic polynomial is exactly equal to n, the size of the matrix; moreover, the coefficient of the top degree term is equal to ( 1)n. Thus in general, it has n complex roots, some of − which may be repeated, some of them real, and so on. All these patterns are going to influence the geometry of the linear map.

(iii) If A is a real matrix then of course χA(λ) is a real polynomial. That however, does not allow us to conclude that it has real roots. So while discussing eigenvalues we should consider even a real matrix as a complex matrix and keep in mind the associated linear map Cn Cn. The problem of existence of real eigenvalues and real eigenvectors will be −→ discussed soon. (iv) Next, the above observation also shows that the coefficient of λn−1 is equal to

n−1 n−1 ( 1) (a11 + + ann)=( 1) tr A. − ··· − Lemma 7.1 Suppose A is a real matrix with a real eigenvalue λ. Then there exists a real column vector v = 0 such that Av = λv. 6 Proof: Start with Aw = λw where w is a non zero column vector with complex entries. Write w = v + ıv′ where both v, v′ are real vectors. We then have

Av + ıAv′ = λ(v + ıv′)

Compare the real and imaginary parts. Since w =0, at least one of the two v, v′ must be a 6 non zero vector and we are done. ♠

Proposition 7.2 Let A be an n n matrix with eigenvalues λ1,λ2,...,λn. Then × (i) tr (A)= λ1 + λ2 + . . . + λn.

(ii) det A = λ1λ2 ...λn.

Proof: The characteristic polynomial of A is

a11 λ a12 a1n − ···  a21 a22 λ a2n  det (A λI) = det . .− ··· . −  . . .   . . .     an1 an2 ann λ     ··· − 

59 n n n−1 n−1 ( 1) λ +( 1) λ (a11 + . . . + ann)+ . . . (48) − − Put λ =0 to get det A = constant term of det (A λI). − Since λ1,λ2,...,λn are roots of det(A λI) = 0 we have − n det (A λI) = ( 1) (λ λ1)(λ λ2) . . . (λ λn). (49) − − − − − (50)

n n n−1 n ( 1) [λ (λ1 + λ2 + . . . + λn)λ + . . . +( 1) λ1λ2 ...λn]. (51) − − −

Comparing (49) and 51), we get, the constant term of det (A λI) is equal to λ1λ2 ...λn = − det A and tr(A)= a11 + a22 + . . . + ann = λ1 + λ2 + . . . + λn. ♠

Proposition 7.3 Let v1, v2,..., vk be eigenvectors of a matrix A associated to distinct eigenvalues λ1,λ2,...,λk. Then v1, v2,..., vk are linearly independent.

Proof: Apply induction on k. It is clear for k =1. Suppose k 2 and c1v1 + . . . + ckvk =0 ≥ for some scalars c1,c2,...,ck. Hence c1Av1 + c2Av2 + . . . + ckAvk =0 Hence

c1λ1v1 + c2λ2v2 + . . . + ckλkvk =0 Hence

λ1(c1v1 + c2v2 + . . . + ckvk) (λ1c1v1 + λ2c2v2 + . . . + λkckvk) − =(λ1 λ2)c2v2 +(λ1 λ3)c3v3 + . . . +(λ1 λk)ckvk =0 − − −

By induction, v2, v3,..., vk are linearly independent. Hence (λ1 λj)cj = 0 for j = − 2, 3,...,k. Since λ1 = λj for j = 2, 3,...,k, cj = 0 for j = 2, 3,...,k. Hence c1 is also 6 zero. Thus v1, v2,..., vk are linearly independent. ♠

Proposition 7.4 Suppose A is an n n matrix. Let A have n distinct eigenvalues λ1,λ2,...,λn. × Let C be the matrix whose column vectors are respectively v1, v2,..., vn where vi is an eigen- vector for λi for i =1, 2,...,n. Then

−1 C AC = D(λ1,...,λn)= D the diagonal matrix.

i th Proof: It is enough to prove AC = CD. For i = 1, 2,...,n : let C (= vi) denote the i column of C etc.. Then i i (AC) = AC = Avi = λivi. Similarly, i i (CD) = CD = λivi. Hence AC = CD as required.] ♠

60 7.3 Relation Between Algebraic and Geometric Multiplicities Recall that

Definition 7.4 The algebraic multiplicity aA(µ) of an eigenvalue µ of a matrix A is defined k to be the multiplicity k of the root µ of the polynomial χA(λ). This means that (λ µ) divides k+1 − χA(λ) whereas (λ µ) does not. − Definition 7.5 The geometric multiplicity of an eigenvalue µ of A is defined to be the dimension of the eigenspace EA(λ);

gA(λ) := dim EA(λ).

Proposition 7.5 Both algebraic multiplicity and the geometric multiplicities are invariant of similarity.

Proof: We have already seen that for any C, χA(λ) = χC−1AC (λ). Thus the invariance of algebraic multiplicity is clear. On the other hand check that EC−1AC (λ)=

C(EA(λ)). Therefore, dim (EC−1AC (λ)) = dimC(EAλ)) = dim (EA(λ)), the last equality being the consequence of invertibility of C.

♠ We have observed in a few examples that the geometric multiplicity of an eigenvalue is at most its algebraic multiplicity. This is true in general.

Proposition 7.6 Let A be an n n matrix. Then the geometric multiplicity of an eigenvalue × µ of A is less than or equal to the algebraic multiplicity of µ.

k k+1 Proof: Put aA(µ) = k. Then (λ µ) divides det (A λI) but (λ µ) does not. − − − Let gA(µ) = g, be the geometric multiplicity of µ. Then EA(µ) has a basis consisting n of g eigenvectors v1, v2,..., vg. We can extend this basis of EA(µ) to a basis of C , say j v1, v2,..., vg,..., vn . Let B be the matrix such that B = vj. Then B is an invertible { } matrix and µIg X B−1AB =        0 Y      where X is a g (n g) matrix and Y is an (n g) (n g) matrix. Therefore, × − − × − det (A λI) = det [B−1(A λI)B] = det (B−1AB λI) − − − = (det (µ λ)Ig)(det (C λIn−g) −g − = (µ λ) det (Y λIn−g). − − Thus g k. ≤ ♠ Remark 7.4 We will now be able to say something about the diagonalizability of a given −1 matrix A. Assuming that there exists B such that B AB = D(λ1,...,λn), as seen in the previous proposition, it follows that AB = BD... etc.. ABi = λBi where Bi denotes the ith column vector of B. Thus we need not hunt for B anywhere but look for eigenvectors of A. Of course Bi are linearly independent, since B is invertible. Now the problem turns to

61 the question whether we have n linearly independent eigenvectors of A so that they can be chosen for the columns of B. The previous proposition took care of one such case, viz., when the eigenvalues are distinct. In general, this condition is not forced on us. Observe that the geometric multiplicity and algebraic multiplicity of an eigenvalue co-incide for a diagonal matrix. Since these concepts are similarity invariants, it is necessary that the same is true for any matrix which is diagonalizable. This turns out to be sufficient also. The following theorem gives the correct condition for diagonalization.

Theorem 7.1 A n n matrix A is diagonalizable if and only if for each eigenvalue µ of A × we have the algebraic and geometric multiplicities are equal: aA(µ)= gA(µ).

Proof: We have already seen the necessity of the condition. To prove the converse, suppose that the two multiplicities coincide for each eigenvalue. Suppose that λ1,λ2,...,λk are all the eigenvalues of A with algebraic multiplicities n1, n2,...,nk. Let

B1 = v11, v12,..., v1n1 = a basis of E(λ1), { } B2 = v21, v22,..., v2n2 = a basis of E(λ2), { } . .

Bk = vk1, vk2,..., vknk = a basis of E(λk). { }

Use induction on k to show that B = B1 B2 . . . Bk is a linearly independent set. (The ∪ ∪ ∪ proof is exactly similar to the proof of proposition (7.3). Denote the matrix with columns as elements of the basis B also by B itself. Then, check that B−1AB is a diagonal matrix. Hence A is diagonalizable. ♠

62