<<

T.K.SUBRAHMONIAN MOOTHATHU

Contents

1. Vector spaces and subspaces 2 2. 5 3. Span, , and 7 4. Linear operators 11 5. Linear functionals and the dual 15 6. Linear operators = matrices 18 7. Multiplying a left and right 22 8. Solving linear equations 28 9. Eigenvalues, eigenvectors, and the triangular form 32 10. Inner product spaces 36 11. Orthogonal projections and least square solutions 39 12. Unitary operators: they preserve distances and angles 43 13. Orthogonal diagonalization of normal and self-adjoint operators 46 14. Singular value decomposition 50 15. of a : properties 51 16. Determinant: existence and expressions 54 17. Minimal and characteristic 58 18. Primary decomposition theorem 61 19. The Jordan (JCF) 65 20. Trace of a square matrix 71 21. Quadratic forms 73

Developing effective techniques to solve systems of linear equations is one important goal of Linear Algebra. Our study of Linear Algebra will proceed through two parallel roads: (i) the theory of linear operators, and (ii) the theory of matrices. Why do we say parallel roads? Well, here is an illustrative example. Consider the following system of linear equations: 1 2 T.K.SUBRAHMONIAN MOOTHATHU

2x1 + 3x2 = 7

4x1 − x2 = 5. 2 We are interested in two questions: is there a solution (x1, x2) ∈ R to the above? and if there is a solution, is the solution unique? This problem can be formulated in two other ways:

2 2 First reformulation: Let T : R → R be T (x1, x2) = (2x1 + 3x2, 4x1 − x2). Does (7, 5) belong to the range of T ? and if so, is the pre-image T −1(7, 5) a singleton?     2 3 7 Second reformulation: Let A =   and Y =  . Does there exist a real matrix 4 −1 5  

x1 X =   such that AX = Y ? and if so, is X unique? x2

The first reformulation is in terms of a linear operator T and the second reformulation is in terms of a matrix A. Often, a statement in terms of linear operators has an equivalent version in terms of matrices, and vice versa. That is why we say the two theories are parallel roads.

A student may keep the following challenge in mind throughout the course. When you encounter a result about linear operators, see whether you can formulate a corresponding result about matrices, and vice versa. If you are successful in translating the result, try to provide a proof also.

1. Vector spaces and subspaces

We assume that the student has an elementary knowledge of groups and fields, and some famil- iarity with addition and multiplication of matrices (excluding the theory of ).

Definition: Let F be a field. An (V, +) is said to be a over F if there is a map ∗ : F × V → V called multiplication satisfying the following: (i) 1 ∗ v = v for every v ∈ V , where 1 is the multiplicative identity of the field F . (ii) c ∗ (d ∗ v) = (cd) ∗ v for every v ∈ V and c, d ∈ F . (iii) (c + d) ∗ v = c ∗ v + d ∗ v for every v ∈ V and c, d ∈ F . (iv) c ∗ (u + v) = c ∗ u + c ∗ v for every u, v ∈ V and c ∈ F . Elements of V are called vectors and elements of F are called scalars.

Remark: In the above, if F = R, then V is said to be a real vector space; and if F = C, then V is said to be a complex vector space.

Example: (i) If F is a field and n ∈ N, then F n is a vector space over F with respect to the coordinatewise operations given as (x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn) and n n c(x1, . . . , xn) = (cx1, . . . , cxn). In particular, R is a real vector space, and C is a complex vector LINEAR ALGEBRA 3 space. Also F 0 := {0} is a vector space over F . (ii) If F is a subfield of a field K, then K is a vector space over F under the natural operations. For instance, R is a vector space over Q, and C is a vector space over R. (iii) If F is a field, then the ring {all polynomials with coefficients in F } is a vector space over F under the operations (f + g)(x) = f(x) + g(x) and (cf)(x) = cf(x).

Remark: Linear Algebra is mainly concerned with finite dimensional vector spaces. Later we will see that any finite dimensional vector space over a field F is isomorphic to F n for some n ≥ 0.

If (G, +) is an abelian group and c ∈ G, then we can define nc ∈ G for any n ∈ Z by the following rules: 0 · c := 0, 1 · c = c,(n + 1) · c := n · c + c and (−n) · c := n · (−c) for n ∈ N.

Exercise-1: Let V be a vector space over a field F , and let v ∈ V , c ∈ F . Then, (i) 0 ∗ v = 0 ∈ V and (−1) ∗ v = −v. More generally, (nc) ∗ v = c ∗ (nv) for every n ∈ Z. (ii) If v ≠ 0 and c ∗ v = 0 ∈ V , then c = 0 ∈ F .[Hint: (i) 0 ∗ v = (0 + 0) ∗ v = 0 ∗ v + 0 ∗ v and use the cancellation law in the group (V, +). Next, 0 = 0 ∗ v = (1 + (−1)) ∗ v. (ii) Multiply by c−1.]

Remark: From now onwards we drop the ∗ symbol and just write cv instead of c ∗ v. Also it is convenient to use the following notations: if V is a vector space over a field F , v ∈ V , and W, Z ⊂ V , let F v = {cv : c ∈ F }, v + W = {v + w : w ∈ W }, FW = {cw : c ∈ F, w ∈ W }, and

W + Z = {w + z : w ∈ W, z ∈ Z}. In particular, we have W + W = {w1 + w2 : w1, w2 ∈ W }.

Next we talk about ways to form new vector spaces. In Group Theory, three easy methods to produce new groups are: take subgroups, form quotient groups, and form the product of some groups. Similarly, in Linear Algebra, three easy methods to produce new vector spaces are: take vector subspaces, form quotient vector spaces, and form the product of some vector spaces.

Definition: Let (V, +) be a vector space over a field F . A subgroup W of (V, +) is said to be a vector subspace of V if FW ⊂ W . Generally, we will use the equivalent condition (ii) given below.

Exercise-2: Let (V, +) be a vector space over a field F and W ⊂ V be a nonempty set. Then the following are equivalent: (i) W is a vector subspace of V .

(ii) W + W ⊂ W and FW ⊂ W , i.e., w1 + w2 ∈ W and cw1 ∈ W for every w1, w2 ∈ W and c ∈ F .

(iii) FW + W ⊂ W , i.e., cw1 + w2 ∈ W for every w1, w2 ∈ W and c ∈ F .

(iv) FW + FW ⊂ W , i.e., cw1 + dw2 ∈ W for every w1, w2 ∈ W and c, d ∈ F .

Example: Any line passing through the origin and any containing the origin are proper vector subspaces of R3. The remaining vector subspaces of R3 are the trivial ones, {0} and R3.

Exercise-3: Let V is a vector space over a field F . Then, 4 T.K.SUBRAHMONIAN MOOTHATHU

(i) If v ∈ V , then F v is a vector subspace of V . (ii) The intersection of any collection of vector subspaces of V is again a vector subspace of V . What about the union?

(iii) If W1,...,Wn ⊂ V are vector subspaces of V , then W1 + ··· + Wn is a vector subspace of V .

In finite dimension, vector subspaces are related to homogeneous systems of linear equations, whereas various spaces have infinite dimensional vector subspaces; see below.

Definition: Let F be a field and consider a system of m linear equations in n variables over F (here,

‘over F ’ means the constants aij, yi below are from F ):

a11x1 + ··· + a1nxn = y1

a21x1 + ··· + a2nxn = y2 . . a x + ··· + a x = y . m1 1 mn n m     x1  y1       .   .  In matrix form, this is AX = Y , where A = [aij]m×n, X =  . , and Y =  . . Often it is

xn ym n m convenient to make the identifications X ∼ x = (x1, . . . , xn) ∈ F and Y ∼ y = (y1, . . . , ym) ∈ F . Then we may write Ax = y for AX = Y . The set {x ∈ F n : Ax = y} is the solution space of the system AX = Y . The system AX = Y is homogeneous if Y = 0, and non-homogeneous if Y ≠ 0.

Example:[Finite dimensional] Let F be a field and AX = 0 be a homogeneous system of m linear equations in n variables over F . Verify that the solution space W := {x ∈ F n : Ax = 0} is a vector subspace of F n. Later we will see that every vector subspace of F n is of this form.

A general principle: We are yet to define dimension, but the following remark can be kept in mind. Typically (but not always) each reduces the dimension of the solution space by 1. If AX = 0 is a homogeneous system of m linear equations in n variables over a field F , and if m ≤ n, then typically (but not always) the solution space W will have dimension n − m. For example, in 3 R , the solution space of a single linear equation ax1 + bx2 + cx3 = 0 is typically a plane (and not a line), and the solution space in R3 of a system of two linear equations is typically a line (being the intersection of two planes). Let F be a field and 0 ≤ k < n. We will show later that any k-dimensional vector subspace W of F n can be realized as the solution space of a homogeneous system AX = 0 of n − k linear equations in n variables over F . LINEAR ALGEBRA 5

Example:[Infinite dimensional] Let a < b be reals and V = {all functions f :[a, b] → R}. Then V is a real vector space with respect to pointwise operations, i.e., (f + g)(x) := f(x) + g(x) and

(cf)(x) := cf(x) for f, g ∈ V , c ∈ R, and x ∈ [a, b]. Let W1 = {f ∈ V : f is a polynomial},

W2 = {f ∈ V : f is differentiable}, W3 = {f ∈ V : f is continuous},

W4 = {f ∈ V : f is Riemann integrable}, and W5 = {f ∈ V : f is bounded}. Then,

W1 ⊂ W2 ⊂ W3 ⊂ W4 ⊂ W5, and each Wj is a vector subspace (in fact an infinite dimensional vector subspace) of V . With a little Analysis, try to show Wj ≠ Wj+1 for 1 ≤ j ≤ 4.

Remark: [Convex sets] A vector space can have interesting structured subsets other than vector subspaces. Let V be a real vector space. A subset W ⊂ V is said to be convex if for any two w, z ∈ W , the line segment joining w and z is contained in W , i.e., if (1 − c)w + cz ∈ W for every w, z ∈ W and c ∈ [0, 1]. The empty set and any vector subspace of V are clearly convex. Verify that any translate of a convex set is convex, and in particular, v + W is convex for any v ∈ V and any vector subspace W of V . A special example of a convex set is a line. If w, z ∈ V are distinct points, then the line passing through w and z is given by {(1 − c)w + cz : c ∈ R} = {w + c(z − w): c ∈ R} = w + R(z − w). The union of two convex subsets of V may not be convex. However, the intersection of any collection of convex subsets of V is convex. In particular, if Y ⊂ V is a subset, then the intersection of all convex subsets of V containing Y is convex, it is the smallest convex set containing Y , and is called the convex hull of Y . If Y = {v1, . . . , vk} ⊂ V , then it may be checked ∑ ∑ { k ∈ k } that the convex hull of Y is the set j=1 cjvj : cj [0, 1] and j=1 cj = 1 .

We conclude this section by defining quotient vector spaces and product of vector spaces.

Definition: Let F be a field. (i) If V is a vector space over F and W ⊂ V is a vector subspace, then we know from group theory that the coset space V/W = {v + W : v ∈ V } is an abelian quotient group of (V, +) with respect to the operation (v1 + W ) + (v2 + W ) := v1 + v2 + W . Moreover, V/W is a vector space over F with respect to the defined as c(v + W ) := cv + W (see that this is well-defined). We call V/W the quotient vector space of V by W .

(ii) Let {Vj : j ∈ J} be a collection of vector spaces over F and let V = Πj∈J Vj. Then V is a vector space over F with respect to coordinatewise operations. We say V is the product of Vj’s.

2. Linear independence

Definition: Let V be a vector space over a field F . By a of a finite sequence ∑ n of vectors v1, . . . , vn ∈ V we mean an element of the form cjvj ∈ V with cj ∈ F . A linear ∑ j=1 n combination j=1 cjvj is non-trivial if at least one cj is non-zero. 6 T.K.SUBRAHMONIAN MOOTHATHU

(i) A finite sequence v1, . . . , vn of vectors in V is said to be linearly dependent if some non-trivial ∑ n linear combination j=1 cjvj of v1, . . . , vn is equal to 0; otherwise (i.e., if only the trivial linear combination 0v1 + ··· + 0vn is equal to 0), then we say v1, . . . , vn are linearly independent. (ii) A nonempty subset U ⊂ V is linearly independent if every finite sequence of distinct vectors in U is linearly independent. Otherwise, we say U is linearly dependent. (iii) By convention, we take the empty set to be linearly independent.

Remark: By the above definition, every subset of a linearly independent set is linearly independent; equivalently, every superset of a linearly dependent set is linearly dependent.

Exercise-4: Let V be a vector space over a field F . (i) Let v ∈ V . Then {v} is linearly dependent iff v = 0. (ii) Let v ∈ V \{0} and w ∈ V . Then {v, w} is linearly dependent iff w ∈ F v. (iii) Let U ⊂ V and |U| ≥ 2. Then U is linearly dependent iff some u ∈ U is a finite linear combina- ∑ ∑ ⇒ n ̸ − tion of other members of U.[Hint: : If j=1 cjvj = 0 and ck = 0, then vk = j≠ k( cj/ck)vj.] (iv) If U ⊂ V is linearly independent and v ∈ V is not a finite linear combination of members of U, then U ∪ {v} is linearly independent. (v) Let U, Z ⊂ V be linearly independent subsets and suppose that no member of Z is a finite linear combination of members of U. Then should U ∪ Z be linearly independent? [Hint: No.]

n Example: If F is a field and n ∈ N, then the vectors e1, . . . , en ∈ F are linearly independent in n F , where ej denotes the vector whose jth coordinate is 1 and all other coordinates are 0.

Example: [A little digression to Number Theory] A real number r is said to be an algebraic number if there is a non-zero polynomial p with rational coefficients such that p(r) = 0. Otherwise r is √ √ said to be a transcendental number. For example 2 is an algebraic number since p( 2) = 0 for p(x) = x2 −2. It can be shown (try!) that there are only countably many polynomials with rational coefficients. Since any non-zero polynomial has only finitely many roots, it follows that the set of algebraic numbers is countable, and consequently the set of transcendental numbers is uncountable as R is uncountable. The numbers e, π are known to be transcendental, but it is not known whether e + π, eπ are algebraic or transcendental. Observe from the definition that r ∈ R is transcendental iff {1, r, r2, . . . , rn} is linearly independent over Q for every n ∈ N. Also, r ∈ R is irrational iff {1, r} is linearly independent over Q. Note here that the notion of linear independence depends on √ the underlying field: {1, 2} is linearly independent over Q and linearly dependent over R.

Example: Consider the real vector space V = {f : [0, 1] → R : f is a polynomial}, and let fk ∈ V k be fk(x) = x for k ≥ 0 so that every f ∈ V is a finite linear combination of fk’s. Suppose LINEAR ALGEBRA 7 ∑ ∑ ··· n ∈ n kj k1 < < kn and j=1 cjfkj = 0 V . Then the polynomial j=1 cjx is identically zero on

[0, 1], which is possible only if all the cj’s are zero. This proves {fk : k ≥ 0} is linearly independent.

Example: Consider the complex vector space V = {f : [0, 2π] → C : f is continuous} and let ikx fk ∈ V be fk(x) = e for k ∈ Z. Observe that fkfm = fk+m and  ∫ 0, if k ≠ 0, 2π eikxdx = 0  2π, if k = 0. ··· ··· ∈ ∈ C ∈ { } Now suppose k1 < < kn and c1fk1 + +cnfkn = 0 V , where cj . Fixing j 1, . . . , n , ··· multiply the equation c1fk1 + + cnfkn = 0 by f−kj and integrate from 0 to 2π. By the above observation, we get 2πcj = 0, or cj = 0. We conclude {fk : k ∈ Z} is linearly independent.

Exercise-5: (i) Let F be a field. For i = 1, 2, let Vi be a vector space over F , and Ui ⊂ Vi be a linearly independent set. Check that U1 × U2 a linearly independent subset of V1 × V2. (ii) Consider the real vector space R3. Find all v ∈ R3 such that {(1, 0, 0), (1, 1, 0), v} is linearly independent. Also find all w ∈ R3 such that {(1, 0, 0), (0, 1, 1), w} is linearly independent.

3. Span, basis, and dimension

The intersection of a collection of subgroups of a group G is again a subgroup of G so that one may talk about the subgroup generated by a subset of G. In the last section we saw that the intersection of a collection of convex sets is again convex so that we may talk about the convex set generated by (i.e, the convex hull of) a given set. A similar phenomenon is true for vector subspaces, where we use the word span in the sense of generate.

Definition: Let V be a vector space over a field F . If U ⊂ V is a subset, we define span(U) as the vector subspace of V generated by U, i.e., the smallest vector subspace of V containing U. Because of Exercise-3(ii), span(U) is well-defined and is equal to the intersection of all vector subspaces of V containing U. Note that if U = ∅, then span(U) = {0}.

Exercise-6: Let V be a vector space over a field F , and let U ⊂ V be nonempty. Then, ∑ { n ∈ N ∈ ∈ } (i) span(U) = j=1 cjvj : n , cj F, and vj U . (ii) If U ⊂ Z ⊂ span(U), then span(Z) = span(U). [Hint: (i) Verify that U ⊂ RHS ⊂ span(U) and RHS is a vector subspace of V .]

Definition: Let V be a vector space over a field F . We say a subset U ⊂ V is a basis for V if U is linearly independent over F , and span(U) = V . 8 T.K.SUBRAHMONIAN MOOTHATHU

Example: (i) ∅ is a basis for the trivial vector space {0}. (ii) If F is a field, then {e1, . . . , en} is a n k basis for F over F . (iii) If fk(x) = x , then {fk : k ≥ 0} is a basis for V = {all polynomials p : 2 2 [0, 1] → R} over R. (iv) Let w = (a, b) ∈ R . Then {e1, w} is a basis for R over R iff b ≠ 0.

Zorn’s lemma: Let (F, ≤) be a partially ordered set. If every chain in F has an upper bound in F, then F has a maximal element. (Here, a chain in F means a totally ordered subset of F).

[101] [Existence of basis] Let V be a vector space over a field F and Z1 ⊂ Z2 ⊂ V be subsets such that Z1 is linearly independent and span(Z2) = V . Then V has a basis U with Z1 ⊂ U ⊂ Z2.

Proof. Let F be the collection of all linearly independent subsets U of V with Z1 ⊂ U ⊂ Z2, partially ordered by inclusion. Then F ̸= ∅ since Z1 ∈ F. If Z2 is a finite set, then F is also finite, and therefore F has a maximal element. Even if Z2 is not finite, the existence of a maximal element of F is guaranteed by Zorn’s lemma because, if {Uα : α ∈ J} is chain in F, then it may be checked ∪ ∈ F F that α∈J Uα and is an upper bound to the chain. Let U denote a maximal element of . By the definition of F, we have that U is linearly independent and Z1 ⊂ U ⊂ Z2. It remains to show span(U) = V = span(Z2). For this, it suffices to show Z2 ⊂ span(U). If Z2 is not contained in span(U), pick v ∈ Z2 \ span(U) and show U ∪ {v} ∈ F to contradict the maximality of U. 

[101′] [Corollary] Let V be a vector space over a field F . Then, (i) V has a basis.

(ii) If Z1 ⊂ V is linearly independent, then V has a basis U with Z1 ⊂ U.

(iii) If Z2 ⊂ V is such that span(Z2) = V , then V has a basis U with U ⊂ Z2.

Proof. For (i), Take Z1 = ∅ and Z2 = V in [101]. For (ii), take Z2 = V . For (iii), take Z1 = ∅. 

Remark: If U = {v1, . . . , , vn} is a basis of a vector space V and if W ⊂ V is a vector subspace, then it may not be possible to find a basis of W as a subset of U; in fact, U can be disjoint with 2 2 W ! For example, consider V = R , U = {e1, e2} and W = {(x1, x2) ∈ R : x1 = x2}.

[102] [Equivalent descriptions of a basis] Let V be a vector space over a field F , and U ⊂ V . Then the following are equivalent: (i) U is a basis for V . (ii) Every v ∈ V can be written uniquely as a finite linear combination of members of U. (iii) U is a minimal spanning set for V . That is, if Z ⊂ U and span(Z) = V , then Z = U. (iv) U is a maximal linearly independent set in V . That is, if U ⊂ Z ⊂ V and Z is linearly independent, then Z = U. LINEAR ALGEBRA 9

Proof. (i) ⇒ (ii): Suppose v ∈ V has two different representations as linear combinations of mem- bers of U. Adding terms of the form 0u to the expressions if necessary, we may assume the two ∑ ∑ n n expressions are v = cjvj = djvj, where v1, . . . , vn ∈ U are distinct and cj ≠ dj for at j=1 ∑ j=1 0 0 − n − least one j0. Then 0 = v v = j=1(cj dj)vj, contradicting the linear independence of vj’s.

(ii) ⇒ (iii): If Z ≠ U, any u ∈ U \ Z has two distinct representations as finite linear combinations ∑ n ∈ ⊂ of members of U, one just as u and another as j=1 cjzj with zj Z U, a contradiction.

(iii) ⇒ (iv): If U is linearly dependent, then by Exercise-4(iii), there is u ∈ U with u ∈ span(U\{u}). Then span(U \{u}) = V , which contradicts the minimality of U. Hence U must be linearly independent. If v ∈ V = span(U), then U ∪ {v} is linearly dependent again by Exercise-4(iii), and this proves U is maximal as a linearly independent set.

(iv) ⇒ (i): If span(U) ≠ V , then for any v ∈ V \ span(U), the set U ∪ {v} is a linearly independent set strictly bigger than U by Exercise-4(iv), a contradiction to the maximality of U. 

Remark: Let V be a vector space over a field F and U ⊂ V . Then we have the following corollary to [102]: U is linearly independent ⇔ U is a basis for span(U) ⇔ every v ∈ span(U) can be written uniquely as a finite linear combination of members of U.

[103] [Dimension theorem] Let V be a vector space over a field F and let U, Z ⊂ V be subsets. (i) If U is linearly independent and Z is a finite set with span(Z) = V , then |U| ≤ |Z|. (ii) If U, Z are bases for V and |Z| < ∞, then |U| = |Z|.

Proof. (i) Towards a contradiction, assume there are distinct members v1, . . . , vn+1 ∈ U, where n = |Z|. Let Up = {v1, . . . , vp} for 1 ≤ p ≤ n. We will show that for each p ≤ n, a p-element subset of Z can be replaced by Up in such a way that the resulting set also spans V . This will imply after n steps that vn+1 ∈ V = span(Un), a contradiction.

Let q = max{p : ∃ W ⊂ Z with |W | = p and span(Z ∪ Up \ W ) = V }. We claim q = n. If q < n, consider W ⊂ Z with |W | = q and span(Z ∪ Uq \ W ) = V . We may write Z \ W = ∑ ∑ { } | | ∈ q n zq+1, . . . , zn since Z = n. Now let cj F be chosen so that vq+1 = j=1 cjvj + j=q+1 cjzj.

Since vq+1 ∈/ span(Uq), there is k ≥ q +1 such that ck ≠ 0. From the expression for vq+1, we deduce ∑ ∑ −1 − q − ∈ ∪ \ ∪ { } zk = ck ( j=1 cjvj + vq+1 j≥q+1; j≠ k cjzj) span([Z Uq+1] [W zk ]), and consequently span([Z ∪ Uq+1] \ [W ∪ {zk}]) = V , a contradiction to the maximality of q. Hence our claim that q = n must be true, completing the proof.

(ii) Apply part (i) first to the pair U, Z and next to the interchanged pair Z,U.  10 T.K.SUBRAHMONIAN MOOTHATHU

Definition: Let V be a vector space over a field F . If V has a finite basis U, then we say V is finite dimensional; and the dimension of V , denoted as dim(V ) or dimF (V ), is defined as the number |U|. This definition is meaningful because of [103]. If V has no finite basis, then we say V is infinite dimensional, and we write dim(V ) = ∞.

n Example: (i) dim({0}) = 0. (ii) If F is a field, then dim(F ) = n. (iii) dimR(C) = 2 since {1, i} is a basis for C over R. (iv) If V = {all polynomials p : [0, 1] → R}, then dim(V ) = ∞. (v) If U ⊂ R is a countable set, then it can be verified that spanQ(U) is also countable, and hence spanQ(U) ≠ R.

Therefore any basis of R over Q must be uncountable, and in particular dimQ(R) = ∞.

Example: Let W, Z be finite dimensional vector subspaces of a vector space V . Then dim(W +Z) = dim(W )+dim(Z)−dim(W ∩Z). Proof : Let U = {v1, . . . , vr} be a basis of W ∩Z. Choose linearly independent sets W1 = {wr+1, . . . , wk} ⊂ W \ (W ∩ Z) and Z1 = {zr+1, . . . , zm} ⊂ Z \ (W ∩ Z) such that U ∪ W1 is a basis of W and U ∪ Z1 is a basis of Z. Clearly W + Z = span(U ∪ W1 ∪ Z1). ∑ ∑ r k It remains to show U ∪ W1 ∪ Z1 is linearly independent. Suppose ajvj + bjwj + ∑ ∑ ∑ ∑ j=1 j=r+1 m m r k cjzj = 0. Then −( cjzj) = ajvj + bjwj ∈ W ∩ Z so there are scalars j=r+1 ∑ ∑j=r+1 ∑ j=1 ∑ j=r+1 m r r m dj with −( cjzj) = djvj, or djvj + cjzj = 0. This gives dj’s and cj’s are j=r+1 j=1 j=1 j=r+1 ∑ ∑ r k zero since U ∪ Z1 is linearly independent. Going back, now we have ajvj + bjwj = ∑ j=1 j=r+1 − m ∪ ( j=r+1 cjzj) = 0, and this gives aj’s and bj’s are zero since U W1 is linearly independent.

Definition: Let V be a vector space and V1,...,Vk ⊂ V be vector subspaces. We say V is a ⊕ ⊕ direct sum of V1,...,Vk and write V = V1 ··· Vk if any v ∈ V can be written uniquely as ⊕ 2 v = v1 + ··· + vk with vj ∈ Vj. For example, R = (x-axis) (y-axis). ⊕ Remarks: (i) When k = 2, observe that V = V1 V2 iff V = V1 + V2 and V1 ∩ V2 = {0}. Thus for ⊕ 2 2 example, if V1,V2 are any two distinct lines passing through the origin of R , then R = V1 V2.

This is no longer true if k ≥ 3. If V1 is the xy-plane, V2 is the z-axis, and V3 is the line in 3 3 R passing through (0, 0, 0), (1, 1, 1), then R = V1 + V2 + V3 and Vi ∩ Vj = {0} for i ≠ j, but 3 R is not a direct sum of V1,V2,V3 since the element (1,1,1) has two different representations

(1, 1, 1) = (0, 0, 0) + (0, 0, 0) + (1, 1, 1) = (1, 1, 0) + (0, 0, 1) + (0, 0, 0) ∈ V1 + V2 + V3. ⊕ ⊕ ∑ ··· k (ii) If V = V1 Vk, then it may be shown by induction on k that dim(V ) = j=1 dim(Vj).

(iii) If V1 ⊂ V is a vector subspace of a vector space V , then ∃ a vector subspace V2 ⊂ V with V = ⊕ V1 V2. To see this, take a basis Z1 of V1, extend Z1 to a basis U of V and put V2 = span(U \ Z1). LINEAR ALGEBRA 11

4. Linear operators

Linear operators are to vector spaces what group homomorphisms are to groups. Some results about linear operators in this section have obvious analogues in Group Theory - try to recall them.

Definition: Let V,W be vector spaces over a field F . A map T : V → W is a linear operator if (i) T :(V, +) → (W, +) is a group homomorphism, i.e., T (u + v) = T (u) + T (v) ∀ u, v ∈ V , (ii) T (cv) = cT (v) for every v ∈ V and c ∈ F . Let L(V,W ) = {T : V → W : T is a linear operator} If T ∈ L(V,W ) is a bijection, then we say that T is an isomorphism and that the vector spaces V,W are isomorphic.

Remark: For T ∈ L(V,W ), observe that: (i) T (0) = 0, (ii) if T is an isomorphism, then T −1 : W → V is also a linear operator (and an isomorphism). In some other categories the analogue is not true: for example, the inverse of a continuous bijection may not be continuous.

Remark: Let V,W be vector spaces over the same field F , and let T : V → W be a map. Then it may be shown that the following are equivalent: (i) T is a linear operator. (ii) T (cu+dv) = cT (u)+dT (v) for every u, v ∈ V and c, d ∈ F . (iii) T (cu + v) = cT (u) + T (v) for every u, v ∈ V and c ∈ F .

Examples: (i) Let F be a field. Then T : F → F is linear iff ∃ a ∈ F such that T (x) = ax for every x ∈ F , where a = T (1). Similarly, T : F 2 → F 2 is linear iff ∃ a, b, c, d ∈ F such that 2 T (x1, x2) = (ax1 +bx2, cx1 +dx2) for every (x1, x2) ∈ F , where (a, c) = T (1, 0) and (b, d) = T (0, 1).

(ii) Consider the vector space V = {f : [0, 1] → R : f is a polynomial} over R. Then T1,T2 : V → V ∫ → R ′ y and T3 : V defined as T1(f) = f , T2(f)(y) = 0 f(x)dx, T3(f) = f(1/4) are linear operators. ⊕ (iii) Let V be a vector space and suppose V = W Z, a direct sum of two vector subspaces. Then T : V → W given by T (w +z) = w is linear, and is called a projection onto W . In this case, I −T is a projection onto Z. A projection from V onto W depends on the complementary vector subspace ⊕ ⊕ 2 e e 2 Z. We have R = W Z = W Z if W = Re1, Z = Re2 and Z = R(e1 + e2). Any (x1, x2) ∈ R ⊕ ⊕ e decomposes as (x1, 0)+(0, x2) in W Z, and as (x1 −x2, 0)+(x2, x2) in W Z. So the projection e to W w.r.to Z is (x1, x2) 7→ (x1, 0), and the projection to W w.r.to Z is (x1, x2) 7→ (x1 − x2, 0).

(iv) If V is the real vector space V = {f : [0, 1] → R : f is a polynomial of degree ≤ 3}, then 4 2 3 T : R → V sending (a0, a1, a2, a3) to the polynomial a0 + a1x + a2x + a3x is an isomorphism.

Exercise-7: Let T : V → W be a linear operator of vector spaces. (i) If U ⊂ V is a vector subspace, then T (U) is a vector subspace of W . In particular, range(T ) := T (V ) is a vector subspace of W . 12 T.K.SUBRAHMONIAN MOOTHATHU

(ii) If Z ⊂ W is a vector subspace, then T −1(Z) := {v ∈ V : T (v) ∈ Z} is a vector subspace of V . In particular, by taking Z = {0}, we get ker(T ) := {v ∈ V : T (v) = 0} is a vector subspace of V . (iii) T is injective iff ker(T ) = {0} (remember to use this later!). [Hint: (iii) ⇐: If T (u) = T (v), then T (u − v) = 0 implying u − v ∈ ker(T ) = {0}.]

Exercise-8: Let V,W be real vector spaces and T ∈ L(V,W ). Then,

(i) If Y is a line segment joining v1, v2 ∈ V , then T (Y ) is a line segment joining T (v1) and T (v2). (ii) If Y ⊂ V is convex, then T (Y ) is convex. (iii) If Y ⊂ V is a line, then T (Y ) is either a singleton or a line in W .

[104] Let T : V → W be a linear operator of vector spaces and U ⊂ V .

(i) If U is linearly independent, and T |span(U) is injective, then T (U) is linearly independent. (ii) T (span(U)) = span(T (U)). (iii) dim(T (V )) ≤ dim(V ).

∑ n Proof. (i) Suppose z1, . . . , zn ∈ T (U) are distinct and cjzj = 0. Let vj ∈ U be with T (vj) = zj. ∑ ∑ ∑ j=1 ∑ n n n n ∈ | We have 0 = j=1 cjzj = j=1 cjT (vj) = T ( j=1 cjvj), implying j=1 cjvj ker(T span(U)) =

{0} by Exercise-7(iii). Hence cj = 0 for every j since {v1, . . . , vn} is linearly independent. ∑ ∑ n n (ii) This is clear since T ( j=1 cjvj) = j=1 cjT (vj).

(iii) If U is a basis for V , then span(T (U)) = T (V ). So T (U) contains a basis of T (V ) by [101′]. 

Exercise-9: [A linear operator is completely specified by its action on a basis - this will be used at many places] Let V,W be vector spaces over the same field F and let U ⊂ V be a basis for V .

(i) If T0 : U → W is a map, then there exists a unique linear operator T : V → W extending T0.

(ii) If T1,T2 : V → W are linear operators such that T1|U = T2|U , then T1 = T2. ∑ ∑ ∈ n ∈ n [Hint: (i) v V has unique expression j=1 cjvj with vj U by [102]. Put T (v) = j=1 cjT0(vj).]

[105] [Basis to basis by isomorphism] Let V,W be finite dimensional vector spaces over the same

field. Let {v1, . . . , vn} be a basis of V , and let w1, . . . , wn ∈ W be distinct. Then there is an isomorphism T : V → W with T (vj) = wj for 1 ≤ j ≤ n ⇔ {w1, . . . , wn} is a basis for W .

Proof. If T : V → W is an isomorphism with T (vj) = wj for 1 ≤ j ≤ n, then {w1, . . . , wn} is a basis of W by [104]. Conversely suppose {w1, . . . , wn} is a basis of W . If we define T on {v1, . . . , vn} as

T (vj) = wj for 1 ≤ j ≤ n, then T extends to a unique operator T ∈ L(V,W ) by Exercise-9. By

[104](ii), we have W = span{w1, . . . , wn} = span(T ({v1, . . . , vn})) = T (span{v1, . . . , vn}) = T (V ), showing T is surjective. To show T is injective, we have to verify ker(T ) = {0}. Suppose v = LINEAR ALGEBRA 13 ∑ ∑ ∑ n ∈ n n j=1 cjvj ker(T ). Then, 0 = T (v) = j=1 cjT (vj) = j=1 cjwj, which gives cj = 0 for every j since w1, . . . , wn are linearly independent. Hence v = 0, and thus ker(T ) = {0}. 

Remark: In particular, by taking W = V in [105] we see that every basis of a finite dimensional vector space V is obtained by applying a self-isomorphism of V to some fixed basis of V .

Observe the following facts from Set Theory: (i) If V,W are nonempty finite sets with |V | = |W |, then there is a bijection from V to W . (ii) Let V,W be sets with |V | < ∞, T : V → W be a map, and let U ⊂ V be a maximal set such that T |U is injective. Then |V | = |V \ U| + |T (V )|. (iii) If V,W are finite sets with |V | = |W |, then a map T : V → W is injective iff T is surjective. Analogous to (i), (ii), (iii), we have results [106], [107], [108] below with dimension playing the role of cardinality.

[106] [Dimension determines a vector space] Let V,W be finite dimensional vector spaces over the same field. Then V is isomorphic to W ⇔ dim(V ) = dim(W ).

Proof. This is just a corollary of [105]. We indicate also a direct (essentially same) argument for the implication ‘⇐’. Since being isomorphic is a transitive relation, it suffices to establish the following: if V is an n-dimensional vector space over a field F , then F n is isomorphic to V . Let ∑ { } n → n U = v1, . . . , vn be a basis for V . Define T : F V as T (c1, . . . , cn) = j=1 cjvj. Check that T is a linear operator. Observe that T is surjective since span(U) = V , and T is injective (equivalently, ker(T ) = {0}) since U is linearly independent. 

Definition: Let T : V → W be a linear operator of vector spaces. The nullity and of T are defined respectively as null(T ) = dim(ker(T )) and rank(T ) = dim(range(T )). For example, if 3 3 T : R → R is T (x1, x2, x3) = (0, x2, x3), then null(T ) = 1 and rank(T ) = 2.

[107] [Nullity-rank theorem] Let T : V → W be a linear operator of vector spaces V,W over the same field. If dim(V ) < ∞, then dim(V ) = null(T ) + rank(T ).

Proof. Let {v1, . . . , vk} be a basis for ker(T ). Now two slightly different proofs are possible. The ′ idea of the first proof is to extend {v1, . . . , vk} to a basis {v1, . . . , vn} of V by [101 ], and then to show that T (vk+1),...,T (vn) are linearly independent vectors spanning T (V ) (try this).

The second proof is as follows. By [104](iii), rank(T ) ≤ dim(V ) < ∞. Let {w1, . . . , wm} be a basis for T (V ). Choose vk+1, . . . , vk+m ∈ V with T (vk+j) = wj. We will show U := {v1, . . . , vk+m} ∑ ∑ ∑ k+m k k+m is a basis for V . If cjvj = 0, then 0 = T (0) = T ( cjvj) + T ( cjvj) = 0 + ∑ ∑ j=1 j=1 j=k+1 k+m m ≤ ≤ { } j=k+1 cjT (vj) = j=1 ck+jwj, which implies ck+j = 0 for 1 j m since w1, . . . , wm is 14 T.K.SUBRAHMONIAN MOOTHATHU ∑ k { } linearly independent. Hence from the starting equation, j=1 cjvj = 0. Since v1, . . . , vk is a basis for ker(T ), we conclude cj = 0 for 1 ≤ j ≤ k also. Thus U is linearly independent. To ∑ ∑ m m show span(U) = V , consider v ∈ V , suppose T (v) = djwj and let u = djvk+j. Since ∑ j∑=1 ∑ j=1 − − k k m ∈  T (v u) = 0, we may write v u = j=1 cjvj. Thus v = j=1 cjvj + j=1 djvk+j span(U).

Exercise-10: Let V be a finite dimensional vector space and W ⊂ V be a vector subspace. Then, (i) dim(V/W ) = dim(V ) − dim(W ). (ii) If dim(W ) = dim(V ), then W = V . [Hint: (i) Let T : V → V/W be the quotient map, T (v) = v + W . Note that T is linear and surjective with ker(T ) = W , and apply [107]. (ii) Let U be a basis for W . If span(U) ≠ V , extend U to a basis of V to obtain a contradiction.]

[108] Let V,W be finite dimensional vector spaces over the same field with dim(V ) = dim(W ), and T : V → W be a linear operator. Then T is injective iff T surjective.

Proof. We have dim(V ) = null(T ) + rank(T ) by [107]. Using this and Exercise-10(ii), note that T is injective ⇔ null(T ) = 0 ⇔ rank(T ) = dim(V ) = dim(W ) ⇔ T is surjective. 

Exercise-11: [First isomorphism theorem] Let V,W be vector spaces over the same field, and let T : V → W be a surjective linear operator. If Z = ker(T ), then Te : V/Z → W given by Te(v + Z) = T (v) is an isomorphism of vector spaces. [Hint: By the First isomorphism theorem of groups, Te is a well-defined group isomorphism. It remains to check cTe(v + Z) = Te(cv + Z).]

Exercise-12: Let F be a field. (i) If V,W are vector spaces over F , then L(V,W ) is a vector space over F with respect to pointwise operations, i.e., (T1 +T2)(v) := T1(v)+T2(v) and (cT )(v) := cT (v). If dim(V ) = n and dim(W ) = m, then dim(L(V,W )) = nm.

(ii) If V1,V2,W are vector spaces over F , then Ψ : L(V1,W ) × L(V2,W ) → L(V1 × V2,W ) defined as Ψ(T1,T2) = T , where T (v1, v2) := T1(v1) + T2(v2), is an isomorphism.

[Hint: (i) Let {v1, . . . , vn} be a basis for V and {w1, . . . , wm} be a basis for W . Let Tij : V → W be the linear operator specified by the conditions Tij(vj) = wi, and Tij(vk) = 0 for k ≠ j. Verify that {Tij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis for L(V,W ). For instance, if T ∈ L(V,W ) and if ∑ ∑ ∑ n m n cij’s are chosen so that T (vj) = i=1 cijwi, then we may see that T = i=1 j=1 cijTij since both sides agree on the basis vectors v1, . . . , vn. (ii) Surjectivity of Ψ is seen as follows. Given

T ∈ L(V1 × V2,W ), define T1(v1) = T (v1, 0) and T2(v2) = T (0, v2). Then Ψ(T1,T2) = T .]

Exercise-13: Let T : V → W be a linear operator of vector spaces. Then, there is a vector subspace ⊕ U ⊂ V such that V = ker(T ) U. And T |U : U → T (V ) is an isomorphism for any such U. LINEAR ALGEBRA 15

[109] Let V,W be finite dimensional vector spaces and T, Te ∈ L(V,W ) be such that range(T ) = range(Te). Then there is an isomorphism S : V → V satisfying Te = T ◦ S.

e e Proof. Let k = rank(T ) = rank(T ), and choose a basis {w1, . . . , wk} for range(T ) = range(T ). By ⊕ ⊕ Exercise-13, write V = ker(T ) U = ker(Te) Ue, and observe that null(T ) = null(Te) = n − k.

Choose vj ∈ U for 1 ≤ j ≤ k with T (vj) = wj. Since T |U : U → T (V ) is an isomorphism, we see that {vj : 1 ≤ j ≤ k} is a basis for U. Also choose a basis {vj : k < j ≤ n} for ker(T ).

Then {vj : 1 ≤ j ≤ n} is a basis for V . Similarly, we may find a basis {ve1,..., ven} of V so that e e T (vej) = wj for 1 ≤ j ≤ k, and {vej : k < j ≤ n} is a basis for ker(T ). Define an isomorphism e S : V → V by the condition S(vej) = vj. Then, T ◦ S(vej) = T (vj) = wj = T (vej) for 1 ≤ j ≤ k and e e T ◦ S(vej) = 0 = T (vej) for k < j ≤ n. Hence T = T ◦ S by Exercise-9. 

5. Linear functionals and the

Roughly speaking, linear functionals on a vector space V are one-dimensional photographs of V from various angles. These photographs help us to have a better understanding of the vector subspaces of V , and linear operators on V .

Definition: Let V be a vector space over a field F . Then V ∗ := L(V,F ) = {ϕ : V → F : ϕ is linear} is called the dual space of V . Any ϕ ∈ V ∗ is called a linear functional on V .

Example: Consider the real vector space V = {f : [0, 1] → R : f is continuous}. Then ϕ : V → R ∫ 1 ∈ given by ϕ(f) = 0 f(x)dx is a linear functional. Moreover, for each a [0, 1], the evaluation map

ϕa : V → R given by ϕa(f) = f(a) is a linear functional.

Exercise-14: [Describing all linear functionals in finite dimension] Let F be a field. (i) ϕ : F n → F ∑ ∑ ∈ n n n is a linear functional iff there is (a1, . . . , an) F such that ϕ(c1, . . . , cn) = j=1 cjaj = j=1 ajcj n for every (c1, . . . , cn) ∈ F . (ii) If V is a vector space over F with basis {v1, . . . , vn}, then ϕ : V → F ∑ ∑ ∑ n n n n is a linear functional iff there is (a1, . . . , an) ∈ F such that ϕ( cjvj) = cjaj = ajcj ∑ j=1 j=1 j=1 n ∈ ⇒ for every j=1 cjvj V .[Hint: For ‘ ’, take aj = ϕ(ej) in (i), and aj = ϕ(vj) in (ii).]

Exercise-15: [Linear operators in terms of linear functionals] Let T : V → W be a linear operator of vector spaces with rank(T ) < ∞. If {w1, . . . , wm} be a basis for T (V ), then there are ϕ1, . . . , ϕm ∈ ∑ ∑ ∗ m ∈ m V such that T (v) = i=1 ϕi(v)wi for every v V .[Hint: If T (v) = i=1 diwi, put ϕi(v) = di.]

Exercise-16: [Dual vector space] Let V be a vector space over a field F . Then, (i) V ∗ separates points of V . That is, if u, v ∈ V are distinct, then ∃ ϕ ∈ V ∗ with ϕ(u) ≠ ϕ(v). (ii) V ∗ = L(V,F ) is a vector space over F . And dim(V ∗) = dim(V ) when dim(V ) < ∞. 16 T.K.SUBRAHMONIAN MOOTHATHU

[Hint: (i) Let Z be a basis for V with u − v ∈ Z. Define ϕ : Z → F as ϕ(u − v) = 1 and ϕ(z) = 0 for z ≠ u − v. Then ϕ extends to a linear functional on V by Exercise-9. (ii) Apply Exercise-12(i).]

∼ ∗ ∼ ∗∗ It follows by Exercise-16(ii) and [106] that V = V = V when V is a finite dimensional vector space. Natural isomorphisms from V to V ∗ and from V to V ∗∗ are described below.

[110] [] Let V be a finite dimensional vector space over a field F . ∗ (i) If {vj : 1 ≤ j ≤ n} is a basis for V , then there is a unique basis {ϕi : 1 ≤ i ≤ n} for V such that ϕi(vj) = δij for every i, j, where δij = 1 if i = j and = 0 if i ≠ j. Here {ϕi : 1 ≤ i ≤ n} is called dual basis to {vj : 1 ≤ j ≤ n}. ∑ { } { } n ∈ (ii) Let v1, . . . , vn , ϕ1, . . . , ϕn be as above. Then v = i=1 ϕj(v)vj for every v V , and ∗ vj 7→ ϕj defines an isomorphism from V to V . ∗ ∗∗ (iii) For v ∈ V , let λv : V → F be the evaluation map, λv(ϕ) = ϕ(v). Then λv ∈ V , and ∗∗ ∗∗ Ψ: V → V given by Ψ(v) = λv is an isomorphism from V to V . ∗ (iv) Every basis {ϕi : 1 ≤ i ≤ n} for V is dual to some basis {v1, . . . , vn} of V .

Proof. (i) Fix 1 ≤ i ≤ n and define ϕi : {vj : 1 ≤ j ≤ n} → F as required and apply Exercise-9 to ∑ ∑ n ∈ ∗ n extend ϕi to a linear functional on V . Suppose i=1 ciϕi = 0 V . Applying i=1 ciϕi to vj, we ∗ get cj = 0, and this shows {ϕ1, . . . , ϕn} is linearly independent. Since dim(V ) = dim(V ), it follows ∗ {ϕi : 1 ≤ i ≤ n} is a basis for V . We may also prove directly that span{ϕi : 1 ≤ i ≤ n} = V as ∑ ∈ ∗ n follows. Consider ψ V , let dj = ψ(vj), and see that ψ = i=1 diϕi by observing that both sides agree on each vj. Uniqueness of the dual basis is also clear by Exercise-9.

(ii) The first assertion is easy, and the second is evident by [105].

∗∗ ∗∗ (iii) Easy to verify that λv ∈ V and Ψ ∈ L(V,V ). Exercise-16(i) gives that Ψ is injective. Since dim(V ∗∗) = dim(V ∗) = dim(V ) by Exercise-16(ii), we conclude that Ψ is surjective by [108].

∗∗ (iv) By part (i), V has a basis B dual to {ϕi : 1 ≤ i ≤ n}. By part(iii), there is a basis { ≤ ≤ } { }  vj : 1 j n of V such that B = λv1 , . . . , λvn . Clearly, δij = λvj (ϕi) = ϕi(vj).

Remark: The student may use the idea in the proof of [110](i) to show that dim(V ∗) = ∞ when V is an infinite dimensional vector space.

[111] [Representation of vector subspaces] Let V be an n-dimensional vector space over a field F . (i) If ϕ ∈ V ∗ \{0}, then ϕ is surjective and dim(ker(ϕ)) = n − 1 (ii) If W ⊂ V is a k-dimensional vector subspace of V , then there are n − k linearly independent ∩ ∈ ∗ n functionals ϕk+1, . . . , ϕn V such that W = i=k+1 ker(ϕi). LINEAR ALGEBRA 17

(iii) If Γ ⊂ V ∗ is a k-dimensional vector subspace of V ∗, then there are n − k linearly independent ∗ vectors vk+1, . . . , vn ∈ V such that Γ = {ϕ ∈ V : ϕ(vj) = 0 for k + 1 ≤ j ≤ n}. (iv) Suppose V = F n. If W ⊂ F n is a k-dimensional vector subspace, then W can be realized as the solution space of a homogeneous system of n − k linear equations in n variables over F .

Proof. (i) To see ϕ is surjective, note that ϕ(V ) is a vector subspace of F , and the only vector subspaces F are {0} and F . We obtain dim(ker(ϕ)) = n − 1 by Nullity-rank theorem.

(ii) Extend a basis {v1, . . . , vk} of W to a basis {vj : 1 ≤ j ≤ n} of V and let {ϕi : 1 ≤ i ≤ n} be ∩ ∗ n the corresponding dual basis of V given by [110]. Verify that W = i=k+1 ker(ϕi).

∗ (iii) Extend a basis {ϕ1, . . . , ϕk} of Γ to a basis {ϕi : 1 ≤ i ≤ n} of V , and let {λv : 1 ≤ j ≤ n} be ∩ j ∗∗ n { ∈ ∗ ≤ ≤ } the dual basis of V . Then by (i), Γ = j=k+1 ker(λvj ) = ϕ V : ϕ(vj) = 0 for k + 1 j n . ∩ n n ∗ (iv) Write W = ker(ϕi) with ϕi ∈ (F ) . By Exercise-14, for each i, there is (ai1, . . . , ain) ∈ i=k+1 ∑ n n F such that ϕi(c1, . . . , cn) = j=1 aijcj. Hence W is the solution space of the following homoge- neous system of n − k linear equations in n variables: ai1x1 + ··· + ainxn = 0 for k + 1 ≤ i ≤ n. 

[112] [Span of linear functionals] (i) Let V be a finite dimensional vector space and let ψ, ϕ1, . . . , ϕk ∈ ∩ ∗ ∈ { ≤ ≤ } k ⊂ V . Then we have ψ span ϕi : 1 i k iff i=1 ker(ϕi) ker(ψ). (ii) The assertion in part (i) is true even if V is infinite dimensional.

Proof. Easy to see that ‘⇒’ is true irrespective of the dimension of V .

(i) ⇐: Let n = dim(V ). Assume without loss of generality that ϕi’s are linearly independent, and ∗ let Γ = span{ϕi : 1 ≤ i ≤ k}. By [111](iii), there are vk+1, . . . , vn ∈ V such that Γ = {ϕ ∈ V : ∩ ≤ ≤ } { ≤ ≤ } ⊂ k ⊂ ∈ ϕ(vj) = 0 for k + 1 j n . Now vj : k + 1 j n i=1 ker(ϕi) ker(ψ) and hence ψ Γ.

(ii) ⇐: The following proof applies to the finite dimensional case also, but may not be as intuitive as the one given above. We argue by induction on k. Let k = 1. If ϕ1 = 0, then ψ = 0 and we are done. If ϕ1 ≠ 0, choose u ∈ V with ϕ1(u) = 1, and note v − ϕ1(v)u ∈ ker(ϕ1) ⊂ ker(ψ) so that

ψ(v) = cϕ1(v) for every v ∈ V , where c = ψ(u). Now assume the result for values up to k and ∩ ∗ k consider ψ, ϕ1, . . . , ϕk+1 ∈ V . Let W = ker(ϕk+1). Then ker(ϕi|W ) ⊂ ker(ψ|W ) and hence ∑ i=1 ∑ k k ψ = ciϕi on W by the induction assumption. Therefore W = ker(ϕk+1) ⊂ ker(ψ− ciϕi). i=1 ∑ i=1 − k  By the initial step of induction, there is scalar ck+1 such that ψ i=1 ciϕi = ck+1ϕk+1.

Definition: (i) If A is an m × n matrix over a field F , then the matrix At of order n × m is defined by the condition that the ijth entry of At is the jith entry of A. If T : V → W is a linear operator of vector spaces, then the transpose linear operator T t : W ∗ → V ∗ is defined as 18 T.K.SUBRAHMONIAN MOOTHATHU

T t(ϕ) = ϕ ◦ T for every ϕ ∈ W ∗. Later, when we represent T by a matrix A, we will show that T t is represented by the matrix At. Note the reversal rule regarding composition, (S ◦ T )t = T t ◦ St. (ii) If W is a subset of a finite dimensional vector space V , then the annihilator of W in V ∗ is ∩ (0) { ∈ ∗ ∈ } defined as the vector subspace W = ϕ V : ϕ(v) = 0 for every v W = v∈W ker(λv), ∗∗ where λv ∈ V is the evaluation map λv(ϕ) = ϕ(v). (iii) If V is a finite dimensional vector space and Γ ⊂ V ∗, the annihilator of Γ in V is defined as ∩ { ∈ ∈ } the vector subspace Γ(0) = v V : ϕ(v) = 0 for every ϕ Γ = ϕ∈Γ ker(ϕ). Note that Γ(0) is the isomorphic image of Γ(0) ⊂ V ∗∗ under the natural isomorphism between V ∗∗ and V .

Exercise-17: Let V be an n-dimensional vector space, and W ⊂ V ,Γ ⊂ V ∗ be vector subspaces.

(i) Extend a basis {v1, . . . , vk} of W to a basis {v1, . . . , vn} of V . Let {ϕ1, . . . , ϕn} be the dual basis ∗ (0) (0) of V . Then W = span{ϕk+1, . . . , ϕn}. Hence n = dim(W ) + dim(W ). ∗ (ii) Extend a basis {ϕ1, . . . , ϕk} of Γ to a basis {ϕ1, . . . , ϕn} of V . Let {v1, . . . , vn} be the basis of

V satisfying ϕi(vj) = δij. Then Γ(0) = span{vk+1, . . . , vn}. Hence n = dim(Γ) + dim(Γ(0)). (0) (0) (iii) (W )(0) = W and (Γ(0)) = Γ. t (0) (0) t (iii) Let T ∈ L(V,V ). Then T (W ) ⊂ U iff T (W ) ⊂ W ; and T (Γ) ⊂ Γ iff T (Γ(0)) ⊂ Γ(0). ∩ ∈ (0) ⇔ n ⊂ ⇔ ∈ { } [Hint: (i) ψ W i=k+1 ker(ϕi) = W ker(ψ) ψ span ϕk+1, . . . , ϕn by [111], [112].]

[113] Let T : V → W be a linear operator of finite dimensional vector spaces. Then, (i) range(T )(0) = ker(T t). (ii) rank(T t) = rank(T ). (iii) ker(T )(0) = range(T t).

Proof. (i) For ψ ∈ W ∗, ψ ∈ range(T )(0) ⇔ T t(ψ)(V ) = ψ(T (V )) = {0} ⇔ ψ ∈ ker(T t).

(ii) n − rank(T t) = dim(ker(T t)) = dim(range(T )(0)) = n − dim(range(T )) = n − rank(T ), where we have used Nullity-rank theorem, [113](i), and Exercise-17(i) respectively.

(iii) Easy to see range(T t) ⊂ ker(T )(0). We get equality by the following dimension argument: dim(range(T t)) = rank(T t) = rank(T ) = n − dim(ker(T )) = dim(ker(T )(0)). 

6. Linear operators = matrices

We will see that linear operators between finite dimensional vector spaces are represented by matrices, and vice versa. Roughly speaking, the required matrix is found as follows: the jth column of the matrix is made up of the coefficients of the image of the jth basis vector of the domain. If F is a field, let M(m × n, F ) denote the collection of all m × n matrices over F . Note that M(m×n, F ) is a vector space over F with respect to matrix addition and scalar multiplication LINEAR ALGEBRA 19 defined as c[aij] = [caij] for A = [aij] ∈ M(m × n, F ). Identifying an m × n matrix with an element of F mn, we see that dim(M(m × n, F )) = mn. We will write M(n, F ) for M(n × n, F ).

Definition: Let V,W be finite dimensional vector spaces over a field F , and T ∈ L(V,W ). Let

U = {v1, . . . , vn} be a basis of V and Z = {w1, . . . , wm} be a basis of W . Let aij ∈ F be chosen ∑ m ∈ × so that T (vj) = i=1 aijwi. Then we say A = [aij] M(m n, F ) is the matrix of T w.r.to the bases U of V and Z of W . Occasionally we may write this briefly as: A is the matrix of T :(V,U) → (W, Z). Especially note that the order of A is m × n and not n × m. When V = W and U = Z, we will just say A is the matrix of T w.r.to the basis U.

Example: Fix θ ∈ R and let T : R2 → R2 be the anticlockwise rotation by angle θ around the origin. Verify geometrically that T is indeed a linear operator. Analytically, T (1, 0) = (cos θ, sin θ) and T (0, 1) = (cos(θ + π/2), sin(θ +π/2)) = (− sinθ, cos θ). Hence the matrix of T with respect to cos θ − sin θ the {(1, 0), (0, 1)} is  . sin θ cos θ ∈ R3 R2 − Example: Let T L( , ) be T (x, y, z) = (2x+ 4y + 8z, 6x 9z). Then the matrix of T with 2 4 8 respect to the standard bases on R3 and R2 is  . On the other hand, if we use the 6 0 −9 basis {(1, 0, 0), (1, 1, 0), (1, 1, 1)} on R3 and the basis {(2, 0), (0, 3)} on R2, then T (1, 0, 0) = (2, 6) = (2, 0) + 2(0, 3), T (1, 1, 0) = (6, 6) = 3(2, 0) + 2(0, 3), and   1 3 7 T (1, 1, 1) = (14, −3) = 7(2, 0) − (0, 3) so that the matrix of T becomes  . 2 2 −1

Example: Let {v1, . . . , vn} be a basis for a vector space V and let T ∈ L(V,V ) be an isomorphism.

If wj = T (vj), then {w1, . . . , wn} is a basis for V , and the matrix of T with respect to the bases

{v1, . . . , vn} and {w1, . . . , wn}, on the domain and range of T respectively, is the .

[114] [Linear operators = matrices] Let F be a field. Then, n m (i) β : L(F ,F ) → M(m×n, F ) given by β(T ) = A = [aij] ∈ M(m×n, F ), where aij’s are chosen m with T (ej) = (a1j, . . . , amj) ∈ F for 1 ≤ j ≤ n, is an isomorphism of vector spaces.

(ii) More generally, consider finite dimensional vector spaces V,W over F . Let {v1, . . . , vn} be a basis for V and {w1, . . . , wm} be a basis for W . Then, β : L(V,W ) → M(m × n, F ) given by ∑ ∈ × m ∈ β(T ) = A = [aij] M(m n, F ), where aij’s are defined by the condition T (vj) = i=1 aijwi W for 1 ≤ j ≤ n, is an isomorphism of vector spaces.

Proof. (i) Easy to check β is linear. Now, T ∈ ker(β) iff T (ej) = 0 for 1 ≤ j ≤ n iff T = 0, and thus β is injective. Since dim(L(F n,F m)) = mn = dim(M(m × n, F )), β is surjective by [108].  20 T.K.SUBRAHMONIAN MOOTHATHU

[115] [Matrix of transpose is transpose of matrix] Let V,W be finite dimensional vector spaces over a field F , and let T ∈ L(V,W ). Suppose A ∈ M(m × n, F ) is the matrix of T w.r.to the bases

{vj : 1 ≤ j ≤ n} of V and {wi : 1 ≤ i ≤ m} of W . Also suppose B ∈ M(n × m, F ) is the matrix of t ∗ ∗ ∗ the transpose operator T : W → V w.r.to the corresponding dual bases {ϕr : 1 ≤ r ≤ n} of V ∗ t and {ψs : 1 ≤ s ≤ m} of W . Then B = A . ∑ t m Proof. Let A = [aij], B = [brs]. We have T (ψs)(vj) = ψs(T (vj)) = ψs( aijwi) = asj. Hence ∑ ∑ i=1 ∑ n t n n for an arbitrary v = ϕr(v)vr ∈ V , we get T (ψs)(v) = ϕr(v)asr = asrϕr(v), and ∑ r=1 ∑ r=1 r=1 t n t n  therefore T (ψs) = r=1 asrϕr. We also have T (ψs) = r=1 brsϕr so that brs = asr.

Recall that if A = [aij] ∈ M(m × n, F ) and B = [bjk] ∈ M(n × k, F ), then the ikth entry of the ∑ ∈ × n product matrix AB M(m r, F ) is given by (AB)ik = j=1 aijbjk. T,B S, A [116] [Matrix of product is product of matrices] Consider (U, U1) −→ (V,V1) −→ (W, W1). Assume that U, V, W are finite dimensional vector spaces over a field F with respective bases U1 = {uk :

1 ≤ k ≤ q}, V1 = {vj : 1 ≤ j ≤ n}, and W1 = {wi : 1 ≤ i ≤ m}. Let B ∈ M(n × q, F ),

A ∈ M(m × n, F ) respectively be the matrices of the linear operators T :(U, U1) → (V,V1) and

S :(V,V1) → (W, W1). Then the matrix of ST :(U, U1) → (W, W1) is AB. ∑ m Proof. Let B = [bjk], A = [aij]. Let C = [cik] be the matrix of ST . Then ST (uk) = cikwi. ∑ ∑ ∑ ∑ ∑ ∑ i=1 n n n m m n Also, ST (uk) = S( bjkvj) = bjkS(vj) = bjk( aijwi) = ( aijbjk)wi = ∑ j=1 j=1 j=1 i=1 i=1 j=1 m  i=1(AB)ikwi. Thus C = AB by the uniqueness of the matrix representing ST .

Remark: Let V be a finite dimensional vector space, and A be the matrix of T ∈ L(V,V ) with respect to a basis U of V . Then it follows easily from [116] that T is an isomorphism (i.e., T is invertible) iff A is invertible (i.e., AB = I = BA for some matrix B).

Definitions and Remarks: Consider A ∈ M(m × n, F ). (i) Then, A induces a linear operator n m n m TA : F → F by the condition that the matrix of TA w.r.to the standard bases of F and F is m A. In other words, if uj ∈ F is the jth column of A, then TA(ej) := uj for 1 ≤ j ≤ n.

m n n m (ii) Let Av := TA(v) ∈ F , when v ∈ F . If v = (c1, . . . , cn) ∈ F , and u1, . . . , un ∈ F are the ∑ ∑ n n columns of A, then Av = TA( j=1 cjej) = j=1 cjuj, which is a linear combination of the columns of A. So, the columns of A are linearly dependent ⇔ there is v ∈ F n \{0} with Av = 0.

(iii) Let ker(A) := {v ∈ F n : Av = 0} ⊂ F n and range(A) := {Av : v ∈ F n} ⊂ F m.

(iv) The column-space of A is the vector subspace of F m spanned by the n columns of A, and is m equal to range(A) ⊂ F . The column-rank of A is the number rank(TA) = dim(range(A)). LINEAR ALGEBRA 21

(v) The row-space of A is the vector subspace of F n spanned by the m rows of A, and is equal to t n t range(A ) = range(TAt ) ⊂ F since the ith row of A is TAt (ei) = A ei. The row-rank of A is the t number rank(TAt ) = dim(range(A )).

(vi) The column-rank of A coincides with the row-rank of A by [113] and [115], and this common number is the rank of A, denoted as rank(A). A direct argument showing that the column-rank of m A is equal to the row-rank of A is as follows. Assume A ≠ 0 and let w1, . . . , wn ∈ F be the columns of A. Let r be the column-rank of A and let 1 ≤ j(1) < . . . < j(r) ≤ n be so that wj(1), . . . , wj(r) are linearly independent. Let P ∈ M(m × r, F ) be the matrix with columns wj(1), . . . , wj(r). Since any r column of A is a linear combination of wj(1), . . . , wj(r), there are u1, . . . , un ∈ F such that wj = P uj for 1 ≤ j ≤ n. Let Q ∈ M(r × n, F ) be the matrix with columns u1, . . . , un so that now we have ∑ ∑ ∑ r r r A = PQ. Observe that (ai1, . . . , ain) = ( k=1 pikqk1,..., k=1 pikqkn) = k=1 pik(qk1, . . . , qkn), which says that the ith row of A is a linear combination of the rows of Q. Hence the row-rank of A is ≤ the number of rows of Q, which is r. Replacing A with At, we get the other inequality also.

(vii) Suppose A = PQ. Then the ith row of A is (as noted above) a linear combination of the rows of Q with the coefficients coming from the ith row of P , and similarly the jth column of A is a linear combination of the columns of P with the coefficients coming from the jth column of Q.   1 3 (viii) Let null(A) := dim(ker(A)). If A =   ∈ M(2, R), then the column-space of A is the 2 6 line y = 2x, the row-space of A is the line y = 3x, ker(A) is the line y = −(1/3)x (all in R2), and rank(A) = 1 = null(A).

Exercise-18: Let F be a field, k ≥ 2, and A1,...,Ak ∈ M(n, F ). −1 (i) [One-sided inverse is full inverse] If A1A2 = I, then A1,A2 are invertible and A1 = A2.

(ii) If the product A1 ··· Ak is invertible, then each Aj is invertible. ◦ [Hint: (i) Considering the corresponding linear operators, Id = TA1A2 = TA1 TA2 which shows

TA1 is surjective and TA2 is injective. Now use [108]. For (ii), use part (i) and induction.]

Exercise-19: Let V,W be vector spaces of dimensions n, m respectively, over a field F . Let T ∈ L(V,W ), and assume A ∈ M(m × n, F ) represents T with respect to some bases for V,W . Then, (i) T is injective ⇔ the columns of A are linearly independent. (ii) T is surjective ⇔ the column space of A is equal to F m. (iii) T t : W ∗ → V ∗ is injective ⇔ the rows of A are linearly independent. (iv) T t : W ∗ → V ∗ is surjective ⇔ the row space of A is equal to F n. (v) T is injective ⇔ T t is surjective; and T is surjective ⇔ T t is injective. 22 T.K.SUBRAHMONIAN MOOTHATHU

(vi) Suppose m = n. Then, T is an isomorphism ⇔ the columns of A form a basis for F n ⇔ the rows of A form a basis for F n ⇔ A is invertible ⇔ rank(A) = n. [Hint: For (v), use [113](ii) and [107]; or (i)-(iv) and the fact that column-rank = row-rank.]

n Example: Let F be a field and let w1, . . . , wn ∈ F be the columns of an A ∈ n n M(n, F ). Then the matrix of the identity operator I : F → F w.r.to the bases {w1, . . . , wn},

{e1, . . . , en} on the domain and range respectively, is A.

7. Multiplying a matrix left and right

Consider A ∈ M(m × n, F ). The underlying theme of this section is the following: operations done on the Rows of A, or equivalently on the m-dimensional space F m, will result in multiplying A on the left by a matrix R ∈ M(m, F ), and operations done on the Columns of A, or equivalently on the n-dimensional space F n, will result in multiplying A on the right by a matrix C ∈ M(n, F ). A mnemonic for this is ‘RAC’, which should be familiar to anybody using Indian railways.

I,C T,A I,R [117] [ vs change of matrix] Consider (V,U2) −→ (V,U1) −→ (W, Z1) −→ (W, Z2).

Let V,W be finite dimensional vector spaces over a field F , and T ∈ L(V,W ). Let U1,U2 be bases of V , and let Z1,Z2 be bases of W . Let A ∈ M(m × n, F ) be the matrix of T :(V,U1) → (W, Z1), let C ∈ M(n, F ) be the matrix of I :(V,U2) → (V,U1), and let R ∈ M(m, F ) be the matrix of

I :(W, Z1) → (W, Z2). If B ∈ M(m × n, F ) is the matrix of T :(V,U2) → (W, Z2), then B = RAC. −1 −1 Moreover, if V = W , U1 = Z1, and U2 = Z2, then R = C so that B = C AC.

Proof. Direct application of [116]; the matrix of a product is the product of matrices. 

Remark: Given A, U1,U2,Z1,Z2 as above, we can determine R,C and hence B. The student may construct and work out some examples to grasp this (see the examples preceding [114]).

Remark: Let U = {v1, . . . , vn}, Z = {z1, . . . , zn} be bases of a vector space V . Fix v ∈ V and sup- ∑ ∑ n n pose v = cjvj = djzj are the expansions. If we know cj’s, can we determine dj’s? Choose j=1 ∑ j=1 ∑ ∑ ∑ ∑ ∑ n n n n n n aij’s with vj = aijzi. Then v = cjvj = cj( aijzi) = ( aijcj)zi so ∑ i=1 j=1 j=1 i=1 i=1 j=1 n that di = j=1 aijcj. In matrix language,  if A = [aij] is the matrix of the identity operator d1 c1     →  .   .  I :(V,U) (V,Z), then  .  = A  . .

dn cn Exercise-20: Let F be a field and A, B ∈ M(m × n, F ). Then, (i) The column-spaces of A, B are the same iff ∃ an invertible matrix C ∈ M(n, F ) with B = AC. (ii) The row-spaces of A, B are the same iff ∃ an invertible matrix R ∈ M(m, F ) with B = RA. LINEAR ALGEBRA 23

[Hint: For (i), Use [109] and [114]. For (ii), apply part (i) to At,Bt, and take transpose.]

The next two exercises are about the rank of a product.

Exercise-21: Let F be a field, A ∈ M(m × n, F ) and B ∈ M(n × q, F ). Then, (i) [Upper bound for the rank of a product] rank(AB) ≤ min{rank(A), rank(B)}. (ii) rank(AB) = rank(A) ⇔ AB and A have the same column-space ⇔ there is C ∈ M(q × n, F ) with ABC = A. (iii) rank(AB) = rank(B) ⇔ AB and B have the same row-space ⇔ there is R ∈ M(n × m, F ) with RAB = B.

(iv) rank(B) = n ⇔ ∃ C ∈ M(q × n, F ) with BC = In (here C is called a right inverse of B).

(iv) rank(A) = n ⇔ ∃ R ∈ M(n × m, F ) with RA = In (here R is called a left inverse of A).

[Hint: (i) Interpret in terms of TA,TB and use the fact [104](iii) saying that a linear operator m cannot increase dimension. (ii) If the middle assertion holds, then the columns w1, . . . , wn ∈ F of q A belong to range(AB). Let vj ∈ F be with ABvj = wj, and C be the q × n matrix with columns v1, . . . , vn. Conversely if ABC = A, then range(AB) ⊂ range(A) = range(ABC) ⊂ range(AB), which implies the middle assertion. For (iii), take transpose and use part (ii). For (iv), apply (ii) with m = n and A = In. For (v), apply (iii) with n = q and B = In.]

Exercise-22: [Lower bound for the rank of a product] (i) Let U, V, W be finite dimensional vector spaces over the same field, and S ∈ L(V,W ), T ∈ L(U, V ). Then null(ST ) ≤ null(S) + null(T ), or equivalently, rank(ST ) ≥ rank(S) + rank(T ) − dim(V ). (ii) [Matrix version] If A ∈ M(m×n, F ) and B ∈ M(n×q, F ), then null(AB) ≤ null(A)+null(B); equivalently, rank(AB) ≥ rank(A) + rank(B) − n. [Hint: (i) Note that ker(ST ) = {u ∈ U : T (u) ∈ ker(S)}. Applying Nullity-rank theorem to

T |ker(ST ) : ker(ST ) → ker(S), we get null(ST ) ≤ null(T ) + null(S).]

Various notions of equivalence can be defined on a space matrices. The student is encouraged to construct and work out some examples pertaining to the following notions.

Definitions: Let F be a filed and A ∈ M(m × n, F ). (i) An elementary row operation on A is one of the following: (type-1) interchange two rows of A, (type-2) multiply a row of A with a non-zero scalar, or (type-3) add a non-zero scalar multiple of a row of A to another row. Note that for j = 1, 2, 3, the inverse operation of a type-j row operation is again a type-j row operation. Matrices A, B ∈ M(m × n, F ) are row equivalent if A can be transformed to B by a finite sequence of elementary row operations. This is an equivalence relation on M(m × n, F ). 24 T.K.SUBRAHMONIAN MOOTHATHU

(ii) Analogously we can define elementary column operations. Consider A, B ∈ M(m × n, F ). We say A, B are column equivalent if A can be transformed to B by a finite sequence of elementary column operations. We say A, B are matrix equivalent if A can be transformed to B by a finite sequence of elementary row and column operations. These are equivalence relations on M(m×n, F ).

(iii) Let m = n. We say A, B ∈ M(n, F ) are similar orconjugate if there is an invertible matrix 1 1 1 b 1 0 C ∈ M(n, F ) with B = C−1AC. If A =  , B =   (where b ≠ 0), and C =  , 0 1 0 1 0 b then C−1AC = B. Being conjugate is an equivalence relation on M(n, F ).

(iv) Consider I ∈ M(n, F ). For j = 1, 2, 3, a matrix obtained by applying type-j row operation once on I is called a type-j , denoted respectively as E(i, k), E(c ∗ i) and E(i + c ∗ k), where i, k are the row numbers and c is a non-zero scalar. A finite product of elementary matrices will be called a Π-elementary matrix.

Exercise-23: (i) Elementary matrices are invertible. More precisely, E(i, k)−1 = E(i, k), E(c∗i)−1 = E(c−1 ∗ i), and E(i + c ∗ k)−1 = E(i − c ∗ k). Hence Π-elementary matrices are also invertible. (ii) If E is an elementary matrix, and if A is a matrix for which EA is defined, then multiplying A on the left by E implements the corresponding row operation on A.

For future use, we note down the following fact about block matrices.

Exercise-24: For matrices  over the same  field, when the products are defined, we have: P PA P PAPB P P [AB] = [PAPB];   A =  ;   [AB] =  ; and [AB]   = AP + BQ. Q QA Q QA QB Q [Hint: In a horizontal splitting into two blocks, the number of rows does not change. In a vertical splitting into two blocks, the number of columns does not change.]

Consider AX = Y , a system of m linear equations in n variables over a field F . Observe that if R ∈ M(m, F ) is an invertible matrix, then X satisfies AX = Y iff X satisfies RAX = RY . Also, a system of linear equations is easy to solve if its form is “somewhat triangular”. Hence, to solve AX = Y , we look for an invertible matrix R such that (i) the products RA and RY can be computed easily, and (ii) RA is “somewhat triangular”; or more precisely, in an echelon form, as defined below (an approximate pronunciation of the German word echelon is esh-lon).

Definition: Let F be a field. We say H = [hij] ∈ M(m × n, F ) is in if either H = 0 or if there is r ∈ {1, . . . , m} with the following properties:

(i) There are integers 1 ≤ j(1) < ··· < j(r) ≤ n with hij(i) = 1 for 1 ≤ i ≤ r.

(ii) hij = 0 for 1 ≤ i ≤ r and 1 ≤ j < j(i). LINEAR ALGEBRA 25

(iii) hij = 0 for r < i ≤ m and 1 ≤ j ≤ n. If in addition, the following holds, then H is said to be in reduced row echelon form.

(iv) hkj(i) = 0 for 1 ≤ i ≤ r and k ≠ i. (Here, observe that rank(H) = r.)

Reduced row echelon form Row echelon form, but Neither

  not reduced row echelon form   − − 0 1 3 0 0 4 0 1 3 0 4 0 0 0 1 2            −  0 0 0 0 1 2 0 0 1 0 2 0 1 3 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

[118] Let F be a field and A ∈ M(m × n, F ). Then there are R, Re ∈ M(m, F ) such that (i) R is a Π-elementary matrix, RA is in row echelon form, and rank(RA) = rank(A). (ii) Re is a Π-elementary matrix, RAe is in reduced row echelon form, and rank(RAe ) = rank(A).

Proof. The assertion about the rank follows automatically by the invertibility of R, Re.

Step-1 : Let A = [aij]. Fix i, j. We claim that if aij ≠ 0, then there is a Π-elementary matrix ∈ −1 R M(m, F ) such that jth column of RA is the vector ei. Proof : Multiply the ith row by aij . −1 ∈ { }\{ } ̸ Then subtract akj times the ith row from the kth row for each k 1, . . . , m i with akj = 0. Encode these elementary row operations by elementary matrices and take R to be their product.

Step-2 : Now we consider proving [118]. In view of step-1, it suffices to prove assertion (i), and for this we use induction on n, the number of columns of A. The case n = 1 can be managed by an interchange of two rows (if necessary) and step-1. Next, assume the result for values up to n, and consider a matrix A ∈ M(m × (n + 1),F ). There are two cases to consider. Case-1 : The first column of A is zero. Then write A = [0 B] where B ∈ M(m × n, F ). By induction assumption there is a Π-elementary matrix R ∈ M(m, F ) such that RB is in row echelon form. Then RA = [0 RB] is also in row echelon form.

Case-2 : The first column of A is non-zero. Let i be the smallest with ai1 ≠ 0. Interchanging the

first and the ith row and applying step-1, we get a Π-elementary matrix R1 ∈ M(m, F ) such that ∗ 1  R1A = , where B ∈ M((m − 1) × n, F ). By induction assumption there is a Π-elementary 0 B   1 0  matrix R2 ∈ M(m − 1,F ) such that R2B is in row echelon form. Let R3 = ∈ M(m, F ) 0 R   2 ∗ 1  and R = R3R1. Then R is Π-elementary, and RA = is in row echelon form.  0 R2B 26 T.K.SUBRAHMONIAN MOOTHATHU

Exercise-25: [Uniqueness] Reduced row echelon form of a matrix is unique, and hence we may say the reduced row echelon form. [Hint: For an elementary proof by Thomas Yuster (by induction on the number of columns), see p.93-94 of Mathematics Magazine, 57(2), (1984).]

Remark: The reduced row echelon form H = RA of a matrix A is not conjugate to A in general since we perform only the left multiplication to get H from A.

[119] [Invertible matrices are Π-elementary] Let F be a field, and A ∈ M(n, F ) be invertible. Then, (i) A is Π-elementary. (ii) [Procedure to find A−1] By elementary row operations, transform [AI] ∈ M(n × 2n, F ) to reduced row echelon form [QR], where Q, R ∈ M(n, F ). Then Q = I and R = A−1.

Proof. By [118](ii), there is a Π-elementary matrix R ∈ M(n, F ) such that R[AI] = [RAR] is in reduced row echelon form. Then RA ∈ M(n, F ) is also in reduced row echelon form. Since A is invertible, we also have rank(RA) = n, and hence RA = I. 

We clarify below the relations among some of the concepts introduced. Let F be a field below.

[120] [Fundamental theorem of rows] For A, B ∈ M(m × n, F ), the following are equivalent: (i) There is an invertible matrix R ∈ M(m, F ) with B = RA. (ii) The reduced row echelon forms of A and B are the same. (iii) ker(A) = ker(B). (iv) A and B have the same row-space, i.e., range(At) = range(Bt). (v) A is row equivalent to B. n m n m (vi) There are T ∈ L(F ,F ), basis U of F , and bases Z1,Z2 of F such that A is the matrix of n m n m T :(F ,U) → (F ,Z1) and B is the matrix T :(F ,U) → (F ,Z2).

Proof. (i) ⇒ (ii): Note that R is Π-elementary by [119]. Now, if H = R1B, then H = R1RA.

(ii) ⇒ (iii): Suppose H = R1A = R2B is the reduced row echelon form. Since R1,R2 are invertible, we have ker(A) = ker(R1A) = ker(H) = ker(R2B) = ker(B).

n ∗ n ∗ n (iii) ⇒ (iv): Let {ϕ1, . . . , ϕn} be the basis of (F ) dual to {e1, . . . , en} and α :(F ) → F be the t t t isomorphism sending ϕj to ej. Note by [115] that α(range(TA)) = range(A ) and α(range(TB)) = t t (0) (0) t range(B ). Now range(TA) = ker(A) = ker(B) = range(TB) by [113].

(iv) ⇒ (v): Apply Exercise-20(ii) and [119].

(v) ⇒ (vi): By assumption, we get a Π-elementary (hence invertible) matrix R ∈ M(m, F ) with n m B = RA. Let U, Z1 respectively be the standard bases of F ,F so that the matrix of TA −1 −1 m w.r.to U, Z1 is A. If Z2 = R (Z1) = {R ei : 1 ≤ i ≤ m} ⊂ F , then R is the matrix of LINEAR ALGEBRA 27

m m I :(F ,Z1) → (F ,Z2) (see the Example at the end of section 6) so that B = RA is also the n m matrix of TA = TA ◦ I :(F ,U) → (F ,Z2) by [116].

m m n T m (vi) ⇒ (i): Let R ∈ M(m, F ) be the matrix of I :(F ,Z1) → (F ,Z2). Now, (F ,U) −→ (F ,Z2) n T m I m is equal to the composition (F ,U) −→ (F ,Z1) −→ (F ,Z2) and therefore B = RA by [116]. 

A matrix is in (reduced) column echelon form if its transpose is in (reduced) row echelon form. Analogous to [120], we have the the following two results, whose proofs are left to the student.

[120′] [Fundamental theorem of columns] For A, B ∈ M(m × n, F ), the following are equivalent: (i) There is an invertible matrix C ∈ M(n, F ) with B = AC. (ii) The reduced column echelon forms of A and B are the same. (iii) ker(At) = ker(Bt). (iv) A and B have the same column-space, i.e., range(A) = range(B). (v) A is column equivalent to B. n m n m (vi) There are T ∈ L(F ,F ), bases U1,U2 of F , and basis Z of F such that A is the matrix of n m n m T :(F ,U1) → (F ,Z) and B is the matrix of T :(F ,U2) → (F ,Z).

[120′′] [Fundamental theorem of rows and columns] For A, B ∈ M(m × n, F ), the following are equivalent: (i) There are invertible matrices R ∈ M(m, F ) and C ∈ M(n, F ) with B = RAC.  

Ir 0 (ii) ∃ r ≤ min{m, n} so that each of A, B is matrix equivalent to the m × n matrix  . 0 0 (iii) null(A) = null(B). (iv) rank(A) = rank(B). (v) null(At) = null(Bt). (vi) rank(At) = rank(Bt). (vii) A is matrix equivalent to B. n m n m (viii) There are T ∈ L(F ,F ), bases U1,U2 of F , and bases Z1,Z2 of F such that A is the n m n m matrix of T :(F ,U1) → (F ,Z1) and B is the matrix T :(F ,U2) → (F ,Z2).

Remark: From condition (iv) above, we deduce that M(m × n, F ) has min{m, n} + 1 (and in particular, only finitely many) equivalence classes w.r.to . On the other hand, when the field F is infinite, in general M(m × n, F ) has infinitely many equivalence classes w.r.to the following notions of equivalence - , column equivalence, and conjugacy. To see 28 T.K.SUBRAHMONIAN MOOTHATHU this, consider  distinct elements a, b ∈ F and check that (i) [1 a] and [1 b] are not row equivalent, 1 1 (ii)   and   are not column equivalent, (iii) singleton matrices [a] and [b] are not conjugate. a b

8. Solving linear equations

A system of linear equations will be written in matrix format as AX = Y , or in vector format as Ax = y, depending on the context. In this section, we discuss three related machinery to solve AX = Y : (i) the row reduced echelon form, (ii) LU decomposition, (iii) generalized inverse. Based on the theory developed so far, some elementary observations can be made about AX = Y :

Exercise-26: Let AX = Y be a system of n linear equations in m variables over a field F . Then, (i) Ax = y has a solution ⇔ y belongs to the column space of A ⇔ rank(A) = rank([AY ]). (ii) The solution space of the homogeneous system Ax = 0 is ker(A). Hence AX = 0 has a non- trivial solution iff rank(A) < n by Nullity-rank theorem. In particular, AX = 0 has a non-trivial solution if m < n, or if m = n and A is non-invertible. On the other hand, if if m = n and A is invertible, then AX = 0 has only the trivial solution X = 0. (iii) If xe is a solution to Ax = y, then the solution space of Ax = y is xe + ker(A), which is a translate of a vector subspace (such subsets are called affine spaces). (iv) If m = n and A is invertible, then Ax = y has a unique solution, namely x = A−1y.

Remark: We see from above that there are three possibilities for the solutions of Ax = y: (i) there is no solution when y∈ / range(TA), (ii) there is a unique solution when y ∈ range(TA) and TA is injective, (iii) there are many solutions when y ∈ range(TA) and TA is not injective.

Remark: Finding A−1 for A ∈ M(n, F ) is computationally difficult when n is large. Hence other techniques are needed to solve AX = Y in practice even when A ∈ M(n, F ) is invertible.

The echelon form is useful in solving AX = Y in the following way. We know by [118] that A can be transformed to a reduced row echelon matrix H by a finite sequence of elementary row operations. If R is the product of the elementary matrices encoding these elementary row operations, then R is invertible and H = RA. Since R[AY ] = [RARY ] = [HRY ], the same sequence of elementary row operations performed on [AY ] gives [HRY ]. And the solutions of AX = Y and HX = RY are the same, if solution exists. Since HX = RY is easier to tackle, we try to solve HX = RY .

Assume F = R for the examples below.       2 4 4 x1 6             Example: Consider AX = Y , where A = 3 8 0, X = x2 and Y = 1. Then, [AY ] = 4 9 4 x3 7 LINEAR ALGEBRA 29         −→ 2 4 4 6 1 2 2 3 1 2 2 3  1 2 2 3    −→     −→   3 8 0 1 3 8 0 1 (R − 3R ) 0 2 −6 −8 0 1 −4 −5   1   2 1   ↔   ( 2 R1) (R2 R3) 4 9 4 7 4 9 4 7 (R3 − 4R1) 0 1 −4 −5 0 2 −6 −8       −→ −→ 1 0 10 13  1 0 10 13  1 0 0 3    −→     (R − 2R ) 0 1 −4 −5 0 1 −4 −5 (R − 10R ) 0 1 0 −1. 1 2   1   1 3   ( 2 R3) (R3 − 2R2) 0 0 2 2 0 0 1 1 (R2 + 4R3) 0 0 1 1

Hence the unique solution is x = 3, x = −1, x = 1. 1 2 3      − 1 2 1 x1 0             Example: Consider AX = Y , where A = 3 8 1 , X = x2 and Y = 0. Then, [AY ] = 2 7 4 x3 1         − −→ − − − 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 0     −→   −→   3 8 1 0 (R − 3R ) 0 2 4 0 0 1 2 0 0 1 2 0.   2 1   1   −   ( 2 R2) (R3 3R2) 2 7 4 1 (R3 − 2R1) 0 3 6 1 0 3 6 1 0 0 0 1

No need to proceed further. The last row reads as 0x + 0x + 0x = 1, and so there is no solution. 1 2  3     x −  1 − 1 2 5    3 Example: Consider AX = Y , where A = , X = x2 and Y = . 3 2 7   −1 x3       1 −2 5 −3 −→ 1 −2 5 −3 −→ 1 −2 5 −3 Then, [AY ] =       − − − 1 − 3 2 7 1 (R2 3R1) 0 8 8 8 ( 8 R2) 0 1 1 1   −→ − 1 0 3 1 . That is, x1 + 3x3 = −1 and x2 − x3 = 1. We can give an arbitrary (R1 + 2R2) 0 1 −1 1

value to x3, and then take x1 = −1 − 3x3, x2 = 1 + x3. Thus there are infinitely many solutions.

Definition: Let F be a field. We say P ∈ M(n, F ) is a if P is obtained from the identity matrix In by a permutation of the rows of In.

Remark: Recall that a type-1 elementary matrix is obtained by interchanging two rows of the identity matrix. It is a fact that any permutation can be written as a finite product of transpositions, where a transposition is a permutation that interchanges two entries while fixing the others. Hence P ∈ M(n, F ) is a permutation matrix iff P is a finite product of type-1 elementary matrices, ··· −1 −1 ··· −1 ··· P = E1 Ek. If this happens, P = Ek E1 = Ek E1 is also a permutation matrix. 30 T.K.SUBRAHMONIAN MOOTHATHU

[121] [LU decomposition] Let F be a field and A ∈ M(m × n, F ). Then, (i) ∃ a permutation matrix P ∈ M(m, F ), an invertible lower- L ∈ M(m, F ) and an upper-triangular matrix U ∈ M(m × n, F ) (i.e., uij = 0 for j < i) such that PA = LU. (ii) If PA = LU is as above, then AX = Y ⇔ LZ = PY and UX = Z.

Proof. (i) Find elementary matrices E1,...,Ek ∈ M(m, F ) such that U := Ek ··· E1A is in row echelon form, and hence ‘upper-triangular’. A type-1 elementary matrix commutes with any type-j elementary matrix. Hence by reordering some Ej’s we may assume that U = L1PA, where L1 is the product of Ej’s of type-2 and type-3, and P is the product of type-1 Ej’s. Then P is a permutation −1 −1 matrix and L1 is invertible, and we have PA = L1 U. To see L := L is lower triangular, observe the following. A type-2 elementary matrix is diagonal and its inverse is also type-2 and diagonal. A type-3 elementary matrix needed in transforming A to a row echelon matrix is always of the following form: E(i + c ∗ k) for i < k. Hence E(i + c ∗ k) as well as E(i + c ∗ k)−1 = E(i − c ∗ k) are lower-triangular. And product of lower-triangular matrices is again lower-triangular.

(ii) AX = Y ⇔ P AX = PY ⇔ LUX = PY ⇔ LZ = PY and UX = Z. 

Remark: (i) The practical advantage of LU decomposition is that solving AX = Y is converted to solving two ‘triangular’ and hence simple systems LZ = PY and UX = Z. (ii) There are some other similar decompositions (eg: LDU decomposition); the student is urged to search the literature and learn more about them. (iii) For a fixed matrix A, if we have to solve AX = Y for different Y ’s, it may be economic to find the LU decomposition of A first. Finding a generalized inverse (also known as pseudo-inverse) of A first is also economic, which we discuss below.

Definition: Let F be a field. (i) Let V,W be finite dimensional vector spaces over F and T ∈ L(V,W ). We say S ∈ L(W, V ) is a generalized inverse of T if TS(w) = w for every w ∈ T (V ), or equivalently if TST (v) = T (v) for every v ∈ V , i.e., if TST = T . (ii) Let A ∈ M(m × n, F ). We say G ∈ M(n × m, F ) is a generalized inverse of A if AGA = A.

∈ R2 R2 ∈ R2 R2 Example: Let T L( , ) be T (x1, x2) = (x1, 0) (projection to x-axis). See that S L( , ) 1 0 given by S(y1, y2) = (y1, cy1 + dy2) satisfies TST = T for any c, d ∈ R. In matrix language, c d   1 0 is a generalized inverse of A =   for any c, d ∈ R. This says that generalized inverse is not 0 0 unique in general. It is also not a symmetric notion since for the matrix A as above and G = I, we have AGA = A but GAG ≠ G. On the other hand, if AGA = A and rank(GA) = rank(G), then it can be shown that GAG = G. LINEAR ALGEBRA 31

[122] [Existence and utility of generalized inverse] (i) Every matrix over a field, and equivalently, every linear operator between finite dimensional vector spaces, has a generalized inverse. Let F be a field, and A ∈ M(m × n, F ), G ∈ M(n × m, F ) be such that AGA = A. Then, (ii) [Test for the existence of solution] The system AX = Y has a solution iff AGY = Y . (iii) ker(A) = range(I − GA) = {(I − GA)z : z ∈ F n}. (iv) [Explicit description of all solutions] If Ax = y has a solution, then the solution space of Ax = y is Gy + {(I − GA)z : z ∈ F n}.

Proof. (i) A proof for matrices can be translated to a proof for linear operators and vice versa. But we will give separate proofs to present two different approaches.

Proof for matrices: Let F be a field, A ∈ M(m × n, F ). We may assume A ≠ 0 so that r :=

Ir 0 rank(A) > 0. By [120′′], transform A to the m × n matrix D :=   by a finite sequence of 0 0 elementary row and column operations. This means, there are invertible matrices R ∈ M(m, F ) and C ∈ M(n, F ) such that RAC = D, or A = R−1DC−1. Let G = CDtR ∈ M(n × m, F ). Now AGA = (R−1DC−1)(CDtR)(R−1DC−1) = R−1DDtDC−1 = R−1DC−1 = A since DDtD = D.

Proof for operators: Let T ∈ L(V,W ). Find vector subspaces V2 ⊂ V and W1 ⊂ W so that V = ⊕ ⊕ ker(T ) V2 and W1 T (V ). Let T0 be the isomorphism T |V : V2 → T (V ). Define S : W → V ⊕ 2 −1 ∈ ∈ as S(w1 + w2) = T0 (w2) for w1 + w2 W1 T (V ). Check that S L(W, V ) and TST = T .

(ii) If AGY = Y , then X = GY is a solution to AX = Y . Conversely, if X satisfies AX = Y , then Y = AX = (AGA)X = AG(AX) = AGY .

(iii) Clearly A(I − GA)z = (A − AGA)z = 0. Conversely, if Az = 0, then z = (I − GA)z.

(iv) x = Gy is one solution of Ax = y by part (ii). Hence the solution space is Gy + ker(A). 

With a brief description we leave to the student two topics that are important in applications, for self-study: full- and Moore-Penrose generalized inverse.

Definition: Let F be a field, A ∈ M(m × n, F ) and suppose r = rank(A) > 0. We say A = PQ is a full-rank factorization of A if P ∈ M(m × r, F ) and Q ∈ M(r × n, F ). If this happens, then necessarily rank(P ) = r = rank(Q), which means that the columns of P form a basis for the column-space of A, and the rows of Q form a basis for the row-space of A. Such a factorization is often useful since matrices of full rank have various nice properties (see for instance, Exercise-21).

[123] Let F be a field, A ∈ M(m×n, F ), and r := rank(A) > 0. Then A has full-rank factorization. 32 T.K.SUBRAHMONIAN MOOTHATHU

n m First proof. Let W = TA(F ) ⊂ F . By hypothesis dim(W ) = r. Let U be a basis for W . n m Let T : F → W be T (v) = TA(v) and S : W → F be S(w) = w. Then TA = ST . If n Q ∈ M(r × n, F ) is the matrix of T :(F , {e1, . . . , en}) → (W, U) and P ∈ M(m × r, F ) is the m matrix of S :(W, U) → (F , {e1, . . . , en}), then A = PQ by [116].

m Second proof : Let w1, . . . , wn ∈ F be the columns of A. Choose 1 ≤ j(1) < ··· < j(r) ≤ n so that wj(1), . . . , wj(r) are linearly independent. Let P ∈ M(m × r, F ) be the matrix with columns r wj(1), . . . , wj(r). Since these columns span the column space of A, there are u1, . . . , un ∈ F such that wj = P uj for 1 ≤ j ≤ n. Let Q ∈ M(r × n, F ) be the matrix with columns u1, . . . , un.

Third proof : Using elementary row operations, find invertible R ∈ M(m, F ) such that H = RA is Q   −1 in row echelon form. Write H = , where Q ∈ M(r × n, F ). Also write R = [PP1] where P 0   Q −1 −1   consists of the first r columns of R . Then A = R H = [PP1] = PQ + P10 = PQ.  0

Remark: (When F = R) The Moore-Penrose generalized inverse of a matrix A ∈ M(m × n, R) is the unique matrix G ∈ M(n×m, R) satisfying the following four properties: AGA = A, GAG = G, (AG)t = AG,(GA)t = GA. Let us denote the Moore-Penrose generalized inverse of A as Aµ. Suppose A = PQ is a full-rank factorization. Then P tP and QQt are invertible (see [130](v) later). It can be checked that P µ = (P tP )−1P t, Qµ = Qt(QQt)−1 and Aµ = QµP µ.

More techniques for solving linear equations will be discussed later.

9. Eigenvalues, eigenvectors, and the triangular form

From now onwards, we will concentrate more on linear operators T from a vector space V to V itself. We may ask whether we can choose a basis U of V so that the matrix of T w.r.to U is somewhat simple. We know from [117] that two matrices of T w.r.to two bases of V are conjugate to each other. Hence, the question is, given a square matrix A, can we find an invertible matrix C such that C−1AC is somewhat simple. The simplest matrix that we can ask for is a ; and the next best thing is an upper-triangular (or lower-triangular) matrix.

Definition: Let F be a field. (i) We say A ∈ M(n, F ) is diagonalizable over F if there is an invertible matrix C ∈ M(n, F ) so that C−1AC is a diagonal matrix. (ii) Let V be a finite dimensional vector space over F , We say T ∈ L(V,V ) is diagonalizable if there is a basis U of V such that the matrix of T w.r.to U is diagonal. (iii) An n × n diagonal matrix whose ith diagonal entry is di will be denoted as diag(d1, . . . , dn). LINEAR ALGEBRA 33

One common mistake is to think that every complex square matrix is diagonalizable.

Example: [Non-diagonalizable matrices exist over every field] Let F be a field. Then, 0 1 (i) A =   is not diagonalizable over F . (ii) More generally, if A ∈ M(n, F ) is a non-zero, 0 0 (i.e., Ak = 0 for some k ∈ N), then A is not diagonalizable over F .

Proof. Since A2 = 0 in (i), it suffices to prove (ii). Suppose Ak = 0 and D := C−1AC = k k k −1 k −1 k k diag(d1, . . . , dn). Then diag(d1, . . . , dn) = D = (C AC) = C A C = 0. This implies di = 0 −1 and hence di = 0 for 1 ≤ i ≤ n. That is, D = 0. So, A = CDC = 0, a contradiction. 

The matrix of T ∈ L(V,V ) w.r.to a basis {v1, . . . , vn} of V is diag(d1, . . . , dn) iff T vj = djvj for 1 ≤ j ≤ n. This leads to the definition of two important concepts: eigenvalues and eigenvectors.

Definition: Let F be a field. (i) Let V be a vector space over F and T ∈ L(V,V ). We say c ∈ F is an eigenvalue of T if ker(T − cI) ≠ {0}, i.e., if there is v ∈ V \{0} with T v = cv. If c is an eigenvalue of T , then any v ∈ ker(T − cI) is said to be an eigenvector of T corresponding to the eigenvalue c. Thus the set of all eigenvectors of T corresponding to an eigenvalue c is a vector subspace of V . (ii) Let A ∈ M(n, F ). We say c ∈ F is an eigenvalue of A over F if there is v ∈ F n \{0} with Av = cv. When c is an eigenvalue of A, members of ker(A − cI) are called eigenvectors of A corresponding n to the eigenvalue c. For instance, if A = diag(d1, . . . , dn), then Aej = djej so that ej ∈ F is an eigenvector of A corresponding to the eigenvalue dj for each j.

Exercise-27: Let F be a field. (i) Consider T ∈ L(V,V ), where V is a vector space over F . Then, c ∈ F is an eigenvalue of T ⇔ T − cI is not injective. (ii) [Conjugate matrices have the same eigenvalues] Let A, B ∈ M(n, F ), and suppose B = C−1AC for some invertible matrix C ∈ M(n, F ). Then Bv = cv iff A(Cv) = c(Cv) for c ∈ F and v ∈ F n.

(iii) If A ∈ M(n, F ) is triangular with diagonal entries d1, . . . , dn, then A is invertible iff dj ≠ 0 ∀ j. (iv) If A ∈ M(n, F ) is triangular, then the eigenvalues of A are precisely the diagonal entries of A. ∑ k [Hint: (iii) dk = 0 ⇒ {Ae1, . . . , Aek} ⊂ span{e1, . . . , ek−1}. If v = ajej ∈ ker(A) with ak ≠ 0, ∑ j=1 − −1 k−1 ∈ { } − then Aek = ak j=1 ajAej span e1, . . . , ek−1 ; so dk = 0. For (iv), apply (iii) to A cI.]

Remark: If A, C ∈ M(n, F ) and C−1AC is triangular, then the eigenvalues of A are precisely the diagonal entries of C−1AC by Exercise-27.

Example: [All scalars as eigenvalues] Consider V = {f : R → R : f is infinitely often differentiable}, ′ which is a real vector space, and let T : V → V be T (f) = f . For any c ∈ R, fc ∈ V given by k fc(x) = exp(cx), is an eigenvector of T with eigenvalue c. By considering polynomials x for k ∈ N, 34 T.K.SUBRAHMONIAN MOOTHATHU it can be observed that dim(V ) = ∞. In contrast, we will see in Exercise-28 below that a linear operator on a finite dimensional vector space can have only finitely many eigenvalues.

∈ ∈ R2 R2 Example: [No eigenvalues] Fix θ (0, π) and let T L( , ) be the anticlockwise rotation cos θ −sinθ around the origin by angle θ. The matrix of T w.r.to the standard basis is A =  . sin θ cosθ If v ∈ R2 \{0} is an eigenvector of T , then the one-dimensional space W := span{v} should satisfy T (W ) ⊂ W . But no line passing through the origin is invariant under T . Therefore, T (equivalently, A) has no eigenvalues over R. It follows that A is not diagonalizable. In fact, there does not exist any invertible matrix C ∈ M(2, R) such that C−1AC is upper-triangular or lower-triangular.   1 1 Example: [Invertible and triangular, but not diagonalizable] Consider A =   over any field. 0 1   1 −1 Then A−1 =  . Now, 1 is the only eigenvalue of A by Exercise-27. If D = C−1AC is 0 1 diagonal, then 1 must be the only eigenvalue of D. This implies D = I, and then A = CIC−1 = I, a contradiction. This argument shows A is not diagonalizable.

In the rest of this section, we look for some sufficient conditions for a square matrix to be conjugate to a diagonal or triangular matrix.

Exercise-28: Let V be a vector space over a field F and T ∈ L(V,V ). Then, (i) Non-zero eigenvectors corresponding to distinct eigenvalues are linearly independent. That is, if c1, . . . , cn ∈ F are distinct, and if v1, . . . , vn ∈ V \{0} are such that T (vj) = cjvj for 1 ≤ j ≤ n, then {v1, . . . , vn} is linearly independent. (ii) Suppose dim(V ) = n. Then by (i), it follows that T has at most n distinct eigenvalues. If T happens to have n distinct eigenvalues c1, . . . , cn, and if v1, . . . , vj ∈ V \{0} are such that T vj = cvj, then U := {vj : 1 ≤ j ≤ n} is a basis for V and the matrix of T w.r.to U is diag(c1, . . . , cn). (iii) The number of distinct eigenvalues of T is at most 1 + rank(T ). ∑ n − [Hint: (i) Use induction on n. Suppose j=1 djvj = 0 and apply T cnI. (iii) If T (v) = cv and c ≠ 0, then v = T (c−1v) ∈ range(T ).]

Exercise-29: Let V be a vector space over a field F , and T,S ∈ L(V,V ) be commuting operators. ∑ n j Then ker(S) and range(S) are T -invariant. In particular, for any polynomial p(x) = djx , ∑ j=0 n j ∈ defining p[T ] := j=0 djT L(V,V ), we see ker(p[T ]) and range(p[T ]) are T -invariant.

Definition: A field F is said to be algebraically closed if every polynomial of degree ≥ 1 over F has a root in F . For example, C is algebraically closed, but R, Q are not (consider x2 + 1). LINEAR ALGEBRA 35

[124] [Existence of eigenvalues] Let F be an algebraically closed field. (i) If V ≠ {0} is a finite dimensional vector space over F , then every T ∈ L(V,V ) has an eigenvalue. (ii) Every A ∈ M(n, F ) has an eigenvalue.

Proof. (See the book Linear Algebra Done Right, where Axler advocates a determinant-free ap- proach.) Let n = dim(V ), and consider v ∈ V \{0}. Since v ≠ 0 and since the n + 1 vec- tors v, T (v),...,T n(v) ∈ V cannot be linearly independent, there is k ∈ {1, . . . , n} with T kv = ∑ ∑ k−1 j k k−1 j djT v. As F is algebraically closed, we may factorize p(x) := x − djx as p(x) = ∏ j=0 j=0 k − − ··· − − j=1(x cj). Then p[T ]v = (T c1I) (T ckI)v = 0. Therefore T cjI is not injective for some j since v ≠ 0. We conclude by Exercise-27(i) that cj is an eigenvalue of T for this j. 

[125] [Triangular form over algebraically closed fields] Let F be an algebraically closed field. (i) If V ≠ {0} is an n-dimensional vector space over F and T ∈ L(V,V ), then the matrix of T w.r.to some basis of V is upper-triangular. (ii) If A ∈ M(n, F ), there is an invertible matrix C ∈ M(n, F ) such that C−1AC is upper-triangular.

Proof. (i) We use induction on n. Nothing to prove for n = 1. Assume the result for values < n.

Claim: There is a vector subspace W ⊂ V such that dim(W ) = n − 1 and T (W ) ⊂ W .

Suppose the claim is true. By induction assumption, W has a basis {v1, . . . , vn−1} w.r.to which the matrix of T |W is upper-triangular. Note that there is no condition on the last column of a matrix for the matrix to be upper-triangular. Hence for any vn ∈ V \ W , we have that {v1, . . . , vn} is a basis of V and the matrix of T is upper-triangular w.r.to this basis.

First proof of the claim: By [124], T has an eigenvalue c ∈ F . Then T − cI is not injective and hence not surjective. Therefore dim(range(T − cI)) ≤ n − 1. Let W be any (n − 1)-dimensional vector subspace of V containing range(T − cI). For w ∈ W , see that T (w) = (T − cI)(w) + cw ∈ range(T − cI) + W ⊂ W + W ⊂ W .

Second proof of the claim: Applying [124] to T t ∈ L(V ∗,V ∗), find ϕ ∈ V ∗ \{0} and c ∈ F such that ϕ ◦ T = T t(ϕ) = cϕ. Then, W := ker(ϕ) is an (n − 1)-dimensional vector subspace by [111](i), and T (W ) ⊂ W since ϕ ◦ T = cϕ.

Another proof of [125] by induction is as follows. By [124], T v = cv for some v ∈ V \{0} and

some c∈ F . Choose a basis U = {u1, . . . , un} of V with u1 = v. Then the matrix of T w.r.to U is c ∗  , where B ∈ M(n − 1,F ). By induction assumption, Q−1BQ is upper-triangular for some 0 B 36 T.K.SUBRAHMONIAN MOOTHATHU     1 0 1 0 Q ∈ M(n−1,F ). Let C =   ∈ M(n, F ). Then C−1 =  , and the matrix of T w.r.to 0 Q 0 Q−1     c ∗ c ∗ −1     the basis {Cuj : 1 ≤ j ≤ n} of V is C C = , which is upper-triangular.  0 B 0 Q−1BQ

Remark: If the matrix of T ∈ L(V,V ) w.r.to a basis {v1, . . . , vn} of V is upper-triangular, then the matrix of T w.r.to the basis {vn, . . . , v1} of V is lower-triangular.

Questions: (i) Can we get something simpler than a triangular form with the hypothesis of [125]? [Answer: Yes, the Jordan canonical form; this will be discussed later.] (ii) What can we say in the place of [125] if the underlying field is not algebraically closed?

Now we turn our attention to vector spaces possessing analytic and geometric structures.

10. Inner product spaces

Roughly speaking, inner product spaces are vector spaces possessing an analytic and geometric structure where one can talk about the distance between and the angle between two vectors.

Convention: Let K = R or C throughout this section.

Definition: Let V be a vector space over K. An inner product on V is a pairing ⟨·, ·⟩ : V × V → K satisfying the following: (i) ⟨v, v⟩ ≥ 0 for every v ∈ V ; and ⟨v, v⟩ = 0 iff v = 0 for v ∈ V . (ii) ⟨u, v⟩ = ⟨v, u⟩ for every u, v ∈ V .

(iii) [Linearity in the first variable] ⟨c1u1 + c2u2, v⟩ = c1⟨u1, v⟩ + c2⟨u2, v⟩ for every u1, u2, v ∈ V and c1, c2 ∈ K. If these conditions are true, (V, ⟨·, ·⟩) is called an . Moreover, we say V is a real inner product space or complex inner product space depending up on K = R or C respectively.

Exercise-30: Let (V, ⟨·, ·⟩) be an inner product space over K = R or C. Then,

(i) ⟨·, ·⟩ is conjugate-linear in the second variable. i.e., ⟨u, c1v1 + c2v2⟩ = c1⟨u, v1⟩ + c2⟨u, v1⟩ for every u, v1, v2 ∈ V and c1, c2 ∈ F . (ii) ⟨0, v⟩ = 0 = ⟨u, 0⟩ for every u, v ∈ R. (iii) If K = R, then ⟨u, v⟩ = ⟨v, u⟩ for every u, v ∈ V , and ⟨·, ·⟩ becomes linear in each variable. [Hint: (i) Use the second and the third conditions defining ⟨·, ·⟩.] √ Definition: If (V, ⟨·, ·⟩) is an inner product space, define ∥ · ∥ : V → [0, ∞) as ∥v∥ := |⟨v, v⟩| for v ∈ V . We call ∥ · ∥ the norm on V induced by the inner product. LINEAR ALGEBRA 37 ∑ n n n Example: [Standard inner products on R and C ] (i) ⟨(x1, . . . , xn), (y1, . . . , yn)⟩ := xjyj √∑ j=1 n n 2 is the standard inner product on R , and the induced norm is ∥(x1, . . . , xn)∥ = x . (ii) ∑ j=1 j n n ⟨(x1, . . . , xn), (y1, . . . , yn)⟩ := xjy is the standard inner product on C , and the induced √∑ j=1 j ∥ ∥ n | |2 Rn Cn norm is (x1, . . . , xn) = j=1 xj . From now onwards, when we consider and , the inner product is assumed to be the standard one, unless specified otherwise.

Remark: The geometric meaning of the standard ⟨·, ·⟩ and ∥ · ∥ on Rn is described below. (i) ∥v∥ is the length of the vector v for v ∈ Rn.

(ii) ∥u − v∥ is the distance between vectors u and v for u, v ∈ Rn.

(iii) ⟨u, v⟩ = ∥u∥∥v∥ cos θ for u, v ∈ Rn, where θ ∈ [0, π] is the angle between u and v. Proof : Consider the triangle formed by the three vectors u, v, v − u. If a = ∥u∥, b = ∥v∥, c = ∥v − u∥, then by drawing the perpendicular from the tip of u to the side v, we get a right-angled triangle with side-lengths c, a sin θ and b − a cos θ. Hence c2 = a2 sin2 θ + (b − a cos θ)2 = a2 + b2 − 2ab cos θ. On the other hand, c2 = ∥v − u∥2 = ⟨v − u, v − u⟩ = ∥u∥2 + ∥v∥2 − 2⟨u, v⟩ since ⟨u, v⟩ = ⟨v, u⟩ over R.

(iv) Consider u, v ∈ Rn and let θ ∈ [0, π] be the angle between them. Note that ⟨u, v⟩ ∼ 0 if θ ∼ π/2, and ⟨u, v⟩ = 0 iff u, v are perpendicular; |⟨u, v⟩| ∼ ∥u∥∥v∥ if θ ∼ 0 or θ ∼ π. √ [126] Let (V, ⟨·, ·⟩) be an inner product space, and ∥v∥ = ⟨v, v⟩ for v ∈ V . Then, (i) ∥v∥ = 0 iff v = 0 for v ∈ V . (ii) ∥cv∥ = |c|∥v∥ for every v ∈ V and c ∈ K. (iii) [Pythagoras theorem] If ⟨u, v⟩ = 0, then ∥u + v∥2 = ∥u∥2 + ∥v∥2. (iv) [Parallelogram law] ∥u+v∥2 +∥u−v∥2 = 2(∥u∥2 +∥v∥2) for every u, v ∈ V (for a parallelogram, the sum of squares of the diagonals is equal to the sum of squares of the sides). (v) [Cauchy-Schwarz inequality] |⟨u, v⟩| ≤ ∥u∥∥v∥ for every u, v ∈ V . Moreover, equality holds iff one of u, v is a scalar multiple of the other, i.e., iff {u, v} is linearly dependent. (vi) [Triangle inequality] ∥u + v∥ ≤ ∥u∥ + ∥v∥ for every u, v ∈ V . Moreover, equality holds iff one of u, v is a scalar multiple of the other, i.e., iff {u, v} is linearly dependent.

Proof. Statements (i)-(iv) can be verified directly from the definition of ∥ · ∥.

2 (v) We may assume v ≠ 0. Let c = ⟨u, v⟩/∥v∥ , u1 = cv, and u2 = u − u1. Geometrically, u1 is the projection of u to the vector v, and u2 is the component of u perpendicular to v. In fact, it is easy 2 2 2 to verify analytically that ⟨u2, v⟩ = 0. Therefore, ⟨u2, u1⟩ = 0 and hence ∥u∥ = ∥u1∥ + ∥u2∥ by 2 2 2 2 2 2 part (iii). Thus 0 ≤ ∥u2∥ = ∥u∥ − ∥u1∥ = ∥u∥ − |⟨u, v⟩| /∥v∥ . Moreover, equality holds in

Cauchy-Schwarz ⇔ v = 0 or u2 = 0 ⇔ v = 0 or u = u1 = cv. 38 T.K.SUBRAHMONIAN MOOTHATHU

(vi) ∥u + v∥2 = ⟨u + v, u + v⟩ = ∥u∥2 + ⟨u, v⟩ + ⟨v, u⟩ + ∥v∥2 = ∥u∥2 + 2Re⟨u, v⟩ + ∥v∥2 ≤ ∥u∥2 + 2|⟨u, v⟩| + ∥v∥2 ≤ ∥u∥2 + 2∥u∥∥v∥ + ∥v∥2 = (∥u∥ + ∥v∥)2, where we have used (v) for the last inequality. Moreover, equality holds ⇔ |Re⟨u, v⟩| = |⟨u, v⟩| = ∥u∥∥v∥ ⇔ v = 0 or u = cv. 

Remark: Let V be a real inner product space. We may define the angle between u, v ∈ V \{0} as ∈ ⟨u,v⟩ − ≤ ⟨u,v⟩ ≤ the unique θ [0, π] satisfying cos θ = ∥u∥∥v∥ since 1 ∥u∥∥v∥ 1 by Cauchy-Schwarz. Remark: Let V be a vector space over K. We say ∥ · ∥ : V → [0, ∞) is a norm on V if it satisfies 2 assertions (i), (ii), and (vi) of [126]. For example, we have the norm ∥ · ∥1 on R , defined as 2 ∥(x1, x2)∥1 := |x1| + |x2|. Considering e1, e2 ∈ R , we see that the parallelogram law fails for ∥ · ∥1.

Moreover, ∥e1 + e2∥1 = ∥e1∥1 + ∥e2∥1, even though {e1, e2} is linearly independent. Therefore, the 2 norm ∥ · ∥1 is not induced by any inner product on R . Henceforth we will be interested only in inner products, and norms induced by inner products. Also, we will stick to finite dimension.

Exercise-31: Let V be a finite dimensional inner product space over K, and let ∥ · ∥ be the induced norm. For v, vn ∈ V , we say (vn) → v in V if limn→∞ ∥v − vn∥ = 0. (i) |∥u∥ − ∥v∥| ≤ ∥u − v∥ for every u, v ∈ V .

(ii) [Continuity of the norm] If v, vn ∈ V and (vn) → v in V , then (∥vn∥) → ∥v∥ in R.

(iii) [Continuity of ⟨·, ·⟩] If (un) → u and (vn) → v in V , then (⟨un, vn⟩) → ⟨u, v⟩ in K.

[hint: (i) ∥u∥ ≤ ∥u − v∥ + ∥v∥ and ∥v∥ ≤ ∥v − u∥ + ∥u∥. (iii) |⟨u, v⟩ − ⟨un, vn⟩| ≤ |⟨u, v⟩ − ⟨u, vn⟩| +

|⟨u, vn⟩ − ⟨un, vn⟩| ≤ ∥u∥∥v − vn∥ + ∥u − un∥∥vn∥ and (∥vn∥) is bounded by part (ii).]

Definition: Let (V, ⟨·, ·⟩) be a finite dimensional inner product space. (i) ∅ ̸= U ⊂ V is an orthogonal set if ⟨u, v⟩ = 0 for any two distinct u, v ∈ U. Any orthogonal set ∑ \{ } n in V 0 is linearly independent: if j=1 cjvj = 0, take inner product with vk to see ck = 0. (ii) If U ⊂ V is orthogonal and ∥u∥2 = ⟨u, u⟩ = 1 for every u ∈ U, then we say U is orthonormal. If U is both orthonormal and a basis for V , then we say U is an orthonormal basis for V . For n example, {e1, . . . , en} is an orthonormal basis for R . (iii) For W ⊂ V , the orthogonal complement of W is defined as W ⊥ := {z ∈ V : ⟨w, z⟩ = 0 for every w ∈ W } (notice the similarity with annihilator). If V = R3 with the standard inner ∑ { ∈ R3 } ⊥ { ∈ R3 3 } product, and W = (x1, x2, x3) : x1 = x2 = x3 , then W = (y1, y2, y3) : j=1 yj = 0 .

[127] Let V ≠ {0} be a finite dimensional inner product space. Then,

(i) [Gram-Schmidt construction] If {u1, . . . , un} ⊂ V is a linearly independent set, then there is an orthonormal set {v1, . . . , vn} ⊂ V with span{v1, . . . , vk} = span{u1, . . . , uk} for 1 ≤ k ≤ n. (ii) V has an orthonormal basis. LINEAR ALGEBRA 39

(iii) If W ⊂ V is a vector subspace and if {vj : 1 ≤ j ≤ k} is an orthonormal basis for W , then there are vk+1, . . . , vn ∈ V such that {vj : 1 ≤ j ≤ n} is an orthonormal basis for V .

Proof. (i) We use induction on n. The case n = 1 is easy since we may take v1 = u1/∥u1∥. Now assume by induction that we have chosen an orthonormal set {v1, . . . , vn−1} with span{v1, . . . , vk} = ∑ { } ≤ ≤ − n−1⟨ ⟩ ∈ − span u1, . . . , uk =: Vk, for 1 k n 1. Let wn = j=1 un, vj vj Vn−1 and vn = (un wn)/∥un − wn∥. Since un ∈/ Vn−1, we have un − wn ≠ 0 so that vn is well-defined. Direct calculation yields ⟨un − wn, vk⟩ = 0 for 1 ≤ k ≤ n − 1 and thus ⟨vn, vk⟩ = δnk for 1 ≤ k ≤ n.

Geometrically, wn is to be considered as the projection of un onto Vn−1, and un − wn is the component of un perpendicular to Vn−1. Clearly, vn ∈ span{un, wn} ⊂ span({un} ∪ Vn−1) and un ∈ span{vn, wn} ⊂ span({vn} ∪ Vn−1). Hence span{v1, . . . , vn} = span{u1, . . . , un} also.

(ii) Take any basis of V and transform it to an orthonormal basis by part (i).

(iii) Choose linearly independent vectors uk+1, . . . , un ∈ V \ W so that {v1, . . . , vk, uk, . . . , un} is a basis for V . Applying the Gram-Schmidt process to this basis from the (k + 1)th vector onwards, replace uk+1, . . . , un by orthonormal vectors vk+1, . . . , vn. 

The advantages of an orthonormal basis over an ordinary basis are that the scalar coefficients in the expansion of a vector and the norm of a vector can easily be determined.

[128] [Advantages of an orthonormal basis] Let V be a inner product space with an orthonormal ∑ ∑ { } n ⟨ ⟩ ∥ ∥2 n |⟨ ⟩|2 ∈ basis v1, . . . , vn . Then, v = j=1 v, vj vj and v = j=1 v, vj for every v V . ∑ ∑ ∑ n n n 2 Proof. If v = cjvj, then ⟨v, vk⟩ = cj⟨vj, vk⟩ = ck. Hence v = ⟨v, vj⟩vj. Now ∥v∥ = ⟨∑ j∑=1 ⟩ ∑ ∑j=1 ∑ j=1 n ⟨ ⟩ n ⟨ ⟩ n n ⟨ ⟩⟨ ⟩⟨ ⟩ n ⟨ ⟩⟨ ⟩  j=1 v, vj vj, k=1 v, vk vk = j=1 k=1 v, vj v, vk vj, vk = j=1 v, vj v, vj .

Exercise-32: Let W ⊂ V be a vector subspace of a finite dimensional inner product space V . Then, (i) W ⊥ is a vector subspace of V .

(ii) Let {v1, . . . , vk} be an orthonormal basis for W . Then, for vk+1, . . . , vn ∈ V \ W , we have that ⊥ {v1, . . . , vn} is an orthonormal basis for V ⇔ {vk+1, . . . , vn} is an orthonormal basis for W . ⊕ (iii) (W ⊥)⊥ = W , and V = W W ⊥. ∑ ⇒ ∈ ⊥ n ∈ ⊥ ≤ ≤ [Hint: (ii) : Show vk+1, . . . , vn W . If v = j=1 cjvj W , also show cj = 0 for 1 j k.]

11. Orthogonal projections and least square solutions

Suppose a system of linear equations AX = Y over K = R or C has no solution. Still, we may like to find X so that AX is at a ‘shortest distance’ from Y . We may restate this in the language of inner product spaces. Suppose W ⊂ V is a vector subspace of an inner product space V and 40 T.K.SUBRAHMONIAN MOOTHATHU v ∈ V . Can we find w0 ∈ W at a shortest distance from v, i.e., such that ∥v − w0∥ ≤ ∥v − w∥ for every w ∈ W ? The answer is yes; the required w0 is found by projecting v orthogonally onto W .

Definition: Let V be a finite dimensional inner product space and W ⊂ V be a vector subspace. The orthogonal projection P : V → W is defined as follows: (i) if W = {0}, then P ≡ 0, (ii) if ∑ ̸ { } { } k ⟨ ⟩ W = 0 and v1, . . . , vk is an orthonormal basis for W , then P (v) = j=1 v, vj vj. That the definition of P is independent of the choice of an orthonormal basis of W follows from [129] below.

Let us try to understand why P (v) has the special expression given above. Assume V = R3 and

W ⊂ V is a plane spanned by the orthonormal basis {v1, v2}. For v ∈ V , observe geometrically that the orthogonal projection P (v) of v onto W must be the sum of two vectors, P (v) = w1 +w2, where w1 is the orthogonal projection of v onto the line span{v1} and w2 is the orthogonal projection of v onto the line span{v2}. If θ is the angle between v and w1, we have ∥w1∥ = ∥v∥ cos θ. Also cos θ = ⟨v/∥v∥, v1⟩. Hence w1 = ∥w1∥v1 = ∥v∥⟨v/∥v∥, v1⟩v1 = ⟨v, v1⟩v1. Similarly, w2 = ⟨v, v2⟩v2.

[129] [Orthogonal projection theorem] Let V be a finite dimensional inner product space, W ⊂ V be a vector subspace, and P ∈ L(V,V ). Then the following are equivalent. (i) P is the orthogonal projection of V onto W . (ii) P (v) is the unique element in W at a shortest distance from v. ⊕ (iii) If v ∈ V = W W ⊥ is written as v = w + z with w ∈ W and z ∈ W ⊥, then P (v) = w. (iv) P 2 = P , ⟨P (x), y⟩ = ⟨x, P (y)⟩ for every x, y ∈ V , and P (V ) = W . (v) I − P is the orthogonal projection of V onto W ⊥.

Proof. We may assume W ≠ {0}. Choose an orthonormal basis {vj : 1 ≤ j ≤ n} of V so that ⊥ {vj : 1 ≤ j ≤ k}, {vj : k < j ≤ n} are orthonormal bases for W , W respectively. ∑ ∑ ⇒ k ⟨ ⟩ − n ⟨ ⟩ ∈ ⊥ (i) (ii): By definition, P (v) = j=1 v, vj vj and hence v P (v) = j=k+1 v, vj vj W . Now for any w ∈ W , we have P (v) − w ∈ W since W is a vector subspace, and therefore by applying Pythagoras theorem to the orthogonal pair v − P (v),P (v) − w, we get ∥v − w∥2 = ∥(v − P (v)) + (P (v) − w)∥2 = ∥v − P (v)∥2 + ∥P (v) − w∥2 ≥ ∥v − P (v)∥2. Here, we have equality iff ∥P (v) − w∥2 = 0 iff w = P (v) and this establishes uniqueness also. ⊕ ⊥ (ii) ⇒ (iii): Consider v = w + z ∈ W W . Then for any w1 ∈ W , we have ⟨w − w1, z⟩ = 0 so 2 2 2 2 2 2 that by Pythagoras theorem, ∥v − w1∥ = ∥z + (w − w1)∥ = ∥z∥ + ∥w − w1∥ ≥ ∥z∥ = ∥v − w∥ . Hence the unique in W at a shortest distance from v is w. ⊕ 2 ⊥ (iii) ⇒ (iv): Clearly P = P and P (V ) = W . Writing x = x1 + x2 ∈ W W and y = y1 + y2 ∈ ⊕ ⊥ W W , check that ⟨P (x), y⟩ = ⟨x1, y1⟩ = ⟨x, P (y)⟩. LINEAR ALGEBRA 41

⊥ (iv) ⇒ (i): We claim that vj − P (vj) ∈ W for 1 ≤ j ≤ n. Consider w = P (v) ∈ W . We have 2 ⟨w, vj −P (vj)⟩ = ⟨P (v), vj −P (vj)⟩ = ⟨v, P (vj)−P (vj)⟩ = ⟨v, P (vj)−P (vj)⟩ = ⟨v, 0⟩ = 0, and this ⊥ proves the claim. Since P (vj) ∈ W for 1 ≤ j ≤ n and W ∩ W = {0}, we deduce that P (vj) = vj ∑ ∑ ≤ ≤ ≤ n ⟨ ⟩ k ⟨ ⟩ for 1 j k and P (vj) = 0 for k < j n. Hence P (v) = P ( j=1 v, vj vj) = j=1 v, vj vj + 0.

Finally, the implications (i) ⇒ (v) ⇒ (iii) are easy. 

Remark: If P : V → W is the orthogonal projection, then ker(P ) = W ⊥ by [129](iii).

∗ Definition: If A = [aij] ∈ M(n, K), let A ∈ M(n, K) be the conjugate-transpose of A, i.e., the ∗ ∗ t ijth entry of A is aji. If K = R, then A = A .

Remark: (i) Note that (A∗)∗ = A,(cA + dB)∗ = cA∗ + dB∗, and (AB)∗ = B∗A∗. (ii) If we think of v, u ∈ Kn as n × 1 matrices, then u∗ is of order 1 × n so that we have ⟨v, u⟩ = u∗v, and vu∗ ∈ M(n, K). In particular, ∥v∥2 = v∗v, and ⟨v, u⟩u = u⟨v, u⟩ = (uu∗)v.

Example: Let V be an n-dimensional inner product space, let u ∈ V be a , and W ⊂ V be the (n − 1)-dimensional hyperplane, W = span{u}⊥. We would like to find expressions for the orthogonal projection P from V onto W and the reflection T ∈ L(V,V ) about W in terms of the vector u. For v ∈ V , first note geometrically that P (v) = v − cu for some scalar c, and then T (v) = v − 2cu. Now, cu must be the orthogonal projection of v onto span{u}, and this gives c = ⟨v, u⟩. Hence P (v) = v − ⟨v, u⟩u and T (v) = v − 2⟨v, u⟩u. We may write these also as P (v) = v − u⟨v, u⟩ = v − uu∗v = (I − uu∗)v and T (v) = (I − 2uu∗)v, where uu∗ ∈ M(n, K). The operator T (i.e., the reflection about a hyperplane) is called a Householder transformation. Note that (uu∗)v = ⟨v, u⟩u = 0 iff v ∈ W . Hence ker(uu∗) = W , and therefore rank(uu∗) = 1. So the matrices I − uu∗, I − 2uu∗ of P,T obtained here are rank-1 perturbations of the identity.

n [130] Let A ∈ M(m × n, K). (i) If {w1, . . . , wk} ⊂ K is linearly independent, then {w1,..., wk} is n also linearly independent, where wj ∈ K is the vector obtained by taking the complex conjugate ∗ t of the entries of wj. Applying this to the columns of A, we have rank(A ) = rank(A ) = rank(A) (however, range(A∗) can be different from range(At)). (ii) ⟨Av, w⟩ = ⟨v, A∗w⟩ for every v ∈ Kn and w ∈ Km. (iii) range(A)⊥ = ker(A∗). (iv) ker(A∗A) = ker(A) and rank(A∗A) = rank(A). (v) The columns of A are linearly independent ⇔ A∗A ∈ M(n, K) is invertible. (vi) The rows of A are linearly independent ⇔ AA∗ ∈ M(m, K) is invertible.

∑ ∑ k k Proof. (i) For the assertion about linear independence, observe j=1 cjwj = j=1 cjwj. 42 T.K.SUBRAHMONIAN MOOTHATHU

(ii) ⟨Av, w⟩ = w∗Av = w∗(A∗)∗v = (A∗w)∗v = ⟨v, A∗w⟩.

(iii) For w ∈ Km, we have w ∈ range(A)⊥ ⇔ ⟨v, A∗w⟩ = ⟨Av, w⟩ = 0 for every v ∈ Kn ⇔ A∗w = 0.

(iv) Clearly ker(A) ⊂ ker(A∗A). To see ker(A∗A) ⊂ ker(A), note that ∥Av∥2 = (Av)∗Av = v∗A∗Av. Since A∗A ∈ M(n, K) and ker(A∗A) = ker(A), we obtain rank(A∗A) = n − null(A∗A) = n − null(A) = rank(A) by Nullity-rank theorem.

(v) We know that the columns of A are linearly independent ⇔ ker(A) = {0}. By (iv), {0} = ker(A) = ker(A∗A) ⇔ rank(A∗A) = n ⇔ A∗A is invertible. Consider At to prove (vi). 

How do we express an orthogonal projection P : V → W given a basis U of W , where U may not be orthonormal? One approach is to first transform U to an orthonormal basis of W by Gram-Schmidt. There is another way, stated as [131] below for V = Kn, where K = R or C.

[131] [Matrix of an orthogonal projection] Let W ⊂ Kn be a vector subspace and P be the n orthogonal projection from K onto W . Let {w1, . . . , wk} be a basis of W , and let A ∈ M(n×k, K) ∗ −1 ∗ n be the matrix whose jth column is wj. Then P (v) = A(A A) A v for every v ∈ K . ∑ ∈ n ∈ k Proof. Fix v K and choose z = (c1, . . . , cn) K so that P (v) = j=1 cjwj = Az. Since v − P (v) ∈ W ⊥ = ker(A∗) by [130](iii), we get A∗v = A∗(P (v)) = A∗Az, or z = (A∗A)−1A∗v, where the invertibility of A∗A is assured by [130](v). Therefore, P (v) = Az = A(A∗A)−1A∗v. 

Remark: It is tempting to make an attempt to simplify A(A∗A)−1A∗ by writing (A∗A)−1 as A−1(A∗)−1 but this is meaningless if A is not a square matrix. If A happens to be a square matrix, then k = n so that W = Kn. In this case (A∗A)−1 = A−1(A∗)−1 indeed, and consequently P = I.

Exercise-33: Find an expression for the orthogonal projection P : V → W in the following cases, by using the original definition, or [129], or [131], or Gram-Schmidt together with [129]: (i) V = R2 and W = span{(1, 3)}. (ii) V = R3 and W = span{(1, 1, 0)}. (iii) V = R3 and W = span{(1, 1, 0), (0, 0, 1)}. (iv) V = R3 and W = span{(1, 1, 0), (0, 1, 1)}. [Hint: (ii) Let w = (1, 1, 0). Then { w } = { √w } is an orthonormal basis for W . Hence for ∥w∥ 2 x = (x , x , x ) ∈ R3, we get P (x) = ⟨x, √w ⟩ √w = 1 ⟨x, w⟩w = 1 (x + x , x + x , 0). Or, setting  1 2 3  2 2  2 2 1 2 1 2 1 1/2 1/2 0       t −1 t   A = 1, note that A(A A) A = 1/2 1/2 0, and apply [131] to get the same answer.] 0 0 0 0

Next we give a practical application of orthogonal projections. We will write Ax = y for AX = Y . LINEAR ALGEBRA 43

Definition: Let A ∈ M(m × n, K) and y ∈ Km. We say z ∈ Kn is a least square solution for the system Ax = y if ∥y − Az∥ ≤ ∥y − Ax∥, or equivalently if ∥y − Az∥2 ≤ ∥y − Ax∥2 for every x ∈ Kn.

[132] [Least square theorem] Let A ∈ M(m × n, K) and y ∈ Km. Then z ∈ Kn is a least square solution for the linear system Ax = y ⇔ A∗Az = A∗y.

Proof. Let W = range(A). By [129], note that z is a least square solution for Ax = y ⇔ Az is the orthogonal projection of y to W ⇔ y − Az ∈ W ⊥ = ker(A∗) (by [130](iii)) ⇔ A∗y = A∗Az. 

Remark: If z is a least square solution to Ax = y, then Az is unique (being the orthogonal projection of y to range(A)), but z may not be unique since there may exist ze ≠ z with Aze = Az.

2 In various applications, given finitely many points (a1, y1),..., (am, ym) ∈ R , one would like to find a simple curve (for example, a line) that fits the data as best as possible.

Definition: Let K = R or C. Let Pn denote the collection of all polynomials of degree at most n 2 over K. We say p ∈ Pn is a least square polynomial in Pn to a finite data (a1, y1),..., (am, ym) ∈ K ∑ ∑ n | − |2 ≤ n | − |2 ∈ P if j=1 yj p(aj) j=1 yj q(aj) for every q n; equivalently if

∥(y1, . . . , ym) − (p(a1), . . . , p(am))∥ ≤ ∥(y1, . . . , ym) − (q(a1), . . . , q(am))∥ for every q ∈ Pn.

We may use [132] to find least square polynomials because:   2 ··· n 1 a1 a1 a1     2 ··· n  1 a2 a2 a2  [133] Let A =   ∈ M(m × (n + 1),K). ......  . . . . .  1 a a2 ··· an m m m ∑ ∈ m n j ∈ n+1 Let y = (y1, . . . , ym) K , p(x) = j=0 cjx , and z = (c0, c1, . . . , cn) K . Then m Az = (p(a1) ··· p(am)) ∈ K . Hence, p is a least square polynomial in Pn for the finite data 2 ∗ ∗ (a1, y1),..., (am, ym) ∈ K ⇔ z is a least square solution to Ax = y ⇔ A Az = A y.

− ∈ R3 Example: Let (0, 1), (2, 0), (4, 5) be given. We wish to find a line that best fits this data.   − 1 0  1   c      0   Proceeding as in [133] with m = 3 and n = 1, we have A = 1 2, Z = and Y =  0 . c1 1 4 5      

∗ ∗ 6 20 c0 20 Now A AZ = A Y simplifies to = . This gives c0 = −5/3 and c1 = 3/2. So 3 6 c1 4 the required line is y = (3/2)x − 5/3. Plot the line to see whether it is a good fit to the data.

12. Unitary operators: they preserve distances and angles

Let V be a finite dimensional inner product space over K = R or C throughout this section. 44 T.K.SUBRAHMONIAN MOOTHATHU

Suppose T ∈ L(V,V ). When can we say that T takes orthonormal sets to orthonormal sets? The answer is: when T is a unitary operator. To define a unitary operator, first we need to introduce the concept of the adjoint operator.

Exercise-34: [Riesz representation of linear functionals] ϕ : V → K is a linear functional ⇔ there is z ∈ V such that ϕ(v) = ⟨v, z⟩ for every v ∈ V .[Hint: Let {v1, . . . , vn} be an orthonormal basis ∑ ∑ ∑ n n n for V . We know that if aj = ϕ(vj), then ϕ( j=1 cjvj) = j=1 cjaj. Take z = j=1 ajvj.]

[134] [Adjoint operator] Let W be a finite dimensional inner product space and T ∈ L(V,W ). Then there exists a unique operator T ∗ ∈ L(W, V ), called the adjoint of T , with the defining property that ⟨T (v), w⟩ = ⟨v, T ∗(w)⟩ for every v ∈ V and w ∈ W .

Proof. Fix w ∈ W and we wish to define T ∗(w). Since ϕ : V → K given by ϕ(v) = ⟨T (v), w⟩ is linear, there is z ∈ V such that ϕ(v) = ⟨v, z⟩ by Exercise-34. Define T ∗(w) = z and check that T ∗ is linear. If Te also satisfies the property of T ∗, then verify ⟨v, (T ∗ − Te)(w)⟩ = 0. Taking v = (T ∗ − Te)(w), we wee ∥(T ∗ − Te)(w)∥2 = 0 for every w ∈ W , and thus Te = T ∗. 

Remark: Let S, T ∈ L(V,V ). Then, we have (cS+dT )∗ = cS∗+dT ∗,(ST )∗ = T ∗S∗, and (T ∗)∗ = T . To obtain these, verify the defining condition involving the inner product of the adjoint.

[135] Let T ∈ L(V,V ). Then, (i) If A is the matrix of T w.r.to an orthonormal baisis U = ∗ ∗ {v1, . . . , vn} of V , then A is the matrix of T w.r.to U. ⊕ (ii) ker(T ) = range(T ∗)⊥, and hence V = ker(T ) range(T ∗). (iii) c ∈ K is an eigenvalue of T iff c is an eigenvalue of T ∗ (but the eigenvectors can be different). (iv) Let W ⊂ V be a vector subspace. Then T (W ) ⊂ W ⇔ T ∗(W ⊥) ⊂ W ⊥.

∗ Proof. (i) Write A = [aij], and let B = [bij] be the matrix of T . Recall that the ijth entry of the matrix of an operator is the ith coefficient of the image of the jth basis vector. Thus, ∗ bij = ⟨T vj, vi⟩ = ⟨vj,T (vi)⟩ = ⟨T (vi), vj⟩ = aji.

(ii) For v ∈ V , we have v ∈ range(T ∗)⊥ ⇔ ⟨v, T ∗(w)⟩ = ⟨T (v), w⟩ = 0 ∀ w ⇔ T (v) = 0. ⊕ (iii) V = ker(T − cI) range(T ∗ − cI) by (ii). Hence, c is an eigenvalue of T ⇔ ker(T − cI) ≠ {0} ⇔ T ∗ − cI is not surjective ⇔ T ∗ − cI is not injective (∵ dim(V ) < ∞) ⇔ c is an eigenvalue of T ∗.

(iv) T (W ) ⊂ W ⇔ ⟨T (w), z⟩ = ⟨w, T ∗(z)⟩ = 0 for every w ∈ W and z ∈ W ⊥ ⇔ T ∗(W ⊥) ⊂ W ⊥. 

Definition: (i) Let T ∈ L(V,V ). We say T is unitary if T ∗T = I, or equivalently if T −1 = T ∗. (ii) A matrix A ∈ M(n, K) is said to be unitary if A−1 = A∗.

Exercise-35: [Polarization identity - this is useful to recover the inner product from the norm] Consider our inner product space V over K. LINEAR ALGEBRA 45

(i) If K = R, then ⟨v, w⟩ = (1/4)(∥v + w∥2 − ∥v − w∥2) for every v, w ∈ W . ∑ C ⟨ ⟩ 3 k∥ k ∥2 ∈ (ii) If K = , then v, w = (1/4) k=0 i v + i w for every v, w W .

[136] [Fundamental theorem of unitary operators] Let T ∈ L(V,V ), and A ∈ M(n, K) be the matrix of T w.r.to an orthonormal basis {v1, . . . , vn} of V . Then the following are equivalent: (i) T is unitary. (ii) A is unitary. (iii) (T preserves distances) T is an isometry, i.e., ∥T (v)∥ = ∥v∥ for every v ∈ V . (iv) (T preserves angles) ⟨T (v),T (w)⟩ = ⟨v, w⟩ for every v, w ∈ V . (v) T maps orthonormal bases to orthonormal bases. (vi) The columns of A form an orthonormal basis for Kn. (vii) The rows of A form an orthonormal basis for Kn.

Proof. We have (i) ⇔ (ii) by [135](i). Next, (i) ⇒ (iii) is direct since ∥T (v)∥2 = ⟨T (v),T (v)⟩ = ⟨v, T ∗T (v)⟩ and T ∗T = I. For (iii) ⇒ (iv), use Exercise-35. And (iv) ⇒ (v) is easy.

Let A = [aij] and let wj be the jth column of A. To get (v) ⇒ (vi) ⇒ (ii), note that ∑ ⟨ ⟩ n ⟨ ⟩ ∗ ∗ T (vj),T (vk) = i=1 aijaik = wj, wk = wkwj = (A A)kj. Finally, the equivalence of (ii) and (vii) follows from the already established facts since A is unitary iff At is unitary. 

Remark: (i) Unitary matrices A are computationally useful since the inverse A−1 can be found quickly as A−1 = A∗. For A ∈ M(n, R), the condition A−1 = A∗ reduces to A−1 = At, and matrices satisfying this are called orthogonal matrices. The columns of an orthogonal matrix form cos θ −sinθ an orthonormal basis and not just an orthogonal set. The matrix   ∈ M(2, R) of sin θ cos θ rotation is orthogonal since its columns form an orthonormal basis for R2. (ii) We saw in [136] that T is unitary ⇔ T is a linear isometry when dim(V ) < ∞. However, ‘⇐’ can fail when dim(V ) = ∞ since a linear isometry may not be surjective in this case. (iii) If T is unitary, then T is an isometry and hence, |c| = 1 for any eigenvalue c of T . This is because T (v) = cv for some v ≠ 0, and we also have ∥T (v)∥ = ∥v∥.

Earlier we saw that the LU decomposition comes from row operations leading to the row echelon form. Similarly, ortho-upper decomposition comes from the Gram-Schmidt process.

[137] [Ortho-upper decomposition (also known as QR decomposition)] Suppose A ∈ M(m × n, K) has linearly independent columns. Then there exist Q ∈ M(m × n, K) with orthonormal columns and an invertible upper-triangular matrix C ∈ M(n, K) such that A = QC. 46 T.K.SUBRAHMONIAN MOOTHATHU

m Proof. Let u1, . . . , un ∈ K be the columns of A. By Gram-schmidt, we obtain orthonormal m vectors v1, . . . , vn ∈ K such that span{u1, . . . , uk} = span{v1, . . . , vk} for 1 ≤ k ≤ n. Let ∑ ∈ × ··· k Q M(m n, K) be the matrix with columns v1 vn. Let cjk be so that uk = j=1 cjkvj and cjk = 0 for j > k. Put C = [cjk] ∈ M(n, K). In fact, cjk’s can be determined explicitly: ∑ ∑ k−1 k−1 vk = (uk − ⟨uk, vj⟩vj)/∥uk − ⟨uk, vj⟩vj∥ by Gram-Schmidt, and hence ∑ i=1 j=1 ∑ k ⟨ ⟩ ≤ − ∥ − k−1⟨ ⟩ ∥ ̸ uk = j=1 cjkvj, where cjk = uk, vj for 1 j < k 1 and ckk = uk j=1 uk, vj vj = 0.

Since ckk ≠ 0 for every k, the upper-triangular matrix C is invertible by Exercise-27. 

Remark: Suppose we have an ortho-upper decomposition A = QC, and assume m = n. Then Q is unitary by [136] so that Q−1 = Q∗. Now, solving a linear system AX = Y is equivalent to solving CX = Q∗Y , and the latter is easier to solve since C is upper-triangular.

13. Orthogonal diagonalization of normal and self-adjoint operators

Let T : V → V be a linear operator of a finite dimensional inner product space V . When can we say that the matrix of T is diagonal w.r.to some orthonormal basis of V ? The answer depends on whether K = C or R. The two classes of operators (according to whether K = C or R) for which such a diagonalization is possible are the classes of normal and self-adjoint operators.

Definition: (i) Let V be a finite dimensional inner product space and T ∈ L(V,V ). We say T is normal if T ∗T = TT ∗, and self-adjoint if T ∗ = T . A matrix A ∈ M(n, K) is said to be normal if A∗A = AA∗, and self-adjoint (or Hermitian) if A∗ = A.

We present a table indicating a rough analogy between complex numbers and operators:

z ∈ C; zz = zz normal operator; TT ∗ = T ∗T z ∈ unit circle; zz = 1 = zz unitary operator; TT ∗ = I = T ∗T z ∈ R; z = z self-adjoint operator; T ∗ = T Example: (i) Unitary operators and self-adjoint operators are clearly normal, but there are no other implications among the three notions. This can be seen by observing that for a ∈ C, the operator T ∈ L(C, C) given by T (v) = av is always normal, is unitary iff |a| = 1, and is self-adjoint iff a ∈ R.

∗ (ii) Diagonal matrices are normal. If A = diag(d1, . . . , dn) ∈ M(n, K), then A = diag(d1,..., dn), and AA∗ = A∗A since any two diagonal matrices of the same order commute.

(iii) Orthogonal projections are self-adjoint by [129](iv).

First let us work out the diagonalization process over the field K = C.

[138] (i) Let V ≠ {0} be a finite dimensional inner product space over C and T ∈ L(V,V ). Then the matrix of T is upper-triangular w.r.to some orthonormal basis of V . LINEAR ALGEBRA 47

(ii) If A ∈ M(n, C), then ∃ a C ∈ M(n, C) such that C∗AC is upper-triangular.

Proof. (i) Imitate the proof of [125] by using induction on n = dim(V ). n n (ii) Applying part (i) to TA : C → C , we get that the matrix B of TA is upper-triangular n n n w.r.to some orthonormal basis {v1, . . . , vn} of C . Let C be the matrix of I : C → C w.r.to the −1 orthonormal bases {v1, . . . , vn} on the domain and {e1, . . . , en} on the range. Then B = C AC. −1 ∗ Moreover C = C by the equivalence (ii) ⇔ (vi) in [136] since the jth column of C is vj. 

Exercise-36: Let V be a finite dimensional inner product space and T ∈ L(V,V ) be normal. Then, (i) ∥T (v)∥ = ∥T ∗(v)∥ for every v ∈ V . (ii) T v = cv ⇔ T ∗v = cv for v ∈ V and c ∈ K. (iii) Let v, w ∈ V and c, d ∈ K. If T v = cv, T w = dw and c ≠ d, then ⟨v, w⟩ = 0. [Hint: (i) ∥T v∥2 = ⟨T (v),T (v)⟩ = ⟨T ∗T (v), v⟩ = ··· . (ii) Since T − cI is normal, ∥(T − cI)(v)∥ = ∥(T ∗ − cI)(v)∥ by the first part. (iii) T ∗w = dw by the second part. Now show c⟨v, w⟩ = d⟨v, w⟩.]

[139] [Spectral theorem - complex version] (i) Let A ∈ M(n, C). Then there exists a unitary matrix C ∈ M(n, C) such that C∗AC is diagonal ⇔ A is a , i.e., A∗A = AA∗. (ii) Let V ≠ {0} be a finite dimensional inner product space over C and T ∈ L(V,V ). Then the matrix of T is diagonal w.r.to some orthonormal basis of V ⇔ T is normal.

n n Proof. We will prove only (ii), for then (i) can be deduced by considering TA : C → C . If the ∗ matrix of T is diag(d1, . . . , dn) w.r.to an orthonormal basis U of V , then the matrix of T w.r.to

U is diag(d1,..., dn). Since any two diagonal matrices of the same order commute, it follows that T ∗T = TT ∗. Conversely suppose T is normal, and choose by [138] an orthonormal basis

U = {v1, . . . , vn} of V so that the matrix A = [aij] of T w.r.to U is upper-triangular. Let B = [bij] ∗ be the matrix of T w.r.to U, and keep in mind that bij = aji. ∑ ∗ ∗ n Since T (v1) = a11v1, we have T (v1) = a11v1 by Exercise-36. But T (v1) = bk1vk = ∑ k=1 n a1kvk, and hence a1k = 0 for 2 ≤ k ≤ n. This implies T (v2) = a12v1 + a22v2 = 0 + a22v2 so k=1 ∑ ∑ ∗ ∗ n n that T (v2) = a22v2 by Exercise-36. But T (v2) = k=1 bk2vk = k=1 a2kvk, and hence a2k = 0 for 3 ≤ k ≤ n. Proceed in this manner to see A is indeed a diagonal matrix. 

2 Remark: If (a1, y1),..., (am, ym) ∈ K , and if ai ≠ aj for i ≠ j, then there is a polynomial p over

K, called the Lagrange’s interpolation polynomial, such that deg(p) ≤ m − 1 and p(aj) = yj for ∑ ∏ ≤ ≤ m − − −1 1 j m. Indeed, let p(x) = j=1 pj(x), where pj(x) = yj i≠ j(x ai)(aj ai) .

[140] [Some more properties of normal matrices/operators] Let V be a finite dimensional inner product space over C and T ∈ L(V,V ). Then the following are equivalent. (i) T is normal. 48 T.K.SUBRAHMONIAN MOOTHATHU ∑ ∗ ∗ k j ∗ ∈ { j ≥ } (ii) T is a polynomial in T (i.e., T = j=0 ajT ); equivalently, T span T : j 0 . (iii) If W ⊂ V is a vector subspace and T (W ) ⊂ W , then T ∗(W ) ⊂ W and hence T (W ⊥) ⊂ W ⊥. (iv) ∥T v∥ = ∥T ∗v∥ for every v ∈ V .

Proof. (i) ⇒ (ii): By [139], we can assume that the matrix of T w.r.to some orthonormal basis U ∗ ∗ of V is A = diag(c1, . . . , cn). Then, w.r.to U, the matrix of T is A and the matrix of p[T ] is p[A] for any polynomial p. Hence it suffices to find a polynomial p over C such that p(cj) = cj for

1 ≤ j ≤ n. Choose p to be the Lagrange’s interpolation polynomial by considering the distinct cj’s. Note that this proof says in fact that T ∗ = p[T ] for some polynomial p of degree ≤ n − 1.

(ii) ⇒ (iii): T j(W ) ⊂ W for every j, and hence p[T ](W ) ⊂ W for every polynomial p.

We already know (i) ⇒ (iv) by Exercise-36. For the implications (iii) ⇒ (i) and (iv) ⇒ (i), consider by [138] an orthonormal basis w.r.to which the matrix of T is upper-triangular, and then imitate the proof of [139] to show that this matrix must be diagonal. 

Remarks: (i) Let V be a finite dimensional inner product space over C. If T ∈ L(V,V ) is normal ∗ ∗ and W ⊂ V is a T -invariant vector subspace, then it follows from [140] that (T |W ) = T |W , and | | the operators T W , T W ⊥ are normal.    1 1 1 0 2 2     ∗ (ii) Consider T ∈ L(C , C ) given by . Then represents T . Now, e1 is an eigenvector 0 1 1 1 ∗ ∗ of T , but not of T ; equivalently, W := span{e1} is T -invariant, but not T -invariant and hence ⊥ W = span{e2} is not T -invariant. These troubles are due to the fact that T is not normal. (iii) Let A, B ∈ M(n, C) be normal. In general, A + B and AB may not be normal (find examples). But, if AB = BA, then A∗B = BA∗ by [140](ii) and we can verify that A + B, AB are normal. (iv) Several conditions equivalent to normality for a matrix can be found in the article Normal matrices, Linear Algebra and its Applications, 87, (1987), 213-225.

Exercise-37: Let A ∈ M(n, C) be a normal matrix. Then, (i) [Existence of kth root] For each k ∈ N, there is B ∈ M(n, C) with Bk = A. (ii) ker(Ak) = ker(A) for every k ∈ N. In particular, A = 0 if A is also nilpotent. −1 ∈ C k e [Hint: By [139], D := C AC = diag(d1, . . . , dn). (i) Let bj be so that bj = dj, B = e −1 k k diag(b1, . . . , bn) and take B = CBC . (ii) Check that ker(A ) = ker(D ) = ker(D) = ker(A).]

Invariance of vector subspaces can be expressed algebraically in terms of orthogonal projections:

Exercise-38: Let V be a finite dimensional inner product space, T ∈ L(V,V ), W ⊂ V be a vector subspace and P ∈ L(V,V ) be the orthogonal projection onto W . Then, (i) T (W ) ⊂ W ⇔ TP = PTP ⇔ (I − P )TP = 0. LINEAR ALGEBRA 49

(ii) T (W ⊥) ⊂ W ⊥ ⇔ PT (I − P ) = 0 ⇔ PT = PTP . (iii) Both W and W ⊥ are T -invariant ⇔ TP = PT . (iv) Can we use the above to give an algebraic proof of “[140](i) ⇒ [140](iii)”? (we do not know).

Next we consider the diagonalization problem over the field K = R.

Exercise-39: Let A ∈ M(n, R) be self-adjoint (i.e., symmetric). Let c ∈ C be an eigenvalue of A over C. Then c ∈ R and there is v ∈ Rn \{0} such that Av = cv.[Hint: Suppose v ∈ Cn \{0} and Av = cv. Then c⟨v, v⟩ = ⟨Av, v⟩ = ⟨v, Av⟩ = c⟨v, v⟩ so that c = c. If v = x + iy with x, y ∈ Rn, then Ax = cx and Ay = cy; and at least one of x, y is non-zero.]

Recall that a real unitary matrix is called an , i.e., C ∈ M(n, R) is orthogonal if C−1 = Ct, or equivalently CtC = I. Also, A ∈ M(n, R) is self-adjoint iff A is symmetric.

[141] [Spectral theorem - real version] (i) Let A ∈ M(n, R). Then there exists an orthogonal matrix C ∈ M(n, R) such that CtAC is diagonal ⇔ A is symmetric/self-adjoint, i.e., At = A. (ii) Let V ≠ {0} be a finite dimensional inner product space over R and T ∈ L(V,V ). Then the matrix of T is diagonal w.r.to some orthonormal basis of V ⇔ T is self-adjoint.

Proof. The implication ‘⇒’ is easy in both (i) and (ii) since a real diagonal matrix is self-adjoint.

(ii) Suppose T is self-adjoint. We will use induction on n = dim(V ). The case n = 1 is trivial. Now assuming the result for values up to n − 1, consider V with dim(V ) = n. Let A ∈ M(n, R) be the matrix of T w.r.to an orthonormal basis {u1, . . . , un} of V . Considering A as a member of M(n, C), n A has an eigenvalue d1 ∈ C by [124]. By Exercise-39, d1 ∈ R and there is a unit vector ve1 ∈ R with

Ave1 = d1ve1. Correspondingly, there is a unit vector v1 ∈ V with T (v1) = d1v1 (if ve1 = (c1, . . . , cn), ∑ n { } ⊂ ∗ ⊥ then v = j=1 cjuj). Let W = span v1 . Clearly T (W ) W . Since T = T , we deduce W | ⊥ is T invariant and T W ⊥ is self-adjoint. By induction assumption, W has an orthonormal basis { } | { } { } v2, . . . , vn such that the matrix of T W ⊥ w.r.to v2, . . . , vn is diag(d2, . . . , dn). Then v1, . . . , vn is an orthonormal basis for V , and the matrix of T w.r.to {v1, . . . , vn} is diag(d1, . . . , dn).

t n n (i) Suppose A = A. By part (ii), the matrix D of TA : R → R is diagonal w.r.to some n n n orthonormal basis {v1, . . . , vn} of R . Let C ∈ M(n, R) be the matrix of I : R → R w.r.to the −1 bases {v1, . . . , vn} on the domain and {e1, . . . , en} on the range. Then C AC = D by the change of basis rule, [117]. Also, C is orthogonal by [136](vi) since the jth column of C is vj. 

Topic for self-study: Structure of normal matrices over R. 50 T.K.SUBRAHMONIAN MOOTHATHU

14. Singular value decomposition

Let V be a finite dimensional inner product space over K = R or C, and T ∈ L(V,V ). In the previous section we saw that if T is normal and K = C, or if T is self-adjoint and K = R, then the matrix of T is diagonal w.r.to some orthonormal basis of V . Here we will see that the matrix of T can be made diagonal for any T provided we choose one orthonormal basis for the domain of T and possibly another orthonormal basis for the range of T . This is useful in applications.

Remarks and Definition: Let V,W be finite dimensional inner product spaces over K and let T ∈ L(V,W ). Then T ∗ ∈ L(W, V ). Note that T ∗T ∈ L(V,V ) is self-adjoint and in particular ∗ normal. Therefore, by [139] or [141], the matrix of T T is diagonal, say diag(c1, . . . , cn), w.r.to some orthonormal basis {v1, . . . , vn} of V . According to our analogy between operators and complex numbers, T ∗T is like zz and hence ≥ 0. To be precise, ⟨T ∗T (v), v⟩ = ⟨T (v),T (v)⟩ = ∥T (v)∥2 ≥ 0 ∗ for every v ∈ V . Hence cj = cj⟨vj, vj⟩ = ⟨T T (vj), vj⟩ ≥ 0 for 1 ≤ j ≤ n. Any permutation of vj’s will induce the corresponding permutation on cj’s. Replacing {v1, . . . , vn} by {vσ(1), . . . , vσ(n)} for a suitable permutation σ, we may assume that c1 ≥ · · · ≥ cr > 0 = cr+1 = ··· = cn, where ∗ √ √ r = rank(T T ) = rank(T ) (see [130](v)). Then we say ( c1,..., cn) is the singular value se- quence of T . That is, the singular value sequence of T is the sequence of square roots of eigenvalues of T ∗T arranged in decreasing order with suitable multiplicity (we take square root because T ∗T 2 n m is like T ). If A ∈ M(m × n, K), then the singular value sequence of A is that of TA : K → K .

[142] [Singular value decomposition] Let V,W be finite dimensional inner product spaces over K and T ∈ L(V,W ). Let r = rank(T ) > 0 and let (d1, . . . , dn) be the singular value sequence of T .

Then there exist orthonormal bases {v1, . . . , vn} of V and {w1, . . . , wm} of W such that ∗ ⊥ (i) {v1, . . . , vr} is an orthonormal basis for range(T ) = ker(T ) .

(ii) {w1, . . . , wr} is an orthonormal basis for range(T ). ∑ r ⟨ ⟩ ∈ ∈ × (iii) T (v) = j=1 dj v, vj wj for every v V . This means the matrix B = [bij] M(m n, K) of

T w.r.to {v1, . . . , vn} and {w1, . . . , wm} is ‘diagonal’ in the sense that bii = di and bij = 0 for i ≠ j.

∗ Proof. Choose an orthonormal basis {v1, . . . , vn} of V so that the matrix of T T w.r.to this basis 2 ∗ ≤ ≤ { } is diag(c1, . . . , cn), where cj = dj . Since vj = T (T (vj/cj)) for 1 j r, we see v1, . . . , vr is ∗ ∗ an orthonormal basis for range(T ). Next observe that ⟨T (vj),T (vk)⟩ = ⟨T T (vj), vk⟩ = cj⟨vj, vk⟩.

This implies two things. First, ⟨T (vj),T (vj)⟩ = 0 and hence T (vj) = 0 for j > r since cj = 0 for j > r. Second, if we put wj = T (vj)/dj, then {w1, . . . , wr} is an orthonormal basis for range(T ).

Choose wr+1, . . . , wm so that {w1, . . . , wm} becomes an orthonormal basis for W . For any v = ∑ ∑ ∑ ∑ n ⟨ ⟩ n ⟨ ⟩ r ⟨ ⟩ r ⟨ ⟩  j=1 v, vj vj, we now have T (v) = j=1 v, vj T (vj) = j=1 v, vj T (vj) = j=1 v, vj djwj. LINEAR ALGEBRA 51

[142′] [Singular value decomposition - matrix version] Let A ∈ M(m × n, K), let r = rank(A) > 0, and (d1, . . . , dn) be the singular value sequence of A. Then there exist B ∈ M(m × n, K), and two unitary matrices R ∈ M(m, K), C ∈ M(n, K) with the following properties: n (i) If vj ∈ K is the jth column of C, then {v1, . . . , vr}, {v1, . . . , vn} respectively are orthonormal bases for range(A∗) and Kn. m (ii) If wk ∈ K is the kth column of R, then {w1, . . . , wr}, {w1, . . . , wm} respectively are orthonor- mal bases for range(A) and Km.

(iii) B = [bij] is ‘diagonal’ with d1, . . . , dn as the diagonal entries in the following sense: bii = di for 1 ≤ i ≤ r, bii = 0 = di for r < i ≤ n, and bij = 0 for i ≠ j. (iii) B = R−1AC so that A = RBC−1 = RBC∗.

Proof. Choose {v1, . . . , vn}, {w1, . . . , wm} for TA by [142] so that we get the expression TA(v) = ∑ r ⟨ ⟩ j=1 dj v, vj wj. Let C be the matrix with columns v1, . . . , vn, and R be the matrix with columns w1, . . . , wm. Define B as required in (iii). Now we have Avk = TA(vk) = dkwk. Since Cek = vk −1 ∗ ∗ ∗ and C = C , we wee C vk = ek. Hence RBC vk = RBek = R(dkek) = dkRek = dkwk. Thus ∗ ∗ Avk = RBC vk for 1 ≤ k ≤ n and hence A = RBC . Other properties follow by [142]. 

15. Determinant of a square matrix: properties

Let us leave the realm of inner product spaces and go back to the setting of matrices over an arbitrary field F . We are going to define determinant of a square matrix. Roughly speaking, determinant of a square matrix A ∈ M(n, R) gives the signed volume of the parallelopiped in Rn specified by the n row vectors (equivalently, the n column vectors) of A. To motivate the definition of determinant, let us look at the areas of parallelograms in R2.

Let P (u, v) denote a parallelogram in R2 whose sides are the vectors u, v ∈ R2, and let |P (u, v)| denote its area. By drawing figures, one may verify geometrically the following: (i) If c > 0, then |P (cu, v)| = |P (u, cv)| = c|P (u, v)|. (ii) |P (u + v, v)| = |P (u, u + v)| = |P (u, v)|. Using (i) and (ii), we may deduce: (iii) If c > 0, then |P (u + cv, v)| = (1/c)|P (u + cv, cv)| = (1/c)|P (u, cv)| = |P (u, v)|, and similarly |P (u, cu + v)| = |P (u, v)|.

Definition: Let F be a field. A function det : M(n, F ) → F is called a determinant function if it satisfies the following three axioms for every A, B ∈ M(n, F ): (i) If B is obtained by multiplying a row of A by c ∈ F , then det(B) = c · det(A). (ii) If B is obtained by adding one row of A to another row of A, then det(B) = det(A). 52 T.K.SUBRAHMONIAN MOOTHATHU

(iii) det(I) = 1.

The uniqueness and existence of the determinant function will be proved shortly. First we deduce some of its properties, by examining how a determinant function changes when a matrix is multiplied by an elementary matrix. Recall the three types of elementary matrices E(i, k), E(c ∗ i), E(i + c ∗ k) defined in section 7 of these notes. For convenience, add E(0 ∗ i) and E(i + 0 ∗ k) to the list, keeping in mind that E(0 ∗ i) is not invertible and is not an elementary matrix.

[143] Let F be a field, det : M(n, F ) → F be a determinant function, c ∈ F , and A ∈ M(n, F ). (i) [Type-2 effect] det(E(c ∗ i)A) = c · det(A); in particular (by taking A = I), det(E(c ∗ i)) = c. (ii) If some row of A is identically zero, then det(A) = 0. (iii) [Type-3 effect] det(E(i + c ∗ k)A) = det(A); in particular det(E(i + c ∗ k)) = 1 for i ≠ k. (iv) If two rows of A are the same, then det(A) = 0. (v) [Type-1 effect] det(E(i, k)A) = −det(A); in particular det(E(i, k)) = −1 for i ≠ k. (vi) det(cA) = cn · det(A). (vii) det(E) ≠ 0 and det(EA) = det(E)det(A) for every elementary matrix E ∈ M(n, F ).

n Proof. Let u1, . . . , un ∈ F be the rows of A. Note that (i) is just the first axiom.

(ii) If ui = 0, then A = E(0 ∗ i)A and hence det(A) = det(E(0 ∗ i)A) = 0 · det(A) = 0 by (i).

(iii) We may assume c ≠ 0 and i < k. Now we have det(E(i + c ∗ k)A) =         ......                 u + cu  u + cu   u   u   i k  i k  i   i    −   −     det  ...  = −c 1 · det  ...  = −c 1 · det  ...  = det ... = det(A), where we used                  uk   −cuk  −cuk uk  ...... part (i), the second axiom, and part (i) respectively.

(iv) If ui = uk, then 0 = det(E(i + (−1) ∗ k)A) = det(A) by (ii) and (iii).

(v) Assume i < k. By parts (iii) and (i), we have det (E(i, k)A) =   ......                     u  u + u  u + u  u + u   u   k   i k  i k  i k  i            det ... = det  ...  = det  ...  = −det  ...  = −det ... = −det(A).                      ui   ui   −uk   uk  uk  ...... (vi) Each of the n rows of A gets multiplied by c. Apply part (i) repeatedly. LINEAR ALGEBRA 53

(vii) This follows from (i), (iii), and (v). 

[144] Let F be a field, det : M(n, F ) → F be a determinant function, and A ∈ M(n, F ). (i) If R ∈ M(n, F ) is invertible, then det(RA) = det(R)det(A). (ii) [Invertibility criterion] A is invertible ⇔ det(A) ≠ 0. (iii) [Product rule] If B ∈ M(n, F ), then det(BA) = det(B)det(A). (iv) If A is invertible, then det(A−1) = (det(A))−1. (v) [Conjugacy invariance] If C ∈ M(n, F ) is invertible, then det(C−1AC) = det(A). (vi) [Uniqueness] If detf : M(n, F ) → F is any determinant function, then detf (A) = det(A).

Proof. (i) R is a finite product of elementary matrices by [119]. Now use [143](vii) and induction.

(ii) If A is invertible, I = A−1A and hence 1 = det(I) = det(A−1A) = det(A−1)det(A) by (i), and thus det(A) ≠ 0. If A is not invertible, then rank(A) < n so that by elementary row operations we can find a Π-elementary (hence invertible) matrix R such that the last row of RA is zero. Hence 0 = det(RA) = det(R)det(A) by [143](ii) and part (i). Moreover, by what is proved above, det(R) ≠ 0 since R is invertible. Thus det(A) = 0.

(iii) If B is invertible, use (i). If B is not invertible, then BA is also not invertible by Exercise-18(ii) or by Exercise-21(i), and therefore det(BA) = 0 = det(B)det(A) by (ii).

(iv) We have 1 = det(I) = det(A−1A) = det(A−1)f(A) and hence det(A−1) = (det(A))−1.

(v) det(C−1AC) = det(C−1)det(A)det(C) and det(C−1) = (det(C))−1.

(vi) If A is not invertible, then detf (A) = 0 = det(A) by (ii). If A is invertible, write A as a product f of elementary matrices, A = E1 ··· Em, by [119]. Since det(Ej) = det(Ej) for every j by [143], we conclude detf (A) = det(A) by the product rule. 

[145] Let F be a field, det : M(n, F ) → F be a determinant function, and A ∈ M(n, F ).

(i) If A = diag(d1, . . . , dn), then det(A) = d1 ··· dn.

(ii) If A is triangular with diagonal entries d1, . . . , dn, then det(A) = d1 ··· dn.

Proof. (i) diag(d1, . . . , dn) = E(d1 ∗ 1) ··· E(dn ∗ n) and det(E(dj ∗ 1)) = dj. Use product rule.

(ii) If some dj = 0, then A is not invertible by Exercise-27 so that det(A) = 0 by [144](ii). If d1, . . . , dn are non-zero, then there exist type-3 elementary matrices Ej such that Ek ··· E1A = diag(d1, . . . , dn). Also det(Ej) = 1 by [143](iii). Again use product rule.  54 T.K.SUBRAHMONIAN MOOTHATHU

16. Determinant: existence and expressions

To prove the existence of the determinant function, it is convenient to show that there is another set of axioms specifying a determinant function:

[146] [Another approach to the determinant] Let F be a field and det : M(n, F ) → F be a function with det(I) = 1. Then det is a determinant function iff the following two properties hold: (i) If two rows of A ∈ M(n, F ) are equal, then det(A) = 0. (ii) det is multilinear. That is, if det is thought of as an n-variable function with the rows of the matrix as variables, then det is linear in each variable when the other variables are kept fixed.

Proof. Suppose det : M(n, F ) → F is a determinant function. Then (i) holds by [143](iv). To establish (ii), in view of the the first axiom of the determinant, it suffices to prove the additivity part of linearity. Moreover, it suffices to consider the nth row only, since we can interchange any other row with the nth row as per [143](v). Consider A, P, Q ∈ M(n, F ). Assume that the ith row of A, P, Q are the same for 1 ≤ i ≤ n − 1, and the nth row of A is the sum of the nth rows of P and Q. We have to show det(A) = det(P ) + det(Q). First consider the case where A is lower-triangular. Since there is no condition on the last row for a matrix to be lower-triangular, we see that P,Q are also lower-triangular. By assumption, aii = pii = qii for 1 ≤ i < n and ann = pnn + qnn. Hence det(A) = det(P ) + det(Q) by [145](ii). In the general case, by elementary column operations find an invertible matrix C ∈ M(n, F ) so that AC is lower-triangular. By the property of , the ith row of AC, P C, QC are the same for 1 ≤ i ≤ n − 1 (in particular, P C, QC are lower-triangular), and the nth row of AC is the sum of the nth rows of PC and QC. By the first case, det(AC) = det(PC) + det(QC). Now use the product rule and the fact that det(C) ≠ 0.

Conversely assume det : M(n, F ) → F is any function satisfying det(I) = 1 and properties (i) and (ii). The first axiom of the determinant is clear by (ii). To prove the second axiom, consider A ∈ M(n, F ), and let B be the matrix obtained by adding the kth row of A to the ith row of A. If Ae be the matrix obtained by replacing the ith row of A by the kth row of A, then det(B) = det(A) + det(Ae) by (ii) and det(Ae) = 0 by (i) so that det(B) = det(A). 

Definition: Let Sn be the permutation group on {1, . . . , n}. As mentioned earlier, any permutation

σ ∈ Sn can be written as a finite product of transpositions, σ = β1 ··· βk (recall that a transposition is a permutation that interchanges just two entries while keeping the others fixed). We say σ is an even or odd permutation according to whether k is even or odd. It can be proved that if

σ = δ1 ··· δm is another way of writing σ as a product of transpositions, then m is even iff k is even so that our definition of even and odd permutations is meaningful (see some textbook to learn LINEAR ALGEBRA 55 more about permutations). The sign of σ, denoted as sign(σ), is defined as 1 if σ is even, and −1 k if σ is odd. That is, sign(σ) = (−1) if σ = β1 ··· βk is a product of transpositions.

Exercise-40: Let F be a field, det : M(n, F ) → F be a determinant function, and A ∈ M(n, F ).

If Aσ is the matrix obtained by permuting the rows of A according to σ ∈ Sn, then det(Aσ) = sign(σ)det(A). In particular, det(Iσ) = sign(σ). [Hint: σ is a product of transpositions.]

[147] [Existence and uniqueness of the determinant] Let F be a field and det : M(n, F ) → F be an arbitrary function. Then the following are equivalent: ∑ (i) det(A) = sign(σ)a ··· a for every A = [a ] ∈ M(n, F ). σ∈Sn 1σ(1) nσ(n) ij (ii) det is a determinant function.

Proof. (i) ⇒ (ii): We will use [146] to show det is a determinant function. Easy to see det(I) = 1 since a1σ(1) ··· anσ(n) = 0 if σ ∈ Sn is not the identity permutation. Each term a1σ(1) ··· anσ(n) appearing in the expression for det is multilinear, and hence det is multilinear. Now suppose the ith and kth rows of A ∈ M(n, F ) are equal. Let β ∈ Sn be the transposition interchanging i and ∑ k. Since the map σ 7→ σβ is a bijection of S , we have det(A) = sign(σβ)a ··· a . n σ∈Sn 1σβ(1) nσβ(n)

But alσβ(l) = alσ(l) for l∈ / {i, k}. Also, akσβ(k) = akσ(i) = aiσ(i) and aiσβ(i) = aiσ(k) = akσ(k) since the ith and kth rows of A are equal. Hence a1σβ(1) ··· anσβ(n) = a1σ(1) ··· anσ(n). Moreover, sign(σβ) = −sign(σ). Thus det(A) = −det(A), and therefore det(A) = 0, proving [146](ii). ∑ (ii) ⇒ (i): Let detf (A) = sign(σ)a ··· a , which is a determinant function by the σ∈Sn 1σ(1) nσ(n) above paragraph. By the uniqueness proved in [144](vi), we have det = detf , completing the proof. We also give another argument to clarify why the particular expression in (i) arises. Let det : → ∈ M(n, F ) F be a determinant function. Suppose B M(n, F ) has rows eq1 , . . . , eqn . If qi = qk for ̸ ··· some i = k, then det(B) = det(eq1 , , eqn ) = 0 by [146](i). If qi’s are distinct, then we may write qi = σ(i) for some σ ∈ Sn. Then by Exercise-40, det(B) = det(eσ(1), ··· , eσ(n)) = det(Iσ) = sign(σ). ∑ Now for a general A = [a ] ∈ M(n, F ) with rows u , . . . , u , writing u = n a e and using ij 1 n i qi=1 iqi qi multilinearity, we can derive the expression in (i) for det(A). 

Remark: Now onwards, we write det(A) for the determinant of a square matrix A over a field. Note that in the product a1σ(1) ··· anσ(n), there is one entry from each row and similarly, one entry from each column. The following may be verified using the expression for the determinant from [147]. a11 a12 If A = [a]1×1, then det(A) = a. If A = , then det(A) = a11a22 − a12a21. a21 a22 If A = [aij]3×3, then det(A) = a11(a22a33 − a23a32) − a12(a21a33 − a23a31) + a13(a12a32 − a22a31). 56 T.K.SUBRAHMONIAN MOOTHATHU

Remark: Let F be a field. Then det : M(n, F ) → F is not linear, and in particular det(A + B) ≠ det(A) + det(B) in general. What we have is, det is multilinear when considered as a function of n variables, the variables being the rows of the matrix. Now, if det is considered as a function of n2 variables with the n2 entries of the matrix as variables, then det is a homogeneous polynomial of degree n over F . Assume F = R or C, and identify M(n, F ) with F n2 . Then, being a polynomial, det is continuous. This implies that {A ∈ M(n, F ): A is invertible} is open in M(n, F ), being the pre-image of the open set F \{0} under the continuous map det. This means that if A = [aij] ∈

M(n, F ) is invertible and if δ > 0 is sufficiently small, then for any B = [bij] ∈ M(n, F ) with

|aij − bij| < δ for every i, j, we have that B is invertible. For example, given A ∈ M(n, F ), there is δ > 0 such that I + cA is invertible for every c ∈ F with |c| < δ. It can also be shown (using, say [148](ii) below) that {A ∈ M(n, F ): A is invertible} is dense in M(n, F ), which means the entries of any B ∈ M(n, F ) can be perturbed slightly to get an invertible matrix A.

Exercise-41: Let A, B ∈ M(n, R). Then A is conjugate to B over R ⇔ A is conjugate to B over C.[Hint: ⇐: Suppose C = P + iQ ∈ M(n, C) is invertible and C−1AC = B, or AC = CB. Then AP = PB and AQ = QB by comparing real and imaginary parts. Hence A(P + zQ) = (P + zQ)B for every z ∈ C. The polynomial p(z) := det(P + zQ) has degree ≥ 1 since p(i) ≠ 0. So there is d ∈ R such that p(d) ≠ 0. Then D := P + dQ ∈ M(n, R) is invertible and AD = DB.]

Exercise-42: (i) det(At) = det(A) over any field. (ii) det(A∗) = det(A), det(A∗A) ≥ 0, and ∑ det(AA∗) ≥ 0 over R and C.[Hint: (i) det(At) = sign(σ)a ··· a . Now a = σ∈Sn σ(1)1 σ(n)n σ(i)i −1 aσ(i)σ−1(σ(i)) so that aσ(1)1 ··· aσ(n)n = a1σ−1(1) ··· anσ−1(n). Also σ 7→ σ is a bijection of Sn.]

Remark: We know from [143] how the elementary row operations affect the determinant. Thanks to Exercise-42(i), now we also know how the elementary column operations affect the determinant. For instance, we can see that det(A) = 0 if two columns of A are the same.

Example: det may be used to check linear independence in certain cases. Let F be a field. Then n v1, . . . , vn ∈ F are linearly independent iff the matrix A ∈ M(n, F ) whose rows (or columns) 3 are v1, . . . , vn, is invertible. Consider u = (1, 2, −4), v = (4, 1, 5), w = (3, 2, 0) in R . Note that none of u, v, w is a scalar multiple of another. Does it imply the vectors are linearly independent? − 1 2 4     × − − × − − × − Definitely not. Let A = 4 1 5 . We have det(A) = 1 (0 10) 2 (0 15)+( 4) (8 3) = 3 2 0 −10 + 30 − 20 = 0 so that A is not invertible by [144](ii). Hence u, v, w are linearly dependent!

Example: [] Let F = R or C. The Gram matrix G ∈ M(n, F ) of vectors v1, . . . , vn ∈ n F is defined as G = [⟨vi, vj⟩]. This matrix turns up in certain applications. If A ∈ M(n, F ) has rows LINEAR ALGEBRA 57

∗ ∗ v1, . . . , vn, then G = AA and hence det(G) ≥ 0 by Exercise-42(ii). We have det(G) ≠ 0 ⇔ G = AA is invertible ⇔ v1, . . . , vn (the rows of A) are linearly independent, by [130](vi). A direct argument is n also possible for this. If det(G) = 0, then G is not invertible. Let w = (c1, . . . , cn) ∈ F \{0} be with ∑ ∑ ∑ ∑ ∑ ∥ n ∥2 ⟨ n n ⟩ n n ⟨ ⟩ ⟨ ⟩ Gw = 0. Now i=1 civi = i=1 civi, j=1 cjvj = j=1( i=1 ci vi, vj )cj = Gw, w = 0, showing v1, . . . , vn are linearly dependent. Conversely if v1, . . . , vn are linearly dependent, there ∑ n n is w = (c1, . . . , cn) ∈ F \{0} with civi = 0. Then Gw = 0 since the jth entry of Gw is ∑ i=1 n ⟨ civi, vj⟩. So, G is not invertible and det(G) = 0. i=1   2 ··· n−1 1 a1 a1 a1     2 ··· n−1 1 a2 a2 a2  Example: [] When F be a field, A =   ∈ M(n, F ) ......  . . . . .  2 ··· n−1 1 an an an is called a Vandermonde matrix if a1, . . . , an ∈ F are distinct. Recall the appearance of such matrices in our discussion on least square polynomials. Using induction on n, we are going to ∏ − ̸ show det(A) = 1≤i

R2 − R1,R3 − R1,...,Rn − R1,Cn − a1Cn−1,...,C3 − a1C2,C2 − a1C1 in that order on A, we see 1 0 ∏ ·   − ∈ − det(A) = c det where c = 1

The determinant can also be expressed in terms of cofactors and minors. We indicate the main points below. The student may read the proofs and other details from some textbook (for instance, Chapter 3 of Linear Algebra by H.E.Rose).

Definition: Let F be a field and A ∈ M(n, F ). For any i ∈ {1, . . . , n}, we may write det(A) = ∑ n ∈ j=1 aijcij by grouping together the terms involving aij in the expression for det. Here cij F is called the ijth cofactor of A. The (n − 1) × (n − 1) matrix obtained by deleting the ith row and b jth column of A is called the ijth of A and is denoted as Aij. b [148] Let F be a field and A ∈ M(n, F ). Let cij and Aij be as above. Then, i+j b (i) cij = (−1) det(Aij). ∑ n i+j b (ii) [Expansion along the ith row] If 1 ≤ i ≤ n, then det(A) = (−1) aijdet(Aij). j=1∑ ≤ ≤ n − i+j b (iii) [Expansion along the jth column] If 1 j n, then det(A) = i=1( 1) aijdet(Aij). −1 t t −1 −1 t (iv) [Expression for A ] We have A[cij] = [cij] A = det(A)I so that A = det(A) [cij] .

Proof. Left as a reading assignment.  58 T.K.SUBRAHMONIAN MOOTHATHU

Seminar topic: Cramer’s rule (the importance of this rule in solving linear equations is only theo- retical since calculating the determinant is practically difficult for large matrices).   AB Exercise-43: Let F be a field and Q ∈ M(n, F ). Suppose Q =  . CD (i) If A, D are square matrices and B (or C) is 0, then det(Q) = det(A)det(D). (ii) If B,C are square matrices and A (or D) is 0, then det(Q) = −det(B)det(C). (iii) Find an example where A, B, C, D are square, but det(Q) ≠ det(A)det(D) − det(B)det(C).

[Hint: Let Q = [qij]. If σ ∈ Sn, we know q1σ(1) ··· qnσ(n) has exactly one element from each row of

Q, and similarly exactly one element from each column of Q. (i) Deduce that q1σ(1) ··· qnσ(n) has an element from B iff q1σ(1) ··· qnσ(n) has an element from C. (iii) Elements from both A and C may appear in q1σ(1) ··· qnσ(n) for some σ, but this is not accounted for in det(A)det(D)−det(B)det(C).]

17. Minimal and characteristic polynomials

Two polynomials, called the minimal and characteristic polynomial, can be associated to a square matrix A. Analyzing these polynomials gives valuable information about A, and about the linear operator specified by A. We will need a few facts about the polynomial ring F [x]. The ring F [x] behaves in many respects like the ring Z. So the student may first learn the proofs of various properties of the ring Z to get a better understanding of the proofs about F [x].

A monic polynomial is a polynomial whose leading coefficient is 1.

[149] Let F be a field and let F [x] denote the commutative ring of all polynomials over F . Then F [x] is a principal ideal domain. This means the following. Suppose J ⊂ F [x] is a proper ideal, i.e., a proper subset such that J + J ⊂ J and F [x]J = {fg : f ∈ F [x], g ∈ J} ⊂ J. Then there is a monic polynomial g ∈ J generating J, i.e., J = {fg : f ∈ F [x]}.

Proof. If J contains a non-zero constant polynomial, say c, then the constant polynomial 1 = c−1c ∈ F [x]J ⊂ J. Hence f = f · 1 ∈ J for every f ∈ F [x], or J = F [x], a contradiction. Hence J does not contain non-zero constant polynomials. Let g be a monic polynomial in J of smallest degree. If p ∈ J, then by division, write p = fg + r, where f, r ∈ F [x] and 0 ≤ deg(r) < deg(g). Then r = p − fg ∈ J since J is an ideal. By the choice of g, we must have deg(r) = 0. By the first paragraph, r ≡ 0, and thus p = fg. 

Exercise-44: Let F be a field and A ∈ M(n, F ). Then J := {p ∈ F [x]: p[A] = 0 ∈ M(n, F )} is an ∑ ∑ { } m j m j ideal of F [x] different from 0 and F [x], where p[A] = j=0 cjA if p(x) = j=0 cjx . [Hint: To see J ≠ {0}, note dim(M(n, F )) = n2 so that I, A, A2,...,An2 are linearly dependent.] LINEAR ALGEBRA 59

Definition: Let F be a field and A ∈ M(n, F ). By Exercise-44 and [149], there is a monic polynomial g ∈ F [x] generating the ideal J = {p ∈ F [x]: p[A] = 0}. Clearly g is unique. We call g the minimal polynomial of A. In other words, the minimal polynomial of A is the unique monic polynomial g of minimum degree over F for which g[A] = 0. The polynomial h(x) := det(xI − A) is called the characteristic polynomial of A.   0 1 Example: (i) Consider A =   over C. The characteristic polynomial of A is det(xI −A) = x2. 0 0 Since A2 = 0, the polynomial x2 kills A. Hence x2 belongs to J = {p ∈ C[x]: p[A] = 0}. Therefore, the minimal polynomial of A should divide x2. As A ≠ 0, the polynomial x does not kill A. We conclude that the minimal polynomial of A is also x2. Even though C is algebraically closed, the minimal polynomial does not split into distinct linear factors. 0 0 0     C − 2 − (ii) Consider A = 0 0 0 over . The characteristic polynomial of A is det(xI A) = x (x 1). 0 0 1 Since A2 = A, the polynomial x2 − x = x(x − 1) kills A. Therefore, the minimal polynomial of A should divide x(x − 1). Since neither x nor x − 1 kills A, x(x − 1) is the minimal polynomial of A. The minimal and characteristic polynomials of A are different but they have the same roots.

[150] [Eigenvalues are roots] Let F be a field, A ∈ M(n, F ) and let g, h respectively be the minimal and characteristic polynomials of A. Then, the following are equivalent for c ∈ F . (i) g(c) = 0. (ii) h(c) = 0. (iii) c is an eigenvalue of A over F .

Proof. (i) ⇒ (ii): If g(c) = 0, then we have g(x) = (x − c)p(x) over F , and p[A] ≠ 0 by the minimality of g. Let v ∈ F n be with p[A]v ≠ 0. From 0 = g[A]v = (A − cI)p[A]v, we conclude A − cI is not invertible. Hence h(c) = det(cI − A) = 0 by [144](ii).

(ii) ⇒ (iii): h(c) = det(cI − A) = 0 ⇒ cI − A is not invertible ⇒ c is an eigenvalue of A.

(iii) ⇒ (i): Let Av = cv, v ≠ 0. Then g[A]v = g(c)v. Hence g(c) = 0 as g[A] = 0 and v ≠ 0. 

Remark: (i) From the last line of the above proof we note that if c is an eigenvalue of A ∈ M(n, F ) and g is any polynomial over F , then g(c) is an eigenvalue of g[A]. (ii) Let F be an algebraically closed field and suppose 0 is the only eigenvalue of A ∈ M(n, F ). Then the minimal polynomial of A must be xk for some k ∈ N by [150], and hence A is nilpotent. 60 T.K.SUBRAHMONIAN MOOTHATHU

Exercise-45: Let F be a field. Let gA and hA denote respectively the minimal and characteristic polynomials of A ∈ M(n, F ). Then,

(i) gAt = gA and hAt = hA. When F = R or C, we also have gA∗ = gA and hA∗ = hA. −1 n −1 −k (ii) When A is invertible, hA−1 (x) = (det(A)) (−x) hA(1/x), and gA−1 (x) = gA(0) x gA(x), where k = deg(gA). In particular, deg(gA−1 ) = deg(gA). [Hint: (i) (f[A])t = f[At] and (f[A])∗ = f[A∗] for any polynomial f. Use the property det(Bt) = −1 n −1 det(B) to get hAt = hA. (ii) det(A)hA−1 (x) = det(A)det(xI − A ) = (−x) det(I − x A) = n k k−1 (−x) hA(1/x). Suppose gA(x) = x + ck−1x + ··· + c1x + c0. Note by [150] that c0 = gA(0) ≠ 0 −1 −k −1 −k as 0 is not an eigenvalue of the invertible matrix A. Now, 0 = c0 A 0 = c0 A gA[A] = −1 −1 −1 ··· −1 −(k−1) −k −1 c0 I + c0 ck−1A + + c0 c1A + A . Since the roles of A and A may be interchanged, k −1 k−1 ··· −1 −1 −1 −k deduce that gA−1 (x) = x + c0 c1x + + c0 ck−1x + c0 = gA(0) x gA(x).]

Exercise-46: Let F be a field, A, C ∈ M(n, F ) and assume C is invertible. Then, (i) The minimal polynomials of A and C−1AC are the same. (ii) The characteristic polynomials of A and C−1AC are the same. ∑ ∑ −1 k j k −1 j [Hint: (i) C ( j=1 bjA )C = j=1 bj(C AC) . (ii) det(xI − C−1AC) = det(C−1xIC − C−1AC) = det(C−1(xI − A)C).]

Definition: Let V be a finite dimensional vector space, T ∈ L(V,V ), and A ∈ M(n, F ) be the matrix of T w.r.to some basis of V . We define the minimal/characteristic polynomial of T as that of A. This is meaningful in view of Exercise-45 since a change of basis produces conjugate matrices.

⊂ ∈ n ⇔ Exercise-47: Let F1 F2 be fields. Then, (i) v1, . . . , vm F1 are linearly independent over F1 n v1, . . . , vm (thought of as members of F2 ) are linearly independent over F2.

(ii) Let f, g ∈ F1[x]. Then f divides g in F1[x] ⇔ f divides g in F2[x]. ∑ m [Hint: (i) ⇒: Let civi = 0, ci ∈ F2. Let {dj : j ∈ J} be a basis of F2 over F1. There is a finite i=1 ∑ ∑ ∑ ∑ m m set J0 ⊂ J and aij ∈ F2 such that ci = ∈ aijdj. Hence civi = ∈ dj( aijvi) = ∑j J0 ∑ i=1 j J0 i=1 n m 0 ∈ F . Considering the kth coordinate, ∈ ( aijvi)kdj = 0 ∈ F2 for 1 ≤ k ≤ n, implying ∑ 2 j J0 i=1 m ∈ i=1 aijvi = 0 for each j J0 as dj’s are linearly independent. So aij = 0, and hence ci = 0.

(ii) ⇐: Let p, r ∈ F1[x] and q ∈ F2[x] be such that qf1 = f2 = pf1 + r, where deg(r) < deg(f1).

Then r = (q − p)f1, implying p = q and r = 0.]

[151] Let F1 ⊂ F2 be fields and A ∈ M(n, F1). Let gi and hi respectively be the minimal and characteristic polynomials of A over Fi for i = 1, 2. Then, g1 = g2 and h1 = h2.

− n2 Proof. We have h2(x) = det(xI A) = h1(x) by definition. Now identify M(n, Fi) with Fi for k i = 1, 2, and let k = deg(g1). Then k ∈ N is the smallest such that {I, A, . . . , A } is linearly LINEAR ALGEBRA 61

k dependent over F1. By Exercise-47(i), k ∈ N is the smallest with {I, A, . . . , A } linearly dependent over F2, and hence deg(g2) = k = deg(g1). Since g1, g2 are monic, we see g1 − g2 is a polynomial of degree < k over F2 with (g1 − g2)[A] = 0. Hence g1 − g2 = 0 by the minimality of g2. 

Remark: If A is a square matrix with integer entries, then we can deduce by [151] that the minimal polynomial of A over C has rational coefficients since the smallest subfield of C contining Z is Q.

If A ∈ M(n, F ), then we know that the degree of the minimal polynomial of A is at most n2 = dim(L(F n,F n)). This upper bound can be improved to n:

[152] [Cayley-Hamilton theorem] Let F be a field and A ∈ M(n, F ). If h is the characteristic polynomial of A, then h[A] = 0 ∈ M(n, F ); equivalently, the minimal polynomial g of A divides h.

Proof. It is a fact from Field Theory that every field is contained in some algebraically closed field. In view of [151] and Exercise-47(ii), we may therefore assume F itself is algebraically closed.

First suppose A is upper-triangular with diagonal entries c1, . . . , cn ∈ F . Then h(x) = det(xI −

A) = (x − c1) ··· (x − cn). To show h[A] = 0, it suffices to show h[A]ej = 0 for every j. Since the factors A − cjI of h[A] commute with each other it suffices to prove the following claim.

Claim:(A − c1I) ··· (A − cjI)ej = 0 for 1 ≤ j ≤ n. Clearly (A − c1I)e1 = 0. For j > 1,

(A − cjI)ej ∈ span{e1, . . . , ej−1} and hence the claim holds by induction.

In the general case, there is an invertible C ∈ M(n, F ) such that C−1AC is upper-triangular by [125] since F is algebraically closed. Now by Exercise-46 and the first case, h[C−1AC] = 0. But h[C−1AC] = C−1h[A]C which implies h[A] = 0 as C is invertible. 

Assignment: Collect other proofs of Cayley-Hamilton theorem from the literature.

18. Primary decomposition theorem

Over an algebraically closed field, every square matrix is conjugate to a triangular matrix by [125]. Over an arbitrary field, which matrices are conjugate to triangular matrices ? This can be answered in terms of the minimal polynomial. The crucial step is the Primary decomposition theorem. This theorem is also a stepping stone to our treatment of the Jordan canonical form later.

[153] [Primary decomposition theorem - general version] Let V be a finite dimensional vector over a field F , and let g be the minimal polynomial of T ∈ L(V,V ). Suppose g = g1 ··· gk, where gj’s are relatively prime, monic polynomials over F (we do not assume gj’s are irreducible). ⊕ ⊕ (i) If we put Vj = ker(gj[T ]), then V = V1 ··· Vk and T (Vj) ⊂ Vj for 1 ≤ j ≤ k. | ≤ ≤ (ii) The minimal polynomial of T Vj is gj for 1 j k. 62 T.K.SUBRAHMONIAN MOOTHATHU

Proof. It suffices to consider the case where g = g1g2, for then we may argue inductively. (i) Keep in mind throughout that polynomial expressions in T commute with each other. In particular, this implies T (Vj) ⊂ Vj since T commutes with gj[T ]. As g1, g2 are relatively prime, there exist f1, f2 ∈ F [x] such that 1 = g1f1 + g2f2 (as in the case of integers). Hence any v ∈ V can be written as v = I(v) = g1[T ]f1[T ](v) + g2[T ]f2[T ](v) = v2 + v1, say (note the reversal of suffixes). Now g1[T ](v1) = g1[T ]g2[T ]f2[T ](v) = g[T ]f2[T ](v) = 0 since g[T ] = 0, and therefore v1 ∈ V1. Similarly, v2 ∈ V2. This proves V = V1 + V2. If v ∈ V1 ∩ V2, then the monic generator gv of the ideal {f ∈ F [x]: f[T ](v) = 0} should divide both g1 and g2, implying gv = 1. So v = 0. e | | e (ii) Let gj be the minimal polynomial of T Vj . Since gj annihilates T Vj , we have gj divides gj.

Since (ge1ge2)[T ] = 0, we also have g divides ge1ge2. Therefore, we should have gej = gj for j = 1, 2. 

Exercise-48: Let T ∈ L(V,V ), where V is a finite dimensional vector space. If T is nilpotent, then the matrix of T is upper-triangular w.r.to some basis of V . Equivalently, every nilpotent matrix over an arbitrary field is conjugate to an upper-triangular matrix. [Hint: Use induction on n = dim(V ). Since T cannot be injective, T is not surjective. Let W ⊂ V be any vector subspace of dimension n − 1 containing range(T ). By induction assumption, there is a basis {v1, . . . , vn−1} of W w.t.to which the matrix of T |W is upper-triangular. Extend {v1, . . . , vn−1} to a basis of V .]

A general strategy: We mention briefly a general strategy of attack that we use here, and later for the Jordan canonical form. Suppose T ∈ L(V,V ), where V is a finite dimensional vector space.

Also suppose c1, . . . , ck are the distinct eigenvalues of T . Now, in order to prove some property for T , we follow a two-step procedure: ⊕ ⊕ Step-1 : Try to get a decomposition V = V1 ··· Vk into vector subspaces with T (Vj) ⊂ Vj so | that it will suffice to prove the required property for T Vj . − | Step-2 : Vj’s are chosen in such a way that (T cjI) Vj becomes nilpotent. A nilpotent operator | may be easy to deal with, and finally the required property for T Vj is easily deduced from that of − | − (T cjI) Vj since T = cjI + (T cjI) on Vj.

[154] [Necessary and sufficient condition to get triangular form] Let F be a field. Then, (i) A ∈ M(n, F ) is conjugate to an upper-triangular matrix over F iff the minimal polynomial g of A splits into (possibly repeated) linear factors over F . (ii) Let T ∈ L(V,V ), where V is a finite dimensional vector space over F . Then the matrix of T is upper-triangular w.r.to some basis of V iff the minimal polynomial g of T splits into (possibly repeated) linear factors over F . LINEAR ALGEBRA 63

Proof. We prove only (ii). Suppose the matrix A of T is upper-triangular w.r.to some basis of

V . If d1, . . . , dn are the diagonal entries of A, then the characteristic polynomial of T is h(x) = det(xI − A) = (x − d1) ··· (x − dn). By [152], g divides h, and hence g splits into linear factors.

p1 p Conversely, suppose g splits into linear factors, say g(x) = (x − c1) ··· (x − ck) k , where cj’s are p p distinct. Let gj(x) = (x − cj) j and Vj = ker(gj[T ]) = ker((T − cjI) j ) for 1 ≤ j ≤ k. Since gj’s ⊕ ⊕ are relatively prime, by [153] and induction we get that V = V1 ··· Vk, T (Vj) ⊂ Vj and the | minimal polynomial of T Vj is gj. Now it suffices to show Vj has a basis w.r.to which the matrix of | − T Vj is upper-triangular. Note that T = cjI + (T cjI) on Vj. By Exercise-48, there is a basis Uj − | of Vj w.r.to which the matrix Bj of the nilpotent operator (T cjI) Vj is upper-triangular. Hence |  the matrix of T Vj w.r.to Uj is cjI + Bj, which is upper-triangular.

A linear operator on a finite dimensional vector space may not have a basis consisting of eigen- 0 1 vectors, as seen by T ∈ L(C2, C2) given by the matrix  . However, when the underlying field 0 0 is algebraically closed, it will be shown that one can find a basis consisting of what is known as generalized eigenvectors. Let us make a few observations before defining a .

Exercise-49: Let F be a field and A ∈ M(n, F ). Then, (i) ker(Aj) ⊂ ker(Aj+1) and range(Aj) ⊃ range(Aj+1) for every j ∈ N. (ii) ker(An) = ker(An+j) and range(An) = range(An+j) for every j ∈ N. ⊕ (iii) F n = ker(An) range(An). (iv) If A is nilpotent, then An = 0. [Hint: (ii) At least two are equal in {0} ⊂ ker(A) ⊂ ker(A2) ⊂ · · · ⊂ ker(An) ⊂ F n by considering dimensions. Also, if ker(Ak) = ker(Ak+1), then ker(Ak+1) = ker(Ak+2). (iii) If u = Anv ∈ ker(An) ∩ range(An), then v ∈ ker(A2n) = ker(An) so that u = 0. Also use Nullity-rank theorem.]

Definition: Let F be a field and let c ∈ F be an eigenvalue of A ∈ M(n, F ). We say v ∈ F n is a generalized eigenvector of A corresponding to the eigenvalue c if (A − cI)kv = 0 for some k ∈ N; equivalently, if (A − cI)nv = 0 by Exercise-49. Similarly we may define generalized eigenvectors of linear operators.  Clearly any eigenvector is a generalized eigenvector, but the converse is not true. 0 1   2 If A = , then Ae2 = e1 ≠ 0 but A e2 = 0. Hence e2 is a generalized eigenvector of A but 0 0 not an eigenvector, corresponding to the eigenvalue c = 0.

Definition: Let F be a field and let c ∈ F be an eigenvalue of A ∈ M(n, F ). We define three numbers pertaining to the eigenvalue c (similar definition if A is replaced by an operator T ): (i) The stabilizing number of c is the smallest p ∈ N such that ker((A − cI)p) = ker((A − cI)p+1). 64 T.K.SUBRAHMONIAN MOOTHATHU

(ii) The eigendimension (or geometric multiplicity) of c is q := dim(ker(A − cI)), which is the dimension of the space of eigenvectors corresponding to the eigenvalue c. (iii) The generalized eigendimension (or algebraic multiplicity) of c is r := dim(ker((A − cI)n)), which is the dimension of the space of generalized eigenvectors corresponding to the eigenvalue c.

Remark: Clearly, eigendimension ≤ generalized eigendimension. We will see in [155] below that stabilizing number ≤ generalized eigendimension.  However, the eigendimension and stabilizing 0 1 number are not comparable. For A =   and c = 0, the eigendimension is 1 and stabilizing 0 0   3 0 number is 2; for A =   and c = 3, the eigendimension is 2 and stabilizing number is 1. 0 3 Exercise-50: [Generalized eigenvectors corresponding to distinct eigenvalues are linearly indepen- dent] Let V be a finite dimensional vector space over a field F , let T ∈ L(V,V ), and c1, . . . , ck ∈ F be n the distinct eigenvalues of T . If vj ∈ ker((T − cjI) ) \{0}, then v1, . . . , vk are linearly independent. n [Hint: Else, after a renaming assume v1 ∈ span{v2, . . . , vk}. Let g1(x) = (x − c1) and g2(x) = ∏ k − n j=2(x cj) . Then g1, g2 are relatively prime. Putting V1 = ker(g1[T ]) and V2 = ker(g2[T ]), argue as in the proof of [153] to show V1 ∩ V2 = {0}. But v1 ∈ V1 ∩ V2, a contradiction.]

It is convenient to state [155] below in terms of linear operators. The student may translate [155] into the language of matrices.

[155] [Primary decomposition theorem - specialized version] Let V be an n-dimensional vector space over an algebraically closed field F , let T ∈ L(V,V ), and let c1, . . . , ck ∈ F be the distinct

p1 p eigenvalues of T . Let g(x) = (x − c1) ··· (x − ck) k be the minimal polynomial of T and h(x) =

r1 r (x − c1) ··· (x − ck) k be the characteristic polynomial of T . Then, ⊕ ⊕ p (i) If Vj = ker((T − cjI) j ), then V = V1 ··· Vk, and T (Vj) ⊂ Vj for 1 ≤ j ≤ k. − | ≤ ≤ (ii) (T cjI) Vj is nilpotent for 1 j k. | ≤ ≤ (iii) The only eigenvalue of T Vj is cj for 1 j k. n (iv) pj is the stabilizing number of cj, and in particular Vj = ker((T − cjI) ) for 1 ≤ j ≤ k.

(v) rj is the generalized eigendimension of cj for 1 ≤ j ≤ k.

(vi) 1 ≤ pj ≤ rj for 1 ≤ j ≤ k.

Proof. Our assumption that g and h have the form as stated in the hypothesis is justified by [150].

| − pj − | We have (i) by [153]. Also, the minimal polynomial of T Vj is (x cj) by [153]. Hence (T cjI) Vj p is nilpotent and its minimal polynomial is x j , which says precisely that pj is the stabilizing number − | of the eigenvalue cj. This proves (ii) and (iv). Since 0 is the only eigenvalue of (T cjI) Vj , it also | follows that cj is the only eigenvalue of T Vj as stated in (iii). LINEAR ALGEBRA 65

(v) and (vi): If sj is the generalized eigendimension of cj, i.e., if sj = dim(Vj) = dim(ker((T −

n − | sj cjI) )), then characteristic polynomial of the nilpotent operator (T cjI) Vj is x . The observation det(xI − T ) = det((x − cj)I − (T − cjI)) yields that the characteristic polynomial of T |V is ⊕ ⊕ j s (x − cj) j . Since V = V1 ··· Vk, the characteristic polynomial of T must be the product of the

| | − s1 ··· − sk characteristic polynomials of T V1 ,...,T Vk . Hence h(x) = (x c1) (x ck) and thus sj = rj for 1 ≤ j ≤ k. We have pj ≤ rj since g must divide h by Cayley-Hamilton theorem. 

19. The Jordan canonical form (JCF)

− | Consider the statement of [155] and let Sj be the nilpotent operator (T cjI) Vj . Suppose the | matrix Bj of Sj has a simple form w.r.to some basis of Vj. Then the matrix of T Vj is Bj + cjI so that T has a block-diagonal matrix with the jth block being equal to Bj +cjI. So now the question is how simple can be the matrix of a nilpotent operator.

Definition: Let F be a field. The Jordan block Jm(c) ∈ M(m, F ) for m > 1 and eigenvalue c ∈ F is ··· c 1 0   ......  . . . . defined as Jm(c) =  . That is , each entry of the main diagonal is c, each entry of   0 . . . c 1 0 ··· 0 c the diagonal immediately above the main diagonal is 1, and all other entries of Jm(c) are 0. Also we define J1(c) = [c].

m m Example: If T ∈ L(C , C ) is the nilpotent operator given by T e1 = 0 and T ej = ej−1 for

1 < j ≤ n, then the matrix of T is Jm(0). This nilpotent operator will turn out to be a basic building block for a general nilpotent operator: we will show below that every nilpotent operator is built from finitely many such T .

If T p = 0 and T p−1 ≠ 0 for p ∈ N, we say T is nilpotent of index p.

[156] [JCF of a nilpotent operator] Let V be an n-dimensional vector space, let T ∈ L(V,V ) be nilpotent of index p ∈ N, and let q = null(T )) = eigendimension of the eigenvalue 0. Then, there ∑ ∈ ≥ ≥ · · · ≥ ≥ q are v1, . . . , vq V and integers p = m1 m2 mq 1 with i=1 mi = n such that m −1 m (i) {T i (vi) : 1 ≤ i ≤ q} is a basis for ker(T ); and in particular T i (vi) = 0 for 1 ≤ i ≤ q. ∪ q { k ≤ ≤ − } (ii) U := i=1 T (vi) : 0 k mi 1 is a basis for V .

(iii) The block-diagonal matrix diag(Jm1 (0),...,Jmq (0)) represents T w.r.to the above basis.

Proof. We use induction on n. The case n = 1 is trivial. Assume the result for values up to n − 1 and consider V with dim(V ) = n. The case T = 0 is easily managed. Now suppose T ≠ 0 so that p > 1. Since T p = 0 is not surjective, T is not surjective. Hence the inclusions {0} ⊂ T (V ) ⊂ V 66 T.K.SUBRAHMONIAN MOOTHATHU are proper. By the induction assumption applied to the nilpotent operator T |T (V ) of index p − 1 ′ ′ ′ and nullity s (say), we find w1, . . . , ws ∈ T (V ) and integers p − 1 = m ≥ m ≥ · · · ≥ m ≥ 1 with ∑ 1 2 k s ′ m′ −1 m = dim(T (V )) = n − q so that {T i (wi) : 1 ≤ i ≤ s} is a basis for ker(T |T (V )), and ∪ i=1 i s { k ≤ ≤ ′ − } i=1 T (wi) : 0 k mi 1 is a basis for T (V ).

′ Let vi ∈ V be so that T (vi) = wi and let mi = m + 1. Then, p = m1 ≥ m2 ≥ · · · ≥ ms ≥ 2, ∑ i s mi−1 mi = n − (q − s), and we have that {T (vi) : 1 ≤ i ≤ s} is a basis for ker(T |T (V )), and i=1 ∪ s { k ≤ ≤ − } ∈ T (U1) is a basis for T (V ), where U1 = i=1 T (vi) : 0 k mi 2 . Choose vs+1, . . . , vq ker(T )

m −1 ms−1 so that U2 := {T 1 (v1),...,T (vs), vs+1, . . . , vq} becomes a basis for ker(T ).

m −1 ms−1 We claim that U := {v1, T v1,...,T 1 (v1), . . . , vs,T (vs),...,T (vs), vs+1, . . . , vq} = U1∪U2 is linearly independent. Suppose a linear combination of the members of U is 0. Applying T and noting that T (U2) = {0}, we see that a linear combination of the members of T (U1) is 0, and this linear combination must be trivial since T (U1) is linearly independent, being a basis of T (V ). Hence the original linear combination of members of U reduces to a linear combination of the members of U2. This linear combination should also be trivial since U2 is linearly independent, being a basis of ker(T ). This proves our claim. Now, |U| = |U1| + |U2| = |T (U1)| + |U2| = rank(T ) + null(T ) = dim(V ) by Nullity-rank theorem, and thus U is a basis of V . Finally taking ms+1 = ··· = mq = 1 ∑ q  we see that i=1 mi = n, and U is in the required form specified by [156](ii).

[156′] [JCF of a nilpotent matrix] Let F be a field, A ∈ M(n, F ) be a nilpotent matrix of index p ∈ N (i.e., p ∈ N is the smallest with Ap = 0), and let q = null(A) = eigendimension of the eigenvalue

0. Then there exist an invertible matrix C ∈ M(n, F ) and integers p = m1 ≥ m2 ≥ · · · ≥ mq ≥ 1 ∑ q −1 with i=1 mi = n such that C AC = diag(Jm1 (0),...,Jmq (0)).

The numbers p and q appearing in the statement of [156′] are clearly invariant under conjugacy.

The Exercise below establishes that the integers m1, . . . , mq are also invariant under conjugacy so that the JCF of a nilpotent matrix becomes invariant under conjugacy.

s Exercise-51: Let F be a field. (i) Let Q = Jm(0) ∈ M(m, F ). Then Q ej = 0 for 1 ≤ j ≤ s and s s s Q ej = ej−s for j < s ≤ m. Hence null(Q ) = s for s ≤ m, and null(Q ) = m for s > m. ∑ s s |{ ≤ ≤ ≥ }| ∈ (ii) null(Q ) = k=1 1 i q : mi k if Q = diag(Jm1 (0),...,Jmq (0)) M(n, F ). Hence s mi’s are uniquely determined by the numbers null(Q ), s = 1, 2,.... (iii) Let A, B ∈ M(n, F ) be nilpotent. Then A is conjugate to B iff the Jordan canonical forms of A and B are the same. [Hint: Use (i) to prove (ii) by induction on s. Then deduce (iii) from (ii) by the help of [156′] after noting that the numbers null(As) are also invariant under conjugacy.] LINEAR ALGEBRA 67

′ Combining [155] and [156](or [156 ]), and noting that cI + Jm(0) = Jm(c), we obtain the Jordan canonical form of a general operator/matrix:

[157] [JCF - operator version] Let V be an n-dimensional vector space over an algebraically closed ∏ k p field F , and c1, . . . , ck be the distinct eigenvalues of T ∈ L(V,V ). Let (x − cj) j be the j=1 ∏ − k − rj minimal polynomial of T , let qj = null(T cjI) be the eigendimension of cj, and j=1(x cj) be the characteristic polynomial of T . Then, the matrix of T w.r.to some basis of V has the form diag(B1,...,Bk), where Bj ∈ M(rj,F ); and for each j, there are integers pj = m1j ≥ m2j ≥ · · · ≥ ∑ m ≥ 1 such that qj m = r and B = diag(J (c ), ··· ,J (c )). qj j i=1 ij j j m1j j mqj j j ′ [157 ] [JCF - matrix version] Let F be an algebraically closed field, and c1, . . . , ck be the distinct ∏ k p eigenvalues of A ∈ M(n, F ). Let (x − cj) j be the minimal polynomial of A, let qj = null(A − j=1∏ k − rj cjI) be the eigendimension of cj, and j=1(x cj) be the characteristic polynomial of A. Then, −1 there exists invertible C ∈ M(n, F ) such that C AC = diag(B1,...,Bk), where Bj ∈ M(rj,F ); ∑ ≥ ≥ · · · ≥ ≥ qj and for each j, there are integers pj = m1j m2j mqj j 1 such that i=1 mij = rj and B = diag(J (c ), ··· ,J (c )). j m1j j mqj j j Example: Let F be an algebraically closed field and A ∈ M(n, F ). For n ≤ 3, the minimal and characteristic polynomials of A uniquely determine the Jordan canonical canonical form of A:

n characteristic polynomial minimal polynomial JCF 1 (x − c) (x − c) [c]

2 (x − c1)(x − c2) (x − c1)(x − c2) diag(c1, c2) 2 2 2 (x − c) (x − c) J2(c) 2 (x − c)2 x − c diag(c, c)

3 (x − c1)(x − c2)(x − c3) (x − c1)(x − c2)(x − c3) diag(c1, c2, c3) 2 2 3 (x − c1) (x − c2) (x − c1) (x − c2) diag(J2(c1), c2) 2 3 (x − c1) (x − c2) (x − c1)(x − c2) diag(c1, c1, c2) 3 3 3 (x − c) (x − c) J3(c) 3 2 3 (x − c) (x − c) diag(J2(c), c) 3 (x − c)3 x − c diag(c, c, c) For n ≥ 4, it can happen that the Jordan canonical forms of A, B ∈ M(n, F ) are different even if A and B have the same set of minimal and characteristic polynomials. This is because the finite ′ sequence of numbers mij appearing in [157 ] can be different for A and B. ⊕ k Remark: [Nilpotent + diagonal] Consider [157]. If we put T1 := (T − cjIr )|V and T2 := ⊕ j=1 j j k | j=1(cjIrj ) Vj , then we see that T1 is nilpotent, T2 is diagonalizable, T = T1 +T2, and T1T2 = T2T1. 68 T.K.SUBRAHMONIAN MOOTHATHU

A disadvantage of JCF: The characteristic polynomial h(x) = det(xI −A) of a square matrix A is a continuous function of A in the sense that if we change the entries of A a little bit, then the change in h is also small. But the minimal polynomial g of a matrix A does not vary continuously w.r.to A. c 0 For example, if A =  , and c = 1, then g(x) = x; whereas if c ≠ 1, then g(x) = (x − c)(x − 1). 0 1 Consequently, the JCF of a matrix A does not vary continuously w.r.to A. Hence the minimal polynomial and the JCF are not suitable in applications involving numerical computations.

Advantages/uses of the JCF: (i) The JCF presents all the important data about a matrix - the list of eigenvalues, the three numbers (stabilizing number, eigendimension, generalized eigendimension) associated to each eigenvalue, and the minimal and characteristic polynomials - in a readable form.

(ii) Over an algebraically closed field F , the JCF is a complete invariant for conjugacy. This means the following: for A, B ∈ M(n, F ), we have that A is conjugate to B iff the JCF of A and B are ′ the same (up to a permutation of the blocks Bj appearing in the statement of [156 ]). This follows from [155] and Exercise-51. The fact that JCF is a complete invariant for conjugacy is all the more interesting since the minimal and the characteristic polynomial together do not form a complete conjugacy invariant. Also, the JCF over C is a complete conjugacy invariant for square matrices over R by Exercise-41 even though R is not algebraically closed.

(iii) Using JCF, it is easy to show any square matrix A over an algebraically closed field is conjugate to its transpose At; see Exercise-52 below.

(iv) Over an algebraically closed field, which matrices are diagonalizable? This can be completely answered with the help of JCF; see [158] below.

(v) When does the minimal polynomial coincide with the characteristic polynomial? This can also be answered with the help of JCF; see [159] below.

Exercise-52: Let F be an algebraically closed field and A ∈ M(n, F ). Then At is conjugate to A. t [Hint: In view of JCF, we may assume A = Jm(c). We get Jm(c) when we reverse the order of the −1 t basis. That is, if C is the matrix for ej 7→ em−j for every j, then C Jm(c)C = Jm(c) .]

[158] Let F be an algebraically closed field and A ∈ M(n, F ). Then the following are equivalent. (i) A is diagonalizable, i.e., ∃ invertible C ∈ M(n, F ) such that C−1AC is a diagonal matrix. ∏ k − (ii) The minimal polynomial of A splits into distinct linear factors as j=1(x cj), where cj’s are the distinct eigenvalues of A. Here, k can be strictly less than n. (iii) The stabilizing number is 1 for every eigenvalue of A. (iv) Any generalized eigenvector of A is an eigenvector of A. LINEAR ALGEBRA 69

Proof. We have (ii) ⇔ (iii) ⇔ (iv) by [155].

(i) ⇒ (ii): Let c1, . . . , ck be the distinct eigenvalues of A and let rj be the generalized eigendimension −1 of cj. Then the diagonal entries of the diagonal matrix D := C AC are cj’s, where each cj is repeated rj times overall (not necessarily in consecutive positions). Conjugating D with a permutation matrix, we may assume D = diag(c1Ir1 , . . . , ckIrk ). Since the JCF and minimal polynomial are invariant under conjugacy, it suffices to show that the minimal polynomial of D is

(x − c1) ··· (x − ck). Check this.

′ (ii) ⇒ (i): In [157 ], we have pj = 1 for every j, and hence mij = 1 for all i, j. This means that the Jordan canonical form of A is a diagonal matrix (which is conjugate to A by [157′]). 

Example: Let A ∈ M(n, C) and suppose Ak = I for some k ∈ N. Then the polynomial xk − 1 annihilates A. Since xk − 1 has distinct roots over C, and since the minimal polynomial g of A should divide xk −1, we conclude that g splits into distinct linear factors so that A is diagonalizable  1 1 by [158]. This is no longer true if we replace C with a finite field. We know A =   is not 0 1 diagonalizable over any field. But if A is considered as a member of M(2, Z ), then A2 = I.   2 0 4 Example: Consider A =   ∈ M(2, R). The characteristic polynomial of A is h(x) = x2 + 4. −1 0 The roots 2i, −2i of h are in C \ R. Hence A is not diagonalizable over R. Since the roots are distinct, the minimal polynomial of A is also h so that A is diagonalizable over C by [158]. We may also find the conjugating matrix by making use of the change of basis procedure. Note that (2, i) and (2i, 1) are eigenvectors  of A corresponding to the eigenvalues 2i and −2i respectively. 2 2i 2i 0 Therefore, if we take C =  , then C−1AC =   (verify). i 1 0 −2i

Remark: Diagonalizing helps to compute the powers of a matrix easily. If C−1AC = D = k k k k k −1 diag(d1, . . . dn), then D = diag(d1, . . . , dn) and therefore A = CD C can be computed quickly.

Exercise-53: Let V be a finite dimensional vector space over an infinite field F . Then it is not possible to write V as a finite union of proper vector subspaces. The assumption that F is infinite ∪ m is necessary. [Hint: Let V = Vj, v ∈ V1 and w ∈ V \ V1. Since F is infinite, there are c ≠ d in j=1 ∪ m F and 2 ≤ k ≤ m such that v + cw, v + dw ∈ Vk. Then w, and hence v, is in Vk. Thus V1 ⊂ Vj ∪ ∪ j=2 m { } so that V = j=2 Vj. Repeat the argument. If F is finite, then V is finite and V = v∈V span v .]

[159] Let F be an algebraically closed field and A ∈ M(n, F ). Then the following are equivalent. (i) The minimal and characteristic polynomials of A are the same. (ii) The eigendimension is 1 for each eigenvalue of A. 70 T.K.SUBRAHMONIAN MOOTHATHU

Moreover, if F is also infinite, then we have one more equivalent statement: (iii) There is v ∈ F n such that {v, Av, . . . , An−1v} is a basis for F n (such a vector v is called a cyclic vector).

Proof. Let g, h respectively be the minimal and characteristic polynomials of A. ∑ ⇔ ′ ≥ ≥ · · · ≥ ≥ qj (i) (ii): According to [157 ], we have pj = m1j m2j mqj j 1 and i=1 mij = rj.

Hence g = h ⇔ pj = rj for every j ⇔ qj = 1 for every j.

(iii) ⇒ (i): [This holds even if F is finite and even if F is not algebraically closed] By hypothesis, {v, Av, . . . , An−1v} is linearly independent. In other words, f[A]v ≠ 0 for any non-zero polynomial f with deg(f) ≤ n − 1. Since g[A]v = 0, we have deg(g) ≥ n. Since g divides h and since both g, h are monic, we get g = h.

n (i) ⇒ (iii): For v ∈ F , Jv := {f ∈ F [x]: f[A]v = 0} is an ideal with g ∈ Jv and hence there is monic fv ∈ F [x] generating Jv and dividing g. Since g has only finitely many monic factors, there ∪ n m can be only finitely many distinct Jv’s, say Jv1 ,...,Jvm . Let Vj = ker(fvj [A]). Then F = j=1 Vj ∈ n n since for each v F there is j with Jv = Jvj . Hence F = Vj = ker(fvj [A]) for some j by

Exercise-53. For this j, we have fvj [A] = 0. Since fvj is a monic factor of g, we conclude fvj = g. ̸ By assumption g = h, and hence deg(fvj ) = deg(h) = n. Therefore, f[A]vj = 0 for every non-zero n−1 polynomial f of degree ≤ n − 1. In other words, {vj, Avj,...,A vj} is linearly independent. 

Exercise-54: Let F be an algebraically closed field, A ∈ M(n, F ), and suppose that the minimal and characteristic polynomials of A are the same. If B ∈ M(n, F ) is such that AB = BA, then B = f[A] for some polynomial f over F .[Hint: By [159], there is v ∈ F n such that U := {v, Av, . . . , An−1v} is a basis for F n. Writing Bv as a linear combination of members of U, we get Bv = f[A]v for some polynomial f. Check that Bu = f[A]u for every u ∈ U.]

Remark: Let V be an n-dimensional vector space, and assume T ∈ L(V,V ) has a cyclic vector v ∈ V , i.e., U := {v, T (v),T 2(v),...,T n−1(v)} is a basis of V . Let the minimal (=characteristic) n n−1 n n−1 polynomial of T be g(x) = x +an−1x +···+a1x+a0. Then T = −(an−1T +···+a1T +a0I). ··· − 0 0 0 0 a0     ··· −  1 0 0 0 a1     ··· −  0 1 0 0 a2  Hence the matrix of T w.r.to the basis U is  . ......  . . . .     ··· −  0 0 1 0 an−2

0 ··· 0 0 1 −an−1 LINEAR ALGEBRA 71

Reading assignment: Cyclic decomposition theorem and Rational canonical form (see section 7.2 of Linear Algebra by Hoffman and Kunze). Roughly speaking, the cyclic decomposition theorem says the following: if V is a finite dimensional vector space and T ∈ L(V,V ), then we can write ⊕ ⊕ V = V1 ··· Vk, where each Vj is T -invariant and T |V has a cyclic vector. If Uj is the basis j ∪ k of Vj generated by a cyclic vector, then w.r.to the basis U = j=1 Uj of V , the matrix of T is a block-diagonal matrix diag(B1,...,Bk), where each Bj has the form as in the Remark above.

20. Trace of a square matrix ∑ → n Definition: Let F be a field. We define trace : M(n, F ) F as trace(A) = i=1 aii = sum of the diagonal entries of A, where A = [aij].

[160] Let F be a field. Then, (i) trace : M(n, F ) → F is linear. (ii) trace(AB) = trace(BA) for every A, B ∈ M(n, F ). ′ (ii ) More generally, trace(A1 ··· Am) = trace(Aσ(1) ··· Aσ(m)) for A1,...,Am ∈ M(n, F ) and any cyclic permutation σ ∈ Sn. However, trace(ABC) ≠ trace(ACB) in general. (iii) Conjugate matrices have the same trace.

(iv) Let F be algebraically closed and suppose h(x) = (x − c1) ··· (x − cn) is the characteristic polynomial of A ∈ M(n, F ), i.e., c1, . . . , cn ∈ F are the (not necessarily distinct) eigenvalues of A ∏ ∑ n n repeated according to generalized eigendimension. Then det(A) = cj and trace(A) = cj. ∏ ∑ j=1 j=1 k n k k n k More generally, det(A ) = j=1 cj and trace(A ) = j=1 cj . n n−1 (v) Let F be algebraically closed and let h(x) = x + an−1x + ··· + a0 be the characteristic n polynomial of A ∈ M(n, F ). Then trace(A) = −an−1 and det(A) = (−1) a0. In particular, the characteristic polynomial of A ∈ M(2,F ) is x2 − trace(A)x + det(A).

Proof. (i) This is clear. ∑ ∑ ∑ ∑ ∑ ∑ n n n n n n (ii) trace(AB) = i=1(AB)ii = i=1 j=1 aijbji = j=1 i=1 bjiaij = j=1(BA)jj = trace(BA).

′ (ii ) The  first assertion  follows by (ii) and induction,   use induction.  For the second, note that 0 1 1 1 1 0 0 0 0 1 1 0 1 1 1 1       =   whereas       =  . 0 1 0 0 1 0 0 0 0 1 1 0 0 0 1 1

(iii) If B = C−1AC, then trace(B) = trace(C−1AC) = trace(ACC−1) = trace(A).

(iv) By [125], A is conjugate to some upper-triangular matrix B ∈ M(n, F ). Since conjugate matrices have the same characteristic polynomial, A and B have the same list of eigenvalues when 72 T.K.SUBRAHMONIAN MOOTHATHU counted with generalized eigendimension. Moreover, det(A) = det(B) and trace(A) = trace(B). Hence it suffices to prove the result for B. But the case of B is clear since B is upper-triangular.

(v) This follows from (iv), and the relation between the roots and coefficients of a polynomial. 

Remark: (i) Let V is a finite dimensional vector space and T ∈ L(V,V ). In view of [160](iii), we may define trace(T ) := trace(A), where A is the matrix of T w.r.to any basis of V .

Example: [] This example illustrates the use of the expression for determinant from [160](iv). Perhaps this does not fit with this section, but anyway... ···  a1 a2 a3 an     ···   an a1 a2 an−1   ∈ C  ···  ∈ C If a1, . . . , an , then A = an−1 an a1 an−2 M(n, ) is called a circulant matrix.    . .. .   . . . 

a2 a3 ··· an a1 Note that the diagonal of A is constant. Let B ∈ M(n, C) be the special circulant matrix with 2πi/n rows e2, e3, . . . , en, e1 respectively. Then Be1 = en and Bej = ej−1 for 1 < j ≤ n. If c = e , j j 2j (n−1)j 2 n−1 then Bvj = c vj for vj = (1, c , c , . . . , c ). Hence 1, c, c , . . . , c are the eigenvalues of B. ∏ ∑ n−1 j n j−1 Therefore, det(B) = c ≠ 0. Letting f(x) = ajx , we see that A = f[B]. Hence the j=0 j=1 ∏ n−1 n−1 j eigenvalues of the circulant matrix A are f(1), f(c), . . . , f(c ); and det(A) = j=0 f(c ).

Example: Let A = [aij] be the of a finite graph G on the vertex set {v1, . . . , vn}, i.e., aij = 1 or 0 according to whether vi and vj are joined by an edge or not. By induction on k, k check that the ijth entry of A is the number of paths of length k from vi to vj in G. Taking into account possible repetitions, we see that trace(Ak)/k is an upper bound to the number of k-cycles in G. As A is a real , A is diagonalizable. If A is conjugate to diag(c1, . . . , cn), ∑ n k then by [160](iv) we have that ( j=1 cj )/k is an upper bound to the number of k-cycles in G.

Example: AB − BA = I has no solution in M(n, F ). To see this, apply trace to both sides.

Example: Let A ∈ M(n, C). If A is nilpotent, then we know 0 is the only eigenvalue of A, and hence det(A) = 0 = trace(A). Somewhat conversely, suppose trace(Ak) = 0 for 1 ≤ k ≤ n. If ∑ ∈ C n k c1, . . . , cn are the eigenvalues of A repeated according to multilpicity, then we have j=1 cj = 0 for 1 ≤ k ≤ n by [160](iv), and it is a property of complex numbers that this implies cj = 0 for every j. Thus 0 is the only eigenvalue of A. So A must be nilpotent by [150]. LINEAR ALGEBRA 73

21. Quadratic forms

n Definition: A function f : R → R is called a quadratic form if there is A = [aij] ∈ M(n, R) ∑ ∑ t n n n such that f(v) = ⟨Av, v⟩ = v Av = aijxixj for every v = (x1, . . . , xn) ∈ R . The i=1 ∑j=1 ∑ n n term quadratic comes because the expression i=1 j=1 aijxixj is a quadratic polynomial in the n variables x1, . . . , xn.

[161] Let f : Rn → R be a quadratic form. Then, (i) There exists a symmetric matrix A ∈ M(n, R) such that f(v) = vtAv for every v ∈ Rn.

(ii) There exist a diagonal matrix D = diag(d1, . . . , dn) ∈ M(n, R) and an orthonormal basis ∑ ∑ { } Rn n 2 n ∈ Rn v1, . . . , vn of such that f(v) = j=1 djyj for every v = j=1 yjvj . (iii) Suppose f is represented by the symmetric matrix A ∈ M(n, R). Then f is positive definite, i.e., f(v) > 0 for every v ∈ Rn \{0} ⇔ all eigenvalues of A are > 0. Similar statements are true with ≥ 0, < 0 and ≤ 0 in the place of > 0.

Proof. (i) Let B ∈ M(n, R) be so that f(v) = vtBv for every v ∈ Rn. Being a scalar, vtBv = (vtBv)t = vtBtv and hence f(v) = (1/2)[vtBv +vtBtv] = vtAv, where A = (B +Bt)/2. We remark that quadratic forms can also be considered over a general field, but the division by 2 in the last step cannot be performed if we consider a field of characteristic 2 in the place of R.

(ii) By part (i), there is symmetric A ∈ M(n, R) with f(v) = vtAv for every v ∈ Rn. By [141], there is an orthogonal matrix C ∈ M(n, R) (i.e., C−1 = Ct) such that D = CtAC is diagonal, say n D = diag(d1, . . . , dn). If vj is the jth column of C, then {v1, . . . , vn} is an orthonomal basis of R ∑ n n by [136] (and Avj = cjvj since AC = CD). Consider v = yjvj ∈ R . By , we ∑ j=1 ∑ t ⟨ n ⟩ t t t t t t n 2 have (C v)i = vi, j=1 yjvj = yi so that f(v) = v Av = v CDC v = (C v) D(C v) = j=1 djyj .

(iii) This follows from (ii) since A and D have the same list of eigenvalues (repeated according to ∑ n 2  generalized eigendimension), and from the expression f(v) = j=1 djyj in (ii).

Remark: Part (ii) of [161] is useful in analyzing conic sections. Consider the general equation 2 2 R2 b1x1 + 2b2x1x2 + b3x2 + b4x1 + b5x2 + b6 = 0 of a conic section in . Collecting the second degree 2 2 terms, we seef(x1, x2) := b1x1 + 2b2x1x2 + b3x2 is a quadratic form represented by the symmetric b1 b2 matrix A = . Let d1, d2 be the eigenvalues of A and assume d1 ≠ d2. Find unit vectors b2 b3 2 2 v1, v2 ∈ R with Av1 = d1v1 and Av2 = d2v2. Then {v1, v2} is an orthonormal basis for R , and if we take C = [cij] ∈ M(2, R) to be the matrix with columns v1, v2, then C is orthogonal and t 2 D := C AC = diag(d1, d2). Any v ∈ R can be expanded w.r.to the orthonormal basis {v1, v2} 2 2 ∈ R2 as v = y1v1 + y2v2, and then f(v) = d1y1 + d2y2. Consider v = x1e1 + x2e2 = y1v1 + y2v2 . 74 T.K.SUBRAHMONIAN MOOTHATHU

Since v1 = c11e1+ c21e2 and v2 = c12e1 + c22e2, we get x1 = c11y1 + c12y2 and x2 = c21y1 + c22y2.

y1 x1 In short, C   =  . So the equation of the conic section w.r.to the new orthonormal basis y2 x2 { } 2 2 v1, v2 becomes d1y1 + d2y2 + d3y1 + d4y2 + d5 = 0, where the advantage is that the term of y1y2 is absent. For instance, we may now decide by completing the square whether the conic section is a parabola, ellipse or a hyperbola; see p.231 of Apostol, Linear Algebra, for further discussions.

Remark: This is a comment about something that you are going to learn in future in Multivariable Calculus. Suppose a function ψ : Rn → R has continuous partial derivatives of second order. To decide whether a ∈ Rn is a local maximum or local minimum of ψ, one has to examine the quadratic 2 form v 7→ vtAv, where the ijth entry of A ∈ M(n, R) is ∂ ψ | . ∂xi∂xj a Exercise-55: Suppose a quadratic form f : R2 → R is represented by the symmetric matrix A = t [aij] ∈ M(2, R) as f(v) = v Av. Then,

(i) f(v) > 0 for every v ≠ 0 ⇔ a11 > 0 and det(A) > 0.

(ii) f(v) < 0 for every v ≠ 0 ⇔ a11 < 0 and det(A) > 0. (iii) f takes both positive and negative values ⇔ det(A) < 0. (iv) We cannot say anything if det(A) = 0. 2 2 −1 2 −1 2 [Hint: Note f(x1, x2) = a11x1 + 2a12x1x2 + a22x2 = a11(x1 + a11 a12x2) + a11 det(A)x2, assuming a11 ≠ 0. Also, this gives f(1, 0) = a11 and f(a12, −a11) = a11det(A).]

*****