Infosys Science Foundation Series in Mathematical Sciences

Ramji Lal Algebra 2 Linear Algebra, Galois Theory, Representation Theory, Group Extensions and Schur Multiplier Infosys Science Foundation Series

Infosys Science Foundation Series in Mathematical Sciences

Series editors , University of Michigan, USA Irene Fonseca, Mellon College of Science, USA

Editorial Board Chandrasekhar Khare, University of California, USA , Tata Institute of Fundamental Research, , , Indian Institute of Technology Kanpur, India S.R.S. Varadhan, Courant Institute of Mathematical Sciences, USA Weinan E, Princeton University, USA The Infosys Science Foundation Series in Mathematical Sciences is a sub-series of The Infosys Science Foundation Series. This sub-series focuses on high quality content in the domain of mathematical sciences and various disciplines of mathematics, statistics, bio-mathematics, financial mathematics, applied mathematics, operations research, applies statistics and computer science. All content published in the sub-series are written, edited, or vetted by the laureates or jury members of the . With the Series, Springer and the Infosys Science Foundation hope to provide readers with monographs, handbooks, professional books and textbooks of the highest academic quality on current topics in relevant disciplines. Literature in this sub-series will appeal to a wide audience of researchers, students, educators, and professionals across mathematics, applied mathematics, statistics and computer science disciplines.

More information about this series at http://www.springer.com/series/13817 Ramji Lal

Algebra 2 Linear Algebra, Galois Theory, Representation Theory, Group Extensions and Schur Multiplier

123 Ramji Lal Harish Chandra Research Institute (HRI) Allahabad, Uttar Pradesh India

ISSN 2363-6149 ISSN 2363-6157 (electronic) Infosys Science Foundation Series ISSN 2364-4036 ISSN 2364-4044 (electronic) Infosys Science Foundation Series in Mathematical Sciences ISBN 978-981-10-4255-3 ISBN 978-981-10-4256-0 (eBook) DOI 10.1007/978-981-10-4256-0

Library of Congress Control Number: 2017935547

© Springer Nature Singapore Pte Ltd. 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature The registered company is Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Dedicated to the memory of my mother (Late) Smt Murti Devi, my father (Late) Sri Sankatha Prasad Lal, and my father like brother (Late) Sri Gopal Lal Preface

Algebra has played a central and decisive role in all branches of mathematics and, in turn, in all branches of science and engineering. It is not possible for a lecturer to cover, physically in a classroom, the amount of algebra which a graduate student (irrespective of the branch of science, engineering, or mathematics in which he prefers to specialize) needs to master. In addition, there are a variety of students in a class. Some of them grasp the material very fast and do not need much of assis- tance. At the same time, there are serious students who can do equally well by putting a little more effort. They need some more illustrations and also more exercises to develop their skill and confidence in the subject by solving problems on their own. Again, it is not possible for a lecturer to do sufficiently many illustrations and exercises in the classroom for the aforesaid purpose. This is one of the con- siderations which prompted me to write a series of three volumes on the subject starting from the undergraduate level to the advance postgraduate level. Each volume is sufficiently rich with illustrations and examples together with numerous exercises. These volumes also cater for the need of the talented students with difficult, challenging, and motivating exercises which were responsible for the further developments in mathematics. Occasionally, the exercises demonstrating the applications in different disciplines are also included. The books may also act as a guide to teachers giving the courses. The researchers working in the field may also find it useful. The first volume consists of 11 chapters, which starts with language of mathe- matics (logic and set theory) and centers around the introduction to basic algebraic structures, viz., groups, rings, polynomial rings, and fields together with funda- mentals in arithmetic. This volume serves as a basic text for the first-year course in algebra at the undergraduate level. Since this is the first introduction to the abstract-algebraic structures, we proceed rather leisurely in this volume as com- pared with the other volumes. The present (second) volume contains 10 chapters which includes the funda- mentals of linear algebra, structure theory of fields and the Galois theory, repre- sentation theory of groups, and the theory of group extensions. It is needless to say that linear algebra is the most applicable branch of mathematics, and it is essential

vii viii Preface for students of any discipline to develop expertise in the same. As such, linear algebra is an integral part of the syllabus at the undergraduate level. Indeed, a very significant and essential part (Chaps. 1–5) of linear algebra covered in this volume does not require any background material from Volume 1 of the book except some amount of set theory. General linear algebra over rings, Galois theory, represen- tation theory of groups, and the theory of group extensions follow linear algebra, and indeed these are parts of the syllabus for the second- and the third-year students of most of the universities. As such, this volume together with the first volume may serve as a basic text for the first-, second-, and third-year courses in algebra. The third volume of the book contains 10 chapters, and it can act as a text for graduate and advance graduate students specializing in mathematics. This includes commutative algebra, basics in algebraic geometry, semi-simple Lie algebras, advance representation theory, and Chevalley groups. The table of contents gives an idea of the subject matter covered in the book. There is no prerequisite essential for the book except, occasionally, in some illustrations and exercises, some amount of calculus, geometry, or topology may be needed. An attempt to follow the logical ordering has been made throughout the book. My teacher (Late) Prof. B.L. Sharma, my colleague at the University of Allahabad, my friend Dr. H.S. Tripathi, my students Prof. R.P. Shukla, Prof. Shivdatt, Dr. Brajesh Kumar Sharma, Mr. Swapnil Srivastava, Dr. Akhilesh Yadav, Dr. Vivek Jain, Dr. Vipul Kakkar, and above all, the mathematics students of the University of Allahabad had always been the motivating force for me to write a book. Without their continuous insistence, it would have not come in the present form. I wish to express my warmest thanks to all of them. Harish-Chandra Research Institute (HRI), Allahabad, has always been a great source for me to learn more and more mathematics. I wish to express my deep sense of appreciation and thanks to HRI for providing me all infrastructural facilities to write these volumes. Last but not least, I wish to express my thanks to my wife Veena Srivastava who had always been helpful in this endeavor. In spite of all care, some mistakes and misprints might have crept in and escaped my attention. I shall be grateful to any such attention. Criticisms and suggestions for the improvement of the book will be appreciated and gratefully acknowledged.

Allahabad, India Ramji Lal April 2017 Contents

1 Vector Spaces ...... 1 1.1 Concept of a Field ...... 1 1.2 Concept of a Vector Space (Linear Space)...... 7 1.3 Subspaces...... 11 1.4 Basis and Dimension ...... 16 1.5 Direct Sum of Vector Spaces, Quotient of a Vector Space .... 23 2 Matrices and Linear Equations ...... 31 2.1 Matrices and Their Algebra ...... 31 2.2 Types of Matrices ...... 35 2.3 System of Linear Equations ...... 40 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity...... 43 2.5 LU Factorization ...... 58 2.6 Equivalence of Matrices, Normal Form ...... 60 2.7 Congruent Reduction of Symmetric Matrices...... 65 3 Linear Transformations ...... 73 3.1 Definition and Examples ...... 73 3.2 Isomorphism Theorems ...... 75 3.3 Space of Linear Transformations, Dual Spaces ...... 79 3.4 Rank and Nullity ...... 83 3.5 Matrix Representations of Linear Transformations ...... 85 3.6 Effect of Change of Bases on Matrix Representation ...... 88 4 Inner Product Spaces ...... 97 4.1 Definition, Examples, and Basic Properties ...... 97 4.2 Gram–Schmidt Process ...... 107 4.3 Orthogonal Projection, Shortest Distance ...... 112 4.4 Isometries and Rigid Motions ...... 120

ix x Contents

5 Determinants and Forms ...... 131 5.1 Determinant of a Matrix...... 131 5.2 Permutations ...... 135 5.3 Alternating Forms, Determinant of an Endomorphism ...... 139 5.4 Invariant Subspaces, Eigenvalues ...... 150 5.5 Spectral Theorem, and Orthogonal Reduction ...... 159 5.6 Bilinear and Quadratic Forms ...... 176 6 Canonical Forms, Jordan and Rational Forms...... 195 6.1 Concept of a Module over a Ring ...... 195 6.2 Modules over P.I.D ...... 203 6.3 Rational and Jordan Forms ...... 214 7 General Linear Algebra ...... 229 7.1 Noetherian Rings and Modules ...... 229 7.2 Free, Projective, and Injective Modules ...... 234 7.3 Tensor Product and Exterior Power ...... 250 7.4 Lower K-theory ...... 258 8 Field Theory, Galois Theory ...... 265 8.1 Field Extensions...... 265 8.2 Galois Extensions...... 275 8.3 Splitting Field, Normal Extensions...... 284 8.4 Separable Extensions ...... 294 8.5 Fundamental Theorem of Galois Theory ...... 305 8.6 Cyclotomic Extensions...... 311 8.7 Geometric Constructions ...... 318 8.8 Galois Theory of Equation...... 324 9 Representation Theory of Finite Groups...... 331 9.1 Semi-simple Rings and Modules ...... 331 9.2 Representations and Group Algebras ...... 346 9.3 Characters, Orthogonality Relations ...... 351 9.4 Induced Representations...... 361 10 Group Extensions and Schur Multiplier ...... 367 10.1 Schreier Group Extensions ...... 368 10.2 Obstructions and Extensions ...... 391 10.3 Central Extensions, Schur Multiplier ...... 398 10.4 Lower K-Theory Revisited...... 418 Bibliography ...... 427 Index ...... 429 About the Author

Ramji Lal is Adjunct Professor at the Harish-Chandra Research Institute (HRI), Allahabad, Uttar Pradesh. He started his research career at the Tata Institute of Fundamental Research (TIFR), Mumbai, and served at the University of Allahabad in different capacities for over 43 years: as a Professor, Head of the Department, and the Coordinator of the DSA Program. He was associated with HRI, where he initiated a postgraduate (PG) program in mathematics and coordinated the Nurture Program of National Board for Higher Mathematics (NBHM) from 1996 to 2000. After his retirement from the University of Allahabad, he was Advisor cum Adjunct Professor at the Indian Institute of Information Technology (IIIT), Allahabad, for over 3 years. His areas of interest include group theory, algebraic K-theory, and representation theory.

xi Notations from Algebra 1

hia Cyclic subgroup generated by a, p. 122 a/badivides b,p.57 a * bais an associate of b,p.57 At The transpose of a matrix A,p.200 AH The hermitian conjugate of a matrix A, p. 215 Aut(G) The automorphism group of G, p. 105 An The alternating group of degree n,p.175 Bðn; RÞ Borel subgroup, p. 187 CGðÞH The centralizer of H in G, p. 159 C The field of complex numbers, p. 78 Dn The dihedral group of order 2n,p.90 det Determinant map, p. 191 End(G) Semigroup of endomorphisms of G, p. 105 f(A) Image of A under the map f,p.34 f −1(B) Inverse image of B under the map f,p.34 f |Y Restriction of the map f to Y,p.30 ‚ Eij Transvections, p. 200 Fit(G) Fitting subgroup, p. 353 g.c.d. Greatest common divisor, p. 58 g.l.b. Greatest lower bound, or inf, p. 40 G=lHðG=rHÞ The set of left(right) cosets of G mod H,p.135 G/H The quotient group of G modulo H, p. 151 ½ŠG : H The index of H in G, p. 135 jjG Order of G, p. 331 G0 ¼ ½ŠG; G Commutator subgroup of G, p. 403 Gn nth term of the derived series of G, p. 345 GLðn; RÞ General linear group, p. 186 IX Identity map on X,p.30 iY Inclusion map from Y,p.30 Inn(G) The group of inner automorphisms, p. 407

xiii xiv Notations from Algebra 1 ker f The kernel of the map f,p.35 LnðÞG nth term of the lower central series of G, p. 281 l.c.m. Least common multiple, p. 58 l.u.b. Least upper bound, or sup, p. 40 Mn(R) The ring of n  n matrices with entries in R, p. 350 N Natural number system, p. 21 NGðÞH Normalizer of H in G, p. 159 O(n) Orthogonal group, p. 197 O(1, n) Lorentz orthogonal group, p. 201 PSO(1, n) Positive special Lorentz orthogonal group, p. 201 Q The field of rational numbers, p. 74 Q8 The quaternion group, p. 88 R The field of real numbers, p. 75 R(G) Radical of G, p. 346 Sn Symmetric group of degree n,p.88 Sym(X) Symmetric group on X,p.88 S3 The group of unit quaternions, p. 92 hiS Subgroup generated by a subset S, p. 116 SLðn; RÞ Special linear group, p. 196 SO(n) Special orthogonal group, p. 197 SO(1, n) Special Lorentz orthogonal group, p. 201 SPð2n; RÞ Symplectic group, p. 202 SU(n) Special unitary group, p. 202 U(n) Unitary group, p. 202 Um Group of prime residue classes modulo m, p. 100 V4 Kleins four group, p. 102 X/R The quotient set of X modulo R,p.36 Rx Equivalence class modulo R determined by x,p.27 X+ Successor of X,p.20 XY The set of maps from Y to X,p.34  Proper subset, p. 14 Q}ðXÞ Power set of X,p.19 n ;   k¼1 Gk Direct product of groups Gk 1 k n, p. 142 / Normal subgroup, p. 147 // Subnormal subgroup, p. 332 Z(G) Center of G, p. 108 Zm The ring of residue classes modulo m,p.256 p(n) The number of partition of n, p. 172  Hpffiffiffi K Semidirect product of H with K, p. 204 A Radical of an ideal A, p. 286 R(G) Semigroup ring of a ring R over a semigroup G, p. 238 R[X] Polynomial ring over the ring R in one variable, p. 240 R½X1; X2; ÁÁÁ; XnŠ Polynomial ring in several variables, p. 247 „ The Mobius function, p. 256 Notations from Algebra 1 xv

¾ Sum of divisor function, p. 256 a Legendre symbol, p. 280 p Stab(G, X) Stabilizer of an action of G on X, p. 295 Gx Isotropy subgroup of an action of G at x, p. 295 XG Fixed point set of an action of G on X, p. 296 Zn(G) nth term of the upper central series of G,p.351 ΦðGÞ The Frattini subgroup of G, p. 355 Notations from Algebra 2

2 ¾ B¾ðK; HÞ Group of 2 co-boundaries with given , p. 385 C(A) Column space of A,p.42 Ch(G, K) Set of characters from G to K, p. 278 Ch(G) Character ring of G, p. 350 dim(V) Dimension of V,p.18 EXT Category of Schreier group extensions, p. 368 E(H, K) The set of equivalence classes of extensions of H by K, p. 376 E1 ] E2 Baer sum of extensions, p. 388 EXTˆðH; KÞ Set of equivalence classes of extensions associated to abstract kernel ˆ, p. 384 E(V) Exterior algebra of V, p. 257 FACS Category of factor systems, p. 375 F(X) The fixed field of a set of automorphism of a field, p. 275 G(L/K) The Galois group of the field extension L of K, p. 275 G ^ G Non-abelian exterior square of a group G, p. 413 K Algebraic closure of K, p. 289 2 ¾ H¾ðK; HÞ Second cohomology with given , p. 385 K0ðÞR Grothendieck group of the ring R,p.257 K1ðÞR Whitehead group of the ring R, p. 260 L KS Separable closure of K in L,p.295 L/K Field extension L of K, p. 262 mT ðÞX Minimum polynomial of linear transformation T, p. 212 minK ðÞfi ðÞX Minimum polynomial of fi over the field K, p. 265 M(V) Group of rigid motion on V, p. 122 M R N Tensor product of R-modules M and N, p. 250 NL=K Norm map from L to K, p. 279 N(A) Null space of A,p.41 ObsðÞˆ Obstruction of the abstract kernel ˆ, p. 393 R(A) Row space of A,p.42 St(R) Steinberg group, p. 422

xvii xviii Notations from Algebra 2

SymrðVÞ rth symmetric power of V, p. 345 TL=K Trace map from L to K, p. 314 T(V) Tensor algebra of V, p. 257 TS Semi-simple part of T, p. 219 Tn Nilpotent part of T, p. 220 2ð ; Þ ¾ VZ¾ K H Group of 2 co-cycles with given , p. 385 r V rth exterior power of V, p. 255 ΨE Abstract kernel associated to the extension E, p. 377 ‰ È · Direct sum of representations ‰ and ·, p. 345 ‰ · Tensor product of representations ‰ and ·, p. 345 Symr‰ rth symmetric power of the representation ‰, p. 345 SFV (L/K) Set of all intermediary fields of L/K, p. 275 r ‰ rth exterior power of the representation ‰, p. 345 ´‰ Character afforded by the representation ‰, p. 350 ` ð Þ n X nth cyclotomic polynomial, p. 311 ` ð Þ A X Characteristic polynomial of A, p. 149 Chapter 1 Vector Spaces

This chapter is devoted to the structure theory of vector spaces over arbitrary fields. In essence, a vector space is a structure in which we can perform all basic operations of vector algebra, can talk of lines, planes, and linear equations. The basic motivating examples on which we shall dwell are the Euclidean 3-space R3 over R in which we live, the Minkowski Space R4 of events (in which the first three coordinates represent the place and the fourth coordinate represents the time of the occurrence of the event), and also the space of matrices.

1.1 Concept of a Field

Rings and fields have been introduced and studied in Algebra 1. However, to make the linear algebra part (Chaps. 1–5) of this volume independent of Algebra 1, we recall, quickly, the concept of a field and its basic properties. Field is an algebraic structure in which we can perform all arithmetical operations, viz., addition, subtraction, mul- tiplication, and division by nonzero members. The basic motivating examples are the structure Q of rational numbers, the structure R of real numbers, and the structure C of complex numbers with usual operations. The precise definition of a field is as follows: Definition 1.1.1 A Field is a triple (F, +, ·), where F is a set, + and · are two internal binary operations, called the addition and the multiplication on F, such that the following hold: 1. (F, +) is an abelian Group in the following sense: (i) The operation + is associative in the sense that (a + b) + c = a + (b + c) for all a, b, c ∈ F. (ii) The operation + is commutative in the sense that (a + b) = (b + a) for all a, b ∈ F.

© Springer Nature Singapore Pte Ltd. 2017 1 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_1 2 1 Vector Spaces

(iii) There is a unique element 0 ∈ F, called the zero of F, such that a + 0 = a = 0 + a for all a ∈ F. (iv) For all a ∈ F, there is a unique element −a ∈ F, called the negative of a, such that a + (−a) = 0 =−a + a. 2. (i) The operation · is associative in the sense that (a · b) · c = a · (b · c) for all a, b, c ∈ F. (ii) The operation · is commutative in the sense that (a · b) = (b · a) for all a, b ∈ F. 3. The operation · distributes over + in the sense that (i) a · (b + c) = a · b + a · c, and (ii) (a + b) · c = a · c + b · c for all a, b, c ∈ F. 4. (i) There is a unique element 1 ∈ F −{0}, called the one of F, such that 1 · a = a = a · 1 for all a ∈ F. (ii) For all a ∈ F −{0}, there is a unique element a−1 ∈ F, called the multiplicative inverse of a, such that a · a−1 = 1 = a−1 · a. Before having some examples, let us observe some simple facts: Proposition 1.1.2 Let (F, +, ·) be a field. (i) The cancellation law holds for the addition + in F in the sense that (a + b = a + c) implies b = c. In turn, (b + a = c + a) implies b = c. (ii) a · 0 = 0 = 0 · a for all a ∈ F. (iii) a · (−b) =−(a · b) = (−a) · b for all a, b ∈ F. (iv) The restricted cancellation for the multiplication in F holds in the sense that (a = 0 and a · b = a · c) implies b = c. In turn, (a = 0 and b · a = c · a) implies b = c. (v) (a · b = 0) implies that (a = 0 or b = 0).

Proof (i) Suppose that a + b = a + c. Then b = 0 + b = (−a + a) + b = −a + (a + b) =−a + (a + c) = (−a + a) + c = 0 + c = c. (ii) 0 + a · 0 = a · 0 = a · (0 + 0) = a · 0 + a · 0. Using the cancellation for +, we get that 0 = a · 0. Similarly, 0 = 0 · a. (iii) 0 = a · 0 = a · (b + (−b)) = a · b + a · (−b). It follows that a · (−b) = −(a · b). Similarly, the other part follows. (iv) Suppose that a = 0 and a · b = a · c. Then b = 1 · b = (a−1 · a) · b = a−1 · (a · b) = a−1 · (a · c) = (a−1 · a) · c = 1 · c = c. Similarly, the other part follows. (v) Suppose that (a · b = 0).Ifa = 0, there is nothing to do. Suppose that a = 0. Then a · b = 0 = a · 0. From (iv), it follows that b = 0. 

Integral Multiples and the Integral Powers of Elements of a Field Let a ∈ F. For each natural number n, we define the multiple na inductively as fol- lows: Define 1a = a. Assuming that na is defined, define (n + 1)a = na + a. 1.1 Concept of a Field 3

Thus, for a natural number n, na = a + a + ···+ a. We define 0a = 0. Further, ntimes if m =−n is a negative integer, then we define ma = n(−a). Thus, for a negative =− =− + (− ) + ···+ (− ) integer m n, ma  a a  a. This defines the integral multi- ntimes ple na for each integer n. Similarly, we define all integral powers of a nonzero element a of F as follows: Define a1 = a. Assuming that an has already been defined, define an+1 = an · a. This defines all positive integral powers of a. Define a0 = 1, and for negative integer n =−m, define an = (a−1)m. The following law of exponents follow immediately by the induction. (i) (n + m)a = na + ma for all n, m ∈ Z. (ii) (nm)a = n(ma) for all n, m ∈ Z. (iii) an+m = an · am for all a ∈ F −{0}, and n, m ∈ Z. (iv) anm = (an)m for all a ∈ F −{0}, and n, m ∈ Z. Examples of Fields Example 1.1.3 The rational number system Q, the real number system R, and the complex number system C with usual addition and multiplications are basic examples of a field. √ √ Example 1.1.4 Consider F = Q( 2) ={a + b 2 | a, b ∈ Q√}. The addition and multiplication√ in R induce the corresponding operations in Q( 2). We claim that Q( 2) is a field with respect to the induced operations. All the defining properties of a field are consequences of the corresponding√ properties in R except, perhaps, 4(ii) whichweverify.Leta, b ∈ Q such that a + b 2 = 0. We claim that a2 − 2b2 = 0. Suppose not. Then a2 − 2b2 = 0. In turn,√b = 0 (and so also a = 0), otherwise, ( a )2 = 2, a contradiction to the fact that 2 is not a rational number. Thus, then b √ √ √ 1√ = a−b 2 = a + −b 2isinQ( 2). a+b 2 a2−2b2 a2−2b2 a2−2b2 Remark 1.1.5 There is nothing special about 2 in the above example, indeed, we can take any prime, or for that matter any rational number in place of 2 which is not a square of a rational number.

So far all the examples of fields are infinite. Now, we give an example of a finite field. Let p be a positive prime integer. Consider the set Zp ={1, 2,...,p − 1} of residue classes modulo a prime p. Clearly, a = r, where r is the remainder obtained when a is divided by p. The usual addition ⊕ modulo p, and the multiplication  modulo p are given by i ⊕ j = i + j, i, j ∈ Z, and i  j = i · j, i, j ∈ Z 4 1 Vector Spaces

For example, in Z11, 6 ⊕ 7 = 13 = 2. Similarly, the product 6  7 = 42 = 9. We have the following proposition.

Proposition 1.1.6 For any prime p, the triple (Zp, ⊕,)introduced above is a field containing p elements.

Proof Clearly, 1 is the identity with respect to . We verify only the postulate 4(ii) in the definition of a field. The rest of the postulates are almost evident, and can be verified easily. In fact, we give an algorithm (using Euclidean Algorithm) to find the multiplicative inverse of a nonzero element i ∈ Zp.Leti ∈ Zp −{0}. Then p does not divide i. Since p is prime, the greatest common divisor of i and p is 1. Using the Euclidean algorithm, we can find integers b and c such that

1 = i · b + p · c.

Thus, 1 = i · b = i  b. It follows that b is the inverse of i with respect to . 

The above proof is algorithmic and gives an algorithm to find the multiplicative inverse of nonzero elements in Zp. Definition 1.1.7 Let (F, +, ·) be a field. A subset L of F is called a subfield of F if the following hold: (i) 0 ∈ L. (ii) If a, b ∈ L, then a + b ∈ L and a · b ∈ L. (iii) 1 ∈ L. (iv) For all a ∈ L, −a ∈ L. (v) For all a ∈ L −{0}, a−1 ∈ L. Thus, a subfield L of a field F is also a field at its own right with respect to the induced operations. The field F is a subfield of itself. This subfield is called the improper subfield of F. Other√ subfields are called proper subfields.ThesetQ of rational numbers, the set Q( 2) described in Example 1.1.4, are proper subfields of the field R of real numbers. The field R of real numbers is a subfield of the field C of complex numbers.

Proposition 1.1.8 The field Q of rational numbers, and the field Zp have no proper subfields.

Proof We first show that Q has no proper subfields. Let L be a subfield of Q. Then by the Definition 1.1.7(iii), 1 ∈ L. Again, by (ii), n = 1 + 1 + ··· +1 belongs to n 1 ∈ L for all natural numbers n. Thus, by (iv), all integers are in L.By(v), n L for m ∈ , ; = all nonzero integers n. By (ii), n L for all integers m n n 0. This shows that L = Q. 1.1 Concept of a Field 5

Next, let L be a subfield of Zp. Then by the Definition 1.1.7(iii), 1 ∈ L. By (ii), = ⊕ ⊕ ··· ⊕ ∈ Z = Z  i 1 1  1 belongs to L for all i p. This shows that L p. i We shall see that, essentially, these are the only fields which have no proper subfields. Such fields are called prime fields. Homomorphisms and Isomorphisms Between Fields

Definition 1.1.9 Let F1 and F2 be fields. A map f from F1 to F2 is called a fieldhomomorphism if the following conditions hold:

(i) f (a + b) = f (a) + f (b) for all a, b ∈ F1 (note that + in the LHS is the addition of F1, and that in RHS is the addition of F2). (ii) f (a · b) = f (a) · f (b) for all a, b ∈ F1 (again · in the LHS is the multiplication of F1, and that in RHS is the multiplication of F2). (iii) f (1) = 1, where 1 in the LHS denotes the multiplicative identity of F1, and 1 in RHS denotes the multiplicative identity of F2. A bijective homomorphism is called an isomorphism. A field F1 is said to be isomorphic a field F2 if there is an isomorphism from F1 to F2. We do not distinguish isomorphic fields.

Proposition 1.1.10 Let f be a homomorphism from a field F1 to a field F2. Then, the following hold.

(i) f (0) = 0, where 0in the LHS is the zero of F1, and 0in the RHS is the zero of F2. (ii) f (−a) =−f (a) for all a ∈ F1. (iii) f (na) = nf (a) for all a ∈ F1, and for all integer n. n n (iv) f (a ) = (f (a)) for all a ∈ F1 −{0}, and for all integer n. (v) f is injective, and the image of F1 under f is a subfield of F2 which is isomorphic to F1. Proof (i) 0 + f (0) = f (0) = f (0 + 0) = f (0) + f (0). Using cancellation law for addition in F2, we get that f (0) = 0. (ii) 0 = f (0) = f (a + (−a)) = f (a) + f (−a). This shows that f (−a) =−f (a). (iii) Suppose that n = 0. Then 0f (a) = 0 = f (0) = f (0a). Clearly, f (1a) = f (a) = 1f (a). Assume that f (na) = nf (a) for a natural number n. Then f (n + 1)a = f (na + a) = f (na) + f (a) = nf (a) + f (a) = (n + 1)f (a). By induction, it follows that f (na) = nf (a) for all a ∈ F1, and for all natural number n. Suppose that n =−m is a negative integer. Then, f (na) = f ((−m)a) = f (−(ma)) = −f (ma) =−(mf (a)) =−(m)f (a) = nf (a). (iv) Replacing na by an, imitate the proof of (iii). (v) Suppose that a = b. Then (a − b) = 0. Now, 1 = f (1) = f ((a − b)(a − b)−1) = f (a − b)f ((a − b)−1). Since 1 = 0, it follows that (f (a) − f (b)) = f (a − b) = 0. This shows that f (a) = f (b). Thus, f is injective, and it can be real- ized as a bijective map from F1 to f (F1). It is sufficient, therefore, to show that f (F1) is a subfield of F2. Clearly, 0 = f (0), and 1 = f (1) belong to f (F1).Let 6 1 Vector Spaces f (a), f (b) ∈ f (F1), where a, b ∈ F1. Then (f (a) + f (b)) = f (a + b) ∈ f (F1), and also (f (a)f (b)) = f (ab) ∈ f (F1). Finally, if f (a) = 0, then a ∈ F1 −{0}.But, −1 −1 then (f (a)) = f (a ) ∈ F1.  Characteristic of a Field Let F be a field. Consider the multiplicative identity 1 of F. There are two cases: (i) Distinct integral multiples of 1 are distinct, or equivalently, n1 = m1 implies that n = m. This is equivalent to say that n1 = 0 if and only if n = 0. In this case we say that F is of characteristic 0. Thus, for example, the field R of real numbers, the field Q of rational numbers, and the field C of complex numbers are the fields of characteristic 0. (ii) Not all integral multiples of 1 are distinct. In this case there exists a pair n, m of distinct integers such that n1 = m1. But, then, (n − m)1 = 0 = (m − n)1. In turn, there is a natural number l such that l1 = 0. In this case, the smallest natural number l such that l1 = 0 is called the characteristic of F. Thus, the characteristic of Zp is p. Proposition 1.1.11 The characteristic of a field is either 0 or a prime number p. A field of characteristic 0 contains a subfield isomorphic to the field Q of rational numbers, and a field of characteristic p contains a subfield isomorphic to the field Zp. Proof Suppose that F is a field of characteristic 0. Then n1 = m1 implies that n = m. ( = ) ( = ) ( m = r ) ( )( ) = Also m1 0 if and only if m 0 . Suppose that n s . Then m1 s1 ms1 = nr1 = (n1)(r1). In turn, ((m1)(n1)−1 = (r1)(s1)−1). Thus, we have a map Q ( m ) = ( )( )−1 (( )( )−1 = f from to F given by f n m1 n1 . Next, suppose that m1 n1 (r1)(s1)−1). Then ms1 = (m1)(s1) = (n1)(r1) = nr1. This means that ms = nr, ( m = r ) or equivalently, n s . This shows that f is an injective map. It is also straight forward to verify that f is a field homomorphism. Thus, L ={(m1)(n1)−1 | m ∈ Z, n ∈ Z −{0}} is a subfield of F which is isomorphic to Q. Next, suppose that the characteristic of F is l = 0. Then l is the smallest natural number such that l1 = 0. We show that l is a prime p. Suppose not. Then l = l1l2, 1 < l1 < l, 1 < l2 < l. But, then 0 = l1 = (l1l2)1 = (l11)(l21). In turn, l11 = 0orl21 = 0. This is a contradiction to the choice of l. Thus, the characteristic of F is a prime p. Suppose that i = j. Then p divides i − j. In turn, (i − j)1 = 0, and so i1 = j1. Thus, we have a map f from Zp to F defined by f (i) = i1. Clearly, this is an injective field homomorphism.  Exercises 1.1.1 Show that Q(ω) ={a + bω | a, b ∈ Q}, where ω a primitive cube root of 1, is a subfield of the field C of complex numbers. √ √ Q( ) 1.1.2 Show that√ 2√ is not a member√ of 2 . Use√ the method√ of Example 1.1.4 to show that Q( 2)( 2) ={a + b 2 + (c + d 2)( 2) | a, b, c, d ∈ Q} is a field with respect to the addition and multiplication induced by those in R. Generalize the assertion. 1.1 Concept of a Field 7 √ √ √ √ √ 1.1.3 Show that Q( 2)( 3) ={a + b 2 + (c + d 2)( 3) | a, b, c, d ∈ Q} is a field with respect to the addition and multiplication induced by those in R.

1 1 2 1.1.4 Show that Q(2 3 ) ={a + b2 3 + c2 3 | a, b, c ∈ Q} is also a field with R 1 respect to the addition and multiplication induced by those in . Express 1 as 1+2 3 1 2 a + b2 3 + c2 3 , a, b, c ∈ Q.

1.1.5 Show that F ={0, 1, α, α2} is a field of characteristic 2 with respect to the addition + and multiplication · given by the following tables:

+ 0 1 α α2 0 0 1 α α2 1 1 0 α2 α α α α2 0 1 α2 α2 α 1 0

· 0 1 α α2 0 0 0 0 0 1 0 1 α α2 α 0 α α2 1 α2 0 α2 1 α

1.1.6 Find the multiplicative inverse of 20 in Z257, and also find the solution of 10x ⊕ 2 = 3.

1.1.7 Write a program in C++ language to check if a natural number n is prime, and if so to find the multiplicative inverse of a nonzero element m in Zn. Find the output with n = 224 + 1, and m = 641.

1.2 Concept of a Vector Space (Linear Space)

Consider the space (called the Euclidean 3-space) in which we live. If we fix a point (place) in the three space as origin together with three mutually perpendicular lines (directions) passing through the origin as the axes of reference, and also a segment of line as a unit of length, then any point in the 3-space determines, and it is determined uniquely by an ordered triple (α, β, γ) of real numbers. 8 1 Vector Spaces

Z

P (α, β, γ)

O Y

X Thus, with the given choice of the origin and the axes as above, the space in which we live can be represented faithfully by

3 R ={x = (x1, x2, x3) | x1, x2, x3 ∈ R}, and it is called the Euclidean 3-space. The members of R3 are called the usual 3- vectors. It is also evident that the physical quantities which have magnitudes as well as directions (e.g., force, velocity, or displacement) can be represented by vectors. More generally, for a fixed natural number n,

n R ={x = (x1, x2,...,xn) | x1, x2,...,xn ∈ R} is called the Euclidean n-space, and the members of the Euclidean n-space are called the Euclidean n-vectors.Wetermx1, x2,...,xn as components, or coordinates of 2 4 the vector x = (x1, x2,...,xn). Thus, R represents the Euclidean plane, and R represents the Minkowski space of events in which the first three coordinates rep- resent the place, and the fourth coordinate represents the time of the occurrence of the event. R1 is identified with R. By convention, R0 = {0} is a single point. We have the addition + in Rn, called the addition of vectors, and it is defined by

x + y = (x1 + y1, x2 + y2,...,xn + yn), where x = (x1, x2,...,xn) and y = (y1, y2,...,yn). We have also the external multiplication · by the members of R, called the multiplication by scalars, and it is given by α · x = (αx1, αx2,...,αxn), α ∈ R. 1.2 Concept of a Vector Space (Linear Space) 9

Remark 1.2.1 The addition + of vectors in 3-space R3 is the usual addition of vectors, which obeys the parallelogram law of addition.

The Euclidean 3-space (R3, +, ·) introduced above is a Vector Space in the sense of the following definition: Definition 1.2.2 A Vector Space (also called a Linear Space) over a field F (called the field of Scalars) is a triple (V, +, ·), where V is a set, + is an internal binary operation on V , called the addition of vectors, and ·:F × V → V is an external multiplication, called the multiplication by scalars, such that the following hold: A. (V, +) is an abelian group in the sense that: 1. + is associative, i.e., (x + y) + z = x + (y + z)

for all x, y, z in V . 2. + is commutative, i.e., x + y = y + x

for all x, y in V . 3. We have a unique vector 0 in V , called the null vector, and it is such that

x + 0 = x = 0 + x

for all x in V . 4. For each x in V , we have a unique vector −x in V , called the negative of x, and it is such that x + (−x) = 0 = (−x) + x.

B. The external multiplication · by scalars satisfies the following conditions: 1. It distributes over the vector addition + in the sense that

α · (x + y) = α · x + α · y

for all α ∈ F and x, y in V . 2. It distributes over the addition of scalars also in the sense that

(α + β) · x = α · x + β · x

for all α, β ∈ F and x in V . 3. (αβ) · x = α · (β · x) for all α, β ∈ F and x in V . 4. 1· x = x for all x in V .

Example 1.2.3 Let F be a field, and n be a natural number. Consider the set

n V = F ={x = (x1, x2,...,xn) | x1, x2,...,xn ∈ F} 10 1 Vector Spaces of row vectors with n columns, and with entries in F. We have the addition + in Fn defined by

x + y = (x1 + y1, x2 + y2,...,xn + yn),

where x = (x1, x2,...,xn) and y = (y1, y2,...,yn). We have also the external multiplication · by the members of F defined by

α · x = (αx1, αx2,...,αxn), α ∈ F.

The field properties of F ensures that the triple (Fn, +·) is a vector space over F. The zero of the vector space is the zero row 0 = (0, 0,...,0), and the negative of x = (x1, x2,...,xn) is −x = (−x1, −x2,...,−xn). We can also treat the members of Fn as column vectors.

Example 1.2.4 Let L be a subfield of a field F. Consider (F, +, ·), where + is the addition of the field F, and · is the restriction of the multiplication in F to L × F. Then it is evident that (F, +, ·) is a vector space over L. Thus, every field can be considered as vector spaces over its subfields.

Example 1.2.5 Let C[0, 1] denote the set of all real valued continuous functions on the closed interval [0, 1]. Since sum of any two continuous functions is a continuous function, we have an addition on C[0, 1] with respect to which it is an abelian group. Define the external multiplication · by (a · f )(x) = a · f (x). Then C[0, 1] is a vector space over the field R of reals. Note that the set D[0, 1] of differentiable functions is also a vector space over the field R of reals with respect to the addition of functions, and multiplication by scalars as defined above.

Example 1.2.6 Let Pn(F) denote the set of all polynomials of degree at most n over a field F. Then Pn(F) is an abelian group with respect to the addition of polynomials. Further, if a ∈ F and f (X) ∈ Pn(F), then af (X) ∈ Pn(F). Thus, Pn(F) is also a vector space over F.

Proposition 1.2.7 Let V be a vector space over a field F. Then the following hold: (i) The cancellation law holds in (V, +) in the sense that (x + y = x + z) implies y = z (In turn, (y + x = z + x) implies y = z). (ii) 0 · x = 0, where 0 in the left side is the 0 of F, 0 on right side is that of V , and x ∈ V. (iii) α · 0 = 0, where both 0 are that of V , and α ∈ F. (iv) (−α) · x =−(α · x) for all α ∈ F, and x ∈ V . In particular, (−1) · x =−x. (v) (α · x = 0) implies that (α = 0 or x = 0).

Proof (i) Suppose that (x + y = x + z). Then y = 0 + y = (−x + x) + y = −x + (x + y) =−x + (x + z) = (−x + x) + z = 0 + z = z. (ii) 0 + 0 · x = 0 · x = (0 + 0) · x = 0 · x + 0 · x. By the cancellation in (V, +), 1.2 Concept of a Vector Space (Linear Space) 11

0 = 0 · x. (iii) 0 + α · 0 = α · 0 = α · (0 + 0) = α · 0 + α · 0. By the cancellation in (V, +),0 = α · 0. (iv) 0 = 0 · x = (−α + α) · x = (−α) · x + α · x. This shows that (−α) · x = −(α · x) (v) Suppose that (α · x = 0), and α = 0. Then, x = 1 · x = (α−1α) · x = α−1 · (α · x) = α−1 · 0 = 0. 

1.3 Subspaces

Definition 1.3.1 Let V be a vector space over a field F. A subset W of V is called a subspace,oralinear subspace of V if (i) 0 ∈ W. (ii) x + y ∈ W for all x, y ∈ W. (iii) α · x ∈ W for all α ∈ F and x ∈ W.

Thus, a subspace is also a vector space over the same field at its own right. Proposition 1.3.2 Let V be a vector space over a field F. Then a nonempty subset W of V is a subspace if and only if ax + by ∈ W for all a, b ∈ F, and x, y ∈ V.

Proof Suppose that W is a subspace of V .Leta, b ∈ F, and x, y ∈ V . From the Defi- nition 1.3.1(i), ax, by ∈ W. In turn, by Definition 1.3.1(ii), ax + by ∈ W. Conversely, suppose that W is a nonempty subset of V such that ax + by ∈ W for all a, b ∈ F, and for all x, y ∈ W.Letx, y ∈ W. Then x + y = 1x + 1y belongs to W. Further, since W is nonempty, there is an element x ∈ W, and then 0 = 0x + 0x belongs to W.Alsoforx ∈ W, and a ∈ F, ax = ax + 0x ∈ W. This shows that W is a subspace of V . 

Example 1.3.3 Let V be a vector space over a field F. Then V is clearly a subspace of V , and it is called an improper subspace of V . The singleton {0} is also a subspace of V , and it is called the trivial subspace of V . Other subspaces of V are called Proper subspaces of V .

Example 1.3.4 (Subspaces of R2 over R)LetW be a nontrivial subspace of R2. Then there is a nonzero element (l, m) ∈ W. Since W is a subspace, α · (l, m) = (αl, αm) ∈ W for all α ∈ R. Thus, Wlm ={(αl, αm) | α ∈ R}⊆W. Wlm is easily 2 2 seen to be a subspace of R . Indeed, Wlm is the line in the plane R passing through origin and the point (l, m). Note that all lines in R2 are of this type. Suppose that W = Wlm. Then there is a nonzero element (p, q) in W − Wlm. We claim that ql − pm = 0. Suppose that ql − pm = 0. Since (l, m) = (0, 0), l = 0 or m = 0. = ( , ) = ( p , p ) Suppose that l 0. Then, p q l l l m turns out to be in Wlm, a contradiction ( , ) = ( , ) = ( q , q ) to the choice of p q . Similarly, if m 0, then p q m l m m , a contradiction. Now, let (a, b) be an arbitrary member of R2. Since ql − pm = 0, we can solve the 12 1 Vector Spaces pair of equations αl + βp = a and αm + βq = b. In other words, (a, b) = α(l, m) + β(p, q) belongs to W, and so W = R2. This shows that only proper subspaces of R2 are the lines passing through origin.

Example 1.3.5 (Subspaces of R3 over R) As in the above example, lines and planes passing through origin are proper subspaces of R3 over R. Indeed, they are the only proper subspaces.

Proposition 1.3.6 Intersection of a family of subspaces is a subspace.

Proof Let {Wα | α ∈ } be a family of subspaces of a vector space V over F. Then 0 ∈ Wα for all α, and so 0 belongs to the intersection of the family. Thus, the intersection of the given family is nonempty. Let x, y ∈ α∈ Wα, and a, b ∈ F. Then x, y ∈ Wα for all α. Since each Wα is a subspace, ax + by ∈ Wα for all α. Hence ax + by belongs to the intersection. This shows that the intersection of the family is a subspace. 

Proposition 1.3.7 Union of subspaces need not be a subspace. Indeed, the union W1 W2 of two subspaces is a subspace if and only if W1 ⊆ W2 or W2 ⊆ W1.  ⊆ = ⊆ Proof If W1 W2, then W1 W2 W2 a subspace. Similarly, if W2 W1, then also the union is a subspace. Conversely, suppose that W1 W2 is a subspace and W1 ∈ ∈ is not a subset ofW2. Then there is an element x W1 which is not in W2.Lety W2. Then, since W1 W2 is a subspace, x + y ∈ W1 W2.Nowx + y does not belong to W2, for otherwise x = (x + y) − y will be in W2, a contradiction to the supposition. Hence x + y ∈ W1. Since x ∈ W1 and W1 is subspace, y =−x + (x + y) belongs to W1. This shows that W2 ⊆ W1. 

Proposition 1.3.8 Let W1 and W2 be subspaces of a vector space V over a field F. + ={ + | ∈ , ∈ } Then W1 W2 x y x W1 y W2 is also a subspace (called the sum of W1 and W2) which is the smallest subspace containing W1 W2. ∈ ∈ = + ∈ + ⊆ Proof Since 0 W2, x W1 implies that x x 0 W1 W2. Thus,W1 W1 + W2. Similarly, W2 ⊆ W1 + W2. Also, if L is a subspace containing W1 W2, then x + y ∈ L for all x ∈ W1, and y ∈ W2. Therefore, it is sufficient to show that W1 + W2 is a subspace. Clearly, W1 + W2 =∅.Letx + y and u + v belong to W1 + W2, where x, u ∈ W1, and y,v ∈ W2. Since W1 and W2 are subspaces, αx + βu ∈ W1, and αy + βv ∈ W2. But, then α(x + y) + β(u + v) = (αx + βu) + (αy + βv) belongs to W1 + W2. 

Definition 1.3.9 A family {Wα | α ∈ } of subspaces of a vector space V over a field F is called a chain if for any given pair α, β ∈ , Wα ⊆ Wβ, or Wβ ⊆ Wα.

Proposition 1.3.10 Union of a chain of subspaces is a subspace.

Proof Let {Wα |α ∈ } be a chain of subspaces of a vector space V over a field F.Clearly,0∈ α∈ Wα.Letx, y ∈ α∈ Wα, and α, β ∈ F. Then x ∈ Wα, and y ∈ Wβ for some α, β ∈ F. Since the family is a chain, Wα ⊆ Wβ, or Wβ ⊆ Wα. 1.3 Subspaces 13

This means that x, y ∈ Wα, or x, y ∈ Wβ. Since Wα and Wβ are subspaces, αx + βy α + β ∈ belongs to Wα or to Wβ. It follows that x y α∈ Wα. This shows that α∈ Wα is a subspace.  Subspace Generated (Spanned) by a Subset Definition 1.3.11 A subset S of a vector space V over a field F need not be a subspace, for example, it may not contain 0. The intersection of all subspaces of V containing S is the smallest subspace of V containing S. This subspace is called the subspace generated (spanned) by S, and it is denoted by < S >.If< S > = V , then we say that S generates V,orS is a set of generators of V . A vector space V is said to be finitely generated if it has a finite set of generators. Clearly, < ∅ > ={0}. Remark 1.3.12 The subspace < S > of V generated by S is completely characterized by the following 3 properties: (i) < S > is a subspace. (ii) < S > contains S. (iii) If W is a subspace containing S, then < S >⊆ W. Definition 1.3.13 Let S be a nonempty subset of a vector space V over a field F. An element x ∈ V is called a linear combination of members of S if

x = a1x1 + a2x2 + ··· + anxn for some a1, a2,...,an ∈ F and x1, x2,...,xn ∈ V . We also say that x depends linearly on S. Remark 1.3.14 If S is a nonempty set, then 0 is always a linear combination of the members of S,for0 = 0x. All the members of S are linear combination of members of S, for any x ∈ S is 1x. Further, if x is a linear combination of members of S, and S ⊆ T, then x is also a linear combination of members of T. A Linear combination of linear combinations of members of S is again a linear combination of members of S. Proposition 1.3.15 Let S be a nonempty subset of a vector space V over a field F. Then < S > is the set of all linear combinations of members of S. Proof Let W denote the set of all linear combinations of members of S. Since mem- bers of S are also linear combinations of members of S,itfollowsthatS ⊆ W. Thus, W is nonempty set. Let x = a1x1 + a2x2 +···anxn and y = b1y1 + b2y2 +···bmym be members of W, and a, b ∈ F. Then

ax + by = a1x1 + a2x2 +···anxn + b1y1 + b2y2 +···bmym, being a linear combination of members of S, is again a member of W, and so W is a subspace of V .LetL be a subspace of V containing S. It follows, by induction on r, 14 1 Vector Spaces that any linear combination a1x1 + a2x2 +···+arxr belongs to L. Thus, W is the smallest subspace of V containing S. 

In particular, S is a set of generators of a vector space V over a field F if and only if every element of V is a linear combination of members of S.

Example 1.3.16 The set E ={e1, e2, ··· , en}, where

i ei = (0, 0,...,0, 1 , 0,...,0),

n is a set of generators of the vector space F . Indeed, any member x = (x1, x2,...,xn) n of F is the linear combination x = x1e1 + x2e2 + ··· + xnen of members of E. The subset S ={e1 + e2, e2 + e3, e3 + e1} is also a set of generators of 3 F ,forx = (x1, x2, x3) = α1(e1 + e2) + α2(e2 + e3) + α3(e3 + e1), where + − + − + − α = x1 x2 x3 , α = x2 x3 x1 , α = x1 x3 x2 1 2 2 2 3 2 (verify). 3 Example 1.3.17 Consider The subset S ={e1 − e2, e2 − e3, e3 − e1} of R . It is easy to verify that x = (x1, x2, x3) is a linear combination of S ={e1 − e2, e2 − e3, e3 − e1} if and only if x1 + x2 + x3 = 0. Thus, the subspace < S > 3 of R generated by S is the plane {x = (x1, x2, x3) | x1 + x2 + x3 = 0}.

Linear Independence Definition 1.3.18 A subset S of a vector space V over a field F is called linearly independent if given any finite subset {x1, x2,...xn} of S, xi = xj for i = j,

a1x1 + a2x2 +···+anxn = 0 implies that ai = 0 for all i.

A subset S which is not linearly independent is called a linearly dependent subset. Thus, a subset S of a vector space V over a field F is linearly dependent if there is a subset {x1, x2,...,xn} of distinct members of S, and a1, a2,...,an not all zero in F such that a1x1 + a2x2 +···+anxn = 0.

Vacuously, the empty set ∅ is linearly independent. The observations in the following proposition are easy but crucial, and they will be used often. Proposition 1.3.19 Let V be a vector space over a field F. Then, (i) any subset of V containing 0 is linearly dependent, (ii) every subset of a linearly independent subset of V is linearly independent, (iii) every subset containing a linearly dependent set is linearly dependent, (iv) if S is a subset of V , and x ∈< S > − S, then S {x} is linearly dependent, and  (v) if S is linearly independent, and x ∈/< S >, then S {x} is linearly independent. 1.3 Subspaces 15

Proof (i) If 0 ∈ S, then 1 · 0 = 0but1= 0. It follows from the definition that S is linearly dependent. The assertions (i) and (iii) are immediate from the definition itself. (iv) Suppose that x ∈/ S, and x ∈< S >. Then there are distinct members x1, x2,..., xn ∈ S, and a1, a2, ..., an ∈ F such that

x = a1x1 + a2x2 +···+anxn.

But, then −1x + a1x1 + a2x2 +···+anxn = 0.  Since 1 = 0, it follows that S {x} is linearly dependent. ∈/< > , ,..., (v) Suppose that S is linearly independent, and x S . Suppose that x1 x2 xn ∈ S are distinct members of S {x} such that

a1x1 + a2x2 +···+anxn = 0.

If xi = x for all i, then since S is linearly independent, ai = 0 for all i. Suppose that xi = x for some i. Without any loss, we may suppose that x1 = x. Then a1 = 0, otherwise,

−1 −1 −1 x = (−a1) a2x2 + (−a1) a3x3 +··· + (−a1) anxn. belongs to < S >, a contradiction to the supposition. Thus, a1 = 0. Hence

a2x2 + a3x3 +···+anxn = 0.

Since S is linearly independent, ai = 0 for all i. 

Proposition 1.3.20 A subset S of a vector space V over a field F is linearly indepen- dent if and only if given distinct members x1, x2, ..., xn ∈ S, and a1, a2,...,an, b1, b2,...,bn ∈ F,

a1x1 + a2x2 +···+anxn = b1x1 + b2x2 +···+bnxn. implies that ai = bi for all i.

Proof Suppose that S is linearly independent, and

a1x1 + a2x2 +···+anxn = b1x1 + a2x2 +···+bnxn, where x1, x2, ..., xn are distinct members of S. Then

(a1 − b1)x1 + (a2 − b2)x2 + ··· + (an − bn)xn = 0. 16 1 Vector Spaces

Since S is linearly independent, ai − bi = 0 for all i, and so ai = bi for all i. Conversely, suppose that the condition is satisfied, and

a1x1 + a2x2 +···+anxn = 0, where x1, x2, ..., xn are distinct members of S. Then,

a1x1 + a2x2 + ··· + anxn = 0x1 + 0x2 + ··· + 0xn.

From the given condition ai = 0 for all i. This shows that S is linearly independent. 

Example 1.3.21 The set E ={e1, e2, ··· , en} described in Example 1.3.16 is n linearly independent subset of F ,for(x1, x2,...,xn) = x1e1 + x2e2 + ··· + xnen = 0 = (0, 0,...,0) implies that each xi = 0. Also, the subset S ={e1 + 3 e2, e2 + e3, e3 + e1} of F is linearly independent, for a1(e1 + e2) + a2(e2 + e3) + a3(e3 + e1) = 0 = (0, 0, 0) implies that a1 + a3 = 0 = a1 + a2 = a2 + a3. But, then a1 = a2 = a3 = 0. However, the subset S ={e1 − e2, e2 − 3 e3, e3 − e1} of F is linearly dependent, for 1(e1 − e2) + 1(e2 − e3) + 1(e3 − e1 = 0.

1.4 Basis and Dimension

Definition 1.4.1 A subset S of a vector space V over a field F is said to be a minimal set of generators or irreducible set of generators if (i) S generates V , i.e., < S > = V , and (ii) no proper subset of S generates V . More precisely, < S > = V , and < S −{x} > = V for all x ∈ S. Definition 1.4.2 A subset B of a vector space V over a field F is said to be a maximal linearly independent set if (i) B is linearly independent, and (ii) B ⊂ S implies that S is linearly dependent. More precisely, a linearly independent subset B is maximal linearly independent if for all x ∈/ B, B {x} is linearly dependent. The following two propositions says that maximal linearly independent sets and minimal sets of generators are same. Proposition 1.4.3 Every minimal set of generators is also a maximal linearly inde- pendent set. Proof Let S be a minimal set of generators of a vector space V over a field F. Suppose that S is not linearly independent. Then there exists a set {x1, x2,...,xn} of distinct members of S, and a1, a2,...,an not all 0 in F such that 1.4 Basis and Dimension 17

a1x1 + a2x2 +···+anxn = 0.

Since the addition is commutative, without any loss, we may assume that a1 = 0. But, then

−1 −1 −1 x1 = (−a1) a2x2 + (−a1) a3x3 +··· + (−a1) anxn.

This shows that x1 is a linear combination of members of S −{x1}, or equivalently, x1 ∈< S −{x1} >. Thus, S ⊆< S −{x1} >. Since < S > is the smallest subspace containing S, V = < S >⊆< S −{x1} >.Itfollowsthat< S −{x1} > = V .Thisis a contradiction to the supposition that S is a minimal set of generators of V . Thus, S is linearly independent. Next, suppose that x ∈/ S. Since S is also a set of generators, it follows from the Proposition 1.3.19(iv) that S {x} is linearly dependent. This completes the proof of the fact that S is maximal linearly independent.  Conversely, have the following proposition: Proposition 1.4.4 A maximal linearly independent subset is also a minimal set of generators.

Proof Let B be a maximal linearly independent subset of a vector space V over a field F.Letx ∈ V .Ifx ∈ B, then x ∈< B >. Suppose that x ∈/ B. Since B is maximal linearly independent subset of V , B {x} is linearly dependent. Hence there exists a set {x1, x2,...,xn} of distinct members of B {x}, and a1, a2,...,an not all 0 in F such that a1x1 + a2x2 +···+anxn = 0.

One of the xi is x and corresponding ai = 0, otherwise B will turn out to be linearly dependent, a contradiction to the supposition that B is linearly independent. We may assume, without loss of generality, that x1 = x and a1 = 0. But, then

−1 −1 −1 x = x1 = (−a1) a2x2 + (−a1) a3x3 +··· + (−a1) anxn.

Hence x ∈< B >. This shows that B is a set of generators of V . Finally, x ∈/< B −{x} >, otherwise, from Proposition 1.3.19(iv), B will turn out to be linearly dependent. This shows that B is a minimal set of generators.  Most of the implications in the following theorem are already established. Theorem 1.4.5 Let B be a subset of a vector space V over a field F. Then the following conditions are equivalent: 1. B is maximal linearly independent subset of V . 2. B is a minimal set of generators of V . 3. B is linearly independent as well as a set of generators of V . 4. Every nonzero element x ∈ V can be expressed uniquely (upto order) as

x = a1x1 + a2x2 +···+anxn, 18 1 Vector Spaces where x1, x2,...,xn are distinct members of B, and a1, a2,...,an are all nonzero members of F.

Proof The equivalence of 1 and 2 follows from the Proposition 1.4.3 and the Propo- sition 1.4.4. The implication 2 ⇒ 3 follows from the Proposition 1.4.3. (3 ⇒ 4). Assume 3. Since B is a set of generators and also linearly independent, 4 follows from the Proposition 1.3.20. (4 ⇒ 1). Assume 4. It follows again from the Proposition 1.3.20 that B is linearly independent. Suppose that x ∈/ B.By(4),x is a linear combination of members of B, and so B {x} is linearly dependent. This shows that B is maximal linearly independent subset. 

Definition 1.4.6 A subset B of a vector space V over a field F is called a basis of V if it satisfies any one, and hence all, of the conditions in the Theorem 1.4.5.

Example 1.4.7 The set E ={e1, e2, ··· , en} described in Example 1.3.16 is linearly independent (Example 1.3.21) subset as well as a set of generators of Fn (Example 1.3.16), and hence it is a basis of Fn. This basis is called the standard n 3 basis of F . Similarly, S ={e1 + e2, e2 + e3, e3 + e1} is another basis of F .

Proposition 1.4.8 Let V be a finitely generated vector space over a field F. Then V has a finite basis. Indeed, any finite set of generators contains a basis.

Proof Let S be a finite set of generators of V . It may be a minimal set of generators and so a basis. If not, < S −{x1} > = V for some x1 ∈ S. S −{x1} maybeaminimal set of generators and so a basis. If not, then < S −{x1, x2} > = V for some x2 ∈ S −{x1}. S −{x1, x2} may be a minimal set of generators and so a basis. If not, proceed. This process stops after finitely many steps giving us a basis contained in S,forS is finite. 

Theorem 1.4.9 Let V be a finitely generated vector space over a field F. Then every basis of V is finite, and any two bases of V contain the same number of elements.

Proof From the above proposition, V has a finite basis

B1 ={x1, x2, ·, xn}(say).

Let B2 be another basis of V .IfB1 − B2 =∅, then B1 ⊆ B2. Since B1 and B2 are both maximal linearly independent sets (being bases), B1 = B2, and we are done. = − =∅ ⊆ = Suppose that B1 B2. Then B2 B1 , otherwise B2 B1, and again B2  B1. Let y1 ∈ B2 − B1. Since B1, being a basis, is maximal linearly independent, B1 {y1} is linearly dependent. Thus, there exist a1, a2,...,an, b1 not all 0 in the field F such that a1x1 + a2x2 +···+anxn + b1y1 = 0. 1.4 Basis and Dimension 19

Indeed, b1 = 0, otherwise

a1x1 + a2x2 +···+anxn = 0, and then all ai = 0. Further, since y1 = 0, b1y1 = 0. Hence ai = 0forsomei.We may assume that a1 = 0. But, then

−1 −1 −1 −1 x1 = (−a1) a2x2 + (−a1) a3x3 +··· + (−a1) anxn + (−a1) b1y1.   ∈<( −{ }) { } > ⊆<( −{ }) { } > Hence, x1 B1 x1 y1 . This shows that B1 B1 x1  y1 , and so (B1 −{x1}) {y1} generates V . We also show that (B1 −{x1}) {y1} is linearly independent. Suppose that

a2x2 + a3x3 +···+anxn + b1y1 = 0.

If b1 = 0, then a2x2 + a3x3 +···+anxn+=0.

Since B1 is linearly independent, ai = 0 for all i ≥ 2. Suppose that b1 = 0. Then

−1 −1 −1 y1 = (−b1) a2x2 + (−b1) a3x3 +··· + (−b1) anxn,  and so y1 ∈< B1 −{x1} >. Since (B1 −{x1}) {y1} is already seen to be a set of < −{ } > = generators of V , B1 x1 V . This is a contradiction to the supposition ( −{ }) { } that B1 is a basis (minimal set of generators). This shows that B1 x1 y1 ( −{ }) { }− =∅ is also a basis containing n elements. If B1 x1 y1 B2 , then as ( −{ }) { }= before B1 x1 y1 B2, and so B2 contains n elements. If not, as before, B2 − ((B1 −{x1}) {y1}) is nonempty, and then proceed as above. The process stops after finitely many steps, at most at the nth step, showing that B2 is finite, and contains exactly n elements. 

Definition 1.4.10 The number of elements in a basis of a finitely generated vector space V over a field F is called the dimension of V , and it is denoted by dim(V ).

It follows from Example 1.4.7 that the dimension of Fn is n. The dimension of the plane W ={x = (x1, x2, x3) | x1 + x2 + x3 = 0} is 2, for {e1 − e2, e2 − e3} is a basis of W (verify). To determine the dimension of a vector space, one needs to determine a basis of the vector space, and then count the number of elements in the basis. In the next chapter we shall have an algorithm to find a basis, and so also the dimensions of the subspaces of Fn, which are generated by finite sets of elements of Fn. Proposition 1.4.11 Every set of generators of a finite dimensional vector space contains a basis. 20 1 Vector Spaces

Proof Let B ={x1, x2,...,xn} be a finite basis of a finite dimensional vector space V over a field F.LetS be a set of generators of V . Then each member xi is a linear combination of a finite subset Ai (say) ofS. In turn, each member of B is a linear combination of the finite subset A = A1 A2 ,..., An of S. Since B generates V , A also generates V . Since A is finite, we can reduce A to a minimal set of generators, a basis of V . 

Proposition 1.4.12 Every linearly independent subset of a finite dimensional vector space V over a field F can be enlarged to a basis of V .

Proof Let B ={x1, x2,...,xn} be a finite basis of a finite dimensional vector space V over a field F, and S a linearly independent subset of V .IfB ⊆< S >, then < S > = ∈ − < > V , and so S is a basis, and there is nothing to do. If not, then some xi B  S . ∈ − < > { } We may assume that x1 B S . Then by the Proposition 1.3.19(v), S x1 is ⊆< { } > = < >⊆< { } > linearly independent. If B S x1 , then V B S x1 , and so S {x1} turns out to be a basis. If not, proceed. This process stops at most at the nth step enlarging S to a basis. 

Corollary 1.4.13 If dimV = n, then (i) every linearly independent subset contains at most n elements, and (ii) any set of generators contain at least n elements. 

Proposition 1.4.14 Let F be a finite field containing q elements, and V a vector space over F of dimension n. Then V contains exactly qn elements.

Proof Since dimV = n, there is a basis {x1, x2,...,xn} of V containing n elements. Hence every element v of V can be expressed uniquely as

v = α1x1 + α2x2 ··· + αnxn,

n where α1, α2,...,αn belong to F. This says that we have a bijective map η from F to V defined by

η(α1, α2,...,αn) = α1x1 + α2x2 +···+αnxn.

Since F contains q elements, Fn and hence V contains qn elements. 

Corollary 1.4.15 Let F be a finite field of characteristic p (note that p is prime). Then F contains pn elements for some n ∈ N.

Proof Since F is finite, its characteristic is some prime p = 0. By proposition1.1.1, F has a subfield isomorphic to the field Zp of prime residue classes modulo p. Thus, F is a vector space over a field containing p elements. Since it is finite, its dimension is finite n (say). From the previous proposition, the result follows.  1.4 Basis and Dimension 21

Corollary 1.4.16 Let L be a field containing pn elements, where p is a prime, and n ≥ 1. Let F be a subfield of L. Then F contains pm elements for some divisor m of n.

Proof Since F is a subfield of L, charL = charF = p. Thus, F contains pm elements for some m. Since L is a vector space over F,itfollowsthatL contains (pm)r = pmr elements for some r. Hence n = mr. 

Remark 1.4.17 We shall see in a latter chapter that for every prime p, and for all n ≥ 1, there is a unique (upto isomorphism) field of order pn. Further, corresponding to any divisor m of n, there is a unique subfield of order pm.

Definition 1.4.18 Let V be a vector space over a field F. An ordered n-tuple (x1, x2,...,xn) is called an ordered basis of V if the set {x1, x2,...,xn} is a basis of V . Thus, to every basis there are exactly n! distinct ordered bases which give rise to the same basis. Proposition 1.4.19 Let V be a vector space of dimension n over a finite field F containing q elements. Then the number of ordered bases of V is

(qn − 1)(qn − q)(qn − q2) ···(qn − qn−1), and the number of bases of V is

( n − )( n − ) ···( n − n−1) q 1 q q q q . n!

Proof We find the number of ordered n-tuples (x1, x2,...,xn) such that the set {x1, x2,...,xn} is a basis. Since x1 can be any nonzero element of the vector space, n the number of ways in which x1 can be selected is q − 1. Having chosen x1,we have to select x2 such that {x1, x2} is linearly independent. Clearly, {x1, x2} is linearly independent if and only if x2 = αx1 for all α ∈ F. Thus, the number of ways in which the ordered pair (x1, x2) can be chosen so that the set {x1, x2} is linearly indepen- n n dent is (q − 1)(q − q). Again, having chosen (x1, x2), we have to find x3 so that {x1, x2, x3} is linearly independent. Now, {x1, x2, x3} is linearly independent if and only if x3 = α1x1 + α2x2 for every pair α1, α2 ∈ F. Thus, the number of choices for n 2 x3 is q − q . Hence the number of choices for the ordered triple (x1, x2, x3) so that n n n 2 the set {x1, x2, x3} is linearly independent is (q − 1)(q − q)(q − q ). Proceeding inductively, the number of choices for the ordered n-tuple (x1, x2,...xn) so that that {x1, x2,...xn} is linearly independent (and so a basis) is

(qn − 1)(qn − q)(qn − q2) ···(qn − qn−1).

In turn, the number of bases of V is

( n − )( n − ) ···( n − n−1) q 1 q q q q .  n! 22 1 Vector Spaces

Remark 1.4.20 If W is a subspace of a vector space V , then since a basis of W is a linearly independent subset of V, dimW ≤ dimV . Further, if n is the dimension of V , and m ≤ n, then there is a subspace of dimension m,forif{x1, x2,...,xn} is a basis of V , then the subspace < {x1, x2,...,xm} > is a subspace of dimension m. Proposition 1.4.21 Let V be a vector space of dimension n over a field F containing q elements. Then the number br of r dimensional subspaces, r ≥ 1 is

(qn − 1)(qn − q) ···(qn − qr−1) b = . r (qr − 1)(qr − q) ···(qr − qr−1)

The total number of subspaces is

1 + b1 + b2 + ···+ bn, where br is given above. Proof Let 0 < r < n. Any subspace of dimension r is determined by a linearly independent subset {x1, x2,..., xr} of V . From the proof of the above proposition, it follows that the number of linearly independent subsets containing r elements is

( n − )( n − ) ···( n − r−1) q 1 q q q q . r!

Further, a linearly independent subset {y1, y2,...,yr} determine the same subspace as the set {x1, x2,...,xr} if and only if {y1, y2,...,yr} is a basis of the the subspace < {x1, x2,...,xr} > which is of dimension r. The number of bases of a vector space of dimension r is ( r − )( r − ) ···( r − r−1) q 1 q q q q . r!

Thus, the number of r dimensional subspaces of V is br given in the statement of the proposition.  Proposition 1.4.22 Let V be a vector space of finite dimension over a field F. Let W1 and W2 be subspaces of V . Then W1 + W2 ={x + y | x ∈ W1, y ∈ W2} is a subspace, and

dim(W1 + W2) = dimW1 + dimW2 − dim(W1 W2).

+ { , ,..., } Proof We have already seen that W1 W2 is a subspace. Let x1 x2 xr be a basis of W1 W2. Then it is a linearly independent subset of W1 as well as of W2. Thus, it can be enlarged to a basis

{x1, x2,...,xr, y1, y2,...,ym} of W1, and to a basis 1.4 Basis and Dimension 23

{x1, x2,...,xr, z1, z2,...zn} of W2, where dimW1 = r + m, and dimW2 = r + n. We show that

S ={x1, x2,...,xr, y1, y2,...,ym, z1, z2,...,zn} is a basis of W1 + W2. Clearly, W1 ⊆< S >, and W2 ⊆< S >. Hence W1 + W2 ⊆< S >.AlsoS ⊆ W1 W2. Hence, < S >⊆ W1 + W2. Now, we show that S is linearly independent. Suppose that

α1x1 + α2x2 +···+αrxr + β1y1 + β2y2 +···+βmym + δ1z1 + δ2z2 +···+δnzn = 0. Then, α1x1 +···αrxr + β1y1 +···+βmym =−δ1z1 −···−δnzn   belongs to W1 W2. Since {x1, x2,...,xr} is a basis of W1 W2,

−δ1z1 − δ2z2 −···−δnzn = γ1x1 + γ2x2 +···γrxr for some γ1, γ2,...,γr in F. Since {x1, x2,...,xr, z1, z2,...,zn} is linearly inde- pendent (being a basis of W2), it follows that δ1, δ2,...,δn are all zero. Simi- β , β ,...β α + α +···+α = larly, 1 2 m are all zero. But, then 1x1 2x2  rxr 0. Since {x1, x2,...,xr} is linearly independent(being a basis of W1 W2), we see that α1, α2,...,αr are all zero. This shows that S is also linearly independent and so + ( + ) = + + = ( + ) + ( + a basis of W1 W2. In turn, dim W1 W2 r m n r n r m) − r = dimW1 + dimW2 − dim(W1 W2). 

Example 1.4.23 Let V be a vector space of dimension n.LetW1 and W2 be distinct subspaces of dimension n − 1 each. Then W1 + W2 is a subspace of V containing ( + ) = = + − W1 properly, and so it is V . Thus, dim W1 W2 n dimW1 dimW2 dim(W1 W2. Hence, from the above proposition dim(W1 W2) = n − 1 + n − 1 − n = n − 2.

1.5 Direct Sum of Vector Spaces, Quotient of a Vector Space

Let V1, V2,...,Vr be vector spaces over a field F. Consider the Cartesian product V = V1 × V2 ×···×Vr. Define addition using the coordinate-wise addition, and the external multiplication · by α · (x1, x2,...,xr) = (αx1, αx2,...,αxr).Itis straight forward to verify that V is a vector space over F with respect to these operations. This vector space is called the external direct sum of V1, V2,...,Vr. A vector space V over a field F is said to be internal direct sum of its subspaces V1, V2,...,Vr if every element x of V has a unique representation as

x = x1 + x2 + ··· + xr, 24 1 Vector Spaces where xi ∈ Vi for all i. The notation

V = V1 ⊕ V2 ⊕ ··· ⊕ Vr stands to assert that V is direct sum of its subspaces V1, V2,...,Vr.

Proposition 1.5.1 Let V1, V2,...,Vr be subspaces of a vector space V over a field F. Then the following conditions are equivalent. = ⊕ ⊕ ··· ⊕ (1) V V1 V2 Vr.  i i (2) V = V1 + V2 + ··· + Vr, and Vi V ={0} for all i, where V = (V1 + V2 +···+Vi−1 + Vi+1 + Vi+2 +···+Vr).

Proof 1 =⇒ 2. Assume 1. Since every element x of V has a unique representation i as x = x1 + x2 +···+xr, V = V1 + V2 + ··· + Vr.Letx ∈ Vi V . Then x = x1 + x2 +···+xi−1 + xi+1 + xi+2 +···xr, where xj, j = i belong to Vj. Thus,

0 = x1 + x2 +···+xi−1 − x + xi+1 + xi+2 +···+xr = 0 + 0 +···+0.

From the uniqueness of the representation of 0, it follows that xj is zero for all j, and x is also 0. Hence x = 0. 2 =⇒ 1. Assume 2. Clearly, every element x of V has a representation x = x1 + x2 +···+xr, where xi ∈ Vi for all i. Now, we prove the uniqueness of the representation. If

x = x1 + x2 +···+xr = y1 + y2 +···+yr,  i where xi, yi ∈ Vi for all i. Then xi − yi ∈ Vi V ={0}. This shows that xi = yi for all i. 

i Remark 1.5.2 If V is direct sum of V1, V2,...,Vr, then V as defined in the above proposition is direct sum of V1, V2,...,Vi−1, Vi+1, Vi+2,...,Vr.

Proposition 1.5.3 Let W1, W2,...,Wr be subspaces of a vector space V such that V = W1 + W2 +···+Wr. Then V is direct sum of W1, W2,...,Wr if and only if dimV = dimW1 + dimW2 +···+dimWr.

Proof Suppose that V is direct sum of W1, W2,...,Wr. Then, we show that = + +···+ = dimV dimW1 dimW2 dimWr. The proof is by induction onr.Ifr 1, 1 then there is nothing to do. Assume that the result is true for r. Since W1 W ={0}, 1 it follows from the Proposition 1.4.22 that dimV = dim(W1 + W ) = dimW1 + 1 1 dimW − dim{0}=dimW1 + dimW2. Further, it is clear that W is direct sum 1 of W2, W3,...,Wr+1, and so by the induction assumption dimW = dimW2 + dimW3 +···+dimWr+1. Hence dimV = dimW1 + dimW2 +···+dimWr. Con- = + +···+ versely, suppose that dimV dimW1 dimW2 dimWr. Clearly, then i Wi W ={0} for all i. The result follows.  1.5 Direct Sum of Vector Spaces, Quotient of a Vector Space 25

Example 1.5.4 The plane W ={(x, y, z) | lx + my + nz = 0}, (l, m, n) = (0, 0, 0), is a subspace of R3 of dimension 2 (verify). Let (a, b, c)/∈ W. The line L ={(aα, bα, cα) | α ∈ R} is also a subspace of R3 of dimension 1 such that W + L = R3. Since dimW + dimL = 3, R3 = W ⊕ L.

Let V be a vector space, and W a subspace of V .Letx ∈ V . The subset x + W = {x + w | w ∈ W} is called the coset of V modulo W determined by x.Thisisalso called a Plane in V passing through x and parallel to W.Theset{x + W | x ∈ V } of cosets of V modulo W is denoted by V/W. Proposition 1.5.5 Let W be a subspace of a vector space V over a field F. Then the following hold: (i) x ∈ x + W for all x ∈ V. (ii) (x + W = y + W) if and only if (x − y) ∈W. (iii) (x + W = y + W) if and only if (x + W) (y + W) =∅. In particular, V/W is a partition of the vector space V .

Proof (i) Since 0 ∈ W, x = x + 0 ∈ x + W for all x ∈ V . (ii) Suppose that (x + W = y + W). Then x ∈ y + W. Hence x = y + w for some w ∈ W. In turn, x − y = w belongs to W. Conversely, suppose that x − y ∈ W. Then x + w = y + (x − y) + w belongs to y + W for all w ∈ W. This shows that x + W ⊆ y + W. Similarly, since y − x also belongs to W,it follows that y + W ⊆ x +W. Thus, x + W = y + W.  (iii) Suppose that (x + W) (y + W) =∅.Letz ∈ (x + W) (y + W). Then z = x + u = y + v for some u,v ∈ W. But, then x − y = v − u belongs to W. It follows from (ii) that (x + W = y + W). 

Corollary 1.5.6 Let W be a subspace of a vector space V over a field F. Then the following hold: (i) If x + W = x + W and y + W = y + W, then (x + y) + W = (x + y ) + W. (ii) If x + W = x + W, then ax + W = ax + W for all a ∈ F. In turn, we have an internal binary operation + on V/W, and an external multipli- cation · on V/W by scalars given by

(x + W) + (y + W) = (x + y) + W,

and a · (x + W) = a · x + W.

Further, V/W is a vector spae with respect to these operations.

Proof (i) Suppose that x + W = x + W, and y + W = y + W. Then x − x and y − y belong to W. Since W is a subspace (x + y) − (x + y ) belongs to 26 1 Vector Spaces

W. This shows that (x + y) + W = (x + y ) + W. (ii) Suppose that x + W = x + W. Then, x − x ∈ W. Since W is a subspace (ax − ax ) = a(x − x ) belongs to W. This shows that ax + W = ax + W for all a ∈ F. (i) and (ii) ensure that we have the addition + on V/W, and the multiplication · by scalars as described in the corollary. The verification of the fact that V/W is a vector space with respect to the operations described is straight forward. The zero of the vector space is the coset 0 + W = W, and the negative of of x + W is −x + W.  Definition 1.5.7 The vector space V/W described above is called the quotient space of V modulo W. Proposition 1.5.8 Let V be a finite dimensional vector space over a field F, and W a subspace of V . Then dimV/W = dimV − dimW

Proof Let {x1, x2,...,xr} be a basis of W, and {y1 + W, y2 + W,...,ys + W} be a basis of V/W. We show that {x1, x2,...,xr, y1, y2,...,ys} is a basis of V .Let x ∈ V . Then x + W ∈ V/W. Since {y1 + W, y2 + W, ..., ys + W} is a basis of V/W,

x + W = α1(y1 + W) + α2(y2 + W) + ··· + αs(ys + W) = (α1y1 + α2y2 +···+αsys) + W for some α1, α2,...,αs in F. Hence,

x − (α1y1 + α2y2 + ··· + αsys) belongs to W for some α1, α2,...,αs in F. Since {x1, x2,...,xr} is a basis of W,

x − (α1y1 + α2y2 +···+αsys) = β1x1 + β2x2 +···+βrxr for some β1, β2,...,βr in F. Thus,

x = β1x1 + β2x2 +···+βrxr + α1y1 + α2y2 +···+αsys for some βi, αj ∈ F. This shows that {x1, x2,...,xr, y1, y2,...,ys} generates V . Next, suppose that

β1x1 + β2x2 +···+βrxr + α1y1 + α2y2 +···+αsys = 0.

Since β1x1 + β2x2 +···+βrxr belongs to W,

α1(y1 + W) + α2(y2 + W) +···+αs(ys + W) = (α1y1 + α2y2 +···+αsys) + W = W.

Since W is the zero of V/W, and {y1 + W, y2 + W, ··· , ys + W} (being a basis of V/W) is linearly independent, αj = 0 for all j. But, then β1x1 + β2x2 +···+ 1.5 Direct Sum of Vector Spaces, Quotient of a Vector Space 27

βrxr = 0. Since {x1, x2,...,xr} (being a basis of W) is linearly independent, βi = 0 for all i. This shows that {x1, x2,...,xr, y1, y2,...,ys} is linearly independent, and so it is a basis also. Thus,

dimV = r + s = dimW + dimV/W. 

In this book, we shall be mainly interested in finite dimensional vector spaces. How- ever, a vector space need not be finite dimensional, and to develop the analogues theory for infinitely generated vector spaces, one uses some equivalents of axiom of choice (for example, Zornes Lemma). Proposition 1.5.9 Union of a chain of linearly independent subsets is linearly inde- pendent.

Proof Let {Sα | α ∈ } be a chain of linearly independent subsets. Then 0 ∈/ Sα for all α, and hence 0 ∈/ α∈ Sα.Letx1, x2,...,xn be distinct elements α∈ Sα. ∈ { | α ∈ } α Suppose that xi Sαi . Since Sα is a chain, there exists r such that ⊆ , ,..., Sαi Sαr for all i. Thus x1 x2 xn all belong to Sαr . Since Sαr is linearly inde- pendent,

a1x1 + a2x2 +···+anxn = 0 implies that ai = 0 for all i.

This shows that the union is linearly independent. 

Proposition 1.5.10 Every linearly independent subset can be embedded in to a basis.

Proof All that we need to show that every linearly independent subset can be embed- ded in a maximal linearly independent subset. Let S be a linearly independent subset of a vector space V over a field F.LetX be the set of all linearly independent sub- sets which contain S. Then X =∅,forS ∈ X. Thus, (X, ⊆) is a nonempty partially ordered set. From the previous proposition, it follows that every chain in X has an upper bound. By the Zorn’s Lemma, X has a maximal element T (say). Clearly, T is also maximal linearly independent subset. 

Proposition 1.5.11 Every set S of generators contains a basis.

Proof Let S be a set of generators of V .IfS ={0},orS =∅, then V ={0}, and then ∅⊆S is a basis of V . Suppose that S ={0}.LetX be the set of all linearly independent subsets of S.Ifx is a nonzero element in S, then {x}∈X, and so X =∅. Thus, (X, ⊆) is a nonempty partially ordered set. Since union of a chain of linearly independent subsets is a linearly independent subset, every chain in X has an upper bound. By the Zorn’s lemma, X has a maximal element B (say). S ⊆<B >, for if not, there exists an element x ∈ S− < B >. Consider the subset B = B {x}. Suppose that αx + α1x1 + α2x2 + ··· + αnxn = 0, 28 1 Vector Spaces where x1, x2,...xr are all in B, and xi = xj for i = j. But, then αx ∈< B >. Since x ∈/< B >, α = 0. Since B is linearly independent, αi = 0 for all i. This shows that B is linearly independent. This is a contradiction to the supposition that B is maximal linearly independent subset of S. Thus, S ⊆< B >, and so < B > = V . This shows that B is a basis of V contained in S. 

Exercises

1.5.1 Test the following for being a subspace of F3. Find the subspace generated by those which are not subspaces.

(i) W ={x = (x1, x2, x3) | x1 + x2 = x3}. (ii) W ={x = (x1, x2, x3) | x1 + 2x2 + x3 = 0 = 2x1 + x2 + 3x3}. (iii) W ={x = (x1, x2, x3) | x1 + 2x2 + x3 = 1}. ={ = ( , , ) | + = 2} (iv) W x x1 x2 x3 x1 2x2 x3 . ={ = ( , , ) | 2 + 2 + 2 = } (v) W x x1 x2 x3 x1 x2 x3 1 . (vi) W ={(x, sinx, cosx) | x ∈ R}. ={ = ( , , ) | 2 + 2 + 2 = = + + } (v) W x x1 x2 x3 x1 x2 x3 1 x1 2x2 x3 . 1.5.2 Which of the following subsets are linearly independent? and Why? (i) The subset {(1, 1, 0), (0, 1, 1), (1, 0, 1)} of F3. (ii) The subset {(1, 1, 1, 0), (2, 3, 4, 0), (4, 9, 16, 0), (2, 3, π, 0)} of F4. (iii) The subset {(1, 1, 1, 0), (1, 3, 4, 0), (1, 9, 16, 0), } of F4. (iv) The sphere S2 ={(x, y, z)|x2 + y2 + z2 = 1} in F3. (v) The subset {(x, x2)|x ∈ R} of R2.

1.5.3 Let V be a finite dimensional vector space, and W a subspace such that dimW = dimV . Show that W = V . Is this result true for infinite dimensional vector spaces? Support.

1.5.4 Show that a non trivial proper subspace of R3 is either a line passing through origin or a plane passing through origin.

1.5.5 Show that the intersection of two distinct planes passing through origin is a line passing through origin. − 1.5.6 Let W1 and W2 be two distinct subspaces of dimension n 1 of a vector space W of dimension n. Show that the dimension of W1 W2 is n−2.

1.5.7 A subspace W of dimension n − 1 of a vector space V of dimension n is called a hyperplane in V . Show that for every hyperplane W of dimension n − 1of n n the vector space F , there exists a = (a1, a2,...,an) ∈ F −{0} such that

W ={x = (x1, x2,...,xn) | a1x1 + a2x2 +···+anxn = 0}.

1.5.8 Show that a subspace (also called a plane) of dimension r of a vector space V of dimension n is an intersection of n − r distinct hyperplanes. 1.5 Direct Sum of Vector Spaces, Quotient of a Vector Space 29

1.5.9∗ What are the results in the above section which remain true even when a field is replaced by the set Z of integers with usual addition and multiplication? What are the best possible modifications in the results which are not true for Z so that it holds for Z also?

1.5.10 Show that {e1 + e2, e2 + e3,...,en−1 + en, en}, where {e1, e2,...,en} is the standard basis of the vector space Fn, is another basis of Fn.

1.5.11 Characterize a vector space with unique basis. Can we have a vector space with exactly 2 bases? Support.

Z3 Z 1.5.12 Find the number of bases of p over p.

Z3 Z 1.5.13 Find the number of subspaces of p over p. 1.5.14 Show that a vector space V has no proper subspace if and only if it is of dimension 1.

1.5.15 Embed (1, 1, 0, 2) into a basis of R4.

1.5.16 Show that {(1, 2, 1), (3, 1, 2), (1, 1, 1)} forms a basis of R3 over R. Express (3, 5, 2) as linear combination of members of the above basis.

1.5.17 Can we have a nontrivial finite vector space over a field of characteristic 0, or over any infinite field? Support.

1.5.18 Let F be a field of order 32. Find the number of subfields of F.

1.5.19 Let Pn(F) denote the set of polynomials of degree at most n over the field F. Show that Pn(F) is a vector space over F with respect to the usual addition of polynomials and multiplication by scalars. Find a basis of this vector space and so also the dimension.

1.5.20 Let W1 be a subspace of dimension n − 1 of a vector space V of dimension n.LetW2 be a subspace of dimension r such that W2 is not contained in W1.Find the dimension of W1 W2.

1.5.21 Let V be a vector space over infinite field. Can we express V as finite union of proper subspaces? Support.

1.5.22 Show that {1,(X − 1), (X − 1)2,...,(X − 1)n} is a basis of the vector space Pn(F) of polynomials of degree at most n.

n 1.5.23 Let W ={(x1, x2,...,xn) ∈ R | x1 + 2x2 + 3x3 +···+nxn = 0} Show that W is a subspace of Rn. Find the dimension of W.

1.5.24 Find a vector space having exactly 3 bases. Is such a vector space unique? Support. 30 1 Vector Spaces

1.5.25 Let V be a vector space over a field F and W a subspace. Show that there is a subspace W such that V = W ⊕ W .

1.5.26 Let W be a nontrivial proper subspace of the Euclidean vector space R3 of dimension 3. Show that W is a line passing through origin, or it is a plane passing through origin.

1.5.27 Let W1 be a line passing through origin, and W2 a plane passing through 3 origin which does not contain the line W1. Show that R = W1 ⊕ W2.

1.5.28 Let (l, m, n) = (0, 0, 0). Show that W ={(λl, λm, λn) | λ ∈ R} is a sub- space of R3, and the quotient space R3/W is the set of all lines having the direction ratio (l, m, n).

1.5.29 Show that R considered as a vector space over the field Q of rational numbers is infinite dimensional.

1.5.30 Show that in the real vector space C[0, 1] of all real valued continuous func- tions on [0, 1],theset{Sin x, Sin 2x,...,Sin nx} is linearly independent subset for all n ≥ 1. Show also that the set of all Legendre polynomials also form a linearly independent set. Deduce that C[0, 1] is infinite dimensional. Chapter 2 Matrices and Linear Equations

Matrices play a pivotal role in mathematics, and in turn, in all branches of science, social science, and engineering. This chapter is devoted to the interplay between matrices and systems of linear equations.

2.1 Matrices and Their Algebra

By definition, a m × n matrix A with entries in a field F is an arrangement ⎡ ⎤ a11 a12 ···a1n ⎢ ⎥ ⎢ a21 a22 ···a2n ⎥ ⎢ ⎥ ⎢ ······⎥ A = ⎢ ⎥ ⎢ ······⎥ ⎣ ······⎦

am1 am2 ···amn of m rows and n columns of elements of F. In short A is denoted by [aij], where aij is the entry in the ith row and jth column of A.Theith row

(ai1, ai2,...,ain) of the matrix A is a vector in Fn, called the ith row vector of A, and it will be denoted by Ri(A). Thus, the matrix A can also be expressed as a column

© Springer Nature Singapore Pte Ltd. 2017 31 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_2 32 2 Matrices and Linear Equations ⎡ ⎤ R1(A) ⎢ ⎥ ⎢ R2(A) ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢ ⎥ ⎢ · ⎥ ⎣ · ⎦

Rm(A) of m rows with entries in Fn. Similarly, if we treat the members of Fm as column vectors, then the jth column ⎡ ⎤ a1j ⎢ ⎥ ⎢ a2j ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢ ⎥ ⎢ · ⎥ ⎣ · ⎦

amj of the matrix A is a column vector in Fm, called the jth column vector of A, and it will be denoted by Cj(A). As such, the matrix A can also be expressed as a row

A =[C1(A), C2(A), ... , Cm(A)].

Thus, ⎡ ⎤ 201+ i 1 − i 3 ⎢ 41+ 2i 01i ⎥ A = ⎢ ⎥ ⎣ 08 1 2i ⎦ 12 3 45 is a 4 × 5 matrix with entries in the field C of complex numbers. AmatrixA is called a square matrix if the number of rows and columns are same. The matrix ⎡ ⎤ 201 A = ⎣ 410⎦ 081 is a square 3 × 3 matrix with entries in the field R of real numbers. The set of all m × n matrices with entries in a field F is denoted by Mmn(F).Theset of all square n × n matrices is denoted by Mn(F). We have a binary operation + on Mmn(F), called the matrix addition, and which is defined by

[aij]+[bij]=[cij], where cij = aij + bij. 2.1 Matrices and Their Algebra 33

For example, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 201 012 213 ⎣ 410⎦ + ⎣ 310⎦ = ⎣ 720⎦ 081 581 5162

The m × n matrix 0m×n all of whose entries are 0 is called the zero m × n matrix. Clearly, the matrix 0m×n is described by the property that for any m × n matrix A, A + 0m×n = A = 0m×n + A. Further, if A =[aij] is a m × n matrix, then the matrix −A =[−aij] all of whose entries are the negatives of the corresponding entries of A is called the negative of A, and it is described by the property that A + (−A) = 0m×n = (−A) + A. The proof of the following proposition is an immediate consequence of the corresponding properties of the addition + in F.

Proposition 2.1.1 The set Mmn(F) of m × n matrices with entries in F is an abelian group with respect to the matrix addition in the sense that it satisfies the following properties: (i) The matrix addition + is associative in the sense that (A + B) + C = A + (B + C) for all A, B and C in Mmn(F). (ii) The matrix addition + is commutative in the sense that (A + B) = (B + A) for all A, BinMmn(F). (iii) There is a unique matrix 0m×n in Mmn(F) such that A + 0m×n = A = 0m×n + A for all A in Mmn(F). (iv) For each matrix A in Mmn(F), there is a unique matrix −AinMmn(F) such that A + (−A) = 0mn = (−A) + A. 

We have an external multiplication · on Mmn(F) by scalars in F defined by a · [aij]=[bij], where bij = a · aij. Thus, for example, ⎡ ⎤ ⎡ ⎤ 201 402 2 · ⎣ 410⎦ = ⎣ 820⎦ 081 0162

It can be further observed that the triple (Mmn(F), +, ·) is a vector space over F. mn Indeed, (Mmn(F), +, ·) can be identified with the triple (F ,+,·) under the corre- spondence A ←→ (R1(A), R2(A), ..., Rm(A)) which respects all the operations. Let eij denote the matrix in which ith row jth column entry is 1 and the rest of the entries are 0. For example, the 3 × 3matrixe23 is given by ⎡ ⎤ 000 ⎣ ⎦ e23 = 001 . 000

It follows that the set {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} corresponds to the standard basis of Fmn under the above correspondence. Clearly, 34 2 Matrices and Linear Equations

[aij]=i,jaijeij, and i,jaijeij = 0mn if and only if aij = 0 for all i, j. Thus, {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis, called the standard basis, of the vector space Mmn(F). Thus, the dimension of Mmn(F) is 2 m · n. In particular, Mn(F) is of dimension n . Apart from the above operations, we have an external operation · from Mmn(F) × Mnp(F) to Mmp(F), called the matrix multiplication, defined as follows: Let A = [aij], 1 ≤ i ≤ m, 1 ≤ j ≤ n, and B =[bjk], 1 ≤ j ≤ n, 1 ≤ k ≤ p. Then A · B = [cik], where cik = jaijbjk. Thus, for example, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 201 012 5105 ⎣ 410⎦ · ⎣ 310⎦ = ⎣ 358⎦ 081 581 29 16 1

It can be observed easily that the matrix multiplication is distributive over addition from left as well as from right in the sense that (A + B) · C = A · C + B · C and A · (B + C) = A · B + A · C. Evidently, A · 0n×p = 0m×p, and 0p×m · A = 0p×n. Again, since k(jaijbjk)ckl = jaij(kbjkckl), it follows that the matrix multiplication is associative in the sense that (A · B) · C = A · (B · C) whenever the products are defined. In particular, we have a multiplication · in Mn(F). Note that matrix multiplication is not commutative, for example,    01 · 00 = 10 , 00 10 00 where as    00 · 01 = 00 10 00 01

Thus, the set Mn(F) of n × n matrices with entries in F together with matrix addition +, the multiplication by scalars, and the matrix multiplication · is an algebra in the sense of the following definition. Definition 2.1.2 A vector space V over a field F together with an internal multipli- cation · on V is called an algebra over F if the following conditions hold: 1. The internal multiplication · is associative, i.e., (x · y) · z = x · (y · z) for all x, y, z ∈ V . 2. · distributes over addition +, i.e., (x + y) · z = x · z + y · z, and also x · (y + z) = x · y + x · z for all x, y, z ∈ V . 3. α(x · y) = (αx) · y = x · (αy) for all α ∈ F, and x, y ∈ V . Let A be a n × m matrix. The m × n matrix At obtained by interchanging rows and columns of A is called the transpose of A. More precisely, if A =[aij] is a n × m 2.1 Matrices and Their Algebra 35

t matrix, then the m × n matrix A =[bji], where bji = aij is called the transpose of A.LetA =[aij] be a n × m matrix with entries in the field C of complex numbers. The matrix A =[bij], where bij = aij (the complex conjugate of aij) is called the t conjugate of the matrix A. The matrix A = A is called the tranjugate, also called the hermitian conjugate of A Thus, for example ⎡ ⎤ ⎡ ⎤ 201 t 240 ⎣ 410⎦ = ⎣ 018⎦ 081 101

⎡ ⎤ ⎡ ⎤ 2 + ii1 + i 2 − i 4 − i 1 + i ⎣ 4 + ii 0 ⎦ = ⎣ −i −i 8 ⎦ 1 − i 81+ i 1 − i 01− i

Proposition 2.1.3 Let A, B be matrices with entries in a field F. Then (i) (A + B)t = At + Bt (ii) (At)t = A. (iii) (a · A)t = a · At (iv) (A · B)t = Bt · At provided the relevant sums and the products are defined. Further, if A, B are matrices with entries in the field C of complex numbers, then (v) (A + B) = A + B (vi) (A) = A. (vii) (a · A) = a · A (viii) (A · B) = B · A provided the relevant sums and the products are defined.

Proof The identities (i), (ii), and (iii) are evident from the definition. We prove the (iv). Suppose that A =[aij] is a n × m matrix, and B =[bjk] is a m × p matrix. Then, by the definition, A · B =[cik], where cik = jaijbjk = jvkjuji where vkj = bjk t t t and uji = aij. By the definition B =[vkj], A =[uji], and (A · B) =[wki], where wki = cik. This shows that the kth row jth column entry of both sides are same. This proves the result. The proofs of the rest of the identities are similar. 

2.2 Types of Matrices

1. Identity matrix.Then × n matrix all of whose diagonal entries are 1 and off diagonal entries are 0 is called the identity matrix of order n, and it is denoted by In. For example, 36 2 Matrices and Linear Equations ⎡ ⎤ 100 ⎣ ⎦ I3 = 010 001

It can be checked that In · A = A = A · Im for every n × m matrix A. Indeed, if C is a n × m matrix such that C · A = A for every n × m matrix A, then C = In. 2. Diagonal matrix. A matrix A =[aij] is called a diagonal matrix if all off diagonal entries are 0. Thus, [aij] is a diagonal matrix if aij = 0 for all i = j. The diagonal matrix whose ith row ith column entry is αi is denoted by Diag(α1, α2,...,αn).For example, ⎡ ⎤ 100 Diag(1, 2, 3) = ⎣ 020⎦ 003

The effect of multiplying the diagonal matrix diag(α1, α2,...,αn) to a n × m matrix A from left is to multiply the ith row by αi. Thus diag(α1, α2,...,αn) ·[aij]= [bij], where bij = αiaij. Similarly, the effect of multiplying this matrix to a m × n matrix A from right is the same as multiplying the ith column by αi. In particular, diag(α1, α2,...,αn) · diag(β1, β2,...βn) = diag(α1β1, α2β2,...,αnβn). 3. Scalar matrix.An × n diagonal matrix all of whose diagonal entries are same is called a scalar matrix. Thus, a scalar matrix is of the form αIn, and effect of multiplying this matrix to a matrix A is αA. 4. Symmetric matrix. A matrix A is called a symmetric matrix if At = A. Thus, a diagonal matrix is a symmetric matrix. The matrix ⎡ ⎤ 132 ⎣ 320⎦ 203 is a symmetric matrix. It follows from the Proposition 2.1.3 that sum of two symmetric matrices are symmetric, scalar multiple of a symmetric matrix is a symmetric matrix. Thus, the set Sn(F) of all n × n symmetric matrices forms a subspace of Mn(F).For all matrices A, AAt is a symmetric matrix. For a square matrix A, A + At is a symmetric matrix. Product of two symmetric matrices is symmetric if and only if they commute. 5. Skew symmetric matrix. A matrix A is called a skew symmetric matrix if At =−A. For example, the matrix ⎡ ⎤ 032 ⎣ −300⎦ −200 is a skew symmetric matrix. It follows from the Proposition 2.1.3 that sum of two skew symmetric matrices are skew symmetric, scalar multiple of a skew symmetric matrix is a skew symmetric matrix. Thus, the set SSn(F) of all n × n skew symmetric 2.2 Types of Matrices 37

t matrices forms a subspace of Mn(F). A − A is skew symmetric for all square matrices A. Product of two skew symmetric matrices is skew symmetric if and only if they anti commute in the sense that A · B =−B · A. Also observe that the diagonal entries of a skew symmetric matrices are 0. Every square matrix A with entries in a field F can be uniquely represented as = A+At + A−At A+At sum A 2 2 of a symmetric matrix 2 and a skew symmetric matrix A−At 2 (prove the uniqueness of the representation). 6. Hermitian matrix. A matrix A with entries in the field C of complex numbers is called a hermitian matrix (also termed as self adjoint)ifA = A.Thus,amatrix A with real entries is Hermitian if and only if it is symmetric. The matrix ⎡ ⎤ 13+ i 2 ⎣ 3 − i 2 i ⎦ 2 −i 3 is a Hermitian matrix. Evidently, all diagonal entries of Hermitian matrices are real. It follows from the Proposition 2.1.3 that sum of two Hermitian matrices are Hermitian. However, only real scalar multiple of a Hermitian matrix is a Hermitian matrix. For all matrices A, AA is a Hermitian matrix. For a square matrix A, A + A is also a Hermitian matrix. Product of two Hermitian matrices is Hermitian if and only if they commute. 7. Skew-Hermitian matrix. A matrix A with entries in the field C of complex numbers is called a skew-Hermitian matrix if A =−A.Thus,amatrixA with real entries is skew-Hermitian if and only if it is skew symmetric. The matrix ⎡ ⎤ i 3i − 12 ⎣ 3i + 12i −1 ⎦ 213i is a skew-Hermitian matrix. Evidently, all diagonal entries of skew-Hermitian matri- ces are purely imaginary. It follows from the Proposition 2.1.3 that sums of two skew-Hermitian matrices are skew-Hermitian. However, only real scalar multiple of a skew-Hermitian matrix is a skew-Hermitian matrix. Observe that a matrix A is skew-Hermitian if and only if iA is a Hermitian matrix. For all matrices A, iAA is a skew-Hermitian matrix. For a square matrix A, A − A is also a skew-Hermitian matrix. Product of two skew-Hermitian matrices is skew-Hermitian if and only if they anticommute in the sense that AB =−BA. Every square matrix A with entries in the field C of complex numbers can be = A+A + A−A A+A uniquely represented as sum A 2 2 of a Hermitian matrix 2 , and a A−A skew-Hermitian matrix 2 (prove the uniqueness of the representation). In turn, it follows that every square matrix A with entries in the field C of complex numbers can be uniquely represented as A = B + iC, where B and C are Hermitian matrices. 8. Nonsingular matrices.An × n matrix A is called a nonsingular matrix (also called an invertible matrix)ifthereisan × n matrix B such that A · B = In = B · A. 38 2 Matrices and Linear Equations

Note that such a B, if exists, will be unique, for if B1 and B2 are such matrices, then B1 = B1 · In = B1 · (A · B2) = (B1 · A) · B2 = In · B2 = B2.IfA is an invertible matrix, then the unique B such that A · B = In = B · A is called the Inverse of A, and it is denoted by A−1. Following are some simple observations: −1 = (i) The identity matrix In is invertible and In In. (ii) Consider a diagonal matrix diag(α1, α2,...,αn). As already observed in 2, diag(α1, α2,...,αn) ·[aij]=[bij], where bij = αiaij. Thus, diag(α1, α2,..., αn) ·[aij]=In if and only if αiaij = 1forj = i, and 0 other wise. This is so if α = , = α−1 = = and only if i 0 aii i for each i, and aij 0 for all i j. This shows that Diag(α1, α2,...,αn) is invertible if and only if each αi = 0, and then its inverse is (α−1, α−1, ...,α−1) Diag 1 2 n . −1 −1 (iii) Let A and B be invertible n × n matrices. Then, (AB)(B A ) = In = (B−1A−1)(AB). This shows that AB is also invertible and (AB)−1 = B−1A−1. In due course, we shall describe an algorithms to check if a matrix is invertible, and then to find its inverse. 9. Triangular matrices. A square matrix A is said to be an upper (lower) triangular matrix if all its below (above) diagonal entries are 0. More precisely, a n × n matrix A =[aij] is called an upper (lower) triangular matrix if aij = 0 for all i > j(i < j).Itiscalledstrictly upper (lower) triangular if in addition to that all the diagonal entries are also 0. For example, ⎡ ⎤ 146 ⎣ 020⎦ 003 is an upper triangular matrix. Clearly, the sum of any two upper (lower) triangular matrices is an upper (lower) triangular matrix. Also a scalar multiple of an upper (lower) triangular matrix is a upper (lower) triangular matrix. Thus, the set T+(n, F)(T−(n, F)) of upper (lower) triangular matrices forms a subspace of Mn(F). Further, T+(n, F)(T−(n, F)) is closed under matrix multiplication: For, let A = [aij] and B =[bjk] be upper triangular matrices. Then aij = 0 = bjk for all i > j > k.LetA · B =[cik]. Then cik = jaijbjk = 0 for all i > k. Next, let A =[aij]∈T+(n, F) be a nonsingular matrix. Then there is a matrix B =[bij] such that B · A = In. Equating the first row first column entry from both = = = −1 side we get b11a11 1. But then a11 0 and b11 a11 . Equating second row first column entry, we obtain that b21a11 = 0. Hence b21 = 0. Similarly, equating ith row 1st column entry we obtain that bi1a11 = 0, and so bi1 = 0 for all i > 1. Equating the 1st row 2nd column entry, we get that b11a12 + b12a22 = 0, and equating the = = = −1 2nd row 2nd column entry, we get b22a22 1. Thus a22 0, b22 a22 , and = −1 −1 b12 a22 a11 a12. Proceeding in this way we obtain that all the diagonal entries aii of A are nonzero, and then we can solve bij to get the inverse of A. We also observe that the inverse of A is also a member of T+(n, F). For example, consider the upper triangular matrix 2.2 Types of Matrices 39 ⎡ ⎤ 246 ⎣ 020⎦ 003 all of whose diagonal entries are nonzero. We find its inverse. Suppose that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a11 a12 a13 246 100 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ a21 a22 a23 · 020 = 010 a31 a32 a33 003 001

Then we have the following equations: 2a11 = 1, 4a11 + 2a12 = 0, 6a11 + 3a13 = 0, 2a21 = 0, 4a21 + 2a22 = 1, 6a21 + 3a23 = 0, 2a31 = 0, 4a31 + 2a32 = 0, 3a33 = 1 = 1 , =− = , = = = , = Solving, we get that a11 2 a12 1 a13 a21 a31 a32 0 a23 , = 1 , = 1 0 a22 2 a33 3 . Thus, the inverse of the said matrix is ⎡ ⎤ 1 − − 2 1 1 ⎣ 1 ⎦ 0 2 0 1 00 3

Block multiplication We can multiply two matrices by using suitable blocks of their submatrices. More explicitly, let A be a m × n matrix, and B a n × p matrix. Suppose that m = m1 + m2 +···+mr, n = n1 + n2 +···ns, and p = p1 + p2 +···+pt, where mi, nj and pk are positive integers. Then A and B can be expressed uniquely as ⎡ ⎤ A11 A12 ···A1s ⎢ ⎥ ⎢ A21 A22 ···A2s ⎥ ⎢ ⎥ ⎢ ······⎥ A = ⎢ ⎥ , ⎢ ······⎥ ⎣ ······⎦

Ar1 Ar2 ···Ars where Aij is a mi × nj matrix and ⎡ ⎤ B11 B12 ···B1t ⎢ ⎥ ⎢ B21 B22 ···B2t ⎥ ⎢ ⎥ ⎢ ······⎥ B = ⎢ ⎥ , ⎢ ······⎥ ⎣ ······⎦

Bs1 Bs2 ···Bst where Bjk is nj × pk matrix. Further, then 40 2 Matrices and Linear Equations ⎡ ⎤ C11 C12 ···C1t ⎢ ⎥ ⎢ C12 C22 ···C2t ⎥ ⎢ ⎥ ⎢ ······⎥ A · B = ⎢ ⎥ , ⎢ ······⎥ ⎣ ······⎦

Cr1 Cr2 ···Crt

= s where Cik j=1AijBjk.

2.3 System of Linear Equations

Asystemofm linear equations in n unknowns x1, x2, ..., xn over a field F is given by ⎛ ⎞ a11x1 + a12x2 +···+a1mxm = b1 ⎜ ⎟ ⎜ a21x1 + a22x2 +···+a2mxm = b2 ⎟ ⎜ ⎟ ⎜ · · · · ··· · · · · ⎟ ⎜ ⎟ , (2.1) ⎜ · · · · ··· · · · · ⎟ ⎝ · · · · ··· · · · · ⎠

an1x1 + an2x2 +···+anmxm = bn where aij ∈ F. Example 2.3.1 Following is a system of two linear equations in three unknowns over the field of real numbers: 3x1 + 2x2 + x3 = 1.

x1 + x2 + x3 = 2.

n We say that a n-tuple (a1, a2,...,an) in F is a solution of the system (2.1) of linear equations if x1 = a1, x2 = a2,...,xn = an satisfies all the equations in the system (2.1). Thus, (−2, 3, 1) is a solution of the system of linear equations in the above example. (−3, 5, 0) is also a solution to the above system. Indeed, there are infinitely many solutions which can be parametrized in terms of x3 as (x3 − 3, 5 − 2x2, x3). Clearly, this represents a line. Example 2.3.2 The system

x1 + 2x2 + 3x3 = 1.

x1 + x2 + 3x3 = 2.

4x1 + 6x2 + 12x3 = 5. of linear equations has no solution (why?). 2.3 System of Linear Equations 41

where as

Example 2.3.3 The system x1 + 2x2 = 1.

2x1 + 2x2 = a. has a unique solution for all a (why?).

Definition 2.3.4 A system of linear equations is said to be consistent if it has a solution. It is said to be inconsistent otherwise.

The Example 2.3.1 is consistent having infinitely many solutions, the Example 2.3.2 is inconsistent, whereas Example 2.3.3 is consistent with unique solution. Most of the problems in real life, in engineering, in industries, in social life, and in medical science can be modeled in terms of systems of linear equations. As such, describing and interpreting the solutions of a system of linear equations is one of the main themes of linear algebra. In the following few sections we shall concentrate on this. The system (2.1)ofm linear equations in n unknowns can be expressed in a single matrix equation

t Axt = b (2.2) where A =[aij] is the m × n matrix whose ith row jth column entry is aij, x = n [x1, x2,...,xn]∈F is the 1 × n row matrix of unknowns, and b =[b1, b2,...,bm] ∈ Fm is the 1 × m matrix. Thus, the system of linear equations in Example 2.3.1 can be expressed as ⎡ ⎤  x  321 1 1 ⎣ x ⎦ = 111 2 2 x3

The matrix A in (2.2) is called the coefficient matrix of the system (2.1) of linear t equations, and the m × (n + 1) matrix A+ =[A b ] whose first n columns are those t of A, and the last (n + 1)th column is b , is called the augmented matrix of the system of linear equations. Thus, the coefficient matrix of the Example 2.3.2 is ⎡ ⎤ 12 3 ⎣ 11 3⎦ , 4612 and the augmented matrix of the example is 42 2 Matrices and Linear Equations ⎡ ⎤ 2461 ⎣ 0202⎦ 0035

Definition 2.3.5 A system of linear equations given by the matrix equation

t Axt = 0 ... . (2.3) is called a homogeneous system of linear equations. It is also called the homoge- neous part of the system of linear equations given by

t Axt = b .

Proposition 2.3.6 A homogeneous system of linear equations given by the matrix equation t Axt = 0 . is always consistent, and the set of solutions of the homogeneous system is a subspace of Fn.

t t t Proof Let N(A) denote the set of all solutions of Axt = 0 . Since A0 = 0 , it follows that 0 ∈ N(A).Letu, v ∈ N(A), and a, b ∈ F. Then A(au + bv)t = t aAut + bAvt = 0 . This shows that au + bv ∈ N(A). It follows that N(A) is a subspace of Fn. 

Definition 2.3.7 The subspace N(A) described in the above proposition is called the solution space of the system (2.3) of linear equations, and it is also called the null space of the matrix A. The dimension of the null space N(A) is called the nullity of A, and it is denoted by n(A).If{u1, u2,...,un(A)} is a basis of N(A), then any solution of (2.3) is uniquely expressed as c1u1 + c2u2 +···+cn(A)un(A), where c1, c2,...,cn(A) are constants in F. As such c1u1 + c2u2 +···+cn(A)un(A) is called a general solution of the homogeneous system (2.3).

A little later, we shall give an algorithm to find N(A), indeed a basis of N(A), and so also a general solution of the system (2.3) of linear equations.

Proposition 2.3.8 Suppose that the system of linear equations given by the matrix equation t Axt = b . is consistent, and a =[a1, a2,...,an] is a solution of the above equation. Then the coset a + N(A) ={a + u | u ∈ N(A)} is the complete set of all solutions of the system of linear equations. In turn, if {u1, u2,...,un(A)} is a basis of N(A), then a + c1u1 + c2u2 +···+cn(A)un(A) is a general solution of the system of linear equations, where c1, c2,...,cn(A) are arbitrary constants. 2.3 System of Linear Equations 43

t t t Proof Since a is a solution of Axt = b , Aat = b .Ifu ∈ N(A), then Aut = 0 . t t t But, then A(a + u)t = (Aat + Aut) = (b + 0 ) = b . This shows that a + u t t is also a solution of Axt = b . Conversely, let c be a solution of Axt = b . Then t Act = b . Hence A(c − a)t = (Act − Aat) = 0. It follows that (c − a) ∈ N(A). This shows that c ∈ a + N(A). The rest is an immediate observation. 

n Definition 2.3.9 The subspace R(A) of F generated by the set {R1(A), R2(A),..., Rm(A)} of the rows of A is called the row space of A, and the dimension of R(A) is called the row rank of A. Thus, the maximum number of linearly independent rows of a matrix is the row rank of A. Similarly, the subspace C(A) of Fm (the elements of Fm treated as columns) is called the column space of A, and the dimension of C(A) is called the column rank of A. Again, it follows that the maximum number of linearly independent columns of A is the column rank of A. We shall see, in due course, that row rank is same as column rank, and it is called the rankof A. The rank of A is denoted by r(A).

Proposition 2.3.10 The system of linear equations given by the matrix equation

t Axt = b . is consistent if and only if the column rank of A is same as that of the augmented matrix A+. t Proof The system of linear equations given by the matrix equation Axt = b is also expressible as

t x1C1(A) + x2C2(A) + ··· + xnCn(A) = b , where x =[x1, x2,...,xn], and Ci(A) denotes the ith column of A. Thus, the equation t has a solution if and only if b is a linear combination of the columns of A.Thisis equivalent to say that the column space C(A) of A is same as the column space C(A+) of the augmented matrix A+. Since C(A) ⊆ C(A+), this is equivalent to the fact that column rank of A is same as that of A+. 

We shall look at an algorithm to find the rank of a matrix, and also an algorithm t to find a general solution of Axt = b provided it is consistent.

2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity

Definition 2.4.1 Two systems of m linear equations in n unknowns are said to be equivalent if they have same set of solutions. 44 2 Matrices and Linear Equations

Example 2.4.2 The system x1 + 2x2 = 1

2x1 + 2x2 = a of two linear equations in two unknowns is equivalent to the system

x1 + 2x2 = 1

3x1 + 4x2 = a + 1, for they have same set of solutions, whereas the system is not equivalent to

x1 + 2x2 = 1

2x1 + 3x2 = a

In what follows, we shall introduce an algorithm called the Gaussian elimination to reduce a system of linear equations into an equivalent system of linear equations from which the solution will become apparent. Definition 2.4.3 Following operations on a system of linear equations are called the elementary operations on the system of linear equations, and the corresponding operations on coefficient and augmented matrices are called the Elementary row operations on the matrices: 1. Interchange any two equations in the system. 2. Multiply an equation in the system by a nonzero member of the field. 3. Add a nonzero multiple of an equation in the system to another equation in the system. In turn, the corresponding elementary row operations on matrices are: 1. Interchange any two row of the matrix. 2. Multiply a row of the matrix by a nonzero element of the field. 3. Add a nonzero multiple of a row of the matrix to another row. The following proposition is an immediate observation. Proposition 2.4.4 Any two system of linear equations which differ by a finite sequence of elementary operations are equivalent.  We shall first discuss an algorithm to find the space of solutions of a homogeneous t system of linear equations given by the matrix equation Axt = 0 . More precisely, we derive an algorithm to find a basis of the null space N(A) of A so that every solution of the system is unique linear combination of the basis members. Proposition 2.4.5 The null space N(A), and so also the nullity n(A) of a matrix A remain invariant under the elementary row operations. 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 45

Proof Follows from the Proposition 2.4.4.  Proposition 2.4.6 The row space R(A) and so also the row rank of a matrix A remain invariant under the elementary row operations. Proof Interchange of any two rows of a matrix will not change the row space as the set of rows will not change. Since the subspace of Fn generated by the set n {R1(A), R2(A), . . . , Rm(A)} of rows of A is the same as the subspace of F generated by {R1(A), R2(A),...,aRj(A), . . . , Rm(A)} for each nonzero a ∈ F and j ≤ m,it follows that the row space of a matrix remains the same if we multiply a row of the matrix by a nonzero member of the field. Finally, since the subspace of Fn generated by the set {R1(A), R2(A),...,Rm(A)} of rows of A is the same as the subspace of n F generated by {R1(A), R2(A),...,Rk(A) + aRj(A),...,Rm(A)} for each nonzero a ∈ F and j = k, it follows that the row space of a matrix remains the same if we add a nonzero multiple of a row to another row.  The column space of a matrix, in general, is not invariant under elementary row operations. However, Proposition 2.4.7 The column rank of a matrix remains invariant under elementary row operations. Proof Let A be a matrix, and A a matrix obtained by applying any of the elementary row operations on A. Then evidently,

( ) + ( ) + ··· + ( ) = t x1Ci1 A x2Ci2 A xrCir A 0 if and only if

( ) + ( ) + ··· + ( ) = t x1Ci1 A x2Ci2 A xrCir A 0

This means that the maximum number of linearly independent columns of A is same as that of A. Thus, the column rank of A is same as that of A.  We shall describe an algorithm to transform a matrix in to a special form, called a reduced row echelon form, of the matrix by using elementary row operations, and from which a basis for the null space of the matrix, and also a basis of the row space of the matrix can be easily obtained.

Definition 2.4.8 A m × n matrix A =[aij] is said to be a matrix in reduced row (column) echelon form,oritissaidtobeareduced row echelon matrix if the following hold: (i) The first nonzero entry in each row (column) is 1. This entry is called a pivot entry, and the corresponding columns (rows) are called pivot column (row) of the matrix. The columns (rows) which are not pivot columns (rows) are called the free columns (rows). The unknown variable corresponding to pivot columns are called pivot variables, and those corresponding to free columns are called free Variables. 46 2 Matrices and Linear Equations

(ii) The pivot entry in any row (column) is towards right (bottom) side to the pivot entry in the previous row (column). (iii) All of the rest of the entries in a pivot column (row) are 0. (iv) All the zero rows (columns) are towards bottom (right).

Example 2.4.9 The matrix ⎡ ⎤ 12002 ⎢ 00101⎥ A = ⎢ ⎥ ⎣ 00012⎦ 00000 is in reduced row echelon form. The 1st row 1st column, the 2nd row 3rd column, and the 3rd row 4th column entries are pivot entries, 2nd and 5th columns are free columns. x1, x3 and x4 are pivot variables. x2 and x5 are free variables.

Proposition 2.4.10 Let A be a m × n matrix with entries in a field F and which is in ( ), ( ),..., ( ) reduced row echelon form. Suppose that the columns Ci1 A Ci2 A Cir A with < < ···< ( ), i1 i2 ir are pivot columns and the columns Cj1 A Cj2 ( ),..., ( ) < < ···< A Cjs A with j1 j2 js are free columns. Then,

(i) the first r rows R1(A), R2(A),...,Rr(A) are nonzero rows, and they form a basis of the row space R(A) of A, (ii) the number of pivots is the row rank of A, (iii) the pivot columns form a basis of the column space of A, (iv) row rank of A is the same as the column rank of A. Indeed, it is the number of pivots.

Proof (i) Since each nonzero row contains a unique pivot entry, and the zero rows are towards the bottom, it follows that R1(A), R2(A),...,Rr(A) are precisely the nonzero rows of the matrix. Since the pivot entries 1 in R1(A), R2(A),...,Rr(A) appear in different columns i1, i2,...,ir, it follows that the set {R1(A), R2(A),..., Rr(A)} of nonzero row of A is linearly independent. As such, it forms a basis of the row space R(A) of A. (ii) Follows from (i). { ( ), ( ),..., ( )} (iii) Clearly, the set Ci1 A Ci2 A Cir A of pivot columns form a linearly ( ) independent set, for the kth row entry in the pivot column Cik A is 1 and the rest of the entries in this column are 0. It is also evident that all the free columns are linear linear combinations of the pivot columns. Indeed,

( ) = ( ) + ( ) + ··· + ( ). Cjl A a1jl Ci1 A a2jl Ci2 A arjl Cir A

(iv) Follows from (iii). 

Proposition 2.4.11 Consider the homogeneous system of linear equations given by the matrix equation Axt = ot, 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 47 where A is a reduced row echelon m × n matrix with entries in a field F. Sup- ( ), ( ), . . . , ( ) < < ···< pose that the columns Ci1 A Ci2 A Cir A with i1 i2 ir are pivot ( ), ( ),..., ( ) < < ···< columns, and the columns Cj1 A Cj2 A Cjs A with j1 j2 js are free , ,..., columns. Then the pivot variables xi1 xi2 xir in the homogeneous system of linear , ,..., equations are uniquely expressible in terms of free variables xj1 xj2 xjs as

s =− . xit atjk xjk k=1

The set {u1, u2,..., us} is a basis for the space N(A) of solutions of the homogeneous k = ( k, k,..., k) system, where u u1 u2 un is the unique solution of the homogeneous = = = system corresponding to the choice xjl 0,l k, and xjk 1 of the free variables. Indeed, uk = 0 for l = k, uk = 1, and uk =−a . The nullity n(A) = s, the number jl jk it tjk of free variables. ≤ = = = Proof Under the assumption, for all t r, atit 1 and alit 0forl t.The corresponding homogeneous system of linear equations is given by

+ + + ··· + = . a1i1 xi1 a1j1 xj1 a1j2 xj2 a1js xjs 0

+ + + ··· + = . a2i2 xi2 a2j1 xj1 a2j2 xj2 a2js xjs 0

......

......

+ + + ··· + = . ari1 xi1 arj1 xj1 arj2 xj2 arjs xjs 0 the rest of the equations, if any, are the identities

0x1 + 0x2 + ··· + 0xn = 0.

Evidently, each pivot variable is uniquely expressible in terms of free variable as described in the proposition. Further, the set S ={u1, u2,..., us} of solutions is a basis of the space N(A) of solutions, for any solution with values α1, α2,...,αs , ,..., to the free variables xj1 xj2 xjs is uniquely expressible as linear combination 1 2 s α1u + α2u + ··· , αsu . The rest is evident. 

Proposition 2.4.12 Consider the system of linear equations given by the matrix equation t Axt = b , where A is a reduced row echelon m × n matrix with entries in a field F. Suppose that ( ), ( ),..., ( ) < < ···< the columns Ci1 A Ci2 A Cir A with i1 i2 ir are pivot columns and the columns 48 2 Matrices and Linear Equations

( ), ( ), . . . , ( ) < < ···< Cj1 A Cj2 A Cjs A with j1 j2 js are free columns. Then the sys- tem of linear equations is consistent if and only if bk = 0 for all k ≥ r + 1, + or equivalently, rank(A) = rank(A ). Further, then v = (v1,v2,...,vn), where v =− + , ≤ ≤ v = v = , ≤ ≤ it atj1 bt 1 t r, j1 1 and jl 0 2 l s, is a particular solution of the above nonhomogeneous system. Finally, a general solution xofthe system of linear equation is given by

1 2 s x = v + c1u + c2u + ··· + csu , where c1, c2,...,cs are arbitrary constants.

Proof From the previous proposition, a general solution of the homogeneous part of the above nonhomogeneous system of linear equations is given by

1 2 s c1u + c2u + ··· + csu .

Further, the system of linear equations is given by

+ + + ··· + = . a1i1 xi1 a1j1 xj1 a1j2 xj2 a1js xjs b1

+ + + ··· + = . a2i2 xi2 a2j1 xj1 a2j2 xj2 a2js xjs b2

......

......

+ + + ··· + = . ari1 xi1 arj1 xj1 arj2 xj2 arjs xjs br

The rest of the equations, if any, are the identities

0x1 + 0x2 + ··· + 0xn = bk, k ≥ r + 1.

Clearly, the system is inconsistent if bk = 0 for any k ≥ r + 1. Now, suppose that = ≥ + = = bk 0 for all k r 1. Putting the free variable xj1 1, and xjk 0for ≤ ≤ v = (v ,v ,...,v ) v =− + 2 k s, we get a particular solution 1 2 n , where it atj1 , ≤ ≤ v = v = , ≤ ≤ bt 1 t r, j1 1, and jl 0 2 l s of the system. From the Proposition 2.3.8, we get a general solution

1 2 s x = v + c1u + c2u + ··· + csu of the system, where c1, c2,...,cs are arbitrary constants. 

Example 2.4.13 Consider the system of linear equations given by the matrix equation

t Axt = b , 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 49 where A is the matrix given by ⎡ ⎤ 10102 ⎢ 01101⎥ A = ⎢ ⎥ ⎣ 00011⎦ 00000

The corresponding system of linear equations is given by

x1 + 0x2 + x3 + 0x4 + 2x5 = b1.

0x1 + x2 + x3 + 0x4 + x5 = b2.

0x1 + 0x2 + 0x3 + x4 + x5 = b3.

0x1 + 0x2 + 0x3 + 0x4 + 0x5 = b4.

The matrix A is in reduced row echelon form with the pivot columns C1, C2, C4, and the free columns C3 and C5. The nonzero rows R1, R2, R3 form a basis of row space, and the pivot columns C1, C2, C4 of A form a basis of the column space of A. Row rank = 3 = Column rank of A. For the system to be consistent b4 = 0. Assuming that b4 = 0, we find a general solution of the system. We first find a t basis of the solution space N(A) of the homogeneous part Axt = 0 of the given system of linear equations. x3 and x5 are free variables. Putting x3 = 1 and x4 = 0, we get a solution u1 = (−1, −1, 1, 0, 0) of the homogeneous part of the system. 2 Further putting x3 = 0 and x5 = 1, we get a solution u = (−2, −1, 0, −1, 1) of the homogeneous part of the system. The set {u1, u2} is a basis of the space N(A) of solutions of the homogeneous part. Nullity of A is 2. Finally, putting x3 = 1 and x4 = 0, we get a particular solution v = (−1 + b1, −1 + b2, 1, b3, 0) of the given nonhomogeneous system of linear equations. In turn, a general solution of the given 1 2 nonhomogeneous system of linear equations is v + c1u + c2u .

Observe that a square matrix in reduced row echelon form has no zero rows if and only if all the rows, and so also all columns have pivots, or equivalently, it is the identity matrix. Since a matrix with a zero row is singular, we have the following:

Proposition 2.4.14 A square matrix in reduced row echelon form is nonsingular if and only if it is the identity matrix. 

Elementary operations on a system of linear equations, or equivalently, ele- mentary row operations on the coefficient and augmented matrices, transform the system into equivalent system of linear equations. Further, if the coefficient matrix of the system of linear equations is in reduced row echelon form, then as observed above, a general solution of the system is easily obtained. As such, it is prompting to discover, if possible, an algorithm to reduce an arbitrary matrix 50 2 Matrices and Linear Equations in to a matrix in reduced row echelon form by using elementary row operations. The following theorem gives an algorithm.

Theorem 2.4.15 Using elementary row operations, every matrix can be reduced to a matrix in reduced row echelon form.

Proof Let A be a m × n matrix. If A is the zero matrix, then it is already in reduced row echelon form. Suppose that A is nonzero matrix. Let j1 be the least number such ( ) that the column Cj1 A is a nonzero column. Further, let i1 be the smallest number = such that ai1j1 0. Interchanging the i1th row and the first row, we may assume that = = < −1 a1j1 0, and aik 0 for all k j1. Multiplying the first row by a1j ,wemay = = < − 1 assume that a1j1 1, and aik 0 for all k j1. Next, adding aij1 times the first ≥ [ ] = row to the ith row for each i 2, we reduce A to a matrix aij , where a1j1 1, = ≥ = ≤ − aij1 0 for all i 2, and aik 0 for all k j1 1. If in this reduced matrix aij = 0 for all i ≥ 2, then it is already in reduced row echelon form. If not, let j2 be = ≥ the smallest number such that aij2 0forsomei 2. Further, let i2 be the smallest = > number greater than 2 such that ai2j2 0. Note that j2 j1. Interchanging the i2th = row and the second row, we may assume that a2j2 0. Then multiplying the second row by a−1, we may assume that a = 1. In turn, adding −a times the second 2j2 2j2 ij2 row to the ith row for each i = 2, A may have been reduced to a matrix in reduced row echelon form. If not, proceed as before. This process reduces A in to reduced row echelon form after finitely many steps (if worst comes, at the nth step). 

Corollary 2.4.16 Row rank of a matrix is the same as the column rank of the matrix.

Proof From the Proposition 2.4.6, and the Proposition 2.4.7, row rank and column rank of a matrix are invariant under elementary row operations. From the Proposition 2.4.10(iv), row rank of a matrix in reduced row echelon form is same as its column rank (equal to the number of pivots). Combining this with the Theorem 2.4.15,the result follows. 

Definition 2.4.17 Row rank of a matrix A, or equivalently, the column rank of a matrix is called the rank of the matrix. The rank of a matrix A is denoted by r(A).

Corollary 2.4.18 Let A be a m × n matrix. Then r(A) + n(A) = n.

Proof Since the rank and the nullity remain invariant under elementary row opera- tions, using Theorem 2.4.15, it is sufficient to prove the result for matrices in reduced row echelon form. For a matrix A in reduced row echelon form, r(A) is the number of pivot columns and n(A) is the number of free columns. Clearly, a column is either a pivot column or a free column. 

Example 2.4.19 Consider the system of linear equations

2x3 + 3x4 + 8x5 = 1.

2x1 + 4x2 + x3 + 5x5 = 0. 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 51

x1 + 2x2 + x3 + x4 + 5x5 = 2.

5x1 + 10x2 + 6x3 + 6x4 + 28x5 = a.

The corresponding coefficient matrix A is ⎡ ⎤ 00238 ⎢ 24105⎥ A = ⎢ ⎥ , ⎣ 12115⎦ 5106628 and the augmented matrix A+ is ⎡ ⎤ 002381 ⎢ ⎥ + 241050 A = ⎢ ⎥ . ⎣ 121152⎦ 5106628a

We discuss the consistency of the above system of linear equations, and if consistent, we determine a general solution. For the purpose, we reduce the coefficient matrix A, and also the augmented matrix A+ to reduced row echelon forms simultaneously by using the algorithm described in the above theorem. The 1st column of A is nonzero, and the smallest number i for which ai1 = 0 is 2. Thus, interchanging the 1st and the 2nd rows of A, and of A+, A is transformed to ⎡ ⎤ 24105 ⎢ ⎥ ⎢ 00238⎥ , ⎣ 12115⎦ 5106628 and A+ is transformed to ⎡ ⎤ 241050 ⎢ ⎥ ⎢ 002381⎥ . ⎣ 121152⎦ 5106628a

1 Now, multiplying the 1st row by 2 , the matrices are transformed to ⎡ ⎤ 12 1 0 5 ⎢ 2 2 ⎥ ⎢ 00238⎥ , ⎣ 12115⎦ 5106628 and to 52 2 Matrices and Linear Equations ⎡ ⎤ 12 1 0 5 0 ⎢ 2 2 ⎥ ⎢ 002381⎥ . ⎣ 121152⎦ 5106628a

Further, adding −1 times the 1st row to the 3rd row, and adding −5 times the 1st row to the 4th row, the matrices are transformed to ⎡ ⎤ 12 1 0 5 ⎢ 2 2 ⎥ ⎢ 0023 8⎥ , ⎣ 1 5 ⎦ 00 2 1 2 7 31 00 2 6 2 and to ⎡ ⎤ 12 1 0 5 0 ⎢ 2 2 ⎥ ⎢ 0023 8 1⎥ . ⎣ 1 5 ⎦ 00 2 1 2 2 7 31 00 2 6 2 a

Here, in this transformed matrix, ai2 = 0 for all i ≥ 2. Thus, the 2nd column is a free column. We look at the 3rd column. The 2nd row 3rd column entry a23 = 2 = 0. We divide the 2nd row by 2 to get the pivot entry 1in 2nd row 3rd column. The matrices, thus, reduce to ⎡ ⎤ 1 5 12 2 0 2 ⎢ 3 ⎥ ⎢ 001 2 4 ⎥ , ⎣ 1 5 ⎦ 00 2 1 2 7 31 00 2 6 2 and to ⎡ ⎤ 1 5 12 2 0 2 0 ⎢ 3 1 ⎥ ⎢ 001 2 4 2 ⎥ . ⎣ 1 5 ⎦ 00 2 1 2 2 7 31 00 2 6 2 a

− 1 In turn, to make all other entries in this pivot column 0, we add 2 times the 2nd − 1 − 7 row to the 1st row, 2 times the 2nd row to the 3rd row, and 2 times the 2nd row to the 4th row. The matrices reduce to ⎡ ⎤ − 3 1 120 4 2 ⎢ 3 ⎥ ⎢ 001 2 4 ⎥ , ⎣ 1 1 ⎦ 000 4 2 3 3 000 4 2 and to 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 53 ⎡ ⎤ − 3 1 − 1 120 4 2 4 ⎢ 3 1 ⎥ ⎢ 001 2 4 2 ⎥ . ⎣ 1 1 7 ⎦ 000 4 2 4 3 3 − 7 000 4 2 a 4

= 1 = The 3rd row 4th column entry a34 4 0. We multiply the 3rd row by 4 to get the pivot entry 1in 3rd row 4th column. Thus, the matrices further reduce to ⎡ ⎤ − 3 1 120 4 2 ⎢ 3 ⎥ ⎢ 001 2 4 ⎥ , ⎣ 000 1 2⎦ 3 3 000 4 2 and to ⎡ ⎤ − 3 1 − 1 120 4 2 4 ⎢ 3 1 ⎥ ⎢ 001 2 4 2 ⎥ . ⎣ 000 1 2 7 ⎦ 3 3 − 7 000 4 2 a 4

3 − 3 In turn, we add 4 times the 3rd row to the 1st row, 2 times the 3rd row to the 2nd −3 row, and the 4 times the 3rd row to the 4th row to make the rest of the entries in this pivot column 0. The coefficient matrix A reduces to the following matrix ⎡ ⎤ 12002 ⎢ ⎥ ⎢ 00101⎥ ⎣ 00012⎦ 00000 which is in reduced row echelon form, and the augmented matrix A+ gets transformed to ⎡ ⎤ 12002 5 ⎢ − ⎥ ⎢ 00101 10 ⎥ . ⎣ 00012 7 ⎦ 00000a − 7

Thus, the given system of linear equations is equivalent to a system of linear equations whose coefficient matrix is ⎡ ⎤ 12002 ⎢ ⎥ ⎢ 00101⎥ , ⎣ 00012⎦ 00000 and the augmented matrix is 54 2 Matrices and Linear Equations ⎡ ⎤ 12002 5 ⎢ − ⎥ ⎢ 00101 10 ⎥ . ⎣ 00012 7 ⎦ 00000a − 7

In turn, using the discussions and the results above, we have the following: (i) A basis of the row space of A is {(1, 2, 0, 0, 2), (0, 0, 1, 0, 1), (0, 0, 0, 1, 2)}. The rank r(A) = 3. (ii) Putting the free variable x2 = 1, and the free variable x5 = 0, we get a solution (−2, 1, 0, 0, 0) of the homogeneous part of the system. Further, putting the free variable x2 = 0, and the free variable x5 = 1, we get another solution (−2, 0, −1, −2, 1) of the homogeneous part of the system. The set {(−2, 1, 0, 0, 0), (−2, 0, −1, −2, 1)} is a basis of the solution space N(A) of the homogeneous part. A general solution of the homogeneous part of the system is

c1(−2, 1, 0, 0, 0) + c2(−2, 0, −1, −2, 1), where c1, c2 are arbitrary constants. (iii) The nonhomogeneous system is consistent if and only if 3 = r(A) = r(A+),or equivalently, a = 7. Then, giving the value x2 = 1, and x5 = 0 of the free variables, in the nonhomogeneous system, we get a particular solution (3, 1, −10, 7, 0). Thus, a general solution of the nonhomogeneous part is

(3, 1, −10, 7, 0) + c1(−2, 1, 0, 0, 0) + c2(−2, 0, −1, −2, 1), where c1, c2 are arbitrary constants.

Definition 2.4.20 A square matrix E obtained by applying elementary row opera- tions on identity matrix is called an elementary matrix.

Example 2.4.21 The matrix ⎡ ⎤ 1000 ⎢ ⎥ ⎢ 0100⎥ ⎣ 0030⎦ 0001 is an elementary matrix which is obtained by multiplying the 3rd row of the identity matrix I4 by 3. The matrix ⎡ ⎤ 0010 ⎢ ⎥ ⎢ 0100⎥ ⎣ 1000⎦ 0001 is an elementary matrix which is obtained by interchanging the 1st row and the 3rd row of the identity matrix I4. Again, the matrix 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 55 ⎡ ⎤ 1030 ⎢ ⎥ ⎢ 0100⎥ ⎣ 0010⎦ 0001 is also an elementary matrix which is obtained by adding 3 times the 3rd row of the identity matrix I4 to its 1st row.

τij denotes the elementary matrix which is obtained by interchanging the ith row and the jth row of identity matrix. Thus, ⎡ ⎤ 0010 ⎢ ⎥ ⎢ 0100⎥ = τ . ⎣ 1000⎦ 13 0001

The elementary matrix which is obtained by adding the λ times the jth row of the λ λ identity matrix to its ith row is denoted by Eij . Indeed, Eij is the matrix all of whose diagonal entries are 1, the ith row jth column entry is λ, and the rest of entries are 0. Thus, ⎡ ⎤ 1030 ⎢ 0100⎥ ⎢ ⎥ = E3 ⎣ 0010⎦ 13 0001

λ The matrices Eij are called the transvections. It can be easily observed that the effect of multiplying an elementary matrix E from left (right) to a matrix A is applying the elementary row (column) operation on A which was used to get the matrix E from the identity matrix. Thus, τijA is the λ matrix obtained by interchanging ith row and jth row of A, and Eij A is the matrix obtained by adding λ times the jth row of A to its ith row. It is straightforward, in particular, to verify the following relations, called the Steinberg relations, among the transvections in Mn(F). λ · μ = λ+μ, = λ · −λ = 0 = λ (i) Eij Eij Eij i j. In particular, Eij Eij Eij In. Thus, Eij is −λ invertible, and its inverse is Eij . = , = , λ μ (ii) For i l j k Eij and Ekl commute. = ( λ μ −λ −μ) = λμ (iii) For i l, Eij Ejl Eij Ejl Eil . = ( λ μ −λ −μ) = −μλ (iv) For j k, Eij EkiEij Eki Ejk . Proposition 2.4.22 Let A be a m × n matrix. Then, we can find a nonsingular m × m matrix P such that PA is a matrix in reduced row echelon form. In particular, a square matrix A is nonsingular if and only if its reduced row echelon form PA is the identity matrix. 56 2 Matrices and Linear Equations

Proof Applying an elementary row operation on A is equivalent to multiply A from left by an elementary matrix. Since every matrix can be reduced to a matrix in reduced row echelon form (Theorem 2.4.15), multiplying A successively by ele- mentary matrices from left we arrive at matrix in reduced row echelon form. Since elementary matrices are nonsingular, and product of nonsingular matrices are nonsin- gular, we get a nonsingular matrix P such that PA is a matrix in reduced row echelon form. Since P is nonsingular, A is nonsingular if and only if PA is nonsingular. From the Proposition 2.4.14, A is nonsingular if and only if PA is the identity matrix. 

The above discussion and the results give an algorithm to determine a nonsingular matrix P such that PA is a reduced row echelon matrix. In particular, it gives an algorithm to check if a square matrix A is invertible, and if so, to find the inverse of A. We further illustrate the algorithm by means of examples.

Example 2.4.23 Consider the matrix ⎡ ⎤ 003 1 2 ⎢ 012 0 0⎥ A = ⎢ ⎥ . ⎣ 021−11⎦ − 1 011 3 0

Using the elementary row operations, we transform the matrix A in to a matrix in reduced row echelon form, and simultaneously find a nonsingular matrix P such that PA is a matrix in reduced row echelon form. We start with the pair ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1000 003 1 2 ⎢ ⎢ 0100⎥ ⎢ 012 0 0⎥ ⎥ ⎢ I = ⎢ ⎥ , A = ⎢ ⎥ ⎥ . ⎣ 4 ⎣ 0010⎦ ⎣ 021−11⎦ ⎦ − 1 0001 011 3 0

There is no nonzero entry in the first column of A, and so no pivot will appear in the first column. We leave and move to the second column. The first nonzero entry in the second column of A is 1, and it is in the second row. We interchange the first row R1 and the second row R2 in the pair of matrices. The pair, thus, gets transformed to the pair (E1, A1) given by ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 0100 012 0 0 ⎢ ⎢ 1000⎥ ⎢ 003 1 2⎥ ⎥ ⎢ E = ⎢ ⎥ , A = ⎢ ⎥ ⎥ ⎣ 1 ⎣ 0010⎦ 1 ⎣ 021−11⎦ ⎦ − 1 0001 011 3 0

(note that E1A = A1). The entry 1 in the first row and second column of A1 is the pivot entry. To make the rest of the entries in this pivot column 0, we replace R3 by R3 − 2R1, and then R4 by R4 − R1. In turn, the pair (E1, A1) gets transformed to the pair (E2, A2) given by 2.4 Gauss Elimination, Elementary Operations, Rank, and Nullity 57 ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 0100 01 2 0 0 ⎢ ⎢ 1000⎥ ⎢ 00 3 1 2⎥ ⎥ ⎢ E = ⎢ ⎥ , A = ⎢ ⎥ ⎥ ⎣ 2 ⎣ 0 −210⎦ 2 ⎣ 00−3 −11⎦ ⎦ − − − 1 0 101 00 1 3 0

(Again note that E2A1 = E2E1A = A2). The second row third column entry is 3 R2 which is nonzero. We replace R2 by 3 to make it a pivot entry 1, and in turn, we replace R1 by R1 − 2R2, R3 by R3 + 3R2, and R4 by R4 + R2 to make all the rest of the entries in this pivot column 0. Thus, the pair (E2, A2) is transformed to the pair (E3, A3) given by ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ − 2 − 2 − 4 3 100 010 3 3 ⎢ ⎢ 1 000⎥ ⎢ 001 1 2 ⎥ ⎥ ⎢ E = ⎢ 3 ⎥ , A = ⎢ 3 3 ⎥ ⎥ ⎣ 3 ⎣ 1 −210⎦ 3 ⎣ 000 0 3 ⎦ ⎦ 1 − 2 3 101 000 0 3

(Again, note that E3A2 = A3). Since the 3rd row 4th column, and 4th row 4th column entries are 0, there is no pivot in the 4th column, it is a free column. We go to the 5th 1 column. The 3rd row 5th column entry is 3 which is nonzero. We replace R3 by 3 R3 to + 4 make the 3rd row 5th column entry a pivot entry 1, and then replace R1 by R1 3 R3, − 2 − 2 , R2 by R2 3 R3, and R4 by R4 3 R3. Thus, the pair (E3 A3) is transformed to the pair (E4, A4) given by ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ − 2 1 4 − 2 9 9 9 0 010 3 0 ⎢ ⎢ 1 4 − 2 0 ⎥ ⎢ 1 ⎥ ⎥ ⎢ = ⎢ 9 9 9 ⎥ , = ⎢ 001 3 0 ⎥ ⎥ , ⎣ E4 ⎣ 1 − 2 1 ⎦ A4 ⎣ ⎦ ⎦ 3 3 3 0 000 0 1 1 5 − 2 9 9 9 1 000 0 0 where A4 is in reduced row echelon form, and P = E4 is an invertible matrix such that PA = A4 is in reduced row echelon form.

Example 2.4.24 Consider the matrix A given by ⎡ ⎤ 013 ⎢ ⎥ ⎢ 102⎥ . ⎣ 021⎦ 111

We apply the following elementary row operations in succession. (i) Interchange R1 and R2, (ii) replace R4 by R4 − R1, R3 by R3 − 2R2 and R4 by R4 − R2, − 1 − − + (iii) replace R3 by 3 R3, R1 by R1 2R3, R2 by R2 2R3 and R4 by R4 3R3. on A, and also on I4. Then, A reduces to the reduced row echelon form 58 2 Matrices and Linear Equations ⎡ ⎤ 100 ⎢ ⎥ ⎢ 010⎥ , ⎣ 001⎦ 000 and I4 reduces to ⎡ ⎤ − 4 2 3 1 3 0 ⎢ − 1 2 ⎥ = ⎢ 3 0 3 0 ⎥ . P ⎣ 2 − 1 ⎦ 3 0 3 0 1 −1 −11

Thus, PA is in the row echelon form given above.

Example 2.4.25 Consider the 3 × 3matrixA given by ⎡ ⎤ 111 A = ⎣ 123⎦ 149

If we use the method of the above example, then A reduces to the identity matrix I3, and I3 reduces to ⎡ ⎤ − 5 1 3 2 2 P = ⎣ −34−1 ⎦ − 3 1 1 2 2

Thus, A is invertible, and PA = I3. Hence P is the inverse of A.

2.5 LU Factorization

If the coefficient matrix of a system of linear equations is upper triangular square matrix U with nonzero diagonal entries, then the solution is easily obtained by inspection. For example, if a system of linear equations is given by the matrix equation t Uxt = b , where ⎡ ⎤ 111 U = ⎣ 023⎦ , 009

− = ( , , ) ( − b2 + 4b3 , 3b2 b3 , b3 ) and b b1 b2 b3 , then, evidently, the solution is 18b1 2 9 2 9 . Similarly, it is also easy to solve a system of linear equations whose coefficient matrix is lower triangular square matrix with nonzero diagonal entries. Further, suppose the coefficient matrix A is invertible, and it is expressed as A = LU, where L is a lower triangular matrix, and Uis an upper triangular matrix. Then, we first find the solution t v of Uyt = b , and then the solution u of Uxt = vt. Clearly, u is the solution of t Axt = b . 2.5 LU Factorization 59

The above discussion prompts us to look at the problem of factorizing an invertible matrix A as a product LU of a lower triangular matrix L and an upper triangular matrix U. This, in general, is not possible.

Example 2.5.1 Suppose that    v 01 = a 0 · u . 10 bc 0 w

Then au = 0, av = 1, bu = 1. This, however, is impossible. This shows that the invertible matrix  01 10 cannot be expressed as product of a lower triangular and an upper triangular matrix. Observe that the matrix ⎡ ⎤ 111 ⎣ 003⎦ 029 is also not expressible as product of a lower triangular and an upper triangular matrix.

The reason behind the impossibility of expressing the above matrices as product of lower and upper triangular matrices is while reducing these matrices in to reduced row echelon forms, we are forced either to interchange rows, or to add a nonzero multiple of a kth row to lth row for some k > l. Equivalently, we need to multiply from left by a corresponding elementary matrix τij, or by a corresponding matrix λ Ekl. Obviously, these matrices are not lower triangular matrices. Indeed, if, while reducing A in to reduced row echelon form, elementary row operations of the above type are not needed, then we can find a lower triangular matrix P with diagonal entries 1 so that PA is upper triangular. In turn, A = LU, where L = P−1.We illustrate it by means of examples.

Example 2.5.2 Consider the matrix A given by ⎡ ⎤ 111 A = ⎣ 123⎦ , 149 and the system of linear equations given by the matrix equation

Axt =[1, 2, 3]t.

Adding −1 times the 1st row of A to the 2nd row, and then adding −1 times the 1st −1 −1 row to the 3rd row, or equivalently, multiplying the matrix E13 E12 to A from left, we obtain that 60 2 Matrices and Linear Equations ⎡ ⎤ 111 −1 −1 = ⎣ ⎦ . E13 E12 A 012 038

Again, adding −3 times the 2nd row of the above matrix to its 3rd row, or equivalently, −3 −1 −1 −3 −1 −1 multiplying E23 to E13 E12 A from left, we obtain that E23 E13 E12 A is the upper triangular matrix U given by ⎡ ⎤ 111 U = ⎣ 012⎦ . 002

= = 1 1 3 Thus, A LU, where L E12E13E23 is the lower triangular matrix given by ⎡ ⎤ 100 L = ⎣ 110⎦ 131

Now, to find solution of Axt =[1, 2, 3]t, we first find the solution of Lyt =[1, 2, 3]t. Equating the corresponding entries of both sides, y1 = 1, y1 + y2 = 2, and y1 + t t t 3y2 + y3 = 3. This gives the solution [1, 1, −1] of Ly =[1, 2, 3] . Finally, we find the solution of Uxt =[1, 1, −1]t to get the solution of the original equation Axt = [1, 2, 3]t. Equating the entries of both sides in the equation Uxt =[1, 1, −1]t, we get =− , + = , + + = = −1 , = that 2x3 1 x2 2x3 1 and x1 x2 x3 1. Evidently, x3 2 x2 , = −1 2 and x1 2 .

2.6 Equivalence of Matrices, Normal Form

Definition 2.6.1 Two m × n matrices A and B with entries in a field F are said to be equivalent if there exists a nonsingular m × m matrix P, and a nonsingular n × n matrix Q such that A = PBQ.

Clearly, the relation of being equivalent to is an equivalence relation on Mmn(F). We determine a unique representative of each equivalence class of equivalent matri- ces.

Definition 2.6.2 A m × n matrix A is said to be in normal form if there is r ≤ min(m, n) such that  I O − A = r rn r , Om−rr Om−rn−r where Omndenote the zero m × n matrix.

Theorem 2.6.3 Every m × n matrix is equivalent to a unique matrix in normal form. 2.6 Equivalence of Matrices, Normal Form 61

Proof Applying an elementary row operation on a matrix A is equivalent to multiply A from left by an elementary matrix, and applying an elementary column operation is equivalent to multiply matrix A from right by an elementary matrix. Since all elemen- tary matrices are nonsingular, and product of nonsingular matrices are nonsingular, it is sufficient to show that every matrix can be reduced to a matrix in normal form with the help of elementary row, and elementary column operations. The proof of this fact is by the induction on max(m, n), where m is the number of rows and n the number of columns. If max(m, n) = 1, then m = 1 = n, and A =[a11] is 1 × 1 matrix. If A =[0], then it is already in normal form. If a11 = 0, then multiplying −1 [ ] the row by a11 , we reduce it to the normal form 1 . Assume that the result is true for all r × s matrices with max(r, s)

By the induction hypothesis there is a m − 1 × m − 1 nonsingular matrix C, and there is a nonsingular n − 1 × n − 1matrixD such that    I − O − − C BD = r 1 r 1 n r Om−rr−1 Om−rn−r

Take   I1 O1 n−1 C =  , Om−11 C and   I1 O1 n−1 D =  . Om−11 D

Then C and D are nonsingular. In fact,  ( )−1 = I1 O1 n−1 C  −1 Om−11 (C ) 62 2 Matrices and Linear Equations

(Use block multiplication to show this). Again, using block multiplication, we find that     I1 O1 n−1  I1 O1 n−1 Ir Orn−r C · · D =   = Om−11 B Om−11 C BD Om−rr Om−rn−r

Take P = C · C, and Q = D · D. Then P is nonsingular m × m matrix, and Q a nonsingular n × n matrix such that  I O − PAQ = r rn r Om−rr Om−rn−r is in normal form. Finally,  Ir Orn−r Om−rr Om−rn−r is equivalent to  Is Osn−s Om−ss Om−sn−s if and only if r = s, for one can be obtained from the other using elementary opera- tions if and only if r = s. 

Corollary 2.6.4 There are min(m, n) + 1 equivalence classes of equivalent matri- ces in Mmn(F).

Proof There are min(m, n) + 1 matrices in Mmn(F) which are in normal form.  Corollary 2.6.5 Two matrices A and B are equivalent if and only if they have same rank.

Proof Since under elementary operations rank of the matrices do not change and rank of the matrix  Ir Orn−r Om−rr Om−rn−r is r, the result follows. 

Corollary 2.6.6 All nonsingular matrices in Mn(F) are equivalent to In. The group GL(n, F) is a single complete equivalence class of equivalent matrices. 

Proof Let A be a n × n matrix which is nonsingular. Then there are nonsingular matrices P and Q such that PAQ is in normal form. Clearly, then PAQ is also nonsin- gular. The result follows if we observe that a matrix in normal form is nonsingular if and only if it is the identity matrix. 

Corollary 2.6.7 The group GL(n, F) is generated by elementary matrices. Indeed, every element of GL(n, F) is product of elementary matrices. 2.6 Equivalence of Matrices, Normal Form 63

Proof All elementary matrices are nonsingular, and so they belong to GL(n, F). Further, given any matrix A ∈ GL(n, F), there are nonsingular matrices P and Q which are product of elementary matrices such that PAQ = In. But, then A = P−1Q−1. Since inverse of an elementary matrix is an elementary matrix, P−1 and Q−1 are product of elementary matrices. This shows that A is product of elementary matrices.  { λ | = , λ ∈ } ( , ) Remark 2.6.8 The matrices Eij i j F do not generate GL n F (verify).

Remark 2.6.9 The proof of the Theorem 2.6.3 gives us a method by which (i) we can reduce a matrix A into normal form, (ii) we can find nonsingular matrices P and Q such that PAQ is in normal form, and (iii) we can determine whether A is nonsingular, and then we can find its inverse also.

Following two examples illustrates the algorithm.

Example 2.6.10 Let A be a m × n matrix. To find nonsingular matrices P and Q such that PAQ is in normal form, we proceed as follows: We start with a row with three columns. The first column Im, the second A, and the third column In. Then we try to reduce the matrix A in to normal form by successive elementary row and elementary column operations. Whenever we perform a row operation on A, apply the same operation to the matrix in the first column, and keep the matrix in the third column as it is, and if we perform a column operation on A, then we perform the same operation on the matrix in the third column, and keep the matrix in the first column as it is. Then as the matrix A reduces to a matrix in normal form, the matrix in the first column reduces to the required matrix P, and the matrix in the third column reduces to the required matrix Q. Consider, for example, the matrix ⎡ ⎤ 111 ⎢ 201⎥ A = ⎢ ⎥ . ⎣ 110⎦ 012

Let Ri denote the ith row, and Cj denote the jth column. We start with a row ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1000 111 ⎡ ⎤ 100 ⎢ ⎢ 0100⎥ ⎢ 201⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ 010⎦ ⎥ . ⎣ ⎣ 0010⎦ ⎣ 110⎦ ⎦ 001 0001 012

Replacing R2 by R2 − 2R1, and R3 by R3 − R1, we transform the above row to the row 64 2 Matrices and Linear Equations ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1000 11 1 ⎡ ⎤ 100 ⎢ ⎢ −2100⎥ ⎢ 0 −2 −1 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ 010⎦ ⎥ . ⎣ ⎣ −1010⎦ ⎣ 00−1 ⎦ ⎦ 001 0001 01 2

Next, replacing C2 by C2 − C1, and C3 by C3 − C1, we get the transformed row as ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1 000 10 0 ⎡ ⎤ 1 −1 −1 ⎢ ⎢ −2100⎥ ⎢ 0 −2 −1 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ 01 0⎦ ⎥ . ⎣ ⎣ −1010⎦ ⎣ 00−1 ⎦ ⎦ 00 1 0001 01 2

Interchanging R2 and R4, and then replacing R4 by R4 + 2R2, it reduces to ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1 000 10 0 ⎡ ⎤ 1 −1 −1 ⎢ ⎢ 0001⎥ ⎢ 01 2 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ 01 0⎦ ⎥ . ⎣ ⎣ −1010⎦ ⎣ 00−1 ⎦ ⎦ 00 1 −2102 00 3

Replacing C3 by C3 − 2C2, we transform it to ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1 000 10 0 ⎡ ⎤ 1 −11 ⎢ ⎢ 0001⎥ ⎢ 01 0 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ 01−2 ⎦ ⎥ . ⎣ ⎣ −1010⎦ ⎣ 00−1 ⎦ ⎦ 00 1 −2102 00 3

Finally, replacing R3 by −R3, and then R4 by R4 − 3R3, we transform it to ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ 1000 100 ⎡ ⎤ 1 −11 ⎢ ⎢ 0001⎥ ⎢ 010⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎣ 01−2 ⎦ ⎥ . ⎣ ⎣ 10−10⎦ ⎣ 001⎦ ⎦ 00 1 −51 3 2 000

Thus, A reduces to the normal form  I3 . O13

Further, the required nonsingular matrices P and Q are given by ⎡ ⎤ 1000 ⎢ 0001⎥ P = ⎢ ⎥ , ⎣ 10−10⎦ −51 3 2 2.6 Equivalence of Matrices, Normal Form 65 and ⎡ ⎤ 1 −11 Q = ⎣ 01−2 ⎦ . 00 1

2.7 Congruent Reduction of Symmetric Matrices

Definition 2.7.1 A square matrix A is said to be congruent to a matrix B if there is an invertible matrix P such that PAP t = B. Observe that if A is symmetric, then PAP t is also symmetric. Theorem 2.7.2 Every symmetric matrix A with entries in a field F of characteristic different from 2 is congruent to a diagonal matrix. Proof The proof is algorithmic. Let us recall that applying an elementary row oper- ation on A is equivalent to multiply from left the corresponding elementary matrix E, and applying the same type of elementary column operation on A is equivalent to multiply the matrix A from right by the elementary matrix Et (note that if we apply an elementary row operation on the identity matrix and take its transpose, then it is the same as apply the same elementary column operation on the identity matrix). Thus, it is sufficient to show that a symmetric matrix with entries in a field F of character- istic different from 2 can be reduced to a diagonal matrix by applying successively elementary row followed by the same type of elementary column operations. Let A be a symmetric matrix with entries in F, where characteristic of F if different from 2. If A = 0, then there is nothing to do. Suppose that A = 0. We may suppose that a11 = 0, for if not, suppose that aij = aji = 0, then adding the ith row to the first row, and then adding the ith column to the first column the first row first column entry becomes 2aij = 0 (note that the characteristic F = 2). Then, for each i = 1, adding − −1 − −1 ai1a11 times the first row to the ith row, and ai1a11 times the first column to the ith column, we reduce the matrix to a symmetric matrix matrix in which all entries in the first row (and so also in the first column) except a11 is 0. Now, if aij = 0 for all i, j ≥ 2, we have reduced it to a diagonal matrix. If not, using the previous argument, we may take a22 = 0, and then for i = 2 reduce all the entries ai2 = a2i = 0. Proceeding inductively we reduce the matrix A to a diagonal matrix.  Taking Q = P−1, we get the following corollary. Corollary 2.7.3 Every symmetric matrix A with entries in a field of characteristic different from 2 can be decomposed as A = QDQt, where Q is an invertible matrix, and D is a diagonal matrix.  Remark 2.7.4 The theorem does not hold over a field of characteristic 2. Consider the matrix  01 . 10 66 2 Matrices and Linear Equations

Suppose that     ab 01 ac = p 0 . cd 10 bd 0 q

Equating the corresponding entries p = ba + ab, q = dc + cd, da + cb = 0 = bc + ad. Since the field is of the characteristic 2, p = 0 = q. In turn,     ab 01 ac = 00 . cd 10 bd 00

But, then  ab cd is singular. We illustrate the algorithm of congruent reduction by means of an example. Example 2.7.5 Let A be a symmetric n × n matrix. To find a nonsingular matrix P such that PtAP is a diagonal matrix, we proceed as follows: We start with a row with 3 columns, the first column In, the second column A, and the third column In. We reduce the matrix A in to a diagonal form by successive elementary row and corresponding elementary column operations as described in the above theorem. Whenever we apply an elementary row operation on A, we apply the same operation on the matrix in the first column, and keep the matrix in third column as it is, and whenever we apply elementary column operation we apply the same operation on the matrix in the third column, and keep the first column as it is. In this process as soon as A reduces to a diagonal matrix, the first column reduces to P, and the third column, then will be Pt. Further, PAP t is a diagonal matrix. Consider, for example, the matrix ⎡ ⎤ 012 A = ⎣ 101⎦ 210 and the triple   I3 AI3

If we apply the following elementary operations 1. R1 −→ R1 + R2, 2. C1 −→ C1 + C2, −→ − 1 3. R2 R2 2 R1, −→ − 1 4. C2 C2 2 C1, −→ − 3 5. R3 R3 2 R1, −→ − 3 6. C3 C3 2 C1, 2.7 Congruent Reduction of Symmetric Matrices 67

7. R3 −→ R3 − R2, 8. C3 −→ C3 − C2, successively, on the triple   I3 AI3 , then the triple of matrices reduce to the triple ⎡ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤ − 1 − 110 20 0 1 2 1 ⎣ ⎣ − 1 1 ⎦ ⎣ − 1 ⎦ ⎣ 1 − ⎦ ⎦ . 2 2 0 0 2 0 1 2 2 −1 −2 −1 00−4 00 1

( , − 1 , − ) Thus, A is congruent to diag 2 2 4 , and P is the matrix ⎡ ⎤ 11−0 ⎣ − 1 1 ⎦ . 2 2 0 −1 −21

= −1 = ( , − 1 , − ) = t Further, take L P and D diag 2 2 4 , then A LDL . Note that L is not a lower triangular matrix. However, if we consider the matrix ⎡ ⎤ 112 A = ⎣ 101⎦ 210 with the triple   I3 AI3 of matrices and apply the following elementary operations on each member of the triple to reduce A to a diagonal matrix. 1. R2 −→ R2 − R1 and R3 −→ R3 − 2R1, 2. C2 −→ C2 − C1 and C3 −→ C3 − 2C1, 3. R3 −→ R3 − R2, 4. C3 −→ C3 − C2. Then the triple of matrices reduce to the triple ⎛ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎞ 100 10 0 1 −1 −1 ⎝ ⎣ −110⎦ ⎣ 0 −10⎦ ⎣ 01−1 ⎦ ⎠ −1 −11 00−3 00 1

Thus, A is congruent to diag(1, −1, −3) and P is the matrix ⎡ ⎤ 10−0 ⎣ −11 0⎦ −1 −11 68 2 Matrices and Linear Equations

= −1 = ( , − 1 , − ) = t Further, take L P and D diag 2 2 4 , then A LDL . Note that in this case P and L are lower triangular matrices. Example 2.7.6 Consider the symmetric matrix ⎡ ⎤ 30−1 A = ⎣ 010⎦ −10 3 with the triple   I3 AI3 of matrices and apply the following elementary operations on each member of the triple to reduce A to a diagonal matrix. −→ + 1 1. R3 R3 3 R1 and −→ + 1 2. C3 C3 3 C1. Then the triple of matrices reduce to the triple ⎛ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎞ 1 100 300 10 3 ⎝ ⎣ 010⎦ ⎣ 010⎦ ⎣ 010⎦ ⎠ 1 8 3 01 00 3 001

Here again, P is a lower triangular matrix and the diagonal matrix D has all diagonal √ √ √  = −1 = ( , , 8 entries positive. As such, if we take L P D, where D Diag 3 1 3 , then A = LLt. Later we shall describe those symmetric matrices which can be expressed as LLt, where L is a lower triangular matrix. Exercises 2.7.1 Give two bases of the vector space Mnm(F) of n × m matrices with entries in a field F over the field F.

2.7.2 Find a basis, and so also the dimension of the vector space Sn(F) of n × n symmetric matrices with entries in a field F. 2.7.3 Let F be a field of characteristic different from 2. Find a basis, and so also the dimension of the vector space SSn(F) of n × n skew symmetric matrices with entries in a field F. Do the same for fields of characteristic 2. Are they same?

2.7.4 Let A be a n × m matrix. Consider the subset W ={B ∈ Mmp | AB = 0np} of Mmp. Show that W is a subspace of Mmp. Further, show that the dimension of W is pn(A), where n(A) denotes the nullity of A. 2.7.5 Show that every square matrix A with entries in a field F of characteristic different from 2 is uniquely expressible as sum of a symmetric matrix, and a skew symmetric matrix. Deduce that vector space Mn(F) is direct Sn(F) ⊕ SSn(F). = A+At + A−At Hint. A 2 2 . 2.7 Congruent Reduction of Symmetric Matrices 69

2.7.6 Find a basis, and so also the dimension of the vector space UTn(F) of upper triangular matrices over F.

2.7.7 The sum of the diagonal entries of a square matrix A is called the Trace of A, and it is denoted by Tr(A).Letsl(n, F) denote the set of n × n matrices with trace 0. Show that sl(n, F) is a vector space with respect the addition of matrices and multiplication by scalars. Find a basis of sl(n, F), and so also its dimension.

2.7.8 Let A and B be square n × n matrices. Show that Tr(AB − BA) = 0. Deduce that AB − BA is never identity matrix. Show by means of an example that it may be a nonsingular diagonal matrix.

2.7.9 Show by means of an example that AAt need not be same as AtA.

2.7.10 Consider the co-diagonal n × n matrix n =[aij], where aij = 1ifi + j = + =  2 = n 1, and aij 0, otherwise. Show that n is symmetric and n In. What is the matrix nAn.

2 2.7.11 Describe all 2 × 2 matrices A such that A = 02.

n 2.7.12 Let A be a strictly upper (lower) triangular n × n matrix. Show that A = 0n.

m 2.7.13 Let A be a square n × n matrix which is nilpotent in the sense that A = 0n for some m. Show that In + A is invertible. Show that

2 m−1 In + A + A + ··· + A

is the inverse of A. Is the converse of this statement true? Support.

2.7.14 Let A =[aij] be a square n × n matrix which commutes with e12. Show that a12 = 0 = a21, and a11 = a22. Show that a matrix commutes with all eij if and only if it is a scalar matrix. Show also that the matrices which commute with all transvections are precisely scalar matrices. Deduce that the center Z(GL(n, F)) is  precisely {aIn | a ∈ F }.

2.7.15 Find a basis, and so also the dimension of the subspaces of R4 generated by the following subsets:

(i) {(1, 0, 2, 1), (2, 1, 3, 2), (7, 4, 9, 5), (1, 5, 6, 1)},

(ii) {(1, 1, 1, 1), (1, 0, 2, 3), (1, 0, 4, 9), (1, 0, 8, 27)}.

2.7.16 Reduce the following matrices in to reduced row echelon form. Find the bases of their row spaces, column spaces, and Null spaces. Find their rank, and the nullities. Further, for each of the matrices A, find an invertible matrix P such that PA is a reduced row echelon form of A. 70 2 Matrices and Linear Equations ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 003−3 −3 1111 12 3 4 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 243 3 1 ⎥ , ⎢ 0123⎥ , ⎢ 24 711⎥ ⎣ 243 3 3 ⎦ ⎣ 10−10⎦ ⎣ 3 7 14 25 ⎦ 122 1 2 −51 3 2 4112550 ⎡ ⎤ 1234 ⎢ ⎥ ⎢ 5678⎥ ⎣ 9101112⎦ 13 14 15 16

2.7.17 Check if the following systems of linear equations are consistent, and if so find their general solutions.

1. x1 + 3x2 + 4x3 = 1. 2x1 − x2 + x3 = 2. 4x1 + x2 − x3 = 0. 8x1 − 3x2 + x3 = 3.

2. x1 + 2x2 + x3 + 2x4 + x5 = 2. 2x1 + 4x2 + 3x3 + 3x4 + x5 = 8. 2x1 + 4x2 + 4x3 + 2x4 + 2x5 = 8. x1 + 2x2 + 2x3 + x4 + 2x5 = 2.

3. 4x1 − 15x2 − 2x3 − 32x4 =−40. x1 − 2x2 − 3x4 =−4. −3x1 + 16x2 + 3x3 + 38x4 = 46. x1 − 6x2 − x3 − 14x4 =−17.

2.7.18 Find the value of a, if possible, for which the following system of linear equations is consistent.

4x1 − 15x2 − 2x3 − 32x4 =−40. x1 − 2x2 − 3x4 =−4. −3x1 + 16x2 + 3x3 + 38x4 = 46. x1 − 6x2 − x3 − 14x4 = a.

2.7.19 Check if the matrices in exercise 16 have LU decompositions and if so find their LU decompositions.

2.7.20 Express each of the following symmetric matrices as PDPt, where P is a nonsingular matrix, and D a diagonal matrix. Which of the matrices are expressible as LDLt, where L is a lower triangular matrix. Also express them, if possible, as LLt, where L is a lower triangular matrix. 2.7 Congruent Reduction of Symmetric Matrices 71 ⎡ ⎤ ⎡ ⎤ 1111 ⎡ ⎤ 101 012 ⎢ 1123⎥ ⎣ 011⎦ , ⎢ ⎥ , ⎣ 101⎦ ⎣ 1213⎦ 113 210 1132 ⎡ ⎤ 12 3 4 ⎢ ⎥ ⎢ 26 7 8⎥ ⎣ 371112⎦ 4 8 12 16

2.7.21 Find the maximum number of arithmetic operations needed to reduce a 3 × 3 matrix into reduced row echelon form. Generalize it to n × n matrices.

2.7.22 Write a program in C-Language to check if a system of linear equations is consistent, and if so to find a general solution.

2.7.23 Write a program in C-Language to check if a matrix A admits LU decom- position, and if so to find it.

2.7.24 Write a program in C-Language to check if a symmetric matrix A admits LLt decomposition, and if so to find it. Chapter 3 Linear Transformations

This chapter centers around the study of linear transformations and their matrix representations.

3.1 Definition and Examples

Definition 3.1.1 Let V1 and V2 be vector spaces over a field F.AmapT from V1 to V2 is called a linear transformation or a homomorphism if

T (ax + by) = aT(x) + bT(y) for all a, b ∈ F, and x, y ∈ V1. A bijective linear transformation is called an isomorphism.

Proposition 3.1.2 Let T be a linear transformation from a vector space V1 to a vector space V2. Then the following hold: (i) T (0) = 0 and (ii) T (−x) =−T (x) for all x ∈ V1. (iii) If W1 is a subspace of V1, then T (W1) is a subspace of V2. −1 (iv) If W2 is a subspace of V2, then the inverse image T (W2) of W2 under T is a subspace of V1.

Proof (i) Since T is a linear transformation, T (0) = T (0 · x + 0 · y) = 0 · T (x) + 0 · T (y) = 0. (ii) T (−x) = T (−1 · x + 0 · x) = (−1) · T (x) + 0 · T (x) =−T (x). (iii) Let W1 be a subspace of V1. Then 0 ∈ W1, and so 0 = T (0) ∈ W2.Let T (x), T (y) ∈ T (W1), where x, y ∈ W1. Since W1 is a subspace,

© Springer Nature Singapore Pte Ltd. 2017 73 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_3 74 3 Linear Transformations ax + by ∈ W1 for all a, b ∈ F.ItfollowsthataT(x) + bT(y) = T (ax + by) belongs to T (W1). This shows that T (W1) is a subspace of V2. −1 −1 (iv) Since 0 = T (0) ∈ W2, it follows that 0 ∈ T (W2).Letx, y ∈ T (W2). Then T (x), T (y) ∈ W2. Since W2 is a subspace, T (ax + by) = aT(x) + bT(y) ∈ −1 −1 W2 for all a, b ∈ F.Itfollowsthatax + by ∈ T (W2). This shows that T (W2) is a subspace of W1.  The inverse image T −1({0}) of the trivial subspace {0} under a linear transforma- tion T is a subspace, called the null space of T , and it is denoted by N(T ). N(T ) is also called the kernel of T, and then it is denoted by ker T. Proposition 3.1.3 A linear transformation T is injective if and only if N(T ) ={0}. Proof Suppose that T is injective, and x ∈ N(T ). Then T (x) = 0 = T (0). Since T is assumed to be injective, x = 0. Hence N(T ) ={0}. Suppose, conversely, that N(T ) ={0}. Suppose that T (x) = T (y). Since T is a linear transformation, T (x − y) = T (x) − T (y) = 0. Hence x − y ∈ N(T ) ={0}, and so x = y. Thus, T is injective. 

3 Example 3.1.4 Let a = (a1, a2, a3) be a vector in the Euclidean vector space R over R. Define a map T from R3 to itself by T (r) = r × a, where × is the vector product in R3. Then T is a linear transformation (follows from the property of vector product). The null space of T is given by N(T ) ={x | x×a = 0}={αa | α ∈ R} provided that a = 0. What is the image of T ? Example 3.1.5 Let F be a field. Let V = F n and W = F m be standard vector spaces over the field F.LetT be a linear transformation from F n to m n F .Let{e1, e2, ··· , en} denote the standard basis of F .LetT (ei ) = ri = (ai1, ai2, ··· , aim). Then T determines a matrix A = M(T ) whose ith row is ri .It is evident that T (x) = xM(T ), where x is treated as a 1 × n matrix. Thus, a linear transformation from F n to F m is precisely multiplication by a n × m matrix from right (note that elements of F n are considered as 1 × n matrices). The null space N(T ) is precisely the null space N(M(T )) of the corresponding matrix M(T ). Example 3.1.6 Let V be a vector space over a field F, and W a subspace of V . Consider the quotient space V/W.Thequotient map ν from V to V/W given by ν(x) = x + W is a linear transformation (follows from the definition of operations on V/W). Clearly, ν is surjective, and the null space N(ν) is given by N(ν) ={x ∈ V | x + W = ν(x) = W} (note that the zero of V/W is the coset W). Since x + W = W if and only if x ∈ W, it follows that N(ν) = W. In turn, it also follows that every subspace of a vector space is null space of a linear transformation.

Example 3.1.7 Let ℘n denote the vector space of polynomials over the field R of real numbers of degrees at most n.LetD denote the derivative. Thus,

2 n 2 n−1 D(a0 + a1 X + a2 X +···+an X ) = a1 + 2a2 X + 3a3 X +···+nan X .

Then D is a linear transformation (verify). The null space of D is space of constant polynomials. Find its rank and nullity. Note that D is nilpotent. Indeed, Dn+1 is the 3.1 Definition and Examples 75 zero linear transformation. Further, I + D is an isomorphism. In fact, I − D + D2 − D3 +···+(−1)n Dn is the inverse of I + D (verify).

Example 3.1.8 Let C∞(R) denote the vector space of real-valued functions on R which are r-times continuously differentiable functions for all r. The Differential operator D2 − 3D + 2fromC∞(R) to itself given by d2 f (X) df(X) (D2 − 3D + 2)( f (X)) = − 3 + 2 f (X). dX2 dX is a linear transformation. The null space of this differential operator is precisely {αex + βe2x | α, β ∈ R} which is of dimension 2.

3.2 Isomorphism Theorems

Theorem 3.2.1 (Fundamental Theorem of Homomorphism). Let T be a linear transformation from a vector space V over a field F to a vector space V  over the same field F. Let W be a subspace of V . Then there exists a linear transforma- tion TfromV/WtoV such that Toν = T if and only if W ⊆ N(T ).Alsoifsuch a linear transformation exists, it is unique. Further, then T will be injective if and only if W = N(T ). Finally, T is an isomorphism if and only if T is surjective and W = N(T ).

Proof Suppose that there is a linear transformation T from V/W to V  such that Toν = T .Letx ∈ W. Then x + W = W the zero of V/W.Now,

T (x) = T (ν(x)) = T (x + W) = T (W) = 0, for T is a linear transformation. Thus, x ∈ N(T ). This shows that W ⊆ N(T ). Conversely, suppose that W ⊆ N(T ) and x + W = y + W. Then x − y ∈ W. Since W ⊆ N(T ), T (x − y) = 0. Since T is a linear transformation, T (x) = T (y). Thus, we have a map T from V/W to V  defined by T (x + W) = T (x). It is easily observed that T is a linear transformation such that Toν = T . Further, if T  is a linear transformation such that T oν = T . Then, T (x + W) = T (ν(x)) = (T oν)(x) = T (x) = T (x + W). This shows that T  = T . Next, suppose that such a T exists, and it is injective. Then already W ⊆ N(T ). Let x ∈ N(T ). Then T (x + W) = T (x) = 0 = T (W). Since T is supposed to be injective, x + W = W, and so x ∈ W. This shows that W = N(T ). Conversely, suppose that N(T ) = W. Suppose that T (x + W) = T (y + W). Then T (x) = T (y). Hence x − y ∈ N(T ) = W. This means that x +W = y +W. This shows that T is injective. Finally, since Toν = T, T is surjective if and only if T is surjective. 

Corollary 3.2.2 Let T from V to V  be a surjective linear transformation. Then V/N(T ) ≈ V .  76 3 Linear Transformations

Theorem 3.2.3 (Noether Isomorphism Theorem). Let V1 and V2 be vector sub- + / spaces of a vector space V over a field F. Then V1 V2 V2 is isomorphic to V1/V1 V2.

Proof Define a map η from V1 to V1 + V2/V2 by η(x) = x + V2. Clearly, η is a linear transformation. Any element of V1 + V2/V2 is of the form (x + y)+ V2, where x ∈ V1 and y ∈ V2. But, then (x + y) + V2 = x + V2 = η(x). This shows that η is surjective. Further,  N(η) ={x ∈ V1 | x + V2 = V2}={x ∈ V1 | x ∈ V2}=V1 V2.

The result follows from the fundamental theorem of homomorphism. 

Proposition 3.2.4 Let V and V  be vector spaces over a field F. Then, (i) a linear transformation T from V to V  is surjective if and only if it takes a set of generators to a set of generators. (ii) a linear transformation T from V to V  is injective if and only if it takes a linearly independent set to a linearly independent set. (iii) a linear transformation T from V to V  is an isomorphism if and only if it takes a basis to a basis.

Proof (i) Let T be a linear transformation. Since T (< S >) is a subspace containing T (S),itfollowsthat< T (S)>⊆ T (< S >). Further, since image of linear com- bination of members S is a linear combination of members of T (S),itfollowsthat T (< S >) ⊆< T (S)>. Thus < T (S)>= T (< S >). The result follows. (ii) Let T be an injective linear transformation from V to V .LetS be a linearly independent subset of V .Lety1 = T (x1), y2 = T (x2),...,yr = T (xr ) be distinct elements of T (S). Since T is injective, x1, x2,...,xr are distinct elements of S. Suppose that

α1T (x1) + α2T (x2) +···+αr T (xr ) = 0.

Since T is a linear transformation,

T (α1x1 + α2x2 +···+αr xr ) = 0.

Since T is injective,

α1x1 + α2x2 +···+αr xr = 0.

Since S is linearly independent, αi = 0 for all i. This shows that T (S) is linearly independent. Conversely, suppose that T takes a linearly independent subset to a linearly independent subset. Let x ∈ V, x = 0. Then {x} is linearly independent, and so {T (x)} is linearly independent. Thus, T (x) = 0. This shows that T is injective. (iii). Follows from (i) and (ii).  3.2 Isomorphism Theorems 77

Corollary 3.2.5 Let V be a finite dimensional vector space over a field F, and S ={x1, x2,...,xr } an ordered basis of V . Then a linear transformation T from V to W is an isomorphism if and only if {T (x1), T (x2), . . . , T (xr )} is an ordered basis of W. 

Proposition 3.2.6 Let V and W be vector spaces over a field F. Let S be a basis of V . Then any map f from S to W has a unique extension to a linear transformation T f from V to W. More precisely, we have a bijective map η from the set Hom F (V, W) of all linear transformations from V to W to the set Map(S, W) of all maps from S to W given by η(T ) = T/S.

Proof Let f beamapfromS to W. Since S is a basis of V , every nonzero element x ∈ V has a unique representation as

x = α1x1 + α2x2 + ··· + αr xr , where x1, x2,...,xr are distinct elements of S, and all αi are nonzero. Thus, we have amapT defined by

T (α1x1 + α2x2 + ··· + αr xr ) = α1 f (x1) + α2 f (x2) + ··· + αr f (xr ).

Clearly, T extends f , and it is a linear transformation. If T  is also a linear transfor- mation which extends f , then

   T (α1x1 +···+αr xr ) = α1T (x1) +···+αr T (xr ) =

α1 f (x1) +···+αr f (xr ) = T (α1x1 +···+αr xr ).

Hence T = T . 

Remark 3.2.7 The above proposition says that two linear transformations are same if and only if they agree on a basis.

Corollary 3.2.8 Any two finite dimensional vector spaces are isomorphic if and only if they are of same dimension. In particular, any n dimensional vector space over F is isomorphic to the standard vector space F n.

Proof Suppose that V and W are isomorphic, and T is an isomorphism from V to W. Then T , by the Corollary 3.2.5, takes an ordered basis to an ordered basis. Hence dimV = dimW. Conversely, suppose that dimV = dimW. Then there is a bijective map from a basis of V to a basis of W which can be extended to a linear transformation taking a basis to a basis. Thus, this extended linear transformation is an isomorphism. 

Corollary 3.2.9 Let V and W be vector spaces of dimensions n and m, respectively, over a field F containing q elements. Then the number of linear transformations from V to W is (qm )n. 78 3 Linear Transformations

Proof Since dimension of W is m, and F contains q elements, W contains qm elements. A basis of V contains n elements. It follows from above results that there are as many linear transformation from V to W as many maps from a basis of V to W.  Proposition 3.2.10 Let V and W be finite-dimensional vector spaces of same dimen- sion (in particular V may be same as W). Let T be a linear transformation from V to W. Then, (i) T is an isomorphism if and only if it is injective. (ii) T is an isomorphism if and only if it is surjective. Proof (i) If T is an isomorphism, then it is bijective, and so it is injective also. Con- versely, suppose that T is injective. Let {x1, x2,...,xn} be an ordered basis of V . Then it is linearly independent also. By the Proposition 3.2.4(ii), {T (x1), T (x2),...,T (xn)} is an ordered linearly independent subset of W. Since n = DimV = DimW, {T (x1), T (x2),...,T (xn)} is a basis of W. By the Proposition 3.2.4(iii), it follows that T is an isomorphism. (ii) Again, if T is an isomorphism, then it is bijective, and so it is surjec- tive. Suppose that T is surjective, and {x1, x2,...,xn} is an ordered basis. Since T is surjective, {T (x1), T (x2),...,T (xn)} is an ordered set of generators. Since n = DimV = DimW, {T (x1), T (x2), . . . , T (xn)} is an ordered basis. By the Proposition 3.2.4(iii), T is an isomorphism.  Let V be a vector space over a field F. An isomorphism from V to itself is called an automorphism of V . The set of all automorphisms of V is denoted by GL(V ). Some times it is also denoted by Aut(V ). GL(V ) is a group with respect to the composition of maps. This group is called the general linear group on V . Proposition 3.2.11 Let V be a vector space of dimension n over a field F, and B(V ) the set of all ordered bases of V . Let {x1, x2,...,xn} be a fixed member of B(V ). Then we have a bijective map η from GL(V ) to B(V ) defined by

η(T ) ={T (x1), T (x2), . . . , T (xn)}.

Proof Since an isomorphism takes an ordered basis to an ordered basis, η is indeed a map from GL(V ) to B(V ). Since a linear transformation is uniquely determined by its effect on an ordered basis, η is injective. Also given an ordered basis {y1, y2,...,yn} of V ,themapx1  y1, x2  y2,...,xn  yn can be extended to automorphism T of V such that η(T ) ={y1, y2,...,yn}.  The following corollary follows from the above proposition and the Proposition 1.4.19. Corollary 3.2.12 Let V be a vector space of dimension n over a field F containing q elements. Then the group GL(V ) is finite of order

(qn − 1)(qn − q) ···(qn − qn−1).  3.3 Space of Linear Transformations, Dual Spaces 79

3.3 Space of Linear Transformations, Dual Spaces

Let V and W be vector spaces over a field F.LetHomF (V, W) denote the set of all linear transformations from V to W.Let f, g ∈ HomF (V, W). Define f + g by ( f + g)(x) = f (x) + g(x). It can be easily verified that f + g ∈ HomF (V, W). This defines an operation + on HomF (V, W). It is easily seen that HomF (V, W) is an abelian group with respect to the addition +.Let f ∈ HomF (V, W), and α ∈ F. We define α f by (α f )(x) = α · f (x). Then, α f also belongs to HomF (V, W). This defines a multiplication by scalars on HomF (V, W). Indeed, HomF (V, W) is a vector space over F under these operations (verify). In particular, End(V ) is also a vector space over F. In fact, End(V ) is an algebra, the internal multiplication being the composition of maps.

Theorem 3.3.1 Let V and W be finite-dimensional vector spaces over a field F. Then

dimHomF (V, W) = dimV · dimW.

Proof Suppose that dimV = n and dimW = m.Let{x1, x2,...,xn} be an ordered basis of V , and {y1, y2,...,ym } an ordered basis of W. Fix a pair (i, j), 1 ≤ i ≤ n, 1 ≤ j ≤ m. Since every map from a basis of a vector space to a vector space can be extended uniquely to a linear transformation, we have a unique Tij ∈ HomF (V, W) whose restriction to the basis {x1, x2,...,xn} is given by Tij(xi ) = y j , and for k = i, Tij(xk ) = 0. We show that B ={Tij | 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of HomF (V, W).LetT ∈ HomF (V, W). Then T (xi ) ∈ W. Since {y1, y2,...ym } is a basis of W,

( ) = m α T xi j=1 ji y j for unique α ji ∈ F, 1 ≤ j ≤ m, 1 ≤ i ≤ n. It follows that the linear transformations T and i, j α jiTij agree on each xi , and so they agree on a basis. This means that T = i, j α jiTij. Thus, B is a set of generators for HomF (V, W). Next, we show that B is linearly independent. Suppose that

i, j α jiTij = 0.

Then

(i, j α jiTij)(xk ) = 0 for all k.

Hence,

i, j α jiTij(xk ) = 0 for all k. 80 3 Linear Transformations

Thus,

m α = . j=1 jky j 0 for all k

Since {y1, y2,...,ym } is linearly independent, α jk = 0 for all j, k. This shows that B is linearly independent, and so it is a basis of HomF (V, W). Further, it follows that

dimHomF (V, W) = n · m = dimV · dimW.

Definition 3.3.2 Let V be a vector space over a field F. Treat F as a vector space over F. The members of HomF (V, F) are called the linear functionals on V .The vector space HomF (V, F), denoted by V , is called the dual space of V . If V is finite dimensional, then dimV = dimHom(V, F) = dimV ·dimF = dimV. Thus V and V have same dimensions, and so they are isomorphic as vector spaces. This is not true for infinite-dimensional spaces. For example, the vector space R of real numbers over Q is of infinite dimension, and it has a basis whose cardinality is the same as the cardinality of R. Thus, the cardinality of R is the same as that of the set QR of all maps from R to Q. Clearly, there is no bijective map from R to QR, and so R and R are not isomorphic as vector spaces over Q. Definition 3.3.3 Let V and W be vector spaces over a field F.LetT be a linear transformation from V to W. Define a map T t from W to V by T t ( f ) = foT. Then T t is a linear transformation(verify), and it is called the Transpose of T .

Proposition 3.3.4 Let V1, V2 and V3 be vector spaces over a field F. Let T1 : V1 −→ V2 and T2 : V2 −→ V3 be linear transformations. Then,

( )t = t t . T2oT1 T1 oT2

( )t Proof T2oT1 is a linear transformation from V3 to V1 given by

( )t ( ) = ( ) = ( ) = t ( ) = t ( t ( )) = T2oT1 f fo T2oT1 foT2 oT1 T2 f oT1 T1 T2 f ( t t )( ) T1 oT2 f ∈  for all f V3 . . Let V be a vector space over a field F. The dual (V ) of V is called the double dual of V , and it is denoted by V .Letx ∈ V . Define a map x from V to F by

x ( f ) = f (x).

It can be checked easily that x is a linear functional on V . Thus x ∈ V .This gives us a map x  x from V to V . 3.3 Space of Linear Transformations, Dual Spaces 81

Proposition 3.3.5 Let V be a vector space over a field F. Then the map x  x from V to V is an injective homomorphism. If V is finite dimensional, then it is also an isomorphism.

Proof

(αx + βy) ( f ) = f (αx + βy) = α f (x) + β f (y) = αx ( f ) + βy ( f ) for all f ∈ V . Thus, (αx +βy) = αx +βy , and so the map x  x is a linear transformation. To show that this is injective, it is sufficient to show that x = 0 implies that x = 0. Suppose that x = 0. Then {x}, being linearly independent, can be enlarged to a basis of V . We have a map from this basis to F which is 1 on x and zero at all other members of the basis. This can be extended to a linear functional f of V which is 1 at x. Thus, x ( f ) = f (x) = 1 = 0. Finally, if V is finite dimensional, then DimV = DimV = DimV . Hence any injective linear transformation, in particular x  x , is an isomorphism. 

Remark 3.3.6 A vector space is said to be reflexive if the map x  x is an isomorphism from V to V . Thus, every finite-dimensional vector space is reflexive. Clearly, the vector space R over Q is not reflexive.

We know that every finite-dimensional vector space is isomorphic to its dual(being of same dimension). The question is whether we have a natural isomorphism. More precisely, do we have isomorphisms fV from V to V for all vector spaces V such that given a linear transformation T from V to W the following diagram commutes.

fV V V

T T t

fW W W

The answer to this question is in negative. Suppose that we have a family,

{ fV : V −→ V | Visavector space over F}. of isomorphisms. Let V be a one-dimensional vector space with {x} as a basis, x = 0. Define a linear functional x ∈ V by x (αx) = α. Then x = 0. Since V is also one dimensional, {x } is a basis of V . Hence there is a λ ∈ F such that 2 fV (x) = λx .Takeaμ ∈ F such that μ = 1. Define a linear transformation T on V by T (x) = μx. Then, 82 3 Linear Transformations

t t t (T ofV oT)(x) = T ( fV (T (x))) = T (λμx ) = λμx oT.

Also, fV (x) = λx .Now

(λμx oT)(x) = λμx (T (x)) = λμx (μx) = λμ2x (x) = λμ2

2 t and fV (x)(x) = λx (x) = λ. Since μ = 1, fV = T ofV oT. Thus, the above diagram is not commutative. However, the following result says that every finite dimensional-vector space is naturally isomorphic to its double dual.

Proposition 3.3.7 The family

{ fV : V −→ V | V is finite dimensional, fV (x) = x } defines natural isomorphisms from finite-dimensional vector spaces to its double duals in the sense that given any linear transformation T : V −→ W, the following diagram is commutative.

fV V V

T (T t)t

fW W W

Proof We have already seen that fV defined above is an isomorphism. Let x ∈ V . Then

( fW oT)(x) = fW (T (x)) = T (x) .

Also,

t t t t t ((T ) ofV )(x) = (T ) (x ) = x oT .

Now,

T (x) (g) = g(T (x)) for all x ∈ V . Further,

(x oT t )(g) = x (T t (g)) = x (goT) = (goT)(x) = g(T (x)) 3.3 Space of Linear Transformations, Dual Spaces 83

t for all x ∈ V . This shows that T (x) = x oT for all x ∈ V . Hence fW oT = t t (T ) ofV . 

Definition 3.3.8 Let V be a vector space over a field F.Let{e1, e2,..., en} be a basis of V . Consider F as a vector space over F with {1} as a basis. Fix i, 1 ≤ i ≤ n.

Define a linear functional ei on V by  1ifj = i e (e ) = i j 0 otherwise

{ , ,..., } Then as in Theorem 3.3.1, e1 e2 en is a basis of V called a dual basis which is dual to {e1, e2,...,en}.

Let V and W be vector spaces over a field F.Let{x1, x2,...,xn} be a basis of V , { , ,..., } { , ,..., } { , ,..., } and y1 y2 ym be a basis of W.Let x1 x2 xn and y1 y2 ym be corresponding dual bases. Let T be a linear transformation from V to W. Suppose that

( ) = m α . T xi j=1 ji y j

t ( ) = Now, T yk yk oT, and ( )( ) = ( ( )) = (m α ) yk oT xi yk T xi yk j=1 ji y j m α ( ) = α = α ( ) = j=1 ji yk y j ki ki xi xi . Thus,

t ( ) = n α . T yk i=1 ki xi

This gives us an expression for T t in terms of dual bases provided we know the expression for T in terms of the given bases.

3.4 Rank and Nullity

Definition 3.4.1 Let V and W be vector spaces of finite dimensions over a field F. Let T be a linear transformation from V to W. The dimension of the image T (V ) is called the rank of T , and it is denoted by r(T ). The dimension of the null space N(T ) of T is called the nullity of T , and it is denoted by n(T ).

Thus, T is injective if and only if n(T ) = 0, and T is surjective if and only if r(T ) = dim(W). Theorem 3.4.2 Let V and W be vector spaces of finite dimensions over a field F. Let T be a linear transformation from V to W. Then,

r(T ) + n(T ) = dim(V ). 84 3 Linear Transformations

Proof From the fundamental theorem of homomorphism T (V ) is isomorphic to V/N(T ).Alsodim(V/N(T )) = dim(V ) − dimN(T ). Hence

r(T ) = dim(T (V )) = dimV/N(T ) = dimV − dim(N(T )) = dimV − n(T ). 

Proposition 3.4.3 Let T1 : V1 −→ V2 and T2 : V2 −→ V3 be linear transformations between finite-dimensional spaces over a field F. Then

r(T2oT1) ≤ min(r(T2), r(T1)).

Proof Since T2(T1(V1)) is a subspace of T2(V2),

r(T2oT1) = dim(T2(T1(V1))) ≤ dimT2(V2) = r(T2).

Next, it follows from the above proposition that

dimT2(T1(V1)) = dimT1(V1) − n(T2/T1(V1)) ≤ dimT1(V1) = r(T1). 

Corollary 3.4.4 Under the hypothesis of the above proposition, if T1 is an isomor- phism, then r(T2oT1) = r(T2), and if T2 is an isomorphism, then r(T2oT1) = r(T1).

Proof Suppose that T1 is an isomorphism. Then from the previous proposition, it follows that

( ) = ( −1) ≤ ( ) ≤ ( ). r T2 r T2oT1oT1 r T2oT1 r T2

Thus, r(T2) = r(T2oT1). The rest of the assertion follows similarly.  Corollary 3.4.5 Let T : V −→ W be a linear transformation between finite- dimensional vector spaces over a field F. Then

r(T ) = r((T t )t ).

t t Proof From Proposition 3.3.7,wehave fW oT = (T ) ofV . Since fV and fW are isomorphisms, from the previous proposition, it follows that

t t t t r(T ) = r( fW oT) = r((T ) ofV ) = r((T ) ). 

Corollary 3.4.6 Let T : V −→ W be a linear transformation between finite- dimensional vector spaces over a field F. Then

r(T ) = r(T t ). 3.4 Rank and Nullity 85

Proof Let r = r(T ).Let{y1 = T (x1), y2 = T (x2), ··· , yr = T (xr )} be a basis of T (V ). Enlarge the linearly independent subset {y1, y2,...,yr } to a basis {y1, y2,...,yr , yr+1,...,ym } of W. Consider the corresponding dual basis { , ,..., , ,..., } ( ) = ≤ ≥ + y1 y2 yr yr+1 ym of W. Then ys yi 0 for all i r and s r 1. ( ( )) ={} ≥ + t ( ) = = This means that ys T V 0 for all s r 1. Thus, T ys ys oT 0for ≥ + t ( ) { t ( ), t ( ),..., t ( )} all s r 1. This shows that T W is generated by T y1 T y2 T yr .It follows that r(T t ) ≤ r(T ). In turn, r((T t )t ) ≤ r(T t ) ≤ r(T ). Already by Corollary 3.4.5, r(T ) = r((T t )t ). Hence r(T ) = r(T t ). 

3.5 Matrix Representations of Linear Transformations

Let V1 and V2 be vector spaces of dimensions m and n respectively over a field F. Let {x1, x2,...xm } be a basis of V1, and {y1, y2,...,yn} a basis of V2.Wehavea , ,..., y1 y2 yn ( , ) ( ) map Mx1,x2,...,xm from HomF V1 V2 to Mnm F defined by

, ,..., M y1 y2 yn (T ) =[a ], x1,x2,...,xm ij where

( ) = n . T x j i=1aijyi

y1,y2,...,yn This map Mx1,x2,...,xm is called the matrix representation map of linear transfor- mations with respect to bases {x1, x2,...,xm } and {y1, y2,...yn} of V1 and V2, respectively. Suppose that

, ,..., , ,...,  M y1 y2 yn (T ) = M y1 y2 yn (T ) =[a ]. x1,x2,...,xm x1,x2,...,xm ij

Then

( ) = n = ( ) T x j i=1aijyi T x j

 for each j. But, then the effect of T and T are same on the basis {x1, x2,...,xm }. , ,..., =  y1 y2 yn This means that T T . Hence Mx1,x2,...,xm is an injective map. Further, given any [aij]∈Mnm(F), there is a unique linear transformation T from V1 to V2 whose effect on the basis {x1, x2,...,xm } is given by

( ) = n . T x j i=1aijyi

y1,y2,...,yn y1,y2,...,yn Clearly, Mx ,x ,...,x (T ) =[aij]. Thus, Mx ,x ,...,x is a bijective map. Let T1, T2 be 1 2 m 1 2 m , ,··· ( , ) , ∈ y1 y2 yn ( ) =[ ] members of HomF V1 V2 , and a b F. Suppose that Mx1,x2,...,xm T1 aij y1,y2,...,yn n n , ,..., ( ) =[ ] ( ) =  ( ) =  and Mx1 x2 xm T2 bij . Then T1 x j i=1aijyi and T2 x j i=1bijyi . But, then 86 3 Linear Transformations

( + )( ) = n ( + ) . aT1 bT2 x j i=1 aaij bbij yi

Hence

, ,..., M y1 y2 yn (aT + bT ) = a[a ]+b[b ]= x1,x2,...,xm 1 2 ij ij , ,..., , ,..., aMy1 y2 yn (T ) + bMy1 y2 yn (T ). x1,x2,...,xm 1 x1,x2,...,xm 2

This proves the following proposition. , ,..., y1 y2 yn ( , ) Proposition 3.5.1 The matrix representation map Mx1,x2,...,xm from Hom F V1 V2 to Mnm(F) with respect to bases {x1, x2,...,xm } of V1 and {y1, y2,...,yn} of V2 is a vector space isomorphism. 

Next, let V1 be a vector space with a basis {x1, x2,...,xm }, V2 a vector space with a basis {y1, y2,...,yn}, and V3 a vector space with a basis {z1, z2,...,z p} all over the same field F.LetT1 : V1 −→ V2 and T2 : V2 −→ V3 be linear transformations ( ) = n ( ) =  p given by T1 x j i=1aijyi and T2 yi k=1bki zk . Then

( )( ) = (n ) = n ( ) = n  p = T2oT1 x j T2 i=1aijyi i=1aijT2 yi i=1aij k=1bki zk  p (n ) =  p , k=1 i=1bkiaij zk k=1ckjzk

= n where ckj i=1bkiaij. Thus,

, ,..., , ,..., , ,..., M zi z2 zk (T oT ) =[c ]=[b ][a ]=M z1 z2 zk (T )M y1 y2 yn (T ). x1,x2,...,xm 2 1 ki ki ij y1,y2,··· ,yn 2 x1,x2,...,xm 1

This shows that the matrix representation map with respect to fixed choice of bases preserves product also. In particular, we have the following proposition. Proposition 3.5.2 Let V be a vector space of dimension n over a field F with a basis , ,..., {x , x ,...,x }. Then M x1 x2 xn is an isomorphism from the algebra End (V ) of 1 2 n x1,x2,...,xn F endomorphisms of V to the algebra Mn(F) of n × n matrices with entries in F. 

Corollary 3.5.3 Let T ∈ EndF (V ) and {x1, x2,...,xn} a basis of V . Then , ,..., M x1 x2 xn induces an isomorphism from GL(V ) to GL(n, F). x1,x2,...,xn

Proof GL(V ) is the group of units of EndF (V ) and GL(n, F) is that of Mn(F). The result follows from the above proposition if we observe that an isomorphism between algebras induces isomorphisms between their group of units.  The following corollary is consequence of the above corollary and the Corollary 3.2.12.

Corollary 3.5.4 Let Fq denote a finite field with q elements. Then the order of the group (the number of nonsingular n × n matrices) GL(n, Fq ) is

(qn − 1)(qn − q) ···(qn − qn−1).  3.5 Matrix Representations of Linear Transformations 87

Corollary 3.5.5 Let V1 and V2 vector spaces over a field F of same dimension n. Let {x1, x2,...,xn} be a basis of V1 and {y1, y2,...,yn} that of V2. Then a linear , ,..., y1 y2 yn ( ) transformation T from V1 to V2 is an isomorphism if and only if Mx1,x2,...,xn T is invertible n × n matrix. Also if T −1 exists, then

, ,··· − , ,..., − (M y1 y2 yn (T )) 1 = M x1 x2 xn (T 1). x1,x2,...,xn y1,y2,...,yn

Proof Clearly,

, ,..., , ,..., M x1 x2 xn (I ) = I = M y1 y2 yn (I ). x1,x2,...,xn V1 n y1,y2,...,yn V2

Further, since

, ,...,  , ,...,  , ,..., M x1 x2 xn (T oT) = M x1 x2 xn (T ) · M y1 y2 yn (T ), x1,x2,...,xn y1,y2,...,yn x1,x2,...,xn and

, ,...,  , ,..., , ,...,  M y1 y2 yn (ToT ) = M y1 y2 yn (T ) · M x1 x2 xn (T ) y1,y2,...,yn x1,x2,...,xn y1,y2,...,yn

 for all linear transformations T from V1 to V2, and all linear transformations T from −1 V2 to V1, the result follows. Evidently, if T exists, then

, ,··· − , ,..., − (M y1 y2 yn (T )) 1 = M x1 x2 xn (T 1).  x1,x2,...,xn y1,y2,...,yn

In particular, we have the following corollary.

Corollary 3.5.6 Let V be a vector space of dimension n. Let {x1, x2,...,xn} and , ,..., { , ,..., } y1 y2 yn ( ) y1 y2 yn be bases of V . Then Mx1,x2,...,xn IV is invertible, and its inverse is , ,..., M x1 x2 xn (I ).  y1,y2,...,yn V

y1,y2,...,yn The matrix Mx ,x ,...,x (IV ) is called the matrix of transformation from the basis 1 2 n , ,..., { , ,..., } { , ,..., } y1 y2 yn ( ) =[ ] x1 x2 xn to the basis y1 y2 yn . Thus, Mx1,x2,...,xn IV aij , where = n x j i=1aijyi . Example 3.5.7 Define a map T : R3 −→ R2 by T ((a, b, c)) = (a+b+c, a−b+c). Then T is a linear transformation (verify). Let x1 = (1, 1, 1), x2 = (1, 2, 1), x3 = (1, 2, 0), y1 = (1, 2), and y2 = (1, 1). Suppose that a1x1 + a2x2 + a3x3 = (0, 0, 0). Then a1 +a2 +a3 = 0, a1 +2a2 +2a3 = 0, and a1 +a2 = 0. Solving we get that a1 = a2 = a3 = 0. This shows that {x1, x2, x3} is linearly independent. 3 3 Since dimension of R is 3, it follows that {x1, x2, x3} is a basis of R . Similarly, , { , } R2 y1 y2 ( ) =[ ] ( , ) = y1 y2 is a basis of . Now, suppose that Mx1,x2,x3 T aij . Then 3 1 T ((1, 1, 1)) = T (x1) = a11 y1 + a21 y2 = (a11 + a21, 2a11 + a21). This shows 88 3 Linear Transformations that a11 + a21 = 3 and 2a11 + a21 = 1. Solving, we get that a11 =−2, a21 = 5. Similarly, looking at T (x2) and T (x3), we find that a12 =−4, a22 = 8, a13 =−4, and a23 = 7. Thus,   − − − y1,y2 2 4 4 M , , (T ) = . x1 x2 x3 587

3.6 Effect of Change of Bases on Matrix Representation

Proposition 3.6.1 Matrix representations of a linear transformation with respect to different pair of bases are equivalent to each other. Conversely, if A and B are m × n matrices which are equivalent, then they represent same linear transformation with respect to a suitable pair of bases.

Proof Let V1 and V2 be vector spaces over a field F of dimensions m and n, respec- tively. Let T from V1 to V2 be a linear transformation. Let {x1, x2,...,xm } and {  ,  ,...,  } { , ,..., } {  ,  ,...,  } x1 x2 xm be bases of V1, and y1 y2 yn and y1 y2 yn be those = of V2. Since T IV2 oToIV1 , and matrix representation preserves product, we have

, ,...,  ,  ,...,   ,  ,...,  y1,y2,...,yn y1 y2 yn y1 y2 yn x1 x2 xm ( ) =   ( )   ( ) ( ). M , ,..., T M , ,...,  IV2 M , ,...,  T Mx1,x2,...,xm IV1 x1 x2 xm y1 y2 yn x1 x2 xm

 ,  ,...,  y1,y2,···yn x1 x2 xm   ( ) ( ) By the Corollary 3.5.6, M , ,...,  IV2 and Mx1,x2,···xm IV1 are nonsingular. This y1 y2 yn shows that the two matrix representations are equivalent. Conversely, let A =[aij] and B =[bij] be n × m matrices which are equivalent. Let P be n × n nonsingular matrix, and Q a m ×m nonsingular matrix such that A = PBQ.Let{x1, x2, ···xm } be a basis of V1 and {y1, y2,...,yn} a basis of V2. Define T from V1 to V2 by

( ) = n . T x j i=1aijyi

=[μ ] =[ν ]  = n μ  = m ν Let P ki and Q lj .Takeyi k=1 ki yk , and x j l=1 ljxl . {  ,  ,...,  } {  ,  ,...,  } Since P and Q are invertible y1 y2 yn is a basis of V2, and x1 x2 xm y ,y ,...,y x ,x ,...,x 1 2 n ( ) = 1 2 m ( ) = is a basis of V1.AlsoMy1,y2,···yn IV2 P and Mx1,x2,...,xn IV1 Q. But, then  ,  ,...,  y1 y2 yn M  ,  ,...,  (T ) = PAQ = B.  x1 x2 xm Since every matrix is equivalent to a matrix in normal form, we have the following corollary.

Corollary 3.6.2 Let T be a linear transformation from V1 to V2. Then there exists a basis {x1, x2,...,xm } of V1, {y1, y2,...,yn} of V2, and r ≤ (min(m, n) such that T (xi ) = yi for all i ≤ r, and T (xi ) = 0 for all i > r.  Recall that two square n × n matrices A and B are said to be similar if there is a nonsingular n × n matrix P such that PAP−1 = B. 3.6 Effect of Change of Bases on Matrix Representation 89

Corollary 3.6.3 Let V be a vector space of dimension n over a field F. Let { , ,..., } {  ,  ,...,  } x1 x2 xn and x1 x2 xn be bases of V . Let T be an endomorphism x ,x ,...x x1,x2,...,xn 1 2 n of V . Then M , ,..., (T ) is similar to M  ,  ,...,  (T ). Conversely, if A and B are x1 x2 xn x x xn × {1 2, ,..., } {  ,  ,...,  } n n similar matrices, then there are bases x1 x2 xn and x1 x2 xn , and x ,x ,...,x x1,x2,...,xn 1 2 n a linear transformation T such that M , ,..., (T ) = A and M  ,  ,···  (T ) = B. x1 x2 xn x1 x2 xn Proof The result follows from the Corollary 3.5.5 (look at its proof), if we observe  ,  ,...,  x1 x2 xn x1,x2,...,xn ( )   ( )  that the inverse of Mx1,x2,...,xn IV is M , ,...,  IV . x1 x2 xn Example 3.6.4 Consider the usual vector space R3 over R. Consider the bases 3 {x1, x2, x3} and {y1, y2, y3} of R , where x1 = (1, 1, 1), x2 = (1, 2, 4), x3 = (1, 3, 9), y1 = (1, 2, 4), y2 = (1, 3, 9), and y3 = (1, 4, 16) (verify that these are bases). Let T be a linear transformation such that ⎡ ⎤ 110 , , M x1 x2 x3 (T ) = ⎣ 021⎦ . x1,x2,x3 113

y1,y2,y3 ( 3 ) =[ ] = 3 ( ) = + + Suppose that Mx1,x2,x3 IR aij . Then, x1 IR x1 a11 y1 a21 y2 a31 y3. Hence 1 = a11 +a21 +a31, 1 = 2a11 +3a21 +4a31, 1 = 4a11 +9a21 +16a31. Solving a11 = 3, a21 =−3, a31 = 1. Similarly, looking at the representations of x2 and x3 in terms of y1, y2 and y3, we find that a12 = 1, a22 = 0 = a32 = a13 = a33, and a23 = 1. Thus, ⎡ ⎤ 310 y1,y2,y3 ⎣ ⎦ M (I 3 ) = −301 . x1,x2,x3 R 100

Similarly, ⎡ ⎤ 00 1 x1,x2,x3 ⎣ ⎦ M (I 3 ) = 10−3 . y1,y2,y3 R 01 3

It follows that the above two matrices are similar. Further,

y1,y2,y3 y1,y2,y3 x1,x2,x3 x1,x2,x3 M (T ) = M (I 3 )M (T )M (I 3 ). y1,y2,y3 x1,x2,x3 R x1,x2,x3 y1,y2,y3 R

Substituting the values and multiplying we obtain that ⎡ ⎤ 51−9 , , M y1 y2 y3 (T ) = ⎣ −23 13⎦ . y1,y2,y3 10−2 90 3 Linear Transformations

Proposition 3.6.5 Let V1 and V2 be vector spaces over a field F. Let {x1, x2,...,xn} be a basis of V1 and {y1, y2,...ym } a basis of V2. Let { , ,..., } { , ,..., } x1 x2 xn and y1 y2 ym be corresponding dual bases of V1 and V2 respectively. Let T be a linear transformation from V1 to V2. Then

, ,..., y1,y2,...,ym t x1 x2 xn t (M , ,..., (T )) = M , ,..., (T ). x1 x2 xn y1 y2 ym Proof Let

, ,..., M y1 y2 ym (T ) =[a ]. x1,x2,...,xn ij

Then

( ) = m . T x j i=1aijyi

Suppose that

t ( ) = n . T yk j=1b jkx j

t ( ) = By the definition T yk yk oT. Hence

( )( ) = n ( ) yk oT xl j=1b jkx j xl i.e.,

( ( )) = . yk T xl blk

Now

( ( )) = (m ) = m ( ) = . yk T xl yk i=1ail yi i=1ail yk yi akl

t This shows that [aij] =[b ji]. 

Exercises

3.6.1 Let F be a field. Show that the vector space F n is isomorphic to the vector space F m if and only if n = m.

3.6.2 Define a map T from R3 to R3 by T ((x, y, z)) = (x − y, y − z, z − x). Show that T is a linear transformation. Find its matrix representation with respect to the standard bases. Also find its matrix representation with respect to the basis {(1, 1, 1), (1, 2, 3), (1, 4, 9)} of the domain, and the basis {(1, 1, 0), (0, 1, 1), (1, 0, 1)} of the range. Show that T (R3) is the subspace of R3 represented by the plane x + y + z = 0. What is N(T )? Find the rank and the nullity of T . 3.6 Effect of Change of Bases on Matrix Representation 91

3 3 3.6.3 Let a = (a1, a2, a3) be a fixed vector in R . Define a map T from R to R by

T (r) = r · a (scalar product)

Show that T is a linear transformation. Interpret the kernel of T if a = 0. Find its matrix representation with respect to the standard bases. What is the rank, and what is the nullity of T .

3.6.4 Consider the subspace W ={(la, ma, na) | (l, m, n) = 0 and a ∈ R} of R3. Show that the quotient space R3/W is isomorphic to the subspace represented by the plane lx + my + nz = 0.

3.6.5 Let f be a linear functional on R3. Show that there is a vector a in R3 such that f (r) = r · a (the scalar product).

3.6.6 Determine a linear transformation from R3 to R3 whose kernel is lx + my + nz = 0.

3.6.7 Fine the number of linear transformations on a vector space V of dimension n over a field Fq containing q elements.

3.6.8 Let V be a vector space of dimension n over a field F.Let{x1, x2,...,xn} be a basis of V .LetT1 and T2 be linear transformations on V . Show that T1oT2 − T2oT1 is ( ( − ))( ) = also a linear transformation on V . Show that xi o T1oT2 T2oT1 xi 0 for all i. Deduce that T1oT2 − T2oT1 can never be the identity map.

3.6.9 Let T be a linear transformation on V . Let us call T a nilpotent endomorphism m if T = 0forsomem. Suppose that T is nilpotent. Show that IV + T is an m automorphism of V . Find the inverse of IV + T if T = 0.

3.6.10 Let T ∈ End(V ) = HomF (V, V ).Let f (X) ∈ F[X]. Define an element f (T ) ∈ End(V ) by

m f (T ) = a0 I + a1T + ··· + am T ,

2 m where f (X) = a0 + a1 X + a2 X +··· am X . Suppose that V is finite dimensional. Show that there is a nonzero polynomial f (X) ∈ F[X] such that f (T ) = 0. 2 2 n2 Hint. If dimV = n, then dimEndV = n , and so {IV , T, T ,...,T } is linearly dependent.

3.6.11 Show that End(V ) is a F[X] module with respect to the external operation · given by f (X) · v = f (T )(v).

3.6.12 Let T : R2 −→ R2 be a linear transformation which preserves angle between vectors in the sense that if P and Q are points in R2, then the angle between OPand OQ, where O is origin, is the same as the angle between OT(P) and OT(Q). Show 92 3 Linear Transformations that either T is a reflection about a line passing through origin, or it is a rotation in the plane through an angle α Hint. Suppose that there is a point P different from the origin such that T (P) = P. Then show that T = I , or it is a reflection about the line passing through O and P. Next, if T fixes no point other that O, then show that T ((1, 0)) = (cosα, sinα) for some α, and then show that T ((x, y)) = (xcosα + ysinα, −xsinα + ycosα).

3.6.13 Show that any angle preserving endomorphism of R3 is either a rotation about a fixed axis, or a reflection about a plane passing through origin.

3.6.14 Let ℘n denote the vector space of polynomials over the field R real numbers of degrees at most n. Define a map T from ℘n to itself by

2 n 2 n−1 T (a0 + a1 X + a2 X +···+an X ) = a1 + 2a2 X + 3a3 X +···+nan X .

Show that T is a linear transformation. Find its rank and nullity. Is T invertible?

3.6.15 Let C∞(R) denote the vector space of real-valued functions on R which are r-times continuously differentiable functions for all r. Define a linear transformation D2 − 2D + 1 from C∞(R) to itself by

d2 f (X) df(X) (D2 − 2D + 1)( f (X)) = − 2 + f (X). dX2 dX

Find the nullity of D2 − 2D + 1, and also a basis of the kernel.

3.6.16 Let V be a vector space of dimension m over a field Fq of order q, and W a vector space of dimension n over Fq . Suppose that m ≤ n. Find the number of injective linear transformations from V to W.

3.6.17 Suppose that m ≥ n in the above exercise. Find the number of surjective linear transformations from V to W.

3.6.18 Let V be a vector space of dimension n over a field F.Let{e1, e2,...,en} be an ordered basis of V .Letp be a permutation in Sn. Then we have a map Tp from the ordered basis {e1, e2,...,en} to V given by Tp(ei ) = ep(i). Show that p  Tp defines an injective homomorphism from Sn to the group GL(V ) of all automorphisms of V . Deduce that every group of order n is isomorphic to a subgroup of GL(V ).

3.6.19 Let V be a finite-dimensional vector space over a field F.LetT, S ∈ End(V ) such that SoT = IV (ToS = Iv). Show that ToS = IV (SoT = IV ).

3.6.20 Show by means of an example that the above result is not true for infinite- dimensional spaces. Hint. Let V be the vector space of all real-valued continuous functions on [1, ∞) over the field R of real numbers. Consider the map T given by 3.6 Effect of Change of Bases on Matrix Representation 93

x T ( f )(x) = f (y)dy. 1

Use the fundamental theorem of integral calculus.

3.6.21 Let V be a vector space of dimension n over a field F, and T be an element of the center of EndV . Then T = αIV for some α ∈ F.

Proof Let x ∈ V, x = 0. We show that there is a λx ∈ F such that T (x) = λx x. Suppose not. Then {x, T (x)} is linearly independent, and so it can be embedded in a basis {x, T (x), x3, ···xn} of V . Define a linear transformation S by S(x) = x, S(T (x)) = 0, and S(xi ) = 0 for all i ≥ 3. Then ST(x) = 0, where as TS(x) = T (x) = 0. Thus, for all x ∈ Vthereexistsaλx ∈ F such that T (x) = λx x. Now, we show that λx = λy, x = 0 = y. Suppose that λx = λy. Then λx−y(x − y) = T (x − y) = λx x − λy y. But, then (λx − λx−y)x = (λy − λx−y)y. Since λx = λy, {x, y} is linearly independent. Hence λx = λx−y = λy.This shows that T = λIV for some λ.  3.6.22 Let V be a vector space of dimension n over a field F.LetT be a linear transformation on V which commutes with each element of the group GL(V ). Then

T = αIV for some α ∈ F. In particular, Z(GL(V )) ={αIV | α ∈ F }.

Proof We again show that for each x ∈ V, there is an element λx ∈ F such that T (x) = λx x. The rest will follow as in the above exercise. Suppose that there is a x ∈ V, x = 0 for which there is no λ ∈ F such that T (x) = λx. Then {x, T (x)} is linearly independent, and so it can be embedded in a basis {x, T (x), x3,...xn} of V .LetS be a linear transformation defined by S(x) = x, S(T (x)) =−T (x), and S(xi ) = xi for all i. Then, since S takes a basis to a basis, it is an element of GL(V ). Further, TS(x) = T (x) where as ST(x) =−T (x). Since T (x) = 0, TS= ST. 

3.6.23 Let V be a finite -dimensional vector space over a field F.Let be a nontrivial subspace of EndF (V ) such that ToS and SoT belong to for all S ∈ End(V ) and T ∈ . Then = End(V ) (in the language of ring theory, this is expressed by saying that the ring End(V ) has no proper two-sided ideals).

Proof Let T be a nonzero linear transformation in .Let{x1, x2,...,xn} be a basis of V . Since T = 0, T (xi ) = 0forsomei. Without any loss of generality, we may assume that T (x1) = 0. Take any i. There is a linear transformation S such that S(T (x1)) = xi (embedded {T (x1)} in to a basis). There is also a linear transformation     S such that S (x1) = x1 and S (x j ) = 0 for all j ≥ 1. Thus, ST S (x1) = xi ,   and ST S (x j ) = 0 for all j ≥ 2. Hence ST S = T1i . By the hypothesis, T1i ∈ .AlsoTji = T1i Tj1 ∈ for all j. Thus, is a subspace of EndF (V ) containing all Tji. Since {Tji, 1 ≤ i ≤ n, 1 ≤ j ≤ n} is a basis of the vector space EndF (V ), = EndF (V ).  3.6.24 Let V be a finite-dimensional vector space over F. Show that GL(V ) gen- erates EndF (V ) as a vector space. 94 3 Linear Transformations

2 3.6.25 Let T ∈ EndF (V ) be such that T = T , where V is finite dimensional (such a transformation is called idempotent). Show that every element x of V can be uniquely expressed as x = y + z, where T (y) = 0 and T (z) = z.

3.6.26 Let T ∈ EndF (V ) be nilpotent, and f (X) ∈ F[X]. Show that f (T ) is an automorphism of V if and only if f (X) has a nonzero constant term.

3.6.27 Let T ∈ EndF (V ), and dim(V ) = n. Show that there is a monic polynomial f (X) such that f (T ) = 0. Smallest degree such a monic polynomial is called the minimum polynomial of T . Show that T is an isomorphism if and only if the minimum polynomial of T has nonzero constant term. Deduce that T −1 = g(T ) for some polynomial g(X).

3.6.28 Show that g(T ) = 0 if and only if the minimum polynomial of T divides g(X).

3.6.29 Let V be a finite-dimensional vector space over a field F.LetT be a nonzero element of EndF (V ). Show that there is a S ∈ EndF (V ) such that ST = 0 and (ST)2 = ST.

3.6.30 Let T ∈ EndF (V ) − GL(V ) −{0}. Show that there is a S ∈ EndF (V ) such that ST = 0butTS= 0.

3.6.31 Find the minimum polynomial of the linear transformation in Exercise 3.6.2.

3.6.32 Show that Z(GL(V )) is isomorphic to the multiplicative group F .

3.6.33 Find the order of a Sylow p-subgroup of GL(V ), where V is a vector space of dimension n over Zp. Find also a Sylow p-subgroup.

3.6.34 Let V and W be vector spaces of dimensions n and m, respectively, over a finite field Fq of order q.Letr ≤ min(n, m). Find the number of linear transforma- tions of rank r.

3.6.35 Let T1 and T2 be endomorphism of a vector space V of dimension n over a field F. Show that

r(T1oT2) ≥ r(T1) + r(T2) − n.

3.6.36 Let V be a vector space of dimension 2 over Z2. Show that GL(V ) is iso- morphic to S3.

3.6.37 Let T ∈ EndF (V ), where V is a vector space of dimension n. Suppose that T r = 0. Show that T n = 0.

3.6.38 Show that every linear functional f on V defines a linear transformation T f from EndF (V ) to V by T f (T ) = foT.LetV be a vector space over F, and {x1, x2,...,xn} be a basis of V . Consider p = p1 + p2 +···+ pn, where pi is 3.6 Effect of Change of Bases on Matrix Representation 95 the ith projection linear functional on V with respect to the given basis. Show that p is independent of the choice of basis. Show, further, that the linear functional Tr on EndF (V ) defined by Tr(T ) = p1(T (x1)) + p2(T (x2)) +···+ pn(T (xn)) is also independent of the choice of basis of V . This is called the trace form on EndF (V ).  Show that TS− ST ∈ KerTr for all S, T ∈ EndF (V ). Does Tr(T ) = o implies  that T = TS− ST for some S, T ∈ EndF (V )?

3.6.39 Let f be a linear functional on EndF (V ), where V is as in the above exercise. Suppose that f (AB − BA) = 0 for all A, B ∈ EndF (V ), and f (IV ) = n. Show that f = Tr.

3.6.40 Show that AB − BAcan never be an automorphism of V , where V is finite dimensional. Deduce that IV + AB − BAcan never be nilpotent.

3.6.41 Show that the subgroup of the additive group EndF (V ) generated by GL(V ) is EndF (V ).

3.6.42 Let T be a linear transformation from V to V . Show that the following conditions are equivalent:  (i) N(T ) imageT ={0} (ii) T 2(x) = 0 implies that T(x) = 0.

3.6.43 Let T be a linear transformation on V such that the rank of T is same as rank of T 2. Show that KerT imageT ={0}. Chapter 4 Inner Product Spaces

In the vector space theory, we have talked about points, lines, and planes as translates of subspaces of a vector space. In this chapter, we shall talk about the concepts of angle between lines (planes), distance between points, shortest distances between planes, area, volumes of parallelepiped, etc. We also discuss rigid motions in an inner product space. For this purpose, we enrich the structure of vector space by putting the concept of inner product. We have to consider vector spaces over particular types of fields. All fields F in this chapter will either be the field R of real numbers or the field C of complex numbers. We have a field automorphism α → α from C to itself called the complex conjugation. The restriction of the complex conjugation to R is the identity map.

4.1 Definition, Examples, and Basic Properties

Definition 4.1.1 Let F be the field R of real numbers or the field C of complex numbers, and V a vector space over F.Amap<> from V × V to F (the image of (x, y) under <> is denoted by < x, y >) is called an inner product (real if F = R, and complex inner product if F = C)onV if the following hold: 1. < αx + βy, z > = α < x, z > + β < y, z > for all α, β ∈ F, and x, y ∈ V . 2. < x, y > = < y, x > for all x, y ∈ V . In particular, < x, x > = < x, x > for all x ∈ V , and so < x, x > is a real number for all x ∈ V . 3. < x, x >≥ 0, and < x, x > = 0 if and only if x = 0. A vector space V together with an inner product <> on V is called an inner product space.

© Springer Nature Singapore Pte Ltd. 2017 97 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_4 98 4 Inner Product Spaces

Putting α = 1 = β in 1, we obtain that

< x + y, z > = < x, z > + < y, z >, and putting β = 0 in 1, we obtain that

< αx, y > = α < x, y >.

Next, using 2 and 1, we get

< x, αy + βz > = < αy + βz, x > = α < y, x > + β < z, x > = α< y, x > + β< z, x > = α < x, y > + β < x, z >.

Thus, < x, αy + βz > = α < x, y > + β < x, z > for all x, y, z ∈ V , and α, β ∈ F. Putting α = 1 = β in the above equation, we obtain

< x, y + z > = < x, y > + < x, z >, and putting β = 0, we get

< x, αy > = α < x, y >.

Further, < 0, y >=< 0 · 0, y > = 0· < 0, y > = 0, and similarly, < x, 0 > = < 0, x > = 0 for all x, y ∈ V . Example 4.1.2 Let V = Rn be the Euclidean vector space of dimension n over R. Define < x, y > = x1 y1 + x2 y2 + ··· + xn yn, where x = (x1, x2,...,xn) and y = (y1, y2,...,yn). This gives an inner product on Rn (verify). This inner product is called the usual standard Euclidean inner product on Rn. Example 4.1.3 We have another inner product <>on R3 given by

< x, y > = x1 y1 + x2 y2 + 2x3 y3 + x2 y3 + x3 y2, where x = (x1, x2, x3), and y = (y1, y2, y3) (verify). 4.1 Definition, Examples, and Basic Properties 99

Example 4.1.4 Let V = Cn the complex vector space of dimension n. Define <> on C by < x, y > = x1 y1 + x2 y2 + ··· + xn yn.

Then it is a complex inner product (verify). This inner product space is called the standard unitary space.

Example 4.1.5 Let V = C2. Define <>by

< x, y > = 4(x1 y1 + x2 y2) + i(x1 y2 − x2 y1).

Then it is a complex inner product (verify).

Example 4.1.6 Let V denote the complex vector space of all complex-valued con- tinuous functions on [0, 1]. Define <>by  1 < f, g > = f (x)g(x)dx. 0

Then <>is a complex inner product(verify).

2 { } ∞ | |2 Example 4.1.7 Let l denote the set of all real sequences an such that n=1 an < ∞. Then it is a vector space over R with respect to the usual addition of sequences and multiplication by scalars. We have an inner product on l2 given by

< { }, { } > = ∞ . an bn n=1anbn

n n Let <>be a real (complex) inner product on R (C ). Let E ={e1, e2,...,en} denote the standard basis of the vector space Rn (Cn). The inner product <> determines a matrix A =[aij], where aij = < ei , e j >. Since aij =< ei , e j > = < e j , ei >(< e j , ei >) = a ji(a ji), A turns out to be symmetric (Hermitian matrix). The matrix A, in turn, determines the inner product. Indeed, if x = [x1, x2,...,xn]=x1e1 + x2e2 +···+xnen, and y =[y1, y2,...,yn]=y1e1 + t  y2e2 +···+ynen, then < x, y > = xAy (xAy ). Not all symmetric (Hermitian) matrices determine inner products in the manner described above. For example, consider the real symmetric matrix A given by   12 A = . 21

Then [1, −1]A[1, −1]t = 0, where as [1, −1] =[0, 0]. As such, A will not deter- mine inner product (observe that A is invertible also). Indeed, as we shall see, a matrix A determines an inner product, as described, if and only if there is a non singular matrix B such that A = BBt (BB). 100 4 Inner Product Spaces

Definition 4.1.8 Let (V,<>)be an inner product space. Let x ∈ V . Then < x, x > is a non negative real number. Its non negative square root√ is called the length of the vector x, and it is denoted by || x ||. Thus, || x || = + < x, x >. Clearly, || x || = 0 if and only if x = 0.

Also, since < αx, αx > = αα < x, x > =|α |2|| x ||2,wehave

|| αx || = | α ||| x || for all α ∈ F and x ∈ V . Theorem 4.1.9 (Cauchy–Schwarz inequality) Let (V,<, >) be an inner product space. Then |< x, y >|≤|| x || || y || for all x, y ∈ V . The equality holds if and only if {x, y} is linearly dependent. Proof If y = 0, then both side is 0, and the equality holds. Assume that y = 0. Then < x − αy, x − αy > ≥ 0 for all α ∈ F. Thus,

< x, x > − −α < x, y > − α < y, x > + αα < y, y > ≥ 0.

= < , > = < , > Putting α in the above equation, and noting that x y y x , we obtain that < x, y > < x, y > < x, y > < x, x > − − < y, y > < y, y > < x, y > < x, y > + < y, y > ≥ 0, < y, y >2 or < x, y > < x, y > ≤ < x, x >< y, y >, or |< x, y >|2 ≤||x ||2 || y ||2 .

Taking square root, we obtain that

|< x, y >|≤||x || || y || .

If {x, y} is linearly independent, then x − αy = 0 for all α ∈ F. Hence the inequality becomes strict inequality, and so in this case

|< x, y >| < || x || || y || .

Conversely, if {x, y} is linearly dependent, then x = αy for some α ∈ F.But, then 4.1 Definition, Examples, and Basic Properties 101

|< x, y >|=|< αy, y >|=|α ||< y, y >|=|α ||| y ||2 =|α |||y || || y || =||x || || y || .



Applying the Cauchy–Schwarz inequality for Examples 4.1.4, 4.1.6, and 4.1.7 respectively, we obtain the following corollaries.

Corollary 4.1.10 (Cauchy inequality). Let x1, x2,...,xn, y1, y2,...,yn be com- plex numbers. Then   | n |≤ n | |2 n | |2. i=1xi yi i=1 xi i=1 yi

In particular, the inequality holds for real numbers also. 

Corollary 4.1.11 Let f and g be complex valued continuous functions on [0, 1]. Then      1 1 1 | f (x)g(x)dx |≤ | f (x) |2 dx | g(x) |2 dx. 0 0 0 

{ } { } ∞ | |2 Corollary 4.1.12 If xn and yn is sequence of real numbers such that n=1 xn < ∞ ∞ | |2 < ∞ and n=1 yn , then   | ∞ |≤ ∞ | |2 ∞ | |2. n=1xn yn n=1 xn n=1 yn



Find the inequalities coming out of the Examples 4.1.3 and 4.1.5. Proposition 4.1.13 (Triangle inequality). Let (V,<>)be an inner product space. Then || x + y || ≤ || x || + || y || for all x, y ∈ V . Equality holds if and only if {x, y} is linearly dependent.

Proof (|| x + y ||)2 =|< x + y, x + y >| =|< x, x > + < x, y > + < y, x > + < y, y >| ≤|< x, x >|+|< x, y >|+|< y, x >|+|< y, y >| ≤|| x ||2 + 2 || x || || y || + || y ||2 (by Cauchy–Schwarz) = (|| x || + || y ||)2. Taking the square root, we get 102 4 Inner Product Spaces

|| x + y || ≤ || x || + || y || .

Further, it is clear from the above that equality holds if and only if

< x, y > + < y, x > = 2 || x || · || y || .

This is so if and only if |< x, y >|=||x || · || y || (Cauchy–Schwarz inequality). Again, from the second part of Cauchy–Schwarz, it follows that the equality holds if and only if {x, y} is linearly dependent. 

If we apply the above proposition to Examples 4.1.4, 4.1.6, and 4.1.7, we get the following corollaries:

Corollary 4.1.14 If x1, x2,...,xn, y1, y2,...,yn are complex numbers (or in par- ticular real numbers), then    n | + |2 ≤ n | |2 + n | |2.  i=1 xi yi i=1 xi i=1 yi

Corollary 4.1.15 If f and g are two complex-valued continuous functions on [0, 1], then       1 1 1 | f (x) + g(x) |2 dx ≤ | f (x) |2 dx + | g(x) |2 dx.  0 0 0

2 Corollary 4.1.16 If {an} and {bn} are sequences in l , then    ∞ | + |2 ≤ ∞ | |2 +  ∞| |2.  n=1 an bn n=1 an n=1 bn

Notion of Distance in an Inner Product Space We first introduce the notion of distance on a set by abstracting the fundamental properties of distance.  Definition 4.1.17 Let X be a set. A map d from X × X to the set R+ {0} of non negative real numbers (the image of (x, y) under d is denoted by d(x, y) instead of d((x, y))) is called a distance or a metric on X if 1. d(x, y) = 0 if and only if x = y. 2. d(x, y) = d(y, x). 3. (Triangle inequality) d(x, y) ≤ d(x, z) + d(y, z) for all x, y, z ∈ X. The pair (X, d) is called a metric space. 4.1 Definition, Examples, and Basic Properties 103

Proposition 4.1.18 Let (V,<>)be an inner product space. Then the inner product <>induces a metric d on V defined by

d(x, y) =||x − y ||

Proof d(x, y) =||x − y ||≥ 0, and d(x, y) = oifandonlyif || x − y || = 0. This means that d(x, y) = 0 if and only if x − y = 0, or equivalently, x = y. Since

|| x − y || = || −1(y − x) || = | −1 |||y − x || = || y − x ||, it follows that d(x, y) = d(y, x). Also

d(x, y) =||x − y || = || x − z + z − y || ≤|| x − z || + || z − y || = d(x, z) + d(z, y). 

Remark 4.1.19 Let (V, d) be a metric space. It may not be possible to define a vector space structure on V , and an inner product on V so that the induced metric is d.For example, take a nonempty set V , and define a metric d on V by d (x, y) = 0if x = y and 1 otherwise (verify that d is indeed a metric called a discrete metric). Given any inner product space structure on V , and x = 0inV , and d the induced ( 2 , ) = metric, then d ||x|| x 0 2, and so d can not be induced by an inner product. Notion of Angle, Orthogonality Let (V,<>) be an inner product space. Then by the Cauchy–Schwarz inequality |< x, y >|≤||x || || y || for all x, y ∈ V .Ifx = 0 = y, then

|< x, y >| ≤ 1. || x || || y ||

If it is a real inner product space, then < x, y > −1 ≤ ≤ 1. || x || || y ||

Thus, there is unique θ, 0 ≤ θ ≤ π such that < x, y > cosθ = . || x || || y ||

This θ is called the angle between x and y. In case it is a complex inner product space, the above argument implies that there is a unique θ, 0 ≤ θ < 2π such that 104 4 Inner Product Spaces

< x, y > cosθ + isinθ = . || x || || y ||

This θ may be termed as angle between x and y. Any two vector x and y in an inner product space is said to be orthogonal if < x, y > = 0. This definition extends to the null vector 0 also. Thus, the null vector 0 is orthogonal to each vector. The notation x ⊥ y is used to say that x and y are orthogonal. Thus, x ⊥ yifandonlyif < x, y > = 0.

A vector x is called a unit vector if || x || = 1. Proposition 4.1.20 (Pythagoras Theorem). Let (V,<>) be a real inner product space, and x, y ∈ V . Then x ⊥ y if and only if

|| x − y ||2 =|| x ||2 +||y ||2, or equivalently, || x + y ||2 =|| x ||2 +||y ||2 .

Proof

|| x − y ||2 = < x − y, x − y > = < x, x > − < x, y > − < y, x > + < y, y > =||x ||2 +||y ||2 − 2 < x, y >.

The result follows. 

Remark 4.1.21 In complex inner product space also ‘if x ⊥ y, then || x − y ||2 =||x ||2 +||y ||2(verify). But the converse is not true. Consider, for example, the unitary space C2. The vectors x = (0, 1) and (1, i) are not orthogonal but still || x − y ||2 =||x ||2 +||y ||2 = 3.

In a real inner product space (V,<, >),wehave

|| x − y ||2 =||x ||2 +||y ||2 − 2 || x || || y || cosθ, and || x + y ||2 =||x ||2 +||y ||2 + 2 || x || || y || cosθ.

These equations give formula for the diagonals of parallelograms in terms of sides. Proposition 4.1.22 (Parallelogram Law). In an inner product space (V,<>) we have || x − y ||2 +||x + y ||2 = 2 || x ||2 + 2 || y ||2 for all x, y ∈ V. 4.1 Definition, Examples, and Basic Properties 105

Proof Adding equations

|| x − y ||2 =||x ||2 +||y ||2 − < x, y > − < y, x >, and || x + y ||2 =||x ||2 +||y ||2 + < x, y > + < y, x > we get the result. 

The geometrical meaning of the above proposition is that the sum of the areas of the squares formed on the diagonals of a parallelogram is the sum of the areas of squares formed on the sides of the parallelogram. The following identities, termed as the polarization identities, relate the norm with the inner product. Proposition 4.1.23 (Polarization identities) 1. If (V,<>)is a real inner product space, then

1 < x, y > = [|| x + y ||2 −||x − y ||2] 4 for all x, y ∈ V. 2. If (V,<>)is a complex inner product space, then

1 < x, y > = [|| x + y ||2 + i || x + iy ||2 − (1 + i)(|| x ||2 +||y ||2] 2 for all x, y ∈ V.

Proof 1. Let (V,<>)be a real inner product space. Then

|| x + y ||2 = < x + y, x + y > =||x ||2 +||y ||2 + 2 < x, y >..., (4.1) and

|| x − y ||2 = < x − y, x − y > =||x ||2 +||y ||2 − 2 < x, y >..., (4.2) for all x, y ∈ V . Subtracting the second equation from the first equation, we get the desired identity. 2. Let (V,<>)be a complex inner product space. Then

|| x + y ||2 = < x + y, x + y > =|| x ||2 +||y ||2 + < x, y > + < y, x > ... , (4.3) and

|| x + iy ||2 = < x + iy, x + iy > =||x ||2 +||y ||2 − i < x, y > + i < y, x > ... , (4.4) 106 4 Inner Product Spaces

for all x, y ∈ V . Adding the i times the Eq. 4.4 to the Eq. 4.3, we get the desired result. 

Let (V,<>) be an inner product space. A subset S of V is called an orthonormal set if

(i) || x || = 1 for all x ∈ S and (ii) < x, y > = 0 for all x, y ∈ S, x = y.

Proposition 4.1.24 An orthonormal set in an inner product space is always linearly independent.

Proof Let S be an orthonormal set and

α1x1 + α2x2 + ··· + αn xn = 0, where x1, x2,...,xn are distinct elements of S. Then

αm xm = < α1x1 + α2x2 +···+αn xn, xm > = 0.

Hence αm = 0 for all m. 

Proposition 4.1.25 (Bessels inequality). Let (V,<>) be an inner product space, and {x1, x2,...,xr } an orthonormal set, xi = x j for i = j. Let x ∈ V . Then

r |< , >|2 ≤|| ||2 . i=1 x xi x

Proof We have

< − r < , > , − r < , > > ≥ . x i=1 x xi xi x i=1 x xi xi 0

Since {x1, x2,...,xr } is an orthonormal set, expanding

< , > − r < , > < , > ≥ , x x i=1 x xi x xi 0 or || ||2 − r |< , >|2 ≥ . x i=1 x xi 0

Hence r |< , >|2 ≤|| ||2 .  i=1 x xi x

Definition 4.1.26 An orthonormal set which is also a basis is called an orthonormal basis. 4.1 Definition, Examples, and Basic Properties 107

Corollary 4.1.27 Let (V,<>) be an inner product space. An orthonormal set {x1, x2,...,xn} is an orthonormal basis if and only if

|| ||2 = n |< , >|2 . x i=1 x xi

Proof Suppose that {x1, x2,...,xn} is an orthonormal basis. Let x ∈ V . Then

x = α1x1 + α2x2 + ··· + αn xn for some α1, α2,...,αn in F. But, then < x, xi > = αi for all i. Hence x = n < , > i=1 x xi xi , and so

< − n < , > , − n < , > > = . x i=1 x xi xi x i=1 x xi xi 0

Expanding, we get || ||2 = n |< , >|2 . x i=1 x xi

|| ||2 = n |< , >|2 Conversely, suppose that x i=1 x xi . Then

|| − n < , > ||2 = , x i=1 x xi xi 0 and so = n < , > x i=1 x xi xi for all x ∈ V . This shows that {x1, x2,...,xn} is a set of generators. Since an ortho- normal set is already linearly independent, it is a basis. 

4.2 Gram–Schmidt Process

The proof of the following theorem gives an algorithm by which we can find an orthonormal basis of an inner product space starting from a set of generators for V . The process is called the Gram–Schmidt process. Theorem 4.2.1 (Gram–Schmidt). Let (V,<>) be an inner product space. Let {x1, x2,...,xr } be a finite subset of V . Then there exists an orthonormal set {y1, y2,...,ys }, s ≤ r such that the subspace generated by {x1, x2,...,xr } is the same as that generated by {y1, y2,...,ys }. Proof The proof is by the induction on r.Ifr = 0, then the subset is empty set, and since an empty set is (vacuously) orthonormal, there is nothing to do. Suppose that r = 1. If x1 = 0, then < {x1} > ={0}, and empty set is again an orthonormal set x1 which generates {0}.Ifx1 = 0, take y1 = . Then {y1} is an orthonormal set which ||x1|| generates < {x1} >. Assume the result forr. Consider a subset {x1, x2,...xr , xr+1} of V . By our induction assumption there is an orthonormal subset {y1, y2,...,ys }, s ≤ 108 4 Inner Product Spaces r of V such that < {y1, y2,..., ys } > = < {x1, x2 ...,xr } >.Ifxr+1 belongs to this subspace, then there is nothing to do. Suppose that xr+1 ∈/< {x1, x2,...,xr }=< {y1, y2,...,ys } >. Then

− s < , > = . xr+1 i=1 xr+1 yi yi 0

Take − s < , > xr+1 i=1 xr+1 yi yi y + = . s 1 || − s < , > || xr+1 i=1 xr+1 yi yi

Clearly, ys+1 is a unit vector which is orthogonal to yi for each i ≤ s.Evi- dently, {y1, y2,...ys+1} is an orthonormal set. Since ys+1 is linear combination of {xr+1, y1, y2,...,ys },<{y1, y2,...,ys+1} > is contained in < {y1, y2,...,ys , xr+1} > = < {x1, x2,...,xr+1} >.Alsoxr+1 is linear combination of {y1, y2,..., ys+1}, and so < x1, x2,...,xr+1} > is contained in < {y1, y2,...,ys+1} >. Thus, the result holds for r + 1also. 

Corollary 4.2.2 Every finite dimensional inner product space admits an orthonor- mal basis.

Proof Let (V,<>)be a finite-dimensional inner product space. Let {x1, x2,...,xn} be a basis of V . Then from the above theorem, there exists an orthonormal set {y1, y2,...,ym }, m ≤ n which generates V . Since an orthonormal set is linearly independent, it is a basis, and in turn n = m. 

Proposition 4.2.3 Every orthonormal set of a finite-dimensional inner product space can be enlarged to an orthonormal basis.

Proof Let {x1, x2,...,xm } be an orthonormal set of an inner product space (V,<>) of dimension n. Since an orthonormal set is linearly independent, m ≤ n.Ifm = n, it is already a basis, and so an orthonormal basis. Suppose that m < n. Then < {x1, x2,...,xm } > = V .Letym+1 be a member of V − < {x1, x2,...,xm } >. − m < , > = Then ym+1 i=1 ym+1 xi xi 0. Take − m < , > ym+1 i=1 ym+1 xi xi x + = . m 1 || − m < , > || ym+1 i=1 ym+1 xi xi

Then {x1, x2,...,xm+1} is an orthonormal set. If m + 1 = n, then this is an ortho- normal basis. If not proceed as above. At the (n − m)th step, we shall arrive at an orthonormal basis containing {x1, x2,...xm }. 

Example 4.2.4 This example is to illustrate the Gram–Schmidt process. Consider the usual Euclidean inner product space R3. Consider the subset {(1, 1, 1), (0, 1, 1), (2, 1, 1)} of R3. We determine an orthonormal set which generates the same space ( , , ) as < {(1, 1, 1), (0, 1, 1), (2, 1, 1)} >.Takex = 1 1 1 = ( √1 , √1 , √1 ), and 1 ||(1,1,1)|| 3 3 3 y2 x2 = , where ||y2|| 4.2 Gram–Schmidt Process 109

1 1 1 1 1 1 y2 = (0, 1, 1) − <(0, 1, 1), (√ , √ , √ )>(√ , √ , √ ). 3 3 3 3 3 3  = 3 (− 2 , 1 , 1 ) Thus, x2 2 3 3 3 . Since

(2, 1, 1) − <(2, 1, 1), x1 > x1 − <(2, 1, 1), x2 > x2 = 0, it follows that the orthonormal set {x1, x2} generates the same space as {(1, 1, 1), (0, 1, 1), (2, 1, 1)}. Note that {(1, 1, 1), (0, 1, 1), (2, 1, 1)} does not generate R3. Let ⎡ ⎤ r1 ⎢ ⎥ ⎢ r2 ⎥ ⎢ ⎥ ⎢ · ⎥ A = ⎢ ⎥ ⎢ · ⎥ ⎣ · ⎦

rn

t be a n × m matrix with rows {r1, r2,...,rn}. Then AA =[uij], where uij = t   ri r j = < ri , r j > (AA =[uij], where uij = ri r j = < ri , r j >). Thus, to say t  that the rows of A form an orthonormal set is to say that AA = In (AA = In). t Dually, to say that the columns of A form an orthonormal set is to say that A A = Im  (A A = Im ). In particular, we have the following definition. Definition 4.2.5 A square n × n matrix A with entries in the field R (C) of real (complex) numbers is called an orthogonal (unitary) matrix if the rows of A form n n t t  an orthonormal basis of R (C ), or equivalently, AA = In = A A (AA = In = A A). Alternatively, A is orthogonal if and only if At = A−1. Example 4.2.6 The 2 × 2 matrices   cosθ sinθ , −sinθ cosθ and   cosθ sinθ sinθ −cosθ are orthogonal 2 × 2 matrices. Indeed, any 2 × 2 orthogonal matrix is one of the above two types (prove it). Observe that the linear transformation   cosθ sinθ [x, y] →[x, y]· =[xcosθ − ysinθ, xsinθ + ycosθ] −sinθ cosθ determined by the matrix   cosθ sinθ −sinθ cosθ 110 4 Inner Product Spaces represents rotation of the x, y plane through an angle θ. Interpret the linear transfor- mation determined by the second matrix. Indeed, it will represent the reflexion about a line in the plane (find it).

Example 4.2.7 The 3 × 3matrix ⎡ ⎤ √1 √1 √1 ⎢ 3 3 3 ⎥ ⎣ √1 √1 − √2 ⎦ 6 6 6 √1 − √1 0 2 2 is an orthogonal matrix.

t t t t t If AA = In = BB , then AB(AB) = ABB A = In. Thus, product of two n × n orthogonal matrices are orthogonal. Also, since At = A−1, the inverse of an orthogonal matrix is orthogonal. The identity matrix is obviously an orthogonal matrix. This shows that the set O(n) of orthogonal n × n matrices form a group under matrix multiplication. This group O(n) is called the orthogonal group of n × n matrices. A subgroup of O(n) is called an orthogonal group. Similarly, the set U(n) of all n × n unitary matrices form a group, called the unitary group. The proof of the following proposition is algorithmic, and it is essentially the Gram–Schmidt process. Proposition 4.2.8 Let A be a n × m matrix with entries in the field R (C) of real (complex) numbers which is of rank n. Then, we can find a lower triangular n × n t  matrix P with positive diagonal entries such that P A(PA) = In (PA(PA) = In). Also, if A is of rank m, then we can find an upper triangular m × m matrix Q with t  positive diagonals such that (AQ) AQ = Im ((AQ) AQ = Im).

Proof Since the rank of A is n,therows{r1, r2,...,rn} of A form a linearly inde- pendent subset of Rm . Using the Gram Schmidt process, we transform the rows of A to get a n × m matrix B with the orthonormal rows {s1, s2,...,sn}. Clearly, then t BB = In. Further, while transforming the rows of A to orthonormal rows, we use only, the following types of elementary row operations in succession: 1 (i) Multiply a row by a nonzero number, for example, s1 = r1. |r1| (ii) Add certain linear combinations of rows preceding to jth row to the jth row, and r2−s1 then multiply it by a suitable positive number, for example, s2 = . |r2−s1| We further observe that the corresponding elementary matrices are lower triangular with positive diagonal entries. Thus, B = PAfor some lower n × n triangular matrix P with positive diagonal entries. Finally, let A be a n × m matrix of rank m. Then At is a m × n matrix of rank m. Applying the above result for At , we get a lower triangular t t t m × m matrix P with positive diagonal entries such that PA (PA ) = Im .Take Q = Pt . Then Q is an upper triangular m × m matrix with positive diagonal entries t such that (AQ) AQ = Im .  4.2 Gram–Schmidt Process 111

Corollary 4.2.9 Let A be a n × m matrix with entries in the field R (C) of real (complex) numbers which is of rank n. Then, we can find a lower triangular n × n t matrix L with positive diagonal entries and a n × m matrix Q with QQ = In  (QQ = In) such that A = L Q. Also, we can find an upper triangular m × m t  matrixU with positive diagonals and a n × m matrix Q with QQ = In (QQ = In) such that A = QU.

Proof Follows from the above proposition if we take L = P−1, where PA= Q. 

Corollary 4.2.10 Let A be n × n invertible matrix. Then we can find a lower trian- gular n × n matrix P with positive diagonal entries such that P A is an orthogonal matrix. 

Corollary 4.2.11 Every invertible matrix can be decomposed as product of a lower triangular matrix with positive diagonal entries and an orthogonal matrix. It can also be decomposed as product of an orthogonal matrix with an upper triangular matrix with positive diagonal entries. 

Now, we illustrate the algorithms described above by means of examples.

Example 4.2.12 The set S ={r1 = (1, 1, 0), r2 = (0, 1, 1), r3 = (1, 0, 1)} is a basis of R3, and therefore, the corresponding matrix ⎡ ⎤ 110 A = ⎣ 011⎦ 101 is invertible. We transform S in to an orthonormal basis using Gram–Schmidt process, and using the corresponding elementary row operations on A, we transform it to an orthogonal matrix O. Further by applying the same elementary operations on the the identity matrix I3 we obtain the lower triangular matrix P with positive diagonal r1 √1 √1 entries such that PA = O.Firstr1 is replaced by s1 = = ( , , 0) and |r1| 2 2 correspondingly, we multiply the first row of A by √1 to transform it to 2 ⎡ ⎤ √1 √1 0 ⎣ 2 2 ⎦ 011 101  −< , > r2 r2 s1 s1 √1 √1 2 Further, we replace r2 by s2 = = (− , , ), and we apply the |r2−s1| 6 6 3 corresponding elementary operations on the matrix to transform it to ⎡ ⎤ √1 √1 0 ⎢ 2 2  ⎥ ⎣ − √1 √1 2 ⎦ 6 6 3 101 112 4 Inner Product Spaces

−< , > −< , > r3 r3 s1 s1 r3 s2 s2 √1 √1 √1 Finally, we replace r3 by s3 = = ( , − , ), and we |r3−s1−s2| 3 3 3 apply the corresponding elementary operations on the matrix to transform it to the orthogonal matrix ⎡ ⎤ √1 √1 0 ⎢ 2 2  ⎥ O = ⎢ − √1 √1 2 ⎥ ⎣ 6 6 3 ⎦ √1 − √1 √1 3 3 3

To get the the corresponding lower triangular matrix P so that PA = O, we apply the same elementary row operations in succession on the identity matrix I3 to get ⎡ ⎤ √1 00 ⎢ 2  ⎥ P = ⎢ − √1 2 0 ⎥ ⎣ 6 3 √ ⎦ − √1 − √1 3 2 3 2 3 2

Lastly, to find a lower triangular matrix L = P−1 so that A = LO, we apply the inverses of the same elementary row operations on the identity matrix in reverse order to get ⎡ √ ⎤ 20 0 ⎢ ⎥ L = ⎣ √1 3 0 ⎦ 2 2 √1 √1 √2 2 6 3

4.3 Orthogonal Projection, Shortest Distance

Let (X, d) be a metric space, and A a subset of X. Then

d(x, A) = g.l.b.{d(x, a) | a ∈ A} (the greatest lower bound of the set {d(x, a) | a ∈ A}) is called the shortest distance (or simply distance) between the point x and the set A. More precisely, d(x, A) is characterized by the following two properties: (i) d(x, A) ≤ d(x, a) for all a ∈ A (ii) For all real α > d(x, A), there is an element a ∈ A such that α > d(x, a).

Proposition 4.3.1 Let (V,<>)be a finite dimensional inner product space. Let W be a subspace of V and {x1, x2,...,xr } an orthonormal basis of W. Let x ∈ V . Then the shortest distance between x and W is  || ||2 − r |< , >|2. x i=1 x xi 4.3 Orthogonal Projection, Shortest Distance 113

r < , > ∈ Proof Since i=1 x xi xi W, and

|| − r < , > ||2 =|| ||2 − r |< , >|2, x i=1 x xi xi x i=1 x xi it follows that the shortest distance between x and W is at least  || ||2 − r |< , >|2. x i=1 x xi

= r Further, given any y i=1αi xi in W,

|| − ||2 =|| ||2 + r | |2 − r < , > − r < , >. x y x i=1 αi i=1αi x xi i=1αi x xi

Since a + a ≤ 2 | a | for all complex number a, using Cauchy inequality, we get

r α < x, x > + r α < x, x > ≤ 2 | r α < x, x > | i=1i i  i=1 i i i=1 i i ≤ r | |2 r |< , >|2. 2 i=1 αi i=1 x xi

Hence || − ||2 ≥|| ||2 − r |< , >|2 . x y x i=1 x xi  || ||2 − r |< , >|2 Thus, x i=1 x xi is the shortest distance between x and W. 

r < , > Proposition 4.3.2 Under the hypothesis of the Proposition 4.3.1, i=1 x xi xi is the unique point of W which is at the shortest distance from x.

Proof It follows from the proof of the Proposition 4.3.1 that the said point is at the r shortest distance from W. Suppose that i=1αi xi is a point of W which is also at the shortest distance from x. Then

|| − r ||2 =|| ||2 − r |< , >|2, x i=1αi xi x i=1 x xi and so

r | |2 + r |< , >|2 = r < , > + r < , >.... i=1 αi i=1 x xi i=1αi x xi i=1αi x xi (4.1) r Consider the usual standard complex inner product on C , and points u = (α1, α2, r ...,αr ) and v = (< x, x1 >, < x, x2 >, . . . , < x, xr >) of C , then the above equation means that

|| u ||2 +||v ||2 = < u,v>+ . 114 4 Inner Product Spaces

Hence < u, u > + = < u,v>+ .

This shows that < u − v, u − v>= 0, and so u = v. Thus, αi = < x, xi > for all i.  Let V be a vector space. The translates of one dimensional subspaces are called lines or affine lines in V . Thus, a line in V is of the form {a + λb | λ ∈ F}. This line is the line passing through a and parallel to b (or to the subspace {λb | λ ∈ F}). Trans- lates of subspaces of dimension greater than 1 are called planes or affine planes. The subset a + W, where W is a subspace of dimension r > 1 is called a plane of dimension r passing through a and parallel to W.IfdimW = dimV − 1, then it is said to be a hyperplane. Corollary 4.3.3 Let (V,<>) be a finite dimensional inner product space. Let W be a subspace of V with {x1, x2,...,xr } an orthonormal basis. Then the distance of a point a ∈ V from the affine plane b + W (or affine line if r = 1) is same as that of a − b from W, and it is  || − ||2 − r |< − , >|2. a b i=1 a b xi

The line of shortest distance from a to b + W is the same as perpendicular from a to b + W, and it is

{ + (( − ) − r < − , > ) | ∈ }. a λ a b i=1 a b xi xi λ F

+ + r < − , > The foot of perpendicular from a to b Wisb i=1 a b xi xi . Proof The shortest distance of a from b + W is g.l.b{|| a − (b + w) || | w ∈ W} which is the same as g.l.b{|| a − b − w || | w ∈ W}. Thus, the shortest distance from a to b + W is same as the shortest distance between a − b and W.Fromthe above proposition, it is  || − ||2 − r |< − , >|2. a b i=1 a b xi

− − r < − , > Since a b i=1 a b xi xi is orthogonal to each xi , it is also orthogonal + r < − , > to each member of W. Thus, the line joining a and b i=1 a b xi xi is orthogonal to W. Hence, the line passing through a and perpendicular to b + W is given by { + ( − − r < − , > ) | ∈ }. a λ a b i=1 a b xi xi λ F

+ r < − , > Clearly, b i=1 a b xi xi is the foot of perpendicular from a on to b + W.  Proposition 4.3.4 Let (V,<>)be an inner product space. Let W be a subspace of V . Then W ⊥ ={v ∈ V |< x,v>= 0 for all x ∈ W} is a subspace of V . 4.3 Orthogonal Projection, Shortest Distance 115

Proof Clearly 0 ∈ W ⊥, and so W ⊥ =∅.Lety, z ∈ W ⊥. Then < x, y > = 0 = < x, z > for all x ∈ W. But, then < x, αy + βz > = α < x, y > + βz < x, z > = 0. This shows that αy + βz ∈ W ⊥.ItfollowsthatW ⊥ is a subspace.  Definition 4.3.5 The subspace W ⊥ defined in the above proposition is called the orthogonal complement of W. Proposition 4.3.6 Let V be a finite dimensional inner product space, and W a subspace of V . Then V is the direct sum of W and W ⊥.

Proof Let {x1, x2,...,xr } be an orthonormal basis of W. Then this being an ortho- normal set can be enlarged to an orthonormal basis {x1, x2,...,xr , xr+1,...,xn} of V , where n = dimV. An element x = α1x1 + α2x2 +···+αr xr + αr+1xr+1 + ···αn xn is orthogonal to W if and only if it is orthogonal to each xi , i ≤ r.This ⊥ is so if and only if αi = < x, xi > = 0 ∀i ≤ r. This shows that W = < {xr+1, xr+2,...,xn} >. Since every element x ∈ V can be uniquely expressed as x = α1x1 + α2x2 +···+αr xr + αr+1xr+1 + αr+2xr+2 +···+αn xn, it follows that every element x of V has a unique representation as x = y + z, where y ∈ W and z ∈ W ⊥.  Remark 4.3.7 Let (V,<, >) be a finite-dimensional inner product space, and W a subspace of V . Suppose that x = y + z, where y ∈ W and z ∈ W ⊥. Then y is called the component of x along W, and z is called the component of x orthogonal to W. Clearly, y is the foot of perpendicular from x to W.

Definition 4.3.8 Each subspace W of an inner product space V defines the map PW from V to V given by PW (x) = y, where y is the foot of perpendicular from x to W.ThemapPW is a linear transformation called the orthogonal projection of V on to W.

Example 4.3.9 Consider the subspace W ={x = (x1, x2, x3) | x1 + x2 + x3 = 0}. The subset S ={(1, −1, 0), (0, 1, −1)} is a basis of W (verify). Using Gram Schmidt process, we get an orthonormal basis {( √1 , − √1 , 0), ( √1 , √1 , − 2 )} of 2 2 6 6 3 W. There fore, the foot of the perpendicular from a point x = (x1, x2, x3) on to W is < x,(√1 , − √1 , 0)>(√1 , − √1 , 0) + < x,(√1 , √1 , − 2 )>(√1 , √1 , − 2 ) 2 2 2 2 6 6 3 6 6 3 − − − + − − − + = ( 2x1 x2 x3 , x1 2x2 x3 , x1 x2 2x3 ) 3 3 3 . The matrix of the orthogonal projection PW is given by ⎡ ⎤ 2 −1 −1 1 ⎣ ⎦ PW = −12−1 3 −1 −12

Remark 4.3.10 The result of the above proposition is not true for infinite-dimensional inner product space. For example, consider the vector space C[0, 1] of real-valued continuous functions from the closed interval [0, 1] with inner product given by  1 < f, g > = f (x)g(x)dx. 0 116 4 Inner Product Spaces

Let W be the subspace of polynomial functions. Then W is a proper subspace, and W ⊥ ={0} (prove it). Thus, C[0, 1] is not direct sum of W and W ⊥.

Let (V,<>)be an inner product space. Let y ∈ V . It follows from the definition of the inner product that the map fy from V to F defined by fy(x) = < x, y > is a linear functional on V , and the map f y defined by f y(x) = < y, x > is anti-linear functional in the sense that f y(αx + βz) = α f y(x) + β f y(z). Observe that the set of all anti-linear functionals also form a vector space anti isomorphic to the dual space V  of V . Proposition 4.3.11 Let (V,<>)be a finite-dimensional inner product space. Then  y the map y  fy is an anti-linear isomorphism from V to V .Alsothemapf  f is an isomorphism from V to the space of all anti-linear functionals on V .

Proof That the map y  fy is an anti-linear transformation follows from the def- inition of inner product. We show that it is injective. Suppose that fy = fz. Then fy(x) = fz(x) for all x. Hence < x, y > = < x, z > for all x. This means that < x, y − z > = 0 for all x. In particular, < y − z, y − z > = 0. This implies that y = z. Next, it is easy to observe that an anti-linear transformation takes a subspace to a subspace, and an injective anti-linear transformation takes a linearly independent subset to a linearly independent subset. Thus, the image of V under  the injective anti-linear transformation y  fy is a subspace of V of dimension  equal to the dimension of V . Since dimV = dimV ,itfollowsthaty  fy is also surjective. The rest of the proposition follows similarly. 

The following corollary is a restatement of the bijectivity of the map y  fy. Corollary 4.3.12 Let (V,<>)be a finite-dimensional inner product space, and f a linear functional on V . Then there is a unique y ∈ V such that f (x) = < x, y > for all x ∈ V. 

Remark 4.3.13 The result of the Proposition 4.3.11, and the Corollary 4.3.12 is not true, in general, for an infinite dimensional inner product spaces. Consider, for example, the space P[0, 1] of polynomial functions on [0, 1]. We have an inner product on this space defined by  1 < f, g > = f (x)g(x)dx. 0

Consider the linear functional φ on P[0, 1] defined by φ( f ) = f (1). Check that there is no g in P[0, 1] such that  1 f (x)g(x)dx = f (1) 0 for all f ∈ P[0, 1]. 4.3 Orthogonal Projection, Shortest Distance 117

Adjoint of a Linear Transformation Let (V,<>)be a finite dimensional inner product space. Let T from V to V be a linear transformation. The map x < T (x), y > is a linear functional on V (verify) for each y ∈ V . Hence from the Corollary 4.3.12, there is a unique element in V which we denote by T (y) such that

< T (x), y > = < x, T (y)> for each x ∈ V . This defines a map T  from V to V given by the equation

< T (x), y > = < x, T (y)>.

Using the defining property of an inner product, we see that

 < x, T (αy + βz)>= < T (x), αy + βz > = α < T (x), y > + β < T (x), z >     = α < x, T (y)>+β < x, T (z)>= < x, αT (y) + βT (z)> for each x ∈ V . Thus,

< x, T (αy + βz) − (αT (y) + βT (z)) > = 0 for all x ∈ V . Putting x = T (αy + βz) − αT (y) − βT (z), we get that

T (αy + βz) = αT (y) + βT (z) for all y, z ∈ V , and α, β ∈ F. This shows that T  is a linear transformation. Definition 4.3.14 The linear transformation defined above is called the adjoint of T .

Proposition 4.3.15 Let (V,<>)be a finite dimensional inner product space, and T a linear transformation from from V to V . Let B ={x1, x2,...,xn} be an orthonormal basis of V . Let M(T ) denote the matrix of T with respect to the orthonormal basis B. Then M(T ) = (M(T )) (the tranjugate ( ) ( ) = ( ( )) =  of the matrix M T ). Further, if M T2 M T1 , then T2 T1 .  ( ) =[ ] ( ) =[ ] ( ) = n Proof Suppose that M T aij and M T b jl . Then T xi k=1 aki xk ( ) = n =< n , > = < ( ), > = < and T x j l=1 bljxl . Thus, a ji k=1 aki xk x j T xi x j , ( )>= < , n > = ( ) = ( ( )) xi T x j xi l=1 bljxl bij. This shows that M T M T . ( ) = ( ( )) ( ) = ( ) Suppose that M T2 M T1 . Then M T2 M T1 . Since the matrix repre- sentation map with respect to a fixed basis is bijective, the result follows. 

Proposition 4.3.16 Let (V,<>)be a finite-dimensional inner product space. The map η defined by η(T ) = T  from EndV to EndV is an anti isomorphism of 2 algebras which is an involution in the sense that η = IV . 118 4 Inner Product Spaces

 Proof Since < x,(αT1 + βT2) (y)>= <(αT1 + βT2)(x), y > = α < T1(x), > + < ( ), > = < , ( )>+ < , ( )>= < ,(  + y β T2 x y α x T1 y β x T2 y x αT1 )( )> ∈ (  + )( ) = ( ) + ( ) βT2 y for all x V ,itfollowsthat αT1 βT2 y αT1 y βT2 y for ∈ ( + ) =  +  < ,( )( )>= each y V . Hence αT1 βT2 αT1 βT2 . Further, x T1oT2 y < ( ), > = < ( ), ( )>= < ,  ( )> , ∈ T1oT2 x y T2 x T1 y x T2 T1 y for all x y V . Hence ( ) =   < , ( )>= < ( ), > = < , ( )>=< T1oT2 T2 oT1 .Also x T y T y x y T x T (x), y > = < x,(T )(y)>for all x.y ∈ V . Hence T = (T ). It is clear that 2 η = IV , and so η is bijective. 

Definition 4.3.17 Let (V,<>) be a complex inner product space. A linear trans- formation T from V to V is called a (i) self adjoint or Hermitian linear transformation if T  = T , i.e., < T (x), y > = < x, T (y)> for all x, y ∈ V . (ii) skew Hermitian linear transformation if T  =−T , i.e., < T (x), y > = − < x, T (y)> for all x, y ∈ V . (iii) unitary linear transformation if T  = T −1, or equivalently < T (x), T (y)> = < x, y > for all x, y ∈ V . (iv) normal linear transformation if T T = TT, or equivalently, < T (x), T (y)> = < T (x), T (y)>for all x, y ∈ V .

It is clear from the definition that every Hermitian linear transformation as well as every skew Hermitian linear transformation is normal. Also every unitary linear transformation is normal. The following corollary is immediate from Proposition 4.3.15. Corollary 4.3.18 A linear transformation T on an inner product space V is Her- mitian (skew-Hermitian, unitary or normal) if and only if the matrix representation M(T ) of T relative to an orthonormal basis is Hermitian (skew-Hermitian, unitary, or normal respectively). 

Example 4.3.19 Consider the usual complex inner product space C2. Define a map T from C2 to itself by T ((x, y)) = (x + iy, −ix + y), x, y ∈ C. Then T is Hermitian. Indeed, the matrix of T relative to the standard orthonormal basis {e1, e2} is   1 i −i 1 which is clearly a Hermitian matrix. T is not unitary as the matrix is not unitary. The linear transformation U = √T is unitary (check for the corresponding matrix). The 2 linear transformation (x, y)  (ix, y) from C2 to itself is unitary (check), but it is not Hermitian (verify). The linear transformation (x, y)  (ix, 2y) is normal, but it is neither Hermitian nor unitary (check).

Let (V,<>)be a real inner product space. A linear transformation T from V to V is called a 4.3 Orthogonal Projection, Shortest Distance 119

(i) real symmetric (real skew symmetric) if T  = T (T  =−T ). (ii) orthogonal if T  = T −1.

2 2 Example 4.3.20 The linear transformation Tθ from R to R defined by Tθ((x, y)) = (xcosθ + ysinθ, −xsinθ + ycosθ) is an orthogonal linear transformation(verify).

Let H(V )(SH(V )) denote the set of all Hermitian (skew-Hermitian) linear trans- formation on a complex inner product spaces (V,<>).IfT ∈ H(V ) SH(V ), then T = T  =−T . This is equivalent to say that T ={0}. Thus H(V ) SH(V ) = { } ( ) ( ) ( ± ) =  ±  = ± 0 .IfT1 and T2 are in H V (SH V ), then T1 T2 T1 T2 T1 T2(−(T1 ± T2)). This shows that H(V ) and SH(V ) are subgroups of End(V ), and their intersection is zero. Further, if T is any Hermitian (skew) linear transformation, and α a complex number, then αT is Hermitian (skew-Hermitian) if and only if α is purely real (imaginary). Thus, H(V ) and SH(V ) are not subspaces. Also T is Her- mitian if and only if iT is skew-Hermitian. The map T  iT defines an isomorphism from the group H(V ) to SH(V ).GivenT1, T2 ∈ H(V ), T1T2 ∈ H(V ) if and only = ( ) =   = , ∈ ( ), ∈ if T1T2 T1T2 T2 T1 T2T1. Similarly, given T1 T2 SH V T1T2 SH(V ) if and only if T1T2 =−T2T1.LetT be any endomorphism of V . Then ( T +T  ) 2 is Hermitian for

( T +T  ) = T +(T ) = T +T  . 2 2 2

To summarize, we have proved the following proposition. Proposition 4.3.21 Let (V,<>) be a finite-dimensional complex inner product space. Then the group EndV is direct sum of its subgroups H(V ) and SH(V ). The subgroups H(V ) and SH(V ) are not subspaces. They are isomorphic as groups under the map T  iT. H(V ) and SH(V ) are not closed under product. Indeed, T1, T2 ∈ H(V ) implies that T1T2 ∈ H(V ) if and only if T1T2 = T2T1.AlsoT1, T2 ∈ SH(V ) implies that T1T2 ∈ SH(V ) if and only if T1T2 =−T2T1.  The following two propositions follow immediately from the corresponding result in matrices provided we observe that the matrix representation map M relative to an orthonormal basis is an isomorphism from End(V ) to Mn(R) which maps S(V ) to Sn(R) and SS(V ) to SSn(R). Proposition 4.3.22 Let (V,<>) be a real inner product space. Then the set S(V )(SS(V )) of symmetric (skew symmetric) linear transformations forms sub- spaces of End(V ), and End(V ) is direct sum of these subspaces. Further, product of any two symmetric (skew-symmetric) linear transformations A and B is symmetric (skew-symmetric) if and only if AB = BA(AB =−BA). 

Proposition 4.3.23 Let (V,<>)be a real inner product space of dimension n. Then ( ) n2+n ( ) n2−n  the dimension of S V is 2 and that of SS V is 2 . 120 4 Inner Product Spaces

4.4 Isometries and Rigid Motions

Definition 4.4.1 Let (X, d) be a metric space. A bijective map f from X to itself is called an isometry of (X, d) if d( f (x), f (y)) = d(x, y) for all x, y ∈ X.

The set of all isometries of (X, d) is denoted by Iso(X), and it is clearly a group under composition of maps. This group is a subgroup of Sym(X).If(V,<>)is an inner product space, then it is already equipped with a metric induced by the inner product. We shall try to describe the isometries of V with respect to the induced metric, and also its group Iso(V ) of isometries. Theorem 4.4.2 Let (V,<>)be a finite dimensional complex inner product space. Let T be a linear transformation from V to V . Then the following conditions are equivalent. 1. T is an isometry of V , i.e., || T (x) − T (y) || = || x − y || for all x, y ∈ V. 2. || T (x) || = || x || for all x ∈ V. 3. T is a unitary linear transformation. 4. T  = T −1.

Proof 1 =⇒ 2. Assume 1. Then || T (x) || = || T (x) − T (0) || =||x − 0 || = || x || for all x ∈ V . 2 =⇒ 3. Assume 2. We have the following polarization identity (Proposition 4.1.23)

1 < x, y > = [|| x + y ||2 + i || x + iy ||2 − (1 + i)(|| x ||2 +||y ||2)] 2 for all x, y ∈ V . Thus, < T (x), T (y)>= 1 [|| ( ) + ( ) ||2 + || ( ) + ( ) ||2 − ( + )(|| ( ) ||2 +|| ( ) ||2)] 2 T x T y i T x iT y 1 i T x T y for all x, y ∈ V . Since T is a linear transformation, we have < T (x), T (y)>= 1 [|| ( + ) ||2 + || ( + ) ||2 − ( + )(|| ( ) ||2 +|| ( ) ||2)] 2 T x y i T x iy 1 i T x T y for all x, y ∈ V . Using 2, we see that < T (x), T (y)>= 1 [|| + ||2 + || + ||2 − ( + )(|| ||2 +|| ||2)]=< , > 2 x y i x iy 1 i x y x y for all x, y ∈ V . This shows that T is a unitary linear transformation. 3 =⇒ 4. Assume 3. Then < x, y > = < T (x), T (y)>= < x, T (T (y)) > for all x, y ∈ V . Hence < x, y − T T (y)>= 0 for all x, y ∈ V . Putting x =    y − T (T (y)), we get that T (T (y)) = y for all y ∈ V . This shows that T T = IV .   −1 Since V is finite dimensional TT = IV . This means that T = T .  4 =⇒ 1. Assume 4. Then T T = IV , and so < T (x), T (y)> = < x, T T (y)>= < x, y > for all x, y ∈ V . In particular, < T (x), T (x)>= < x, x > for all x ∈ V . This means that || T (x) || = || x || for all x ∈ V . Since 4.4 Isometries and Rigid Motions 121

T is a linear transformation || T (x) − T (y) || = || T (x − y) || = || x − y || for all x, y ∈ V .   Corollary 4.4.3 GL(V ) Iso(V ) = U(V ). 

Proposition 4.4.4 Let (V1,<>1) and (V2,<>2) be complex inner product spaces. Let f be a vector space isomorphism from V1 to V2 which preserves inner product. Then f induces an isomorphism η( f ) from U(V1) to U(V2) defined by η( f )(T ) = foTof−1.

Proof It is clear that η( f ) defined above is an isomorphism from GL(V1) to GL(V2). It is sufficient, therefore, to prove that if f preserves inner product then η( f ) takes U(V1) onto U(V2). Suppose that f preserves inner product. Then < x, y > = < f ( f −1(x)), f ( f −1(y)) > = < f −1(x), f −1(y)>. This shows that f preserves inner product if and only if f −1 preserves inner product. It is also immedi- ate that composition of inner product preserving maps are inner product preserving. Hence, if f is inner product preserving, and g an isomorphism, then fog (gof)is inner product preserving if and only if g = f −1o( fog) is inner product preserving. We know that T ∈ U(V1) if and only if T is an isomorphism which is inner prod- −1 uct preserving. It follows that T ∈ U(V1) if and only if foTof is inner product preserving. Thus, η( f ) induces an isomorphism from U(V1) to U(V2). 

Proposition 4.4.5 Any two n-dimensional complex inner product spaces are iso- morphic as inner product spaces, i.e., there is an isomorphism between them which preserve inner product.

Proof Let (V1,<>1) and (V2,<>)be two complex inner product spaces each of dimension n.Let{x1, x2,...,xn} be an orthonormal basis of V1, and {y1, y2,...,yn} be that of V2. Then there is an isomorphism f from V1 to V2 which takes xi to yi . But, then

<n ,n > = n = <n ,n >. i=1αi xi i=1βi xi i=1αi βi i=1αi yi i=1βi yi

This shows that f preserves inner product. 

Corollary 4.4.6 Every n-dimensional complex inner product space is isomorphic as inner product space to the standard complex inner product space Cn. 

Thus, if V is a n-dimensional complex inner product space, then the group U(V ) is isomorphic to U(Cn). The group U(Cn) is denoted by U(n), and it is called the unitary group on n-dimensional inner product space. Proposition 4.4.7 Let (V,<>)be a complex inner product space, and H a linear transformation from V to itself. Then H is a Hermitian linear transformation if and only if < H(x), x > is real for all x ∈ V. 122 4 Inner Product Spaces

Proof Suppose that H is Hermitian. Then

< H(x), x > = < x, H (x)>= < x, H(x)>= < H(x), x >.

Hence < H(x), x > is real for all x ∈ V . Conversely, suppose that < H(x), x > is real for all x ∈ V . Then < H(x + y), (x + y)>is real for all x, y ∈ V . Expanding, and noting that < H(x), x > and < H(y), y > are real, we find that < H(x), y > + < H(y), x > is real for all x, y ∈ V.

Again expanding < H(x + iy), (x + iy)>, we get that

< H(x), y > − < H(y), x > is purely imaginary for all x, y ∈ V.

It is an elementary fact that if z1 and z2 are two complex numbers such that z1 + z2 is real and z1 − z2 is purely imaginary, then z1 = z2. This shows that < H(x), y > = < H(y), x > = < x, H(y)>for all x, y ∈ V . This shows that H is Hermitian. 

Corollary 4.4.8 Let H be a Hermitian linear transformation from a finite dimen- sional complex inner product space (V,<>)to itself. Then I − i H and I + iH are isomorphisms. Also (I + iH)(I − iH)−1 is a unitary linear transformation.

Proof Suppose that (I − iH)(x) = 0. Then x = iH(x), and so < x, x > = i < H(x), x > is real. Since < x, x > is real, and from the above proposition < H(x), x > is also real, it follows that < x, x > = 0, and so x = 0. This shows that I − iH is an injective linear transformation from V to itself. Since V is finite dimensional, it follows that I − iHis an isomorphism. Similarly, I + iHis also an ( ) =   ( −1) = ( )−1 isomorphism. Further, since T1oT2 T2 oT1 , and T T ,wehave

−   −  ((I + iH)(I − iH) 1) = ((I − iH) ) 1(I + iH) − − − − = (I + iH) 1(I − iH) = (I − iH)(I + iH) 1 = ((I + iH)(I − iH) 1) 1.

(Note that (I − iH) and (I + iH) commute, and so I − iH and (I + iH)−1 also commute.) This shows that (I + iH)(I − iH)−1 is unitary. 

Rigid Motion

Definition 4.4.9 Let (V,<>)be a finite-dimensional real inner product space. Let d be the metric induced by the inner product. An isometry of (V, d) is also called a rigid motion on V . Thus, T is a rigid motion if

|| T (x) − T (y) || = || x − y || for all x, y ∈ V . 4.4 Isometries and Rigid Motions 123

The group Iso(V ) of all rigid motions is called the group of rigid motions on V , and it is denoted by M(V ). As in case of complex inner product space, an inner product preserving isomor- phism from a real inner product space V1 to a real inner product space V2 induces an isomorphism from M(V1) to M(V2). Thus, the group of motion on an n-dimensional real inner product space V is isomorphic to M(Rn). This group is called the group of Euclidean motions. Theorem 4.4.10 Let (V,<>)be a finite dimensional real inner product space. Let T be a map from V to V . Then the following conditions are equivalent. 1. T is a rigid motion which fixes origin 0. 2. T preserves inner product, i.e., < T (x), T (y)>= < x, y > for all x, y ∈ V. This is equivalent to say that T preserves angle between vectors. 3. T is an orthogonal linear transformation. 4. T is a linear transformation such that T  = T −1. 5. T is a linear transformation which preserves lengths of vectors, i.e., || T (x) || =||x || for all x ∈ V.

Proof 1 =⇒ 2. Assume 1. Then

|| T (x) || = || T (x) − 0 || = || T (x) − T (0) || = || x − 0 || = || x || for all x ∈ V . Also, then || x ||2 +||y ||2 − 2 < T (x), T (y)> =||T (x) ||2 +||T (y) ||2 − 2 < T (x), T (y)> =||T (x) − T (y) ||2 =||x − y ||2 =||x ||2 +||y ||2 − 2 < x, y > for all x, y ∈ V . Hence < T (x), T (y)>= < x, y > for all x, y ∈ V . 2 =⇒ 3. Assume 2. It is sufficient to prove that T is a linear transformation. Using the fact that T preserves the inner product, we see that

< T (x + y) − T (x) − T (y), T (x + y) − T (x) − T (y)> = < x + y − x − y, x + y − x − y > = 0 for all x, y ∈ V . This shows that T (x + y) = T (x) + T (y) for all x, y ∈ V . Simi- larly, it can be shown that T (αx) = αT (x) for all α ∈ R and x ∈ V . The proof of 3 =⇒ 4 is similar to the proof of 3 =⇒ 4 in the Theorem 4.4.2. 4 =⇒ 5. Assume 4. Then T  = T −1. Hence

|| T (x) ||2 = < T (x), T (x)>= < x, T T (x)>= < x, x > =||x ||2 for all x ∈ V . 5 =⇒ 1. Assume 5. Since T is a linear transformation, T (0) = 0 and 124 4 Inner Product Spaces

|| T (x) − T (y) || = || T (x − y) || = || x − y || for all x, y ∈ V . 

Remark 4.4.11 The analogue of 1 =⇒ 2 is not valid in complex case. For example the map T from the usual complex inner product space Cn to itself defined by

T ((z1, z2,...,zn)) = (z1, z2,...,zn) preserves distance, fixes origin, but does not preserve inner product.

Remark 4.4.12 A length preserving map from a real inner product space to itself need not be an orthogonal transformation. Indeed, the translation map La from the usual inner product space Rn to itself defined by

La(x) = x + a, a = 0 preserves length but it is not a linear transformation.

Let (V,<>)be a real inner product space and a ∈ V .ThemapLa from V to V defined by La(x) = x + a is a rigid motion. This is called the translation by a.The set (V ) of all translations on V is a subgroup of the group M(V ) of rigid motions which is isomorphic to (V, +) (the map a  La is an isomorphism). Also O(V ) the group of orthogonal linear transformations is a subgroup of M(V ). Theorem 4.4.13 Every rigid motion of a finite dimensional real inner product space (V,<>) is uniquely expressible as a product of a translation and an orthogonal linear transformation.

Proof Let φ ∈ M(V ).Letφ(0) = a. Then the map T from V to V defined by T (x) = φ(x) − a is also a rigid motion such that T (0) = 0. From the previous result, T is an orthogonal linear transformation and φ = LaoT. Further, suppose that LaoT = LboT , where T and T are orthogonal linear transformations. Then a = LaoT(0) = LboT(0) = b. This shows that a = b and T = T . 

Recall that a group G is said to be (internal) semi-direct product of a normal subgroup H by K if

(i) G = HK, and (ii) H K ={e}. In this case every element g of G has a unique representation as g = hk, where k ∈ K and h ∈ H. Corollary 4.4.14 M(V ) is semi-direct product of (V ) by O(V ). ( ) =( ) ( ) ∈ Proof It follows from the above result that M V V OV .AlsoifLa O(V ), then a = La(0) = 0, and so La = IV . Thus, (V ) O(V ) ={IV }. 4.4 Isometries and Rigid Motions 125

Now, we show that (V ) is a normal subgroup of M(V ).LetLaoT ∈ M(V ) and Lb ∈(V ), where T ∈ O(V ). Then

−1 −1 (LaoT) oLbo(LaoT)(x) = x + T (b) = L T −1(b)(x).

−1 Hence (LaoT) oLbo(LaoT) = L T −1(b) ∈(V ).  Following corollary follows from the second isomorphism theorem. Corollary 4.4.15 M(V )/(V ) is isomorphic to O(V ).  Exercises 4.4.1 Define a map <> from R3 × R3 to R by

<(x1, x2, x3), (y1, y2, y3)>= x1 y1 + x2 y1 + x1 y2 + 2x2 y2 + 3x3 y2 + 3x2 y3 + x3 y3.

Show that <> is an inner product on R3. Deduce that | x1 y1 + x2 y1 + x1 y2 + 2x2 y2 + 3x3 y2 + 3x2 y3 + x3 y3 | ≤ 2 + + 2 + + 2 2 + + 2 + + 2 x1 2x1x2 2x2 6x2x3 x3 y1 2y1 y2 2y2 6y2 y3 y3 for all real 3 numbers xi , yi . Further, find an orthonormal basis of (R ,<>), and a linear trans- formation T from R3 to itself such that

< T (x), T (y)> = < x, y > for all x, y ∈ R3, where <>is the standard Euclidean inner product.

4.4.2 Define a map <> from R2 × R2 to R by

<(x1, x2), (y1, y2)> = x1 y1 + x1 y2 + 2x2 y1 + y1 y2.

Is <> an inner product? Support.

4.4.3 Let P3(x) denote the vector space of all polynomials of degree at most 3 with R ( ) ( ) ( ) coefficients in the field of real numbers. Let f x and g x be elements of P3 x . < ( ), ( )>= 1 ( ) ( ) <, > Define f x g x 0 f x g x dx. Show that defines an inner product on P3(x). Find an orthonormal basis of P3(x). 4.4.4∗ Let V be the vector space of all real valued smooth functions on R and

T = D3 − 6D2 + 11D − 6I

= d where D dx is the standard differential operator. Show that kerT is 3 dimensional inner product space with respect to the inner product  1 < f, g > = f (x)g(x)dx. −1 126 4 Inner Product Spaces

Find an orthonormal basis of kerT.

4.4.5 Let A be a non singular 3 × 3 matrix with real entries. Show, by means of an example, that < x, y > = xAyt need not define an inner product on R3. Show that it defines an inner product if and only if A = BBt for some non singular matrix. Such a matrix is called a positive definite symmetric matrix [ , ] 4.4.6 Let C 0 1 denote the vector space of all continuous functions on the closed [ , ] < f (x), (x)>= 1 f (x) (x)dx interval 0 1 . Show that g  √ 0 √g defines an inner prod- uct on C[0, 1]. Show that the set {1} { 2sinnπx, 2cosnπx | n ∈ Z} forms an orthonormal set.

4.4.7 Find the largest value and also the smallest value of 3x − 4y + 2z, if exists, on the sphere x2 + y2 + z2 = 4, and also on the Ellipsoid x2 + 2y2 + 3z2 = 1. Do they exist on a paraboloid, or a hyperboloid?

4.4.8 Consider the standard real inner product space R4. Use the Gram–Schmidt process to determine an orthonormal basis of the subspace W generated by {(1, 1, 1, 1), (1, 2, 2, 2), (1, 2, 3, 3), (1, 0, 0, 0)}. What is the dimension of W?

4.4.9 Find a lower triangular matrix P with positive entries, if possible, so that PA is an orthogonal matrix, where ⎡ ⎤ 111 A = ⎣ 123⎦ 149

Also express A as A = LO(UO), where L is a lower (U an upper) triangular matrix with positive entries, and O is an orthogonal matrix.

4.4.10 Let W be a subspace of the standard Euclidean inner product space R4 gener- ated by {(1, −1, 1, −1), (1, 1, 1, −3), (1, 2, −3, 0)}. Find the distance of (1, 1, 1, 1) from W, and also the foot of the perpendicular from (1, 1, 1, 1) to W.

4.4.11 Consider the usual inner product space R3. Find the shortest distance between the line − x = y 2 = z , 1 2 1 and − − − x 1 = y 2 = z 3, 1 1 1 and also find the line of shortest distance, if it exists.

4.4.12 In the standard inner product space R4, find shortest distance between {(x, y, z,w)| x + y + z + w = 0 = x − y = z − w} and {(x, y, z,w)| x + z + 1 = 0}. 4.4 Isometries and Rigid Motions 127

4.4.13 Consider the subspace W ={(x, y, z,w)| x + 2y + z + w = 0 = x + y − 2z = w} of the standard inner product space R4. Find an orthonormal basis of W, and also of W ⊥. Find also the component of (1, 1, 1, 1) along W and perpen- dicular to W.

4.4.14 Let T be the linear transformation from the standard inner product space R3 to itself defined by

T ((x, y, z)) = (x + y, y + z, z + x).

Find T , and also find its matrix representation with respect to the standard basis.

4.4.15 Let W be a subspace of a finite-dimensional inner product space V , and x ∈ V be such that < x, y > + < y, x >≤< y, y > for all y ∈ W. Show that x ∈ W ⊥.

4.4.16 Let T be a normal linear transformation from V to V such that T (x) = 0. Show that T (x) = 0.

4.4.17 Let A be a bounded subset of a fine-dimensional real inner product space V in the sense that there is a real number M such that || x ||≤ M for all x ∈ A. Show that { f ∈ M(V ) | f (A) ⊆ A} is a subgroup of O(V ).

4.4.18 Let T ∈ End(V ), where V is a complex inner product space. Suppose that T T (x) = 0. Show that T (x) = 0.

4.4.19 Let T be a Hermitian linear transformation on a finite dimensional complex inner product space V .Letx ∈ V be such that T m(x) = 0forsomem ≥ 1. Show that T (x) = 0.

4.4.20 Let V be a real inner product space of dimension n, and {x1, x2,...,xn} an orthonormal basis. Define a map η from Sn to O(V ) as follows. Let p ∈ Sn. Then η(p) is the unique orthogonal linear transformation whose effect on the orthonormal basis is given by η(p)(xi ) = x p(i). Show that η is an injective homomorphism. Deduce that every group of order n is isomorphic to a subgroup of O(V ).

4.4.21 Show that every group of order n is isomorphic to a subgroup of Un.

4.4.22 Show that every finite subgroup of M(V ) is a subgroup of O(V ).

4.4.23 Find all finite subgroups of O(2).

4.4.24 Show that there is no proper open subspace of an inner product space.

4.4.25 Show that every nonempty open subset of an inner product space generates the space.

4.4.26 Show that every unit sphere {x ||| x || = 1} generates the space. 128 4 Inner Product Spaces

4.4.27 Show that every subspace of an inner product space is closed.

4.4.28 Show that every linear transformation from a finite dimensional inner product space to an inner product space is continuous.

4.4.29 Show that inner product map is also continuous.

4.4.30 Let W be a subspace of a finite dimensional inner product space V . Show that V/W is a inner product space with respect to the inner product defined by < x + W, y + W > = g.l.b.{< x + u, y + v>| u,v ∈ W}.

4.4.31 Let A be a skew-Hermitian transformation. Show that I + A and I − A are isomorphism.

4.4.32 Let V be a finite dimensional real inner product space of dimension n. Define an inner product <>on End(V ) by

< T, T > = i, j αijβij,

where T = ijαijTij and T = ijβijTij. Find the distance between the subspace S(V ) and SS(V ).

4.4.33 Check, if the following transformations from R3 to itself are orthogonal transformations.

(i) T (x1, x2, x3) = (x2, x3, x1) + − (ii) T (x , x , x ) = ( x1√ x2 , x1√ x2 , x ) 1 2 3 2 2 3 + − + − − (iii) T (x , x , x ) = ( x1√ x2 , x1 √x2 x3 , x1 x√2 2x3 ) 1 2 3 2 3 6 (iv) T (x1, x2, x3) = (x1 + x2, x3, x1).

4.4.34 Check, if the following transformations from R3 to itself are rigid motions.

(i) T (x1, x2, x3) = (x2 + 1, x3 + 2, x1) + − (ii) T (x , x , x ) = ( x1√ x2 , x1√ x2 , x ) 1 2 3 2 2 3 + − + − − (iii) T (x , x , x ) = ( x1√ x2 + 1, x1 √x2 x3 , x1 x√2 2x3 ) 1 2 3 2 3 6 (iv) T (x1, x2, x3) = (x1 + x2, x3, x1).

4.4.35 Let V be a real inner product space of dimension n and x ∈ V, x = 0. Then show that Px ={y ∈ V |< x, y > = 0} is a hyperplane in V . Show that Px = Py if and only if x = αy for some α = 0. Further, show that every hyperplane is of the form Px for some x. Determine a bijective correspondence between the set of lines and the set of hyperplanes.

4.4.36 Let Px be the hyperplane determined by an element x in a real inner product space V as described in the above exercise. Let σx be a linear transformation on V which fixes the elements of Px and maps a vector orthogonal to Px to its negative. Show that σx is uniquely defined, and it is given by 4.4 Isometries and Rigid Motions 129

2 < y, x > σ (y) = y − x. x < x, x >

This linear transformation is called the reflection about the hyperplane Px . Observe that Px = Py if and only if σx = σy. Deduce that every reflection is an orthogonal linear transformation whose square is identity. Further, find the matrix representation of σx with respect to the standard basis.

4.4.37 Let  be a finite set of generators of a real inner product space V such that 0 ∈/ .Letσ ∈ GL(V ) be such that (i) σ() =  (ii) σ fixes element wise a hyperplane P (iii) σ(α) =−α for some α ∈ .

Show that σ = σα and P = Pα.

4.4.38∗ Let  be as in the above exercise such that (i) α ∈  implies that − α ∈  (ii) σα() =  for all α ∈  2<β, α> ∈ Z , ∈  (iii) <α, α> for all α β .  Show that  < α > ={1 α, − 1 α, α, −α, 2α, −2α} for all α ∈ . 2 2  39∗ In the above exercise, if in addition,  < α > ={α, −α}, then  is called a root system.Let be a root system and α, β ∈ . Show that the angle between α , π , π , 2π , π , 3π , π , 5π and β is one of the following: 0 2 3 3 4 4 6 6 . Determine the ratio of their lengths in each case. 40∗ Determine all roots in R2. Chapter 5 Determinants and Forms

In this chapter, we introduce the concept of determinant in various ways. The invari- ant subspaces, the eigen values, the spectral theorem, the geometry of orthogonal transformations, and the geometry of bilinear forms also constitute the subject matter of this paper.

5.1 Determinant of a Matrix

We define determinant det(A) of a n × n matrix A by the induction on n as follows: Let A =[aij] be a n × n matrix. Let Aij denote the (n − 1) × (n − 1) submatrix obtained by deleting the ith row and the jth column of the matrix. Example 5.1.1 For the matrix ⎡ ⎤ 123 A = ⎣ 456⎦ , 789   56 A = , 11 89 and   12 A = 23 78

Definition 5.1.2 If A =[a11] is a 1 × 1 matrix, then define det(A) = a11. Assum- ing that the determinant of all m × m matrices, m < n has already been defined, define the determinant of a n × n matrix A by

© Springer Nature Singapore Pte Ltd. 2017 131 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_5 132 5 Determinants and Forms

( ) = n (− )i+1 ( ). det A i=1 1 ai1det Ai1

Thus,   a11 a12 1+1 2+1 det = (−1) a11detA11 + (−1) a21detA21 = a11a22 − a21a12, a21 a22 and ⎡ ⎤ a11 a12 a13 ⎣ ⎦ det a21 a22 a23 a31 a32 a33       1+1 a22 a23 2+1 a12 a13 3+1 a12 a13 = (−1) a11det + (−1) a21det + (−1) a31det a32 a33 a32 a33 a22 a23

= a11(a22a33 − a32a23) − a21(a12a33 − a32a13) + a31(a12a23 − a22a13).

Theorem 5.1.3 The determinant map det satisfies the following properties: (i) det is linear in each row of the matrix. More precisely, ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ r1 r1 r1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ r2 ⎥ ⎢ r2 ⎥ ⎢ r2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢ · ⎥ ⎢ · ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢ · ⎥ ⎢ · ⎥ det ⎢  ⎥ = adet ⎢ ⎥ + bdet ⎢  ⎥ ⎢ ar + br ⎥ ⎢ ri ⎥ ⎢ r ⎥ ⎢ i i ⎥ ⎢ ⎥ ⎢ i ⎥ ⎢ · ⎥ ⎢ · ⎥ ⎢ · ⎥ ⎣ · ⎦ ⎣ · ⎦ ⎣ · ⎦

rn rn rn for each i. (ii) If two distinct rows of a matrix A are same, then det(A) = 0. (iii) det(In) = 1.

Proof (i) We prove (i) by the induction on n.Forn = 1, det([aa11]) = aa11 = adet([a11]), and there is nothing to do. Assume that the result is true for matri- ces of order less than n.LetA be a matrix of order n. Consider a general term i+1 (−1) ai1det(Ai1) under summation in the definition of det(A).Ifk = i, then ai1 does not depend on kth row, whereas by the induction hypothesis, det(Ai1) depends linearly on kth row. However, ak1 depends linearly on kth row, whereas det(Ak1) is independent of kth row. This shows det is linear in each row. (ii) Suppose that the kth row rk is same as the lth row rl of A. Suppose that k < l.If i = k and i = l, then two rows of Ai1 will be same, and so by the induction hypothesis det(Ai1) = 0. Thus, all terms under summation of the R.H.S. of the expression for k+1 l+1 det(A) are zero except perhaps (−1) ak1det(Ak1) and (−1) al1det(Al1). Hence, then, det(A) equals 5.1 Determinant of a Matrix 133

k+1 l+1 (−1) ak1det(Ak1) + (−1) al1det(Al1).

Observe that ak1 = al1.Next,Ak1 can be obtained from Al1 by the interchange of l − l−k−1 k − 1 consecutive rows. This means that det(Ak1) = (−1) det(Al1). Substituting it we see that the det(A) = 0. (iii) Finally if A = In, then the only term which appears in the expression for the 1+1 det(In) is (−1) det(In−1) = 1. 

Remark 5.1.4 Imitating the proof of the above theorem, one can easily observe that → n (− )i+j ( ) → n (− )i+j ( ) the associations A i=1 1 aijdet Aij , A j=1 1 aijdet Aij from the set of square matrices to the field F also satisfy the three properties listed in the theorem. We shall show (see Corollary5.3.16) that an association from Mn to F satisfying the listed 3 properties in the above theorem is unique. As such, it will ( ) = n (− )i+j ( ) = n (− )i+j ( ) follow that det A i=1 1 aijdet Aij j=1 1 aijdet Aij .Thus,we can expand the determinant from any row, or from any column. In turn, det(A) = det(At).

n (− )k+j ( ) = = Corollary 5.1.5 j=1 1 aijdet Akj 0 for all i k.

Proof Suppose that i = k. Consider the matrix B =[bpq], where bpq = apq if p = k and bkq = aiq. In other words B is obtained by replacing the kth row of A by the ith row. Thus, deleting the kth row and jth column of A is same as doing the same thing = n (− )k+j ( ) on B. More precisely, Akj Bkj. The expression j=1 1 aijdet Akj becomes n (− )k+j ( ) j=1 1 bkjdet Bkj . Since two distinct rows of B are same, it follows, from the n (− )k+j ( ) =  above remark, that j=1 1 bkjdet Bkj 0. The result follows.

i+j Definition 5.1.6 Let A be a square matrix of order n. Then (−1) det(Aij) is called cof i+j the (i, j) co-factor of A. The matrix A =[(−1) det(Aij)] is called the co-factor matrix of A. The transpose (Acof )t of the co-factor matrix is called the adjoint of A, adj adj j+i and it is denoted by A . Thus, A =[bij], where bij = (−1) det(Aji).

adj adj Corollary 5.1.7 A · A = det(A)In = A · A.

adj Proof Suppose that A · A =[cij]. Then

= n = n (− )j+k ( ). cij k=1aikbkj k=1aik 1 det Ajk

From the above theorem, it follows that cij = 0ifi = j, and it is detA if i = j. adj t t adj t This proves that A · A = det(A)In. It also follows that A · (A ) = det(A )In = adj t t adj det(A)In. Further, it is also clear that (A ) = (A ) . Taking the transpose of the adj adj t t adj t t t adj equality A · A = det(A)In, we see that (A A) = A (A ) = A (A ) = t adj det(A )In = det(A)In. This shows that A · A = det(A)In. 

Remark 5.1.8 We shall see a little later that det(AB) = detAdetB. Since A · Aadj = adj n−1 det(A)In, it follows that if det(A) = 0, then det(A ) = (det(A)) . 134 5 Determinants and Forms

Corollary 5.1.9 A is invertible if and only if det(A) = 0, and then A−1 = (det(A))−1Aadj.

−1 −1 Proof Suppose that A is invertible, and A A = In. Then 1 = det(A · A) = detA−1 · detA. Hence detA = 0. Conversely, if det(A) = 0, then since A · Aadj = −1 adj det(A)In,wehaveA · (det(A)) A = In. 

The above result gives us another method to find the inverse of a matrix. This we illustrate by means of the following example. Example 5.1.10 Consider the matrix A given by ⎡ ⎤ 111 ⎣ 011⎦ . 002

Here

1+1 1+2 (−1) Det(A11) = 2,(−1) Det(A12) = 0 = 1+3 2+1 2+2 (−1) Det(A13), (−1) Det(A21) =−2,(−1) Det(A22) 2+3 3+1 3+2 = 2,(−1) Det(A23) = 0 = (−1) Det(A31), (−1) Det(A32) =−1 3+3 and (−1) Det(A33) = 1.

Thus, the co-factor matrix is ⎡ ⎤ 200 ⎣ −220⎦ . 0 −11

The adjoint Aadj, being the transpose of the co-factor matrix, is ⎡ ⎤ 2 −20 ⎣ 02−1 ⎦ . 00 1

Clearly, the det(A) = 2, and so the inverse of A is ⎡ ⎤ 1 −10 ⎣ − 1 ⎦ . 01 2 1 00 2

Corollary 5.1.11 (Cramer’s rule) Consider a system of n-linear equations in n unknowns given by the matrix equation

A · X = B,

( ) = n (− )i+j det Aji where A is invertible matrix. Then xi j=1 1 bj det(A) for all i. 5.1 Determinant of a Matrix 135

−1 = ( ( ))−1 adj = 1 adj · Proof Since A det A A ,itfollowsthatX det(A) A B.Theresult follows if we equate the rows. 

5.2 Permutations

This is a brief section on permutations with aim to introduce even and odd permu- tations. For detail, one may refer to algebra 1. A bijective map p from {1, 2,...,n} to itself is called a permutation on {1, 2,...,n}.ThesetSn of all permutations on {1, 2,...,n} is a group with respect to the composition of maps (product of permu- tations). We may represent an element p ∈ Sn (without any ambiguity) by

.... 12 n . p(1) p(2)....p(n)

Since p is a bijective map, the second row is just the rearrangement (permutation) of 1, 2,...,n.Thus,anyp ∈ Sn gives a unique permutation described above. Con- versely, if we have a rearrangement of 1, 2,...,n, then it gives rise to a unique bijective map from {1, 2,...,n} to itself by putting the rearrangement below 12 ···n as above. For example, if n = 4, the rearrangement 2314 of 1234 gives rise to a bijective map p from {1, 2, 3, 4} to itself given by p(1) = 2, p(2) = 3, p(3) = 1 and p(4) = 4. In the above notation

1234 p = . 2314

Thus, the members of Sn can be viewed as permutations. The product gf of permutations 12.... n f = f (1) f (2)....f (n) and 12.... n g = g(1) g(2)....g(n) is given by 12.... n gf = . g(f (1)) g(f (2)) . . . . g(f (n))

Example 5.2.1 If 1234 p = 3412 136 5 Determinants and Forms and 1234 q = 2134 then 1234 pq = 4312 and 1234 qp = . 3421

Thus, pq = qp. Cycles and Transpositions Now, we consider special types of permutations. Consider, for example, the permu- tation 123456 p = . 253164 p takes 1 to 2, 2 to 5, 5 to 6, 6 to 4 and 4 to 1. The remaining symbol 3 is fixed. We can faithfully represent the permutation p bytherow(12564)withtheunderstanding that each symbol goes to the following symbol, the last symbol is mapped to the first symbol, and the symbol not appearing in the row is kept fixed. Thus, the permutation

1234567 1524763 canberepresentedby(2573),whereas

1234 2143 cannot be represented in this form.

Definition 5.2.2 A permutation p ∈ Sn is called a cycle of length r ≥ 1 if there exists a subset {i1, i2,...,ir} of {1, 2,...,n} containing r distinct elements such that p(i1) = i2, p(i2) = i3,...,p(ir−1) = ir, p(ir) = i1, and p(j) = j for all j ∈{/ i1, i2,...,ir}. The cycle p is denoted by (i1i2 ···ir). A cycle of length 2 is called a transposition. Thus, a transposition is represented by (ij) which interchanges i and j, and keeps the rest of the symbols fixed.

Theorem 5.2.3 Every nonidentity permutation can be written as product of disjoint cycles. Further, any two representations of a nonidentity permutation as product of disjoint cycles are same up to a rearrangement of the cycles. 5.2 Permutations 137

Proof Let p be a nonidentity permutation in Sn. Then there exists i1 such that 2 n p(i1) = i1. Clearly, all members of the set {i1, p(i1), p (i1,...,p (i1)} cannot be r s distinct. Hence, there exist r, s; 1 ≤ r < s ≤ n such that p (i1) = p (i1). Thus, t there exists t, 1 < t ≤ n such that p (i1) = i1.Letl1 be the least positive integer l1 such that p (i1) = i1.Givenm ∈ Z, by the division algorithm, there exist q, r m r such that m = l1q + r, where 0 ≤ r ≤ l1 − 1. But, then p (i1) = p (i1). It is clear from the above observation that the effect of the permutation p on 2 l1−1 the symbols in {i1, p(i1), p (i1),...,p (i1)} is the same as that of the cycle 2 l1−1 C1 = (i1 p(i1) p (i2) ···p (i1)).Ifp = C1, then there is nothing to do. If 2 l1−1 not, there exists i2 ∈{/ i1, p(i1), p (i1), . . . , p (i1)} such that p(i2) = i2. As before, 2 l2−1 consider the cycle C2 = (i2 p(i2) p (i2) ···p (i2)), where l2 is the smallest l2 positive integer such that p (i2) = i2. Clearly, C1 and C2 are disjoint cycles. If p = C1C2, then there is nothing to do. If not, proceed. This process stops after finitely many steps giving p as product of disjoint cycles, because the symbols are finitely many. Finally, we prove the uniqueness. Suppose that p = I and

= ··· =   ··· , p C1C2 Cr C1C2 Cs =   = where Ci and Cj are disjoint for i j, and also Ck and Cl are disjoint for k l. ( ) = , ( ) =  ( ) = Suppose that p t t. Then there exist i k such that Ci t t, and also Ck t t. ( ) =  ( ) = We may assume that C1 t t and C1 t t. But, then using the arguments of the =   previous paragraph, we find that C1 C1. Canceling C1 and C1, using induction, and the fact that products of nonidentity disjoint cycles can never be identity, we find = =   that r s, and Ci Ci for all i. Remark 5.2.4 The proof of the above theorem is algorithmic, and it gives an algo- rithm to express a permutation as product of disjoint cycles. Proposition 5.2.5 Every cycle is product of transpositions.

Proof (i1i2 ···ir) = (i1ir)(i1ir−1) ···(i1i2).  Since every permutation is product of disjoint cycles, we have the following corollary. Corollary 5.2.6 Every permutation is product of transpositions.  Remark 5.2.7 Representation of a permutation as product of transpositions is not unique. For example,

(1234) = (14)(13)(12) = (14)(13)(12)(24)(24) = (14)(23)(13).

Alternating Map

Let p ∈ Sn. Consider the following rational number:

p(1)−p(2) p(1)−p(3) ··· p(1)−p(n) · p(2)−p(3) ··· p(2)−p(n) ··· p(n−1)−p(n) . 1−2 1−3 1−n 2−3 2−n (n−1)−n 138 5 Determinants and Forms

The above expression in short is denoted by

p(i)−p(j) . − 1≤i

p(i)−p(j) =± ∈ Proposition 5.2.8 1≤i

Definition 5.2.9 The map χ from Sn to {1, −1} defined by

( ) = p(i)−p(j) χ p − 1≤i

Theorem 5.2.10 The alternating map χ : Sn −→ { 1, −1} is a surjective map which takes any transposition to −1. Further, it is a homomorphism in the sense that χ(pq) = χ(p)χ(q) for all p, q ∈ Sn.

Proof We first show that χ is a homomorphism.

( )− ( ) ( ( ))− ( ( )) ( )− ( ) χ(pq) = pq i pq j = p q i p q j q i q j = 1≤i

Since q is a permutation of 1, 2,...n,itfollowsthat

p(q(i))−p(q(j)) = ( ). ( )− ( ) χ p 1≤i

This shows that χ is a homomorphism. Hence χ(p)χ(p−1) = χ(I) = 1 for all permutation p. Consider the transposition τ = (1, 2). Clearly,

( ) = 2−1 · 2−3 ··· 2−n · 1−3 · 1−4 ··· 1−n =−. χ τ 1−2 1−3 1−n 2−3 2−4 2−n 1

Consider a general transposition σ = (k, l). Take a permutation p ∈ Sn for which p(1) = k, p(2) = l. Observe that such a permutation exists. Then pτp−1 = σ. Hence χ(σ) = χ(p)χ(τ)χ(p−1) =−1. Thus, χ takes any transposition to −1.  5.2 Permutations 139

Corollary 5.2.11 Let p ∈ Sn. Suppose that

p = σ1σ2 ···σr = τ1τ2 ···τs, where σi and τj are transpositions. Then r ≡ s(mod2), i.e., 2 divides r − s (equiva- lently r and s both are simultaneously even, or both are simultaneously odd). Proof From the above theorem, it follows that

χ(p) = χ(σ1)χ(σ2) ···χ(σr) = χ(τ1)χ(τ2) ···χ(τs).

Since χ takes a transposition to −1, (−1)r = (−1)s. Hence r − s is even.  Remark 5.2.12 From the above corollary, it follows that if we can write a permutation as a product of even number of transpositions, then we cannot write it as a product of odd number of transpositions, and if we can write it as product of odd number of transpositions, then we cannot write it as a product of even number of transpositions. Definition 5.2.13 A permutation p is called an even permutation, if it can be expressed as product of even number of permutations, or equivalently, χ(p) = 1. It is said to be an odd permutation if it can be expressed as product of odd number of transpositions. We also say that sign(p) = 1, if p is an even permutation, and sign(p) =−1ifp is an odd permutation. Thus, χ(p) is also written as sign(p).The set An of all even permutations is a subgroup of Sn called the alternating group. Example 5.2.14 Consider the permutation p given by

12345 p = . 24513

Then p = (124)(35) = (12)(14)(35) is product of 3 transpositions. Hence p is an odd permutation and so sign(p) = χ(p) =−1. ={ | ∈ } Proposition 5.2.15 Let τ be a transposition in Sn. Then Anτ pτ p An and An are disjoint and Sn = An Anτ.

Proof Follows from the fact that Anτ is the set of all odd permutations, whereas An is the set of all even permutations. 

5.3 Alternating Forms, Determinant of an Endomorphism

Let V1, V2,...,Vr and W be vector spaces over a field F.Amapf from V1 × V2 × ···×Vr to W is called a multilinear map if

( , ,..., , + , ,..., ) f x1 x2 xi−1 axi bxi xi+1 xr = ( , ,..., , , ,..., ) + ( , ,..., , , ,..., ) af x1 x2 xi−1 xi xi+1 xr bf x1 x2 xi−1 xi xi+1 xr 140 5 Determinants and Forms

, ∈ ∈  ∈ for all a b F, xj Vj, and xi Vi. Thus, a multilinear map is a map which is linear in each coordinate. If Vi = V for each i, then it is called a r-linear map on V .Ifin addition W = F, then it is said to be a r-linear form on V . A 2-linear form on V is also called a bilinear form on V The vector product on R3 is a bilinear map from R3 × R3 to R3, and the scalar product on R3, or in general, an inner product on a real vector space V is a bilinear form on V .

Definition 5.3.1 A r-multilinear map f on a vector space V is called r-alternating map if f (x1, x2,...,xr) = 0 whenever xi = xj for some i = j.

The vector product on R3 is a 2−alternating map. The map f from R2 × R2 to 2 R given by f ((a1, a2), (b1, b2)) = a1b2 − a2b1 is a 2-alternating form on R .The map (a, b, c) → (a × b) · c defines 3-alternating form (called the volume form) on R3. The sum of two r-alternating maps from V to W is a r-alternating map (verify), and the scalar multiple of a r-alternating map is also a r-alternating map. Thus, the set Ar(V, W) of all r-alternating maps form a vector space with respect the above operations. Next, let T be a linear transformation from V to W.ThemapT r from V r to W r r defined by T (x1, x2,...,xr) = (T(x1), T(x2),...,T(xr)) is a linear transforma- tion. If f is a r-alternating map from W to U, and T a linear transformation from V to W, then foT r defines a r-alternating map from V to U(verify). This defines a linear transformation Ar(T) from Ar(W, U) to Ar(V, U). The following properties can be verified easily. ( ) = 1. Ar IV IAr (V,V ). 2. Ar(T2oT1) = Ar(T1)oAr(T2), where T1 is a linear transformation from W1 to W2 and T2 is a linear transformation from W2 to W3. In particular, Ar defines a linear transformation from End(V ) to End(Ar(V )).

Proposition 5.3.2 Let f be a r-alternating form on V , and {x1, x2,...,xr} a linearly dependent set. Then f (x1, x2,...,xr) = 0.

Proof Under the hypothesis, there is an i such that xi is a linear combination of the rest of the coordinates. Substituting this linear combination at the ith coordinate, expanding, and using the property of being alternative, we get the result. 

Corollary 5.3.3 Let V be a vector space of dimension n. Then the vector space Ar(V, W) ={0} for all r > n.  5.3 Alternating Forms, Determinant of an Endomorphism 141

Proposition 5.3.4 Let f be a r-alternating map on V . Then

f (x1, x2,...,xi−1, xi, xi+1,...,xj,...,xr) = −f (x1, x2,...,xi−1, xj, xi+1,...,xj−1, xi, xj+1,...,xr).

In other words, if we interchange two coordinates, then the value of f changes its sign. Proof From the definition of alternating map, it follows that

f (x1, x2,...,xi−1, xi + xj, xi+1,...,xj−1, xi + xj, xj+1,...,xr) = 0.

Expanding, and observing that

f (x1, x2,...,xi−1, xi, xi+1,...,xj−1, xi, xj+1,...,xr) = 0 = f (x1, x2,...,xi−1, xj, xi+1,...,xj−1, xj, xj+1,...,xr) the result follows.  The above result can be restated as follows.

Proposition 5.3.5 Let f be a r-alternating map on V , and τ a transposition in Sr. Then  f (xτ(1), xτ(2),...,xτ(r)) =−f (x1, x2,...,xr).

In general, we have the following proposition.

Proposition 5.3.6 Let f is a r-alternating map from V to W, and p ∈ Sr. Then

f (xp(1), xp(2),...,xp(r)) = sign(p)f (x1, x2,...,xr), where sign(p) = 1, if p is even permutation, and it is −1 if p is an odd permutation.

Proof If p = τ1τ2 ···τm, then applying the above result successively, we see that

m f (xp(1), xp(2),...,xp(r)) = (−1) f (x1, x2,...,xr).

The result follows. 

Proposition 5.3.7 Let f be a r-alternating map on V . Let {x1, x2,...,xr} and { , ,..., } = r y1 y2 yr be subsets of V . Suppose that yj i=1aijxi. Then

( , ,..., ) =  ∈ ( ) ( ) ( ) ··· ( ) ( , ,..., ) = f y1 y2 yr p Sr sign p a1p 1 a2p 2 arp r f x1 x2 xr  ( ) r ( , ,..., ). p∈Sr sign p i=1 aip(i)f x1 x2 xr

( , ,..., ) = (r ,r ,...,r ) Proof f y1 y2 yr f i=1ai1xi i=1ai2xi i=1airxi . Expanding by multilinearity and keeping in mind that the value of f is 0 whenever two arguments are same, we see that 142 5 Determinants and Forms

( , ,..., ) =  ··· ( , ,..., ). f y1 y2 yr p∈Sr ap(1)1ap(2)2 ap(r)rf xp(1) xp(2) xp(r)

Using the above proposition we find that

r  f (y1, y2,...,yr) = p∈S sign(p) ap(i)if (x1, x2,...,xr). r i=1

Corollary 5.3.8 Let V be a vector space with a basis {x1, x2,...,xn} and f a n- alternating form on V . Then f is uniquely determined by its value f (x1, x2,...,xn) on (x1, x2,...,xn).  Theorem 5.3.9 Let V be a vector space of dimension n over a field F. Then the dimension of An(V, F) is 1.

Proof Let {u1, u2,...,un} be a basis of V . Then every n-alternating form f is uniquely determined by its value f (u1, u2,...,un) on (u1, u2,...,un). Indeed, given n ( , ,..., ) =  ( , ,..., ) =  ∈ x1 x2 xn such that xj i=1aijui, f x1 x2 xn p Sn sign ( ) n ( , ,..., ) ( , ) p i=1 ap(i)if u1 u2 un . This shows that the dimension of An V F is at most 1(anytwon-alternating map differ by a scalar multiple). It is sufficient, therefore, to show that An(V, F) ={0}. We show that the map f defined by

n f (x1, x2,...,xn) = p∈S sign(p) ap(i)i, n i=1 = n ( , ,..., where xj i=1aijui is a nonzero n-alternating form on V . Clearly, f u1 u2 un) = 1, for then the matrix [aij] is the identity matrix, and so aii = 1 for all i and aij = 0fori = j. It is clearly an n-linear map. We show that it is alternat- ing. Suppose that xj = xk, j = k. Then aij = aik for all i.Letp be a permu- tation and τ = (jk). Then ap(τ(j))j = ap(k)j = ap(k)k and ap(τ(k))k = ap(j)k = ap(j)j, and for l ∈{/ j, k}, ap(τ(l))l = ap(l)l.Itfollowsthatap(1)1ap(2)2 ···ap(n)n = apτ(1)1apτ(2)2 ···apτ(n)n, and so sign(p)ap(1)1ap(2)2 ···ap(n)n =−sign(pτ)apτ(1)1 apτ(2)2 ···apτ(n)n.NowSn is disjoint union of An and Anτ ={pτ | p ∈ An}. Hence,

n  ∈ ( ) ( ) = p Sn sign p i=1 ap i i  ( ) n +  ( ) n = . p∈An sign p i=1 ap(i)i pτ|p∈An sign pτ i=1 ap(i)i 0

This shows that f (x1, x2,...,xn) = 0 whenever xj = xk for some j = k. Thus, f is alternating.  Let T be an endomorphism of V , where V is a vector space of dimension n over a field F. Then T induces a linear transformation An(T) from An(V, F) to itself. Since An(V, F) is of dimension 1, it follows that An(T) is multiplication by a scalar. This scalar is denoted by det(T), and it is called the determinant of T. Thus, An(T)(f ) = det(T) · f . This defines a map det from End(V ) to F, and it is called the determinant map on End(V ). Since An(T1oT2) = An(T2)oAn(T1),wehavethe following corollary.

Corollary 5.3.10 det(T1oT2) = det(T2) · det(T1) = det(T1) · det(T2).  5.3 Alternating Forms, Determinant of an Endomorphism 143

Corollary 5.3.11 Let T be a linear transformation from a vector space V to itself. { , ,..., } ( ) = n ( ) = Let u1 u2 un be a basis of V . Suppose that T uj i=1aijui. Then det T  ( ) n p∈Sn sign p i=1 ap(i)i. n Proof Let f be a n− alternating form on V . Then by the definition, An(T)(f ) = foT . Now, n (foT )(u1, u2,...,un) = f (T(u1), T(u2),...,T(un))

= (n ,n ,...,n ) f i=1ai1ui i=1ai2ui i=1ainui

=  ( ) n ( ), ,..., ) p∈Sn sign p i=1 ap(i)if u1 u2 un = det(T)f (u1, u2,...,un). This shows that

n  det(T) = p∈S sign(p) ap(i)i. n i=1

The following corollary is immediate from the above proposition.

Corollary 5.3.12 det(IV ) = 1.  Corollary 5.3.13 Let T be a linear transformation from V to V . Then T is invertible if and only if det(T) = 0.

−1 −1 Proof Suppose that T is invertible. Then ToT = IV . Hence 1 = det(ToT ) = det(T) · det(T −1). This shows that det(T) = 0. Conversely, suppose that det(T) = 0. Let {u1, u2,...,un} be a basis of V . It is sufficient to show that {T(u1), T(u2),..., T(un)} is linearly independent. Suppose not. Then for any f ∈ An(V ), An(T)(f ) (u1, u2,...,un) = f (T(u1), T(u2),...,T(un)) = 0 = det(T)f (u1, u2,...,un). Since there is a f ∈ An(V ) such that f (u1, u2,...,un) = 0, it follows that det(T) = 0. This is a contradiction. 

Corollary 5.3.14 det(T) = det(T t).

Proof Let T be a linear transformation on V and {u1, u2,...,un} a basis of V . { , ,..., } ( ) = n Consider the dual basis u1 u2 un . Suppose that T uj i=1aijui. Then t( ) = n  = T uj i=1bijui , where bij aji. Thus, (see the above corollary)

n n t det(T ) = p∈S sign(p) bp(i)i = p∈S sign(p) aip(i). n i=1 n i=1

n n − − = − , ( ) = ( 1)  1 Since i=1 ap(i)i i=1 aip 1(i) sign p sign p , and p p is a bijective map on Sn,wehave

n n p∈S sign(p) ap(i)i = p∈S sign(p) aip(i). n i=1 n i=1

The result follows.  144 5 Determinants and Forms

Corollary 5.3.15 Any map f from the set Mn(F) of n × n matrices with entries in F to F which is linear on each row and which is 0 on matrices with two rows ( ) same is uniquely determined by its value f In on the identity matrix In. In fact, ( ) =  ( ) n ( ) f A p∈Sn sign p i=1 ap(i)if In .

n Proof Take V = F and realize a n × n matrix A as an element (r1, r2,...,rn) of n V , where ri =[ai1, ai2,...,ain] is the ith row of A. Then ri = ai1e1 + ai2e2 + ···+ainen. Since f is n-alternating, it follows from the above theorem that f (A) = ( , ,..., ) f r1 r2 rn n =  ∈ ( ) ( ) ( , ,..., ) p Sn sign p i=1 ap i if e1 e2 en =  ( ) n ( )  p∈Sn sign p i=1 ap(i)if In . In particular, we have the following corollary. ( ) × Corollary 5.3.16 We have the unique map from the set Mn F of n n matrices →  ( ) n with entries in F to F given by A p∈Sn sign p i=1 ap(i)i which is linear on each row and which is 0 on matrices with two rows same and which is 1 on In.  The following corollary follows from Theorem 5.1.3 and the above corollary.

( ) =  ( ) n Corollary 5.3.17 det A p∈Sn sign p i=1 ap(i)i

Let A =[aij] be a n × n matrix with entries in a field F. Then A defines a linear n transformation LA from the vector space F to itself by ⎛⎡ ⎤⎞ ⎡ ⎤ x1 x1 ⎜⎢ ⎥⎟ ⎢ ⎥ ⎜⎢ x2 ⎥⎟ ⎢ x2 ⎥ ⎜⎢ ⎥⎟ ⎢ ⎥ ⎜⎢ · ⎥⎟ ⎢ · ⎥ LA ⎜⎢ ⎥⎟ = A · ⎢ ⎥ ⎜⎢ · ⎥⎟ ⎢ · ⎥ ⎝⎣ · ⎦⎠ ⎣ · ⎦

xn xn

n n Let {e1, e2,...,en} be the standard basis of F (we write the elements of F as columns). Then ( ) = n . LA ej i=1aijei

, ,..., In other words Me1 e2 en (L ) = A. It follows that e1,e2,...,en A

n det(LA) = p∈S sign(p) ap(i)i = det(A). n i=1

To summarize, we list some of the important properties of determinant of matri- ces/linear transformations. The first 3 properties are the defining properties, and the rest of them are the consequences which are useful in computations and other discussions. 1. det(In) = 1 = det(IFn ). 2. Determinant is a multilinear map on rows/columns of matrices. 3. Determinant of a matrix is zero whenever two distinct rows/columns are same. 5.3 Alternating Forms, Determinant of an Endomorphism 145

( ) = ( t) =  ( ) n 4. det A det A p∈Sn sign p i=1 aip(i). The following property of determinant which is a consequence of 3 and 4 is useful in computing the determinant of a matrix. See the example below. 5. Determinant of a matrix does not change if we add a multiple of a row (column) in another row (column). 6. det(A · B) = det(A) · det(B). This follows from the fact that LA·B = LAoLB, and det(LAoLB) = det(LA) · det(LB) = det(A) · det(B). 7. A is invertible if and only if det(A) = 0. This follows from the facts: (i) A is invertible if and only if LA is invertible, and (ii) LA is invertible if and only if Det(A) = Det(LA) = 0. 8. Determinant of a upper(lower) triangular matrix is product of their diagonal entries: Let A =[aij] be an upper triangular matrix. Then aij = 0fori > j.Let ∈ ( )> p Sn.Ifp is a nonidentity permutation, then p i i for at least one i. Hence, the ( ) n = term sign p i=1 ap(i)i 0 for every nonidentity permutation p. This shows that ( ) = n det A i=1 aii. In particular, determinant of a diagonal matrix is product of the ( ) = n ( λ) = diagonal entries. det aIn a . det Eij 1. 9. Determinant of the permutation matrix Ap determined by the permutation p is sign(p) (verify).

Example 5.3.18 Consider the n × n matrix An =[aij], where aij = min(i, j).For example, ⎡ ⎤ 1111 ⎢ 1222⎥ A = ⎢ ⎥ . 4 ⎣ 1233⎦ 1234

Subtracting the first row of An from the rest of the following rows of An, it reduces to   111×(n−1) , 0(n−1)×1 An−1 where 11×(n−1) is a 1 × (n − 1) matrix with each entry 1 and 0(n−1)×1 represents (n − 1) × 1 matrix with each entry 0. Thus, det(An = det(An−1). Using induction, and the fact that det(A1) = 1, we see that det(An) = 1 for all n.

Example 5.3.19 Vandermonde matrix and determinant. A matrix Vn of the type ⎡ ⎤ 11··· 1 ⎢ ⎥ ⎢ x1 x2 ··· xn ⎥ ⎢ 2 2 ··· 2 ⎥ ⎢ x1 x2 xn ⎥ ⎢ · · ··· · ⎥ Vn = ⎢ ⎥ ⎢ ⎥ ⎢ · · ··· · ⎥ ⎣ · · ··· · ⎦ n−1 n−1 ··· n−1 x1 x2 xn 146 5 Determinants and Forms

( ) is called the Vandermonde matrix, and det Vn of a Vandermonde matrix is called a ( ) = ( − ) Vandermonde determinant. We show, by induction, that det Vn n≥i>j xi xj . If xi = xj for some i = j, then two columns of the Vandermondematrix are same, and so the det(Vn) = 0, and then there is nothing to do. Assume that all x1, x2,...,xn are distinct. For n = 1, there is nothing to do. For n = 2, det(V2) = x2 − x1. Thus, the result is true for n = 1 and n = 2. Assume that the result is true for all m < n ≥ 3. Let f (t) = det(Vn(t)), where the matrix Vn(t) is obtained by replacing xn = t in the Vandermonde matrix Vn. Clearly, f (t) is a polynomial in t of degree n − 1. Each xi, i ≤ n − 1 is a root of f (t) for if we replace t = xi, i ≤ n − 1, then det(Vt) = 0. Thus f (t) = a(t − x1)(t − x2) ···(t − xn−1) for some constant a which is the n−1 ( ) n−1 ( ) coefficient of t in f t . Clearly, the coefficient t is det Vn−1 . By the induction = ( ) = ( − ) hypothesis a det Vn−1 n≥i>j xi xj . Substituting the value of a we find ( ) = ( − ) that det Vn n≥i>j xi xj . Determinant as Volume Form Consider a parallelogram in R2 with co-terminus edges OP and OQ with P and Q having position vectors r1 =[a11, a12] and r2 =[a21, a22] respectively. The ⊥ area  of the parallelogram is given by  = base × height =|r1 || r2 |, where ⊥ r1 r1 r2 = r2− < r2, > is the resolution of r2 orthogonal r1.Now, |r1| |r1|

⊥ 2 r1 r1 r1 r1 (| r2 |) = < r2− < r2, > , r2− < r2, > > |r1| |r1| |r1| |r1|

< , >2 =| |2 − r2 r1 . r2 2 |r1|

Thus, 2 2 2 2  =|r1 | | r2 | − < r2, r1 >   t t r1r1 r1r2 t = det( t t ) = det(AA ), r2r1 r2r2 where     r a a A = 1 = 11 12 . r2 a21 a22

Similarly, if we take a parallelogram in R3 with co-terminus edges OP and OQ with P and Q having position vectors r1 =[a11, a12, a13] and r2 =[a21, a22, a23] ⊥ respectively, then the area  of the parallelogram is base × height =|r1 || r2 |, ⊥ = − < , r1 > r1 where r2 r2 r2 | | | | is the resolution of r2 orthogonal r1. It turns out √r1 r1 again that the area  = det(AAt), where   a a a A = 11 12 13 . a21 a22 a23 5.3 Alternating Forms, Determinant of an Endomorphism 147

More generally, it follows by induction that the volume V of the parallelepiped n { , ,..., } in R whose√ co-terminus edges are given by the vectors r1 r2 rm is given by V = det(AAt), where ⎡ ⎤ r1 ⎢ ⎥ ⎢ r1 ⎥ ⎢ ⎥ ⎢ · ⎥ A = ⎢ ⎥ . ⎢ · ⎥ ⎣ · ⎦

rm

In particular, the volume V of the parallelepiped in Rn whose co-terminus edges are given by vectors {r1, r2,...,rn} is given by V = det(A), where ⎡ ⎤ r1 ⎢ ⎥ ⎢ r1 ⎥ ⎢ ⎥ ⎢ · ⎥ A = ⎢ ⎥ . ⎢ · ⎥ ⎣ · ⎦

rn

Example 5.3.20 The area  of the parallelogram in R3 with co-terminus edges given by vectors [1, 0, 1] and [2, 1, 1] is given by   ⎡ ⎤     12    101 23 √ Det( ⎣ 01⎦) = Det( ) = 6. 211 36 11

Exercises

5.3.1 Let V be a vector space of dimension n and W a vector space of dimension m. Find a basis and also the dimension of Ar(V, W). In particular, find a basis and n show that the dimension of Ar(V, F) is Cr.

5.3.2 Find the determinant of the matrix ⎡ ⎤ 11 1 1 ⎢ ⎥ ⎢ 12 3 4⎥ . ⎣ 14 9 16⎦ 1 8 27 64

Find the co-factor, adjoint, and also the inverse of the matrix.

5.3.3 Find the determinant of the matrix An given by 148 5 Determinants and Forms ⎡ ⎤ x1 x2 ···xn ⎢ ⎥ ⎢ x2 x2 ···x2 ⎥ ⎢ 1 2 n ⎥ ⎢ ······⎥ An = ⎢ ⎥ . ⎢ ······⎥ ⎣ ······⎦ n n ··· n x1 x2 xn

5.3.4 Show that the determinant of an orthogonal matrix is ±1. Let A be a 2 × 2 orthogonal matrix whose determinant is 1. Show that it is a rotation matrix in the sense that there exist a θ such that   cosθ sinθ . −sinθ cosθ

Show further that if det(A) =−1, it represents reflection in the plane about a line passing through origin.

5.3.5 Find the determinant of a n × n matrix A =[aij], where aij = max(i, j).

5.3.6 Show that the determinant of the n × n matrix An whose ith row is [(i − 1)n + 1,(i − 1)n + 2,...,in] is 0 for n ≥ 3. What is the rank An?

5.3.7 Show that the determinant of a unitary matrix is a complex number whose modulus is 1. Conversely, show that any complex number with modulus 1 is deter- minant of a unitary matrix.

5.3.8 Let R be a commutative integral domain. Then it can be considered as a subring of a field F.LetA be a matrix with entries in R and so in F. Show that A has inverse with entries in R if and only if det(A) is a unit in R in the sense that its inverse is in R. Deduce that a matrix A with entries in Z has inverse with entries in Z if and only if det(A) =±1.

5.3.9 Let p be a permutation of degree n, and Ap is the matrix obtained by permuting the rows of the identity matrix through permutation p. Show that Ap is an orthogonal matrix whose determinant is sign(p).

5.3.10 Suppose that A is invertible. Show that Aadj is also invertible. Is the converse true? Support.

5.3.11 Suppose that A is a 3 × 3 invertible matrix with determinant 3. Find the determinant of Aadj.

5.3.12 Suppose that A is a invertible 4 × 4 matrix such that det(Aadj) = 8. Find the determinant of A.

5.3.13 Can we find a invertible 4 × 4 rational matrix A such that det(Aadj) = 2? Support. 5.3 Alternating Forms, Determinant of an Endomorphism 149

5.3.14 Let A be a real skew-symmetric n × n matrix, where n is odd. Show that det(A) = 0. Deduce that A is not invertible.

5.3.15 Find the solution of the following system of linear equation using the Cramer’s rule: x + y + z + t = 3

2x + 3y + 4z + t = 5

4x + 9y + 16z + t = 2

8x + 27y + 64z + t = 1.

n 5.3.16 Let {r1, r2,...,rn−1} be an ordered set of n − 1 vectors in R . Define a map f from Rn to R by ⎛⎡ ⎤⎞ x ⎜⎢ ⎥⎟ ⎜⎢ r1 ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ r2 ⎥⎟ ⎜⎢ ⎥⎟ f (x) = Det ⎜⎢ · ⎥⎟ . ⎜⎢ ⎥⎟ ⎜⎢ · ⎥⎟ ⎝⎣ · ⎦⎠

rn−1

Show that f is a linear functional on Rn. Deduce that there is a unique vector u such that f (x) = < x, u >. Let us call this vector u the vector product of {r1, r2,...,rn−1}. Observe that on R3, the concept agrees with that of usual vector product on R3. Show that the vector product on Rn defined above is a n − 1 alternating form on Rn.

5.3.17 Find the vector product in R4 of the set of three ordered vectors {(1, 1, 1, 1), (1, 2, 3, 4), (1, 4, 9, 16)}. Determine also the volume of the parallelepiped formed by the three given vectors as co-terminus edges.

5.3.18 Check if ⎛⎡ ⎤⎞ r1 ⎜⎢ ⎥⎟ ⎜⎢ r2 ⎥⎟ ⎜⎢ ⎥⎟ ⎜⎢ · ⎥⎟ | det ⎜⎢ ⎥⎟ |≤|r1 || r2 | ···|rn | . ⎜⎢ · ⎥⎟ ⎝⎣ · ⎦⎠

rn

Determine the condition under which the equality holds. 150 5 Determinants and Forms

5.4 Invariant Subspaces, Eigenvalues

Let T be a linear transformation on V . A subspace W of V is called an invariant subspace if T(W) ⊆ W. Clearly, the zero space {0} and the whole space V are invari- ant subspaces. These invariant subspaces are called improper invariant subspaces. Other invariant subspaces are called proper invariant subspaces. A linear transformation need not have any proper invariant subspaces. For exam- R2 π ple, the rotation in through the angle 4 radian has no proper invariant subspace. We shall be mainly interested in one-dimensional invariant subspaces. Let T be a linear transformation on a vector space V . An element x ∈ V −{0} is called an eigenvector or a characteristic vector or a proper vector if there is a λ in F such that T(x) = λx. Clearly, such a λ is unique, and it is called the eigenvalue or characteristic value or a proper value corresponding to the eigenvector x.Ifx = 0 is an eigenvector corre- sponding to the eigenvalue λ, then T(αx) = αT(x) = αλx = λαx. This means that the subspace < x > generated by x is a one-dimensional invariant subspace of which any nonzero vector is an eigenvector corresponding to the same eigen- value. Conversely, any nonzero element of a one-dimensional invariant subspace is an eigenvector, and all nonzero vector of this invariant subspace corresponds to same eigenvalue. The eigenvalues and eigenvectors of a matrix A are defined to be the eigenvalues t and eigenvectors of the linear transformation LA. Thus, a nonzero column vector X n t t in F is an eigenvector of A corresponding to the eigenvalue λ if LA(X ) = A · X = t λX . Theorem 5.4.1 Let V be a finite-dimensional vector space over a field F. Then λ ∈ F is an eigenvalue of a linear transformation T on V if and only if det(λI − T) = 0. In turn, λ ∈ F is an eigenvalue of a n × n matrix A with entries in F if and only if det(λIn − A) = 0.

Proof By the definition, λ is an eigenvalue of a linear transformation T on V if and only if there is a nonzero element x ∈ V such that 0 = λx − T(x) = (λI − T)(x). This is equivalent to say that det(λI − T) = 0. The rest of the statement follows if we apply the result for the linear transformation LA determined by the matrix A.

Let A =[aij] be a n × n matrix with entries in F. Then xIn − A is a matrix with entries in the polynomial ring F[x]. The determinant of xIn − A =[xδij − aij] defined again by the formula

n p∈S sign(p) (xδp(i)i − ap(i)i) n i=1 is a polynomial in F[x]. If we follow the rule of expansion of determinant, we see that it is a polynomial of degree n in F[x]. This polynomial is called the characteristic polynomial of A and is denoted by φA(x). 5.4 Invariant Subspaces, Eigenvalues 151

Example 5.4.2 Consider the matrix A given by ⎡ ⎤ 101 A = ⎣ 010⎦ . 111

The characteristic polynomial φA(x) is given by ⎛⎡ ⎤⎞ x − 10 −1 ⎝⎣ ⎦⎠ 3 2 φA(x) = Det(xI3 − A) = Det 0 x − 10 = x − 3x − 2x. −1 −1 x − 1

Definition 5.4.3 The determinant of a r × r submatrix of A all of whose diagonal entries are also the diagonal entries of A is called a principal r − minor of A.

Thus, the principal 1−minors are precisely diagonal entries of A. There are 3 principal 2−minorsofa3× 3 matrix which are det(A11), det(A22) and det(A33). How many principal r × r principal minors of a n × n matrix will be there? The following result follows immediately from the expansion rule of the determinant.

Proposition 5.4.4 Let A be a n × n matrix. Then the characteristic polynomial φA(x) of A is given by

n n−1 n−2 r n−r n φA(x) = x − a1x + a2x + ··· + (−1) arx + ··· + (−1) an, where ar is sum of principal r− minors of A. 

In particular, it follows that a1 is the sum of diagonal entries of A. This is called the trace of A. Similarly, an is the determinant of A. Corollary 5.4.5 The eigenvalues of a matrix A with entries in a field F are precisely the roots of the characteristic polynomial φA(x) which are in F.

Proof The roots of φA(x) are precisely those λ for which det(λI − A) = 0. Equiv- alently, λI − A is singular. This is further equivalent to say that there is a nonzero t t t vector X such that AX = λX . 

Corollary 5.4.6 Let A and B be similar matrices with entries in a field F. Then (i) φA(x) = φB(x). (ii) Sum of principal r-minors of A is same as the sum of the principal r-minors of B. (iii) Trace of A is same as trace of B, and the determinant of A is also same as that of B. (iv) Eigenvalues of A are same as those of B.

Proof (ii), (iii), and (iv) are consequence of (i). Thus, it is sufficient to show the (i). Suppose that B = PAP −1. Then 152 5 Determinants and Forms

−1 −1 φB(x) = det(xIn − B) = det(PxP − PAP ) = −1  detP(Det(xIn − A))det(P ) = det(xIn − A) = φA(x).

Remark 5.4.7 Matrices having same characteristic polynomial (and so same sum of principal r-minors, same trace, same determinant, and same eigenvalues) need not be similar. Consider, for example, a nonidentity uni-upper triangular n × n matrix A. Then the characteristic polynomial of A is clearly (x − 1)n which is same as that of the identity matrix. But identity matrix is similar only to identity matrix.

Example 5.4.8 The characteristic polynomial φA(x) of the matrix A in Example5.4.2 3 2 is φA(x) = x − 3x − 2x. Thus, the eigenvalues A (which are the roots of the characteristic polynomial) are 0, 1, and 2. We also find eigenvectors. Suppose that the column vector ⎡ ⎤ x1 ⎣ ⎦ x2 x3 is an eigenvector corresponding to 0. Then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 101 x1 x1 0 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 010 x2 = 0 · x2 = 0 . 111 x3 x3 0

Solving, we get that x1 =−x3, and x2 = 0. Thus, eigenvectors of A corresponding to the eigenvalue 0 are the set of nonzero vectors of the form ⎡ ⎤ a ⎣ 0 ⎦ −a

Similarly, eigenvectors of A corresponding to the eigenvalue 1 are the set of nonzero vectors of the form ⎡ ⎤ a ⎣ −a ⎦ , 0 and that corresponding to eigenvalue 2 are the set of nonzero vectors of the form ⎡ ⎤ a ⎣ 0 ⎦ . a

Remark 5.4.9 A square matrix A with entries in a field F need not have any eigen- value (in F). For example, the characteristic polynomial of the matrix 5.4 Invariant Subspaces, Eigenvalues 153   01 −10 is x2 + 1 which has no real root and so the matrix has no eigenvalues.

Let T be a linear transformation on a finite-dimensional vector space V . Then the matrix representations of T with respect different choices of bases are all similar. As such, we can define the characteristic polynomial of T to be the characteristic polynomial of any matrix representing T. Eigenvalues, trace, determinant of a linear transformation are related to the characteristic polynomial of T. A linear transformation T on a finite-dimensional vector space V is said to be semi − simple or diagonalisable, if there is a basis of V with respect to which the matrix of T is a diagonal matrix. This is equivalent to say that there is a basis {x1, x2,...,xn} of V such that T(xi) = λixi for some λi ∈ F. In other words T is diagonalisable if and only if there is a basis of V consisting of eigenvectors of T. We know that matrices corresponding to different bases are similar, and the similar matrices represent same linear transformation corresponding to different choices of bases. It is also clear that if T is diagonalisable, then any linear transformation similar to T is diagonalisable. A n × n matrix A with entries in F is said to be diagonalizable or semi − simple is LA is diagonalisable. This is equivalent to say that A is similar to a diagonal matrix. Thus, a n × n matrix A is digonalizable if and only if Fn has a basis consisting of eigenvectors of A. Theorem 5.4.10 Let T be linear transformation on a vector space V of finite dimen- sion. Let λ1, λ2,...,λr be a set distinct eigenvalues of T. Let x1, x2,...,xr be the corresponding eigenvectors. Then {x1, x2,...,xr} is linearly independent.

Proof Suppose contrary. Then {x1, x2,...,xr} is linearly dependent. Since the eigen- vectors are nonzero, there is a minimal linearly dependent subset of {x1, x2,...,xr} which, of course, contains at least two elements. After rearranging, we may suppose that {x1, x2,...,xs} is a minimal linearly dependent subset of {x1, x2,...,xr}. Then there exists α1, α2,...,αs not all zero such that

α1x1 + α2x2 + ··· + αsxs = 0 ··· ··· . (5.1)

All αi in Eq. 5.1 are nonzero, for otherwise it will contradict the assumption that {x1, x2,...,xs} is a minimal linearly dependent subset. Applying the linear transfor- mation T on Eq. 5.1, we get that

α1λ1x1 + α2λ2x2 + ··· + αrλsxs = 0 ··· ··· . (5.2)

Multiplying Eq. 5.1 by λ1, we get that

λ1α1x1 + λ1α2x2 + ··· + λ1αsxs = 0 ··· ··· . (5.3) 154 5 Determinants and Forms

Subtracting Eq. 5.3 from 2, we get that

α2(λ2 − λ1)x1 + α3(λ3 − λ1)x3 + ··· + αs(λs − λs)xs = 0.

Since each αi = 0 and λi = λ1 for i ≥ 2, it reduces to a contradiction to the suppo- sition that {x1, x2,...,xs} is minimal linearly dependent set. 

Corollary 5.4.11 Let T be linear transformation on a vector space V of dimension n. Suppose that T has n distinct eigenvalues. Then T is diagonalisable.

Proof Suppose that T has distinct eigenvalues λ1, λ2,...,λn, and {x1, x2,...,xn} the corresponding of eigenvectors of T. From the above theorem {x1, x2,...,xn} is linearly independent. Since the dim V = n, it is a basis of V . Clearly, the matrix of T relative to this basis is the diagonal matrix diag(λ1, λ2,...,λn). 

Corollary 5.4.12 If a n × n matrix A with entries in F has n distinct eigenvalues in F, then A is similar to a diagonal matrix. Indeed, if λ1, λ2,...,λn are distinct t t t eigenvalues of A, and r1 , r2 ,...,rn are the corresponding column eigenvectors, t t t −1 then the matrix P =[r1 , r2 ,...,rn ] is a nonsingular matrix such that P AP = diag(λ1, λ2,...,λn).

t t t Proof From Theorem 5.4.10 the set {r1 , r2 ,...,rn } of column eigenvectors form a basis of Fn (here the elements of Fn are treated as column vectors). Thus, P is −1 t invertible. Suppose that P is the matrix whose ith row is si. Then si rj = 1if t i = j and 0 otherwise. Further, since the columns rj are eigenvectors of A, AP = t t t −1 t [λ1r1 , λ2r2 ,...,λnrn ]. Hence the ith row jth column entry of P AP is λjsi rj .This −1 is λj if i = j and 0 otherwise. This confirms that P AP = Diag(λ1, λ2,...,λn).

Thus, any upper triangular matrix with all diagonal entries distinct is similar to a diagonal matrix because diagonal entries of triangular matrices are precisely the eigenvalues of the matrix. The above result need not hold in case all eigenvalues are not distinct. For example, a nonidentity uni-upper triangular matrix is not similar to any diagonal matrix. This is because all eigenvalues of unitriangular matrices are 1, the only diagonal matrix all of whose eigenvalues are 1 is the identity matrix, and the identity matrix is similar only to the identity matrix. We illustrate the result by means of an example. Example 5.4.13 Consider the matrix ⎡ ⎤ 110 A = ⎣ 023⎦ . 003

The eigenvalues of A are 1, 2, and 3 which are all distinct. Hence A is similar to a diagonal matrix. We find a nonsingular matrix P such that P−1AP = diag(1, 2, 3). We first find eigenvectors corresponding to these eigenvalues. Suppose that the vector 5.4 Invariant Subspaces, Eigenvalues 155 ⎡ ⎤ x1 ⎣ ⎦ X = x2 x3 is an eigenvector corresponding to the eigenvalue 1. Then A · X = X. Equating rows, we find that x1 + x2 = x1, 2x2 + 3x3 = x2 and 3x3 = x3. This implies that x2 = 0 = x3, and x1 is arbitrary. Thus ⎡ ⎤ 1 ⎣ ⎦ e1 = 0 0 is a typical eigenvector of A corresponding to the eigenvalue 1. Using same process we see that ⎡ ⎤ 1 ⎣ ⎦ e1 + e2 = 1 0 is a typical eigenvector corresponding to the eigenvalue 2, and ⎡ ⎤ 3 ⎣ ⎦ 3e1 + 6e2 + 2e3 = 6 2 is an eigenvector corresponding to eigenvalue 3. Thus, the transformation matrix ⎡ ⎤ 113 P = ⎣ 016⎦ 002 is a nonsingular matrix such that P−1AP = Diag(1, 2, 3) (confirm it). Let T be a linear transformation on V , and λ an eigenvector of T. Consider Vλ ={v ∈ V | T(v) = λv}. Then Vλ is a subspace of V (consisting of eigenvectors corresponding to λ together with 0). This subspace is called the λ-eigenspace of T. Corollary 5.4.14 A linear transformation T on a finite-dimensional vector space V is diagonalisable if and only if V is direct sum of the eigen subspaces of T. Proof Suppose that = ⊕ ⊕ ···⊕ , V Vλ1 Vλ2 Vλr

whereVλi is λi-eigenspace. Clearly, λi are distinct. Let Si be a basis of Vi. Then = r  S i=1 Si is a basis of V consisting of eigenvectors of T. Corollary 5.4.15 A linear transformation T on V is diagonalisable if and only if there exists a set {λ1, λ2,...,λr} of distinct eigenvalues such that dimV = r ( ) i=1dim Vλi . 156 5 Determinants and Forms

Proof Since eigenvectors corresponding to distinct eigenvalues are linearly indepen- dent, under the assumption V becomes direct sum of its eigenspaces. 

Let F[x] denote the set of all polynomials with coefficients in the field F. F[x] is a commutative integral domain (with respect to the usual addition and multiplication of polynomials in F[x]) in the sense that it satisfies all the postulate of a field except the existence of the multiplicative inverse of a nonzero element in F[x] (Indeed, there is no polynomial f (x) such that xf (x) = 1). Let T be a fixed linear transformation on a vector space V over a field F, and

2 n f (x) = a0 + a1x + a2x + ··· + anx a polynomial in F[x]. Then f (T) is a linear transformation on V defined by

2 n f (T) = a0I + a1T + a2T + ··· + anT .

We can extend the multiplication on the vector space V by the members of F to the multiplication by the members of F[x] by defining f (x) · v = f (T)(v). Then V becomes a F[x]-module in the sense that it satisfies all the postulates of a vector space with the field F replaced by the polynomial ring F[x]. Similarly, if A is a n × n matrix with entries in a field F, and f (x) a polynomial in F[x], then we have the matrix f (A) defined by

2 n f (A) = a0In + a1A + a2A + ··· + anA .

It may be observed that if A is matrix of T with respect to certain basis, then f (A) is the matrix of f (T) with respect to same basis. It may also be observed that if λ is an eigenvalue of T with eigenvector v, then f (T)(v) = f (λ)v, and so f (λ) is an eigenvalue of f (T).IfA is a n × n matrix, then Fn becomes a F[x] module t t t with respect to the external product defined by f (x) · X = f (A) · X , where X is a n t column vector in F . It is clear that the matrix product (xIn − A) · X = 0 for all t X ∈ Fn. Matrix theory with entries in the polynomial ring F[x] can be developed on the pattern it was developed for matrices with entries in F. For example, we can talk adj of adjoint of a matrix, determinant of a matrix, and the relation A · A = det(A)In holds for the matrices with entries in a field F[x] (the proof goes exactly on the same lines) also. Following is one of the most fundamental results in linear algebra. Theorem 5.4.16 (Cayley Hamilton Theorem) Every square matrix satisfies its own characteristic polynomial. More precisely, if A is a square matrix, then φA(A) = 0.

Proof φA(x) = Det(xI − A). The matrix xI − A is a matrix with entries in F[x]. From the discussion above, it follows that

adj (xIn − A) · (xIn − A) = Det(xIn − A)In = φA(x)In. 5.4 Invariant Subspaces, Eigenvalues 157

Hence,

adj φA(A) · X = φA(x) · X = φA(x)In · X = (xIn − A) (xIn − A) · X = 0

(see the discussion in the paragraph just above the theorem). This shows that the matrix φA(A) = 0. 

Let A be a 3 × 3 unitriangular matrix. Then its characteristic polynomial φA(x) = 3 3 (x − 1) . From the Cayley Hamilton theorem (A − I3) = 0. In other words, 3 2 2 A − 3A + 3A − I3 = 0. This shows that A(A − 3A + 3I3) = I3 and so 2 A − 3A + 3I3 is the inverse of A. Similarly, result holds for any n × n unitriangular matrices. This also says that if A is strictly triangular n × n matrix, then An = 0. If A n is nonsingular, then the constant term (−1) an in the characteristic polynomial φA(x), n being the determinant of A, is nonzero. Since φA(A) is the zero matrix, (−1) anIn = n n−1 n−2 r n−r n−1 −( A − a1A + a2A + ··· + (−1) arA + ··· + (−1) an−1A), where ar is the sum of the principal r-minors of A. It follows that the inverse of a matrix A, if exists, is a polynomial in A. This also gives an algorithm to find the inverse of A. A linear transformation T on V is said to be triangulable if there is a basis of V with respect to which the matrix is a triangular matrix. A matrix A with entries in F is said to be triangulable if LA is triangulable. This is equivalent to say that A is similar to a triangular matrix. In general, a matrix in a field need not be similar to a triangular matrix. Consider the matrix   01 −10 over the field R of real numbers. This is not similar to any triangular matrix over R. For if it is similar to a triangular matrix over R, then it will have its eigenvalues real (the diagonal terms of the triangular matrix to which it is similar). But this has no real eigenvalues. Theorem 5.4.17 A linear transformation on V is triangulable if and only if there exists an ascending chain

{0}=V0 ⊂ V1 ⊂ V2 ⊂··· ⊂ Vn = V of invariant subspaces, called a flag of V , such that the dimension of Vi is i.

Proof Suppose that such a chain of invariant subspaces exist. By induction, we show the existence of a basis {x1, x2,...,xn} of V such that {xn−r+1, xn−r+2,...,xn} is a basis of Vr for each r.Let{xn} be a basis of V1. Since {xn} is linearly independent subset of V2, it can be extended to a basis {xn−1, xn} of V2. Proceeding inductively, we find a basis {x1, x2,...,xn} of V with the required property. Since each Vn−r+1 which has basis {xr, xr+1,...,xn} is invariant under T, it follows that 158 5 Determinants and Forms

T(xr) = arrxr + arr+1xr+1 + ··· + arnxn for each r. This means that the matrix of T with respect to this basis is triangular. Conversely, suppose that T is triangulable. Then there is a basis {x1, x2,...,xn} with respect to which it is a upper triangular matrix. But, then

T(xr) = arrxr + arr+1xr+1 + ··· + arnxn for each r.LetVr be the subspace of V generated by {xn−r+1, xn−r+2,...,xn}. Then it follows that Vr is invariant under T, dimension of Vr is r, and we have the chain

{0}⊂V1 ⊂ V2 ⊂···⊂ Vn.

Corollary 5.4.18 A matrix A is triangulable if and only if there is a chain

n {0}⊂V1 ⊂ V2 ⊂···⊂F

n t t of subspaces of F such that dimension of Vr is r and A · X = LA(X ) ∈ Vr for all t r, and for all X ∈ Vr. 

As observed earlier, a matrix need not be triangulable. The reason was that there need not be any eigenvalue of the matrix. A field F is called algebraically closed, if every polynomial in F[x] has all its roots in F. It is a fact that every field can be enlarged to an algebraically closed field (see Chap. 9). Theorem 5.4.19 Every linear transformation on a vector space V over an alge- braically closed field is triangulable.

Proof Let T be a linear transformation on a vector space V over an algebraically closed field F. We have to show the existence of a chain

{0}⊂V1 ⊂ V2 ⊂···⊂ Vn = V of invariant subspaces. The proof is by the induction on the dimension of V .If dimV = 1, then there is nothing to prove. Assume that the result is true for all vector spaces of dimension less than n. Suppose that the dimension of V is n ≥ 2. Since F is algebraically closed, the characteristic polynomial φT (x) has a root λ ∈ F. Then λ is an eigenvalue. Thus, there exists a nonzero vector v in V such that T(v) = λv.LetV1 be the subspace generated by v. Then V1 is of dimension 1. Since T(v) = λv ∈ V1, V1 is an invariant subspace. Consider the vector space W = V/V1. Then the dimension of W is n − 1, and since T(V1) ⊆ V1, T induces a linear transformation T on W defined by T(w + V1) = T(w) + V1. By the induction hypothesis there is a chain

{V1}=W0 ⊂ W1 = V2/V1 ⊂ W2 = V3/V1 ⊂···⊂ Wn−1 = Vn/V1 = W 5.4 Invariant Subspaces, Eigenvalues 159 of invariant subspaces of T, such that dimension of Wr is r, and so dimension of Vr+1 is r + 1. We show that each Vr is invariant under T for each r. Already, V1 is invariant under T.Letx ∈ Vr, r ≥ 2. Then T(x + V1) = T(x) + V1 belongs to Wr−1 = Vr/V1. This implies that T(x) ∈ Vr.  Corollary 5.4.20 Every square matrix A with entries in an algebraically closed field is similar to a triangular matrix.

Proof To say that A is similar to a triangular matrix is to say that LA is triangulable. The result follows from the above theorem. 

5.5 Spectral Theorem, and Orthogonal Reduction

Theorem 5.5.1 Let V be a complex inner product space, and T a Hermitian linear transformation on V . Then all the eigenvalues of T are real. Proof Let λ be an eigenvalue of T. Then there exists a nonzero vector x ∈ V such that T(x) = λx. Since T is Hermitian, < T(u), v > = < u, T(v) > for all u,v ∈ V , and hence

λ < x, x > = < λx, x > = < T(x), x > = < x, T(x)>= < x, λx > = λ < x, x >.

Since x = 0,= 0, and so λ = λ.  Corollary 5.5.2 Let V be a finite-dimensional complex inner product space, and T a Hermitian linear transformation on V . Then all the roots of the characteristic polynomial of T are real. Proof Since the field C of complex numbers is algebraically closed, all the roots of the characteristic polynomial of A exist in C, and they are the eigenvalues of A.The result follows from the above theorem.  Corollary 5.5.3 All eigenvalues of Hermitian matrices are real.

Proof The matrix A is Hermitian if and only if LA is Hermitian linear transformation on the standard complex inner product space Cn.  Corollary 5.5.4 All roots of the characteristic polynomial of a real symmetric matrix are real. In particular, every real symmetric matrix has an eigenvalue. Proof A real symmetric matrix can also be taken to be a complex Hermitian matrix. The result follows from the above corollary.  Corollary 5.5.5 All roots of the characteristic polynomial of a symmetric linear transformation T on a real inner product space V are real. In particular, if T is a real symmetric linear transformation, then there is a real number λ, and x = 0 such that T(x) = λx.  160 5 Determinants and Forms

Corollary 5.5.6 All nonzero eigenvalues of a skew-Hermitian matrix are purely imaginary.

Proof We know that A is skew-Hermitian if and only if iA is Hermitian. Now, λ is an eigenvalue of A if and only if iλ is an eigenvalue of iA. This shows that iλ is real, and so λ is purely imaginary. 

Corollary 5.5.7 Let A be a real skew-symmetric matrix. Then there is no nonzero t t eigenvalue of A. In other words, if A · X = λX , where λ is real, then λ = 0 or t t X = 0 .

Proof A real skew-symmetric matrix is also a skew-Hermitian matrix. Hence, all the nonzero roots of its characteristic polynomial are purely imaginary. Thus, there t t t t is no X = 0 , and real number λ = 0 such that A · X = λX . 

Proposition 5.5.8 Let A be an unitary linear transformation (matrix) on a complex inner product space, and λ an eigenvalue of A. Then | λ |= 1.

Proof Let x = 0 be an eigenvector corresponding to eigenvalue λ. Then

| λ |2< x, x > = λλ < x, x > = < λx, λx > = < T(x), T(x)>= < T T(x), x > = < x, x >.

Since x = 0,= 0. Hence | λ |2 = 1. 

Proposition 5.5.9 Let A be an orthogonal linear transformation (matrix) on real inner product space, and λ a real eigenvalue of A. Then λ =±1.

Proof Let x = 0 be an eigenvector corresponding to a real eigenvalue λ. Then as in the previous proposition λ2 < x, x > = < x, x >, and so λ2 = 1. Hence λ =±1. 

Proposition 5.5.10 Let V be an inner product space. Let T be a linear transforma- tion on V , and W is a subspace of V which is invariant under T. Then the orthogonal compliment W ⊥ of W is invariant under T .

Proof Since W is invariant under T, for each y ∈ W, T(y) ∈ W.Letx ∈ W ⊥. Then for each y ∈ W, < y, T (x)> = < T(y), x > = 0. This shows that T (x) ∈ W ⊥. 

Corollary 5.5.11 Let T be a Hermitian linear transformation on an inner product space V , and W an invariant subspace of T. Then W ⊥ is also invariant under T. 

Proposition 5.5.12 Let T be a Hermitian linear transformation on a complex (real) inner product space V . Let x1 and x2 be eigenvectors corresponding to distinct < , > = ⊥ eigenvalues λ1 and λ2 of T. Then x1 x2 0. In other words, Vλ1 Vλ2 . 5.5 Spectral Theorem, and Orthogonal Reduction 161

Proof From previous results, λ1 and λ2 are real. Further,

λ1 < x1, x2 > = < λ1x1, x2 > = < T(x1), x2 > = < x1, T(x2)>= < x1, λ2x2 > = λ2 < x1, x2 >.

Since λ1 = λ2, we see that < x1, x2 > = 0. 

If we apply the above result for LA on the standard inner product space, then we have the following corollary: Corollary 5.5.13 Let A be a Hermitian (real symmetric) matrix with eigenvectors  = ( t = )  X1 and X2 corresponding to distinct eigenvalues. Then X1 X2 0 X1X2 0 . Theorem 5.5.14 (Spectral Theorem) Let T be a Hermitian linear transformation on a finite-dimensional complex (real) inner product space V . Then there is an orthonormal basis consisting of eigenvectors of T.

Proof The proof is by the induction on dimV .IfdimV = 1, then take any nonzero vector of V and divide it by its length to get a unit vector v. Clearly, {v} is an orthonormal basis of V , and since dimV = 1,vis an eigenvector of T. Assume that the result holds for all Hermitian linear transformations on vector spaces of dimensions less than n.LetT be a Hermitian linear transformation on a complex (real) inner product space V of dimension n.Now,T, being a Hermitian linear transformation on a complex (real) inner product space V , has an eigenvector x1. Dividing x1 by its length, we may assume that x1 is a unit vector. Let W be the ⊥ subspace of V generated by x1. Then V = W ⊕ W . Clearly, W is invariant under T. Since T is Hermitian (real symmetric), W ⊥ is also invariant under T. It is clear that the restriction T/W ⊥ of T to W ⊥ is also Hermitian (real symmetric), and the dimension of W ⊥ is n − 1. By the induction hypothesis, W ⊥ has an orthonormal ⊥ basis {x2, x3,...,xn} consisting of eigenvectors of T/W (and so of T). Clearly, then {x1, x2,...,xn} is an orthonormal basis of V consisting of eigenvectors of T. 

Corollary 5.5.15 Let T be a Hermitian linear transformation on a complex(real) inner product space. Let {λ1, λ2,...,λr} be the set of all distinct eigenvalues of T. Then = ⊕ ⊕···⊕ . V Vλ1 Vλ2 Vλr

Proof Since eigenspaces corresponding to distinct eigenvalues are orthogonal, the result follows from the above theorem. 

Corollary 5.5.16 The matrix representation of a Hermitian linear transformation with respect to a suitable orthonormal basis is a diagonal matrix. 

Corollary 5.5.17 Let A be a Hermitian (real symmetric) matrix. Then A is similar to a diagonal matrix. In fact there exists a unitary matrix U(an orthogonal matrix O) such that UAU (OtAO) is a diagonal matrix. 162 5 Determinants and Forms

Proof A is Hermitian (real symmetric) matrix if and only if LA is Hermitian (real symmetric) linear transformation on the standard complex (real) inner product space. t t t Thus, there exists an orthonormal basis {X1 , X2 ,...,Xn } of the standard complex (real) inner product space consisting of eigenvectors of L (and so of A also). Now, A  the standard inner product is given by < Xi, Xj > = Xi · Xj in complex case, and t by < Xi, Xj > = Xi · Xj in real case, where · in R.H.S. is the matrix multiplication.  t  t Thus, Xi · Xi = 1 (Xi · Xi = 1), and for i = j, Xi · Xj = 0 (Xi · Xj = 0).LetU (respectively O) denote the matrix whose ith row is Xi. Then the above observation says that U(O) is unitary (orthogonal) such that ⎡ ⎤ X1 ⎢ ⎥ ⎢ X2 ⎥ ⎢ ⎥  ⎢ · ⎥    UAU = ⎢ ⎥ · A ·[X1 , X2 ,...,Xn ]= ⎢ · ⎥ ⎣ · ⎦ ⎡ ⎤ Xn X1 ⎢ ⎥ ⎢ X2 ⎥ ⎢ ⎥ ⎢ · ⎥    ⎢ ⎥ ·[λ1X1 , λ2X2 ,...,λnXn ]=diag(λ1, λ2,...,λn), ⎢ · ⎥ ⎣ · ⎦

Xn where λ1, λ2,...,λn are eigenvalues of A. Similarly, if A is a real symmetric matrix, t then OAO = diag(λ1, λ2,...,λn).  Remark 5.5.18 We have an algorithm to find an orthonormal basis of V consisting of eigenvectors of T provided that we have an algorithm to solve the characteristic polynomial of T (indeed, we have algorithms to solve a nth degree equation for n ≤ 4 (see the Chap. 9 on Fields and Galois theory)). After getting the distinct eigenvalues, we can find the corresponding eigenspaces, and then use Gram–Schmidt process to find orthonormal basis of each eigenspaces. In turn, it gives an orthonormal basis consisting of eigenvectors. This also gives us a method to diagonalize a Hermitian, and also a real symmetric matrix. We illustrate it by means of an example. Example 5.5.19 Consider the matrix ⎡ ⎤ 3 − 1 2 0 2 A = ⎣ 010⎦ . − 1 3 2 0 2

This is a real symmetric matrix. We find orthogonal matrix O such that OtAO is a diagonal matrix. The characteristic polynomial φA(x) is given by

( ) = ( − ) = ( − 3 )( − )( − 3 ) − (x−1) . φA x Det xI A x 2 x 1 x 2 4 5.5 Spectral Theorem, and Orthogonal Reduction 163

R3 The roots of the characteristic polynomial are 1,1, and 2. We find the eigenspace 1. Suppose that ⎡ ⎤ u ⎣ v ⎦ w

R3 belongs to 1. Then ⎡ ⎤ ⎡ ⎤ u u A · ⎣ v ⎦ = ⎣ v ⎦ . w w

Equating rows, we get that u = w. Thus ⎡ ⎤ u R3 ={⎣ v ⎦ = w}. 1 such that u w

This subspace is clearly of dimension 2. Putting u = 1 = w = v, we get a nonzero 3 R3 member of R1. Another nonzero member of 1 which is not a multiple of the previous element is obtained by taking u = 0 = w and v = 1. Hence ⎡ ⎤ ⎡ ⎤ 1 0 {⎣ 1 ⎦ , ⎣ 1 ⎦} 1 0

R3 is a basis of 1. Using Gram–Schmidt process we find the orthonormal basis ⎡ ⎤ ⎡ ⎤ √1 − √1  6 ⎢ 3 ⎥ ⎢ ⎥ {⎣ √1 ⎦ , ⎢ 2 ⎥} 3 ⎣ 3 ⎦ √1 − √1 3 6

R3 R3 of 1. Similarly, 2 is subspace of dimension 1 which has a singleton ⎡ ⎤ − √1 ⎢ 2 ⎥ {⎣ 0 ⎦} √1 2 as an orthonormal basis. This gives us an orthonormal basis ⎡ ⎤ ⎡ ⎤ 1 ⎡ ⎤ √1 − √ √1  6 − ⎢ 3 ⎥ ⎢ ⎥ ⎢ 2 ⎥ {⎣ √1 ⎦ , ⎢ 2 ⎥ , ⎣ 0 ⎦} 3 ⎣ 3 ⎦ 1 √1 − √1 √ 3 6 2 164 5 Determinants and Forms

In turn, we get an orthogonal matrix ⎡ ⎤ √1 − √1 − √1 ⎢ 3  6 2 ⎥ ⎢ ⎥ O = √1 2 0 ⎣ 3 3 ⎦ √1 − √1 √1 3 6 2 such that OtAO = Diag(1, 1, 2).

Example 5.5.20 Let A be a n × n real symmetric matrix with eigenvalues λ1, λ2,..., λn counted with their multiplicities. Let f (x) be a polynomial with real coefficients. Let μ1, μ2,...,μn be real numbers with f (μi) = λi for all i. Then we can find a real symmetric matrix B such that f (B) = A as follows: From Corollary5.5.17, t there exists an orthogonal matrix O such that O AO = Diag(λ1, λ2,...,λn).Let t B = ODiag(μ1, μ2,...,μn)O . Then f (B) = ODiag(f (μ1), f (μ2),...,f (μn)) t t O = ODiag(λ1, λ2,...,λn) O = A.√ Thus, then, B is a solution of f (X) = A.In particular, if we take B = ODiag(1, 1, 2)Ot, where O is as in the above example, then B2 = A, where A is as in the above example. Can we count the number of solutions of X2 = A, where X is an unknown in the set of real symmetric matrices?

Let B be any nonsingular complex (real) matrix. Then the matrix A = BB (BBt) is a Hermitian (real symmetric) matrix. Further, let λ is an eigenvalue of A. Then since < XBB, X > = < XB, XB > is non-negative for all row vector X, it follows that all the eigenvalues of A are positive. Conversely, if all the eigenvalues of a Hermitian (real symmetric) matrix A are positive (non-negative), then, as described above, we can find a positive definite (positive) Hermitian (real symmetric) matrix B such that A = B2 = BB(BBt): Polar Decomposition Every nonzero complex number z = a + ib is nonsingular 1 × 1 matrix which is uniquely expressible in polar form as z =|z | u = reiθ, where r is positive definite 1 × 1 Hermitian matrix, and u = eiθ is 1 × 1 unitary matrix. More generally, every nonsingular complex square matrix (indeed, every complex matrix) A can be uniquely expressed as A = B + iC, where B and C are Hermitian matrices. Following is the multiplicative analog of this identity called the polar decomposition. Proposition 5.5.21 Every nonsingular square complex matrix A can be uniquely expressed as A = PU, where P is a positive definite Hermitian matrix, and U is a unitary matrix.

Proof Consider the matrix AA. Since A is nonsingular, AA is a Hermitian matrix all of whose eigenvalues are positive. As such, AA is positive definite Hermitian matrix. Let P be a positive definite Hermitian matrix which is square root of AA. Take U = P−1A. Then UU = P−1AA(P−1). Again, since P is Hermitian P−1 is also Hermitian (indeed, (P−1) = (P)−1 = P−1). Thus, UU = P−1AA(P−1) = P−1AAP−1 = I. This shows that U is unitary, and A = PU.  5.5 Spectral Theorem, and Orthogonal Reduction 165

Corollary 5.5.22 Every nonsingular matrix A with real entries can be uniquely expressed as A = PO, where P is a positive definite real symmetric matrix, and O is an orthogonal matrix. 

Example 5.5.23 Consider the matrix ⎡ ⎤ √1 √1 √2 ⎢ 2 6 3 ⎥ = − √1 √1 √2 . A ⎣ 2 √6 3 ⎦ 0 − √2 √2 3 3

This matrix is nonsingular. We find its polar decomposition. The matrix B = AAt is given by ⎡ ⎤ 211 B = ⎣ 121⎦ . 112

The eigenvalues of B are 1, 1, 4. Using the algorithm as described in Remark5.5.18 (see Example 5.5.19), we find an orthogonal matrix O given by ⎡ ⎤ √1 √1 √1 ⎢ 2 6 3 ⎥ = − √1 √1 √1 , O ⎣ 2 √6 3 ⎦ 0 − √2 √1 3 3 such that B = Ot Diag[1, 1, 4] O. The positive square root P of B is given by P = Ot Diag[1, 1, 2] O. As described in Proposition5.5.21, U = P−1A is unitary, and A = PU is the polar decomposition of A.

Singular Value Decomposition Let A be a n × m matrix with entries in a field F, where F is C or F is R. The matrix AA is a positive matrix in the sense that all its eigenvalues are non-negative. This is because if λ is an eigenvalue of AA, then there is a nonzero row vector x such that AAx = λx. Hence (xA)(xA) = λxx. Since x is nonzero vector, it follows that λ is non-negative. Definition 5.5.24 The non-negative square root of eigenvalues of AA is called a singular value of A. If A is a Hermitian, then λ is an eigenvalue of A if and only if λ2 is an eigenvalue of A2 = AA. As such, the singular values of Hermitian matrices are precisely the absolute values of their eigenvalues. Example 5.5.25 The singular values of the matrix   11 −11 166 5 Determinants and Forms √ √ are 2, 2, for the eigenvalues of   20 AAt = 02 are 2, 2. Proposition 5.5.26 Let A be a n × m matrix and x is a unit column eigenvector of AA with λ as corresponding eigenvalue. Then (|| xA ||)2 = λ. In turn, || xA || is the corresponding singular value. Further, if x is a row eigenvector of AA, and yis a row vector orthogonal to x, then xA and yA are orthogonal to each other. Proof Under the hypothesis, (|| xA ||)2 = xA(xA) = xAAx = λx x = λ. Next, suppose that x is a row eigenvector of AA with associated eigenvalue λ. Then xAA = λx. In turn,

yA(xA) = yAAx = y(xAA) = λyx.

The result is evident.  Corollary 5.5.27 Let A be a n × m matrix. Then rank of A is ρ if and only if there are exactly ρ strictly positive eigenvalues of AA. Equivalently, there are exactly ρ nonzero singular values of A.

 Proof Suppose that λ1 ≥ λ2 ≥ ··· λρ are nonzero eigenvalues of AA and the rest     of the n − ρ eigenvalues are 0. Let {r1 , r2 ,...,rρ ,...,rn } be an orthonormal basis of Fn (considering Fn as space of columns) consisting of column eigenvectors     of AA with r1 , r2 ,...,rρ corresponding to eigenvalues λ1 ≥ λ2 ≥ ··· ≥λρ, respectively. It follows from the above proposition that || riA || is nonzero if and     only if i ≤ ρ. Further, riA(rjA) = riAA rj = λjri rj = λi for i = j and 0, otherwise. This shows that {r1A, r2A,...,rρA}, being an orthogonal set, is lin- early independent. Since the rest of riA, i > ρ are zero, it follows that ρ is the rank of A.  Theorem 5.5.28 (Singular value decomposition) Let A be a n × m matrix. Then there exists a unitary/orthogonal n × nmatrixU,andam× m unitary/orthogonal matrix V such that UAV = , where  is a n × m matrix whose first ρ diagonal entries are nonzero singular values σ1, σ2,...,σρ of A in non-ascending order, and the rest of the entries are 0. In turn, A = UV , where U and V  are again unitary/orthogonal matrices, and  as described.

    n n Proof Let {r1 , r2 ,...,rρ ,...,rn } be an orthonormal basis of F (considering F     as space of columns) consisting of column eigenvectors of AA with r1 , r2 ,...,rρ corresponding to nonzero eigenvalues λ1 ≥ λ2 ≥ ··· ≥λρ. From Proposition5.5.26       and Corollary5.5.27, it follows that {A r1 , A r2 ,..., A rρ } is an orthogonal m  1   set of column vectors in F .Letsj denote the column vector A rj . Then σj    m {s1 , s2 ,...,sρ } is an orthonormal set of column vectors in F . Embed it in to 5.5 Spectral Theorem, and Orthogonal Reduction 167

   m an orthonormal basis {s1 , s2 ,...,sm } of F .TakeU to be the n × n matrix whose  ith row is ri, and V the m × m matrix whose jth column is sj . Then U and V are uni-  tary/ orthogonal matrices. The ith row jth column entry cij of UAV is given by riA sj . = > ( ) = = ( ) = = 2 Further, riA 0 for all i ρ, riA rjA 0fori j and riA riA λi σi . Now, it is evident that cii = σi for all i ≤ ρ and 0, otherwise. 

The proof of the theorem is algorithmic. We illustrate it by means of the following example. Example 5.5.29 Consider the matrix A given by   011 A = . 110

Then   21 AAt = . 12 √ The eigenvalues of AAt are 3, 1. Thus, the singular values of A are 3, 1. A unit eigen- √1 √1 √1 √1 vector r1 corresponding to 3 is [ , ], and that r2 corresponding to 1 is [ , − ]. √ 2 2 2 2 r A =[√1 , 2, √1 ], and r A =[−√1 , 0, √1 ]. Thus, s =[√1 , 2 , √1 ], 1 2 2 2 2 2 1 6 3 6 and s =[−√1 , 0, √1 ]. We extend {s , s } to an orthonormal basis by adjoining 2 2 2 1 2 s =[√1 , − √1 , √1 ]. The matrix U is given by 3 3 3 3   √1 √1 U = 2 2 , √1 − √1 2 2 the matrix V is the transpose of ⎡  ⎤ 1 2 1 ⎢ √ √ ⎥ ⎢ 6 3 6 ⎥ − √1 0 √1 , ⎣ 2 2 ⎦ √1 − √1 √1 3 3 3 and the matrix  is given by  √   = 300 010

Further, A = UtV t. Geometry of Orthogonal Transformation Recall that a subspace H of dimension n − 1 of the Euclidean space Rn is called a hyperplane, and a translate x + H is called an affine hyperplane. 168 5 Determinants and Forms

Proposition 5.5.30 Let H be a hyperplane in the Euclidean space Rn. Then there is a unit vector x ∈ H⊥. Further, if y is any unit vector in H⊥, then y =±x.

Proof Let {s1, s2,...,sn−1} be an orthonormal basis of H. Using Gram–Schmidt n process, enlarge it to an orthonormal basis {s1, s2,...,sn−1, x} of R . Then x is a unit vector in H⊥.Let

y = a1s1 + a2s2 + ··· + an−1sn−1 + ax be a unit vector which is a member of H⊥. Then

1 = < y, y > = a < y, x > = a2· < x, x > = a2.

Hence a =±1. Further, 0 =< y, si >= ai for all i. This shows that y =±x. 

If x is a unit vector, the hyperplane H consisting of vectors orthogonal to x is denoted by Hx. It follows that Hx = Hy if and only if x =±y. n Proposition 5.5.31 Let x be a unit vector in R , and Hx the corresponding hyper- n plane. Then the map σx from R to itself defined by

σx(y) = y − 2 < y, x > x is the unique linear transformation which fixes the members of Hx, and takes x to its negative. Further, σx is an orthogonal transformation with determinant −1 (refer to the Exercises4.4.35–4.4.40).

Proof Clearly, σx is a linear transformation. If < y, x > = 0, then by the definition, σx(y) = y.Alsoσx(x) =−x. Further, < σx(y), σx(z)> = < y − 2 < y, x > x, z − 2 < z, x > x > = < y, z >. This shows that σx is an orthogonal transformation. Also, the matrix represen- n tation of σx with respect to the orthonormal basis {s1, s2,...,sn−1, x} of R , where {s1, s2,...,sn−1} is an orthonormal basis of Hx is the diagonal matrix diag(1, 1,...,1, −1). Hence, its determinant is −1. Finally, if τ is any linear trans- formation with the required property, then again the matrix representation of τ with respect to the basis {s1, s2,...,sn−1, x} is diag(1, 1,...,1, −1). Hence, τ = σx. 

Definition 5.5.32 The transformation σx is called a hyperplane reflection (indeed, it is reflection about the hyperplane Hx).

Note that σx = σy if and only if x =±y Recall that an inner product space V is said to be the orthogonal sum of its subspaces V1, V2,...,Vr if V = V1 + V2 + ··· + Vr, 5.5 Spectral Theorem, and Orthogonal Reduction 169 and for i = j, the elements of Vi are orthogonal to Vj. Symbolically, we write it as

V = V1 ⊥ V2 ⊥ ··· ⊥Vr.

In particular, if W is a subspace of V , then V = W⊥W ⊥. Theorem 5.5.33 Let T be an Euclidean orthogonal transformation on Rn. Then there exist subspaces V, W, and two-dimensional subspaces P1, P2,...,Pl together with angles 0 < θ1 ≤ θ2 ≤,...,≤ θl < π such that the following hold: n 1. R = V ⊥W⊥P1⊥P2⊥···⊥Pl. 2. T(x) = x for all x ∈ V. 3. T(x) =−x for all x ∈ W. 4. The restriction of T to Pk is a rotation through an the angle θk in the plane Pk. In other words, if {uk, vk} is an orthonormal basis of Pk, then (i) T(uk) = cosθkuk + sinθkvk, and (ii) T(vk) =−sinθkuk + cosθkvk for all k ≤ l.

Proof The proof is by the induction on n.Forn = 1, it follows trivially. Assume that the result is true for all m ≤ n.LetT be an orthogonal transformation on Rn+1. Then T induces a unique linear transformation T˜ from the standard complex vector space n+1 ˜ C to itself by putting T(x) = xM(T), where x =[x1 x2 ··· xn+1] is a row vector in Cn+1, and M(T) is the matrix representation of T with respect to the standard basis of Rn+1. Note that the matrix representation of T˜ with respect to the standard basis of Cn+1 is the same as M(T).Letλ be an eigenvalue of M(T) which may be a complex number. Then there is a nonzero complex unit vector x =[x1 x2 ··· xn+1] such that xM(T) = λx. Since M(T) is a real matrix,

xM(T) = xM(T) = λx, where x =[x1 x2 ··· xn+1] denote the complex conjugate of the complex vector x, and λ the complex conjugate of λ. This shows that λ is also an eigenvalue of M(T), and x is a corresponding eigenvector. Since M(T) is orthogonal,

t | λ |2 = λxλx = xM(T)(xM(T))t = x(x)t =||x ||2 .

This shows that | λ |= 1, and so λ = eiθ for some θ. Now, suppose that M(T) has a real eigenvalue λ. Then λ =±1. Suppose again that λ = 1isaneigenvalueofM(T), and x is a unit eigenvector associated to 1. Clearly, then x is a real vector. Consider the corresponding hyperplane Hx.The dimension of Hx is n. Further, if y ∈ Hx, then

< x, T(y)>= < T(x), T(y)>= < x, y >

(for T is orthogonal). This shows that T restricted to Hx is an orthogonal transfor- n+1 mation on Hx.AlsoR = < x > ⊥Hx. By the induction hypothesis, there exist 170 5 Determinants and Forms

 subspaces V , W, and two-dimensional subspaces P1, P2,...,Pl of Hx together with angles 0 < θ1 ≤ θ2 ≤,...,≤ θl < π such that the following hold:  1. Hx = V ⊥W⊥P1⊥P2⊥···⊥Pl. 2. T(x) = x for all x ∈ V . 3. T(x) =−x for all x ∈ W. 4. The restriction of T to Pk is a rotation through an the angle θk in the plane Pk for each k ≤ l. Taking V = < x > ⊥V , the result holds for Rn+1. If 1 is not an eigenvalue but −1 is an eigenvalue, then a similar argument proves the result with V ={0}. Assume that M(T) has no real eigenvalues. Note that in this case n + 1 will be iθ1 even 2m (say). Let θ1 be the smallest positive real number such that λ = e is an eigenvalue, and x a corresponding complex eigenvector. As observed ear- − lier, e iθ1 is also an eigenvalue with x a corresponding eigenvector. Then x + x ( − ) and i(x − x) are nonzero real vectors. Take u = x+x , and v = i x x . Then 1 ||x+x|| 1 ||x−x|| n+1 {u1, v1} is an orthonormal subset of R which generates a subspace P1. Then − T(u ) = u M(T) = 1 (eiθ1 x + e iθ1 x) = cosθ u + sinθ v , and simi- 1 1 ||x+x|| 1 1 1 1 (v ) =− + v = ⊥ larly, T 1 sinθ1u1 cosθ1 1.LetU P1 . Then U is of dimension n+1 2(m − 1), R = P1⊥U, and T restricted to U is an orthogonal transforma- tion on U. By the induction hypothesis, there exist two-dimensional subspaces P2, P3,...,Pm together with angles 0 < θ2 ≤ θ3 ≤,...,≤ θm < π with θ1 ≤ θ2 such that U = P2⊥P3⊥···⊥Pm, and T restricted to each Pk is a rotation through n+1 the angle θk. Clearly, R = P1⊥P2⊥P3⊥···⊥Pm.  Corollary 5.5.34 Suppose that the dimension of W in the above theorem is m, and {w1, w2,...,wm} an orthonormal basis of W. Then

= ··· ··· , T σw1 oσw2 o oσwm oρP1 oρP1 o oρPl

where σwi is the reflection about the hyperplane Hwi , and ρPj denote the rotation through an angle θj in the plane Pj, and it is given by ( ) = , , = (i) ρPj x x for all xinV W and Pk k j, ( ) = + v (ii) ρPj uj cosθjuj sinθj j, (v ) =− + v { ,v} (iii) ρPj j sinθjuj cosθj j, where uj j is an orthonormal basis of Pj, j ≤ l.  Corollary 5.5.35 The transformation T in Theorem5.5.33 is a composition of m + 2l hyperplane reflections, where m = dimW.

Proof From the above corollary, it is sufficient to show that the rotation ρPj is com- position of two hyperplane reflections. Since,       cosθj sinθj = 10 · cosθj sinθj , −sinθj cosθj 0 −1 sinθj −cosθj

= = θj + θj  it follows that ρPj σej oσνj , where νj cos 2 uj sin 2 νj. 5.5 Spectral Theorem, and Orthogonal Reduction 171

Corollary 5.5.36 The matrix representation of the orthogonal transformation T on Rn described in Theorem5.5.33 with respect to a suitable orthonormal basis is A =[aij], where (i) aii = 1 for all i ≤ r = dimV(rmaybe0also), (ii) aii =−1 for all i = r + j ≤ r + m, where m = dimW(mmayalsobe0), (iii) aii = cosθj for i = r + m + 2j − 1 and i = r + m + 2j, j ≤ l, (iv) aii+1 = sinθj and ai+1i =−sinθj for i = r + m + 2j − 1, j ≤ l, (v) the rest of the entries are 0.

Proof Taking orthonormal bases of V, W, and of the two-dimensional subspaces n P1, P2,...,Pl together, we get an orthonormal basis of R with respect to which the matrix representation is the required one. 

Corollary 5.5.37 Every orthogonal matrix A is orthogonally similar to a matrix of the form described in the above corollary. More explicitly, there is an orthog- onal matrix O such that OAOt is the matrix of the form described in the above corollary. 

Corollary 5.5.38 An orthogonal transformation with determinant 1 is composition of even number of hyperplane reflections, and with determinant −1 is composition of odd number of hyperplane reflections. 

Corollary 5.5.39 Two orthogonal matrices A and B are similar if and only if they have same set of eigenvalues counted with their multiplicities. 

Exercises

5.5.1 Find invariant subspaces of the differential operator D on the space ℘n of polynomials of degree at most n over reals. Find its characteristic polynomial, and also the eigenvalues.

5.5.2 Consider the matrix ⎡ ⎤ 001 A = ⎣ 100⎦ . 010

Show that the cube roots of 1 are precisely the eigenvalues of A. Show that the matrix is diagonalisable over the field C of complex numbers. Find a complex matrix P such that P−1AP is a diagonal matrix. Is it diagonalisable over the field R of reals?

5.5.3 Show that the matrix ⎡ ⎤ 102 A = ⎣ 047⎦ 002 is diagonalisable over the field R of real numbers, and find P such that P−1AP is a diagonal matrix. 172 5 Determinants and Forms

5.5.4 Show that the matrix ⎡ ⎤ 234 A = ⎣ 025⎦ 002 is not diagonalisable even over the field C of complex numbers. 5.5.5 Show that the matrix   1 − cosαsinα cos2α A = −sin2α 1 + sinαcosα is similar to an upper triangular matrix over R.FindP such that PAP −1 is upper triangular. What is PAP −1? Show that it is not similar to a diagonal matrix even over the field C of complex numbers. 5.5.6 Show that a 2 × 2 matrix over reals all of whose off diagonal entries are positive have all its eigenvalues real. Determine a necessary and sufficient condition on the entries of a 2 × 2 matrix for its diagonalisability. 5.5.7 Suppose that T 2 − 5T + 6 = 0. Determine all possible eigenvalues of T. 5.5.8 Suppose that A is nonsingular. Show that AB and BA have same eigenvalues. 5.5.9 Let A be a n × n matrix. Suppose that Am = 0forsomem ≥ 1. Show that An = 0.

5.5.10 Let A be a nilpotent n × n matrix. Show that In + A is invertible, and det(In + A) = 1. 5.5.11 Let A and B be complex n × n matrices such that AB − BA commutes with A. Show that (AB − BA)n = 0.

5.5.12 Let A be a n × n matrix. Define a map MA from Mn(F) to itself by MA(B) = A · B. Show that MA is a linear transformation. Relate the trace of A and that of MA. 5.5.13 Suppose that At = A2. What are possible eigenvalues of A? 5.5.14 Show that the determinant of a Hermitian matrix is always real. 5.5.15 Show that the determinant of a skew-symmetric real matrix of odd order is 0. 5.5.16 Let A be n × n skew-Hermitian matrix. Show that the determinant of A is either 0, or it is purely imaginary if n is odd. Show that it is purely real if n is even.

5.5.17 Show that if A is n × n Hermitian, then iIn + A is invertible. 5.5.18 Show that if A is skew real symmetric (skew-Hermitian) n × n matrix, then In + A is nonsingular. 5.5 Spectral Theorem, and Orthogonal Reduction 173

5.5.19 Show that if XAX is real for all complex vector X, then A is Hermitian.

5.5.20 Find an orthogonal matrix O such that OtAO is diagonal, where ⎡ ⎤ 101 A = ⎣ 011⎦ . 113

Find a real symmetric matrix B, if possible, such that B2 = A.

5.5.21 Show that all eigenvalues of AA are real, and AA is unitarily similar to a diagonal matrix.

5.5.22 Let A be a real matrix. Show that AtA is similar to a diagonal matrix.

5.5.23 Show that every real symmetric matrix A can be expressed as A = PDPt, where D is a diagonal matrix.

5.5.24 Show that every nonsingular real symmetric matrix A can be expressed as A = LDLt, where L is a lower triangular matrix, and D a diagonal matrix.

5.5.25 Let A be a n × n matrix with entries in R. Show that the map <, > from Rn × Rn to R defined by < x, y > = xA yt is an inner product if and only if A is symmetric, and all the eigenvalues of A are positive.

5.5.26 Which of the following matrices are positive or positive definite? (i) ⎡ ⎤ 121 A = ⎣ 211⎦ 113

(ii) ⎡ ⎤ 231 A = ⎣ 311⎦ 113

(iii) ⎡ ⎤ 105 A = ⎣ 011⎦ 513

(iv) ⎡ ⎤ 30−1 A = ⎣ 010⎦ −10 3 174 5 Determinants and Forms

(v) ⎡ ⎤ 211 A = ⎣ 121⎦ 112

For each of the above matrices, find orthogonal matrices O such that OtAO is a diagonal matrix. Express the positive definite matrices as BBt.

5.5.27 Find the cube roots of the matrices in Exercise5.5.26.

5.5.28 Show that I + iAA is nonsingular for all complex matrices A.

5.5.29 Show that the matrix   cosα sinα A = −sinα cosα is diagonalisable over R if and only if α = nπ for some n. Diagonalize it over the field C of complex numbers.

5.5.30 Show that every orthogonal 2 × 2 matrix with determinant 1 is a matrix of the form given in the above exercise.

5.5.31 Show that the group O(2) is isomorphic to SO(2) × Z2. Show that the group SO(2) is isomorphic to the circle group S1.

5.5.32 Show that O(3) is isomorphic to SO(3) × Z2.

5.5.33 Let A ∈ SO(3). Show that A − I3 is singular. Deduce that 1 is always an eigenvalue of A. What are other possible eigenvalues of A. t Hint. Det(A − I3) = Det(A (A − I3)) = Det(I3 − A).

5.5.34 Use the above exercise to show that every matrix A in SO(3) is similar to a matrix of the form ⎡ ⎤ 10 0 ⎣ 0 cosα sinα ⎦ . 0 −sinα cosα

Deduce that every matrix A in SO(3) represents a rotation in R3 about an axis through an angle α, where trace of A is 2cosα + 1. In particular, deduce that −1 ≤ Tr(A) ≤ 3. This justifies the name rotation group for SO(3).

5.5.35 Show that two orthogonal 3 × 3 matrices are similar if and only they have the same trace, and also the same determinant.

5.5.36 Show that SO(3) is generated by reflections.

5.5.37 Describe the closed subgroups of SO(3). 5.5 Spectral Theorem, and Orthogonal Reduction 175

5.5.38 Show that the group O(n) acts transitively on the set V (n, r) of r-dimensional subspaces of Rn. Describe the isotropy subgroup of the subspace W ∈ V (n, r) con- sisting of vectors with last n − r coordinates 0.

5.5.39 Show that the group O(n) acts transitively on the n − 1 sphere Sn−1 ={x ∈ Rn ||| || = } Rn ( ) x 1 in . Describe the isotropy group O n e1 . 2 ={ ∈ R3 ||| || = } R3 5.5.40 Consider the sphere S x x 1 in . Define a map dS2 2 2 + from S × S to R {0} by cosdS2 (x, y) = < x, y >. Show that dS2 is a metric called the spherical metric. Use Exercise5.5.38 to show that the map dSn−1 from n−1 n−1 + S × S to R {0} defined by cosdSn−1 (x, y) = < x, y > is metric. The metric n space (S , dSn ) is called the spherical n-space. Describe the Geodesics (path of shortest distance) in Sn. 2 ={ = ( , , ) | 2 − 5.5.41 Consider the upper part of the hyperboloid H x x1 x2 x3 x1 2 − 2 = , > } , ∈ 2 − − ≥ x2 x3 1 x1 0 . Show that x y H implies that x1y1 x2y2 x3y3 2. 2 2 + Show that the map dH2 from H × H to R {0} defined by cosh(dH2 (x, y) = x1y1 − x2y2 − x3y3 is a metric (called the hyperbolic metric). (How to generalize it for arbitrary n?).

5.5.42 Show that every matrix in SO(3) is similar to a diagonal matrix over the field C of complex numbers.

5.5.43 Show that a square matrix A with entries in the field C of complex numbers is similar to a diagonal matrix if and only if it is a normal matrix in the sense that AA = AA.

5.5.44 Find the polar decomposition of the following complex matrix.   1 i . 11

5.5.45 Find the polar decomposition of the following real matrix.   11 . 21

5.5.46 Find a singular value decomposition of the following matrices: ⎡ ⎤     01 1 i 11 , and ⎣ 11⎦ . 21 22 10

3 3 3 5.5.47 The map × from R × R to R defined by a × b = (a2b3 − a3b2, a3b1 − 3 a1b3, a1b2 − a2b1) is called the vector product on R . Show that the vector product is uniquely characterized by the requirement that it is bilinear, and it satisfies (i) a × a = 0, 176 5 Determinants and Forms

(ii) < a × b, a > = 0 = < a × b, b > for all a, b ∈ R3. (iii) < e1 × e2, e3 > = 1. Further, show that a nonzero alternating map from Rn × Rn to Rn exists if and only if n = 3. In particular, the concept of vector product exists only on R3.

5.6 Bilinear and Quadratic Forms

In this section, we discuss bilinear and quadratic forms, and their canonical reduction.

Definition 5.6.1 Let V be a finite-dimensional vector space over a field F.Amap f from V × V to F is called a bilinear form if f is linear in each coordinate in the sense that (i) f (ax + by, z) = af (x, y) + bf (y, z), and (ii) f (x, ay + bz) = af (x, z) + bf (x, z) for all x, y, z ∈ V , and a, b ∈ F.

y Thus, a map f from V × V to F is a bilinear form if and only if the maps fx and f y from V to F defined by fx(y) = f (x, y) and f (x) = f (x, y) are linear functionals. y Further, then the maps x  fx, and y  f denoted by Lf and Rf , respectively, are linear transformations from V to V  (verify). The zero map from V × V to F is a bilinear form on V . Any inner product on a real vector space is a bilinear form. The determinant map from F2 × F2 to F is an other bilinear form on F2, where F is a field. Example 5.6.2 Consider the vector space Fn of column vectors over F.LetA be a n × n matrix over F.Themapf from Fn × Fn to F defined by

f (X, Y) = XtAY is a bilinear form on Fn (verify). We shall see below that these are all bilinear forms on Fn. In fact, given any bilinear form f on a vector space V of dimension n over F, there exists an isomorphism T from V to Fn (corresponding to each choice of basis), and a matrix A such that f (x, y) = T(x)tAT (y). Thus, essentially these are all bilinear forms on a vector space of dimension n.

Example 5.6.3 Let φ and ψ be linear functional on a vector space V of dimension n. Then the map f from V × V to F defined by f (x, y) = φ(x) · ψ(y) is a bilinear form on V . Is it true that every bilinear form on V is of this form? Support.

Let f and g be bilinear forms on V and a, b ∈ F. Then it is easily seen that af + bg defined by (af + bg)(x, y) = af (x, y) + bg(x, y) is a bilinear form on V . Further, the zero map which takes every thing to 0 is already a bilinear form. Thus, the set BL(V, F) of bilinear forms on V is a vector space over F with respect to the 5.6 Bilinear and Quadratic Forms 177 operations defined above. Let us fix an ordered basis {u1, u2,...,un} of V . Define a ( , ) ( ) map Mu1,u2,...,un from BL V F to Mn F by

( ) =[ ], Mu1,u2,...,un f aij where aij = f (ui, uj). This map is a linear transformation (verify), and it is called the matrix representation map relative to the ordered basis {u1, u2,..., un}.The ( ) matrix Mu1,u2,...,un f is called the matrix representation of the bilinear form f .

Theorem 5.6.4 The matrix representation map Mu1,u2,...,un is a vector space isomor- phism from BL(V, F) to Mn(F).

Proof It is already seen to be a linear transformation. Thus, it is sufficient to show ( ) = ( ) that Mu1,u2,...,un is bijective. Suppose that Mu1,u2,...,un f Mu1,u2,...,un g . Then ( , ) = ( , ) , = n = n f ui uj g ui uj for all i j.Letx i=1xiui and y i=1yiui be any two members of V . Then using the bilinearity of f and g, we get

( , ) = (n ,n ) = n n ( , ) = f x y f i=1xiui i=yiui i=1xi j=1f ui uj yj i,jxif (ui, uj)yj = i,jxig(ui, uj)yj = g(x, y).

t This shows that f = g. Observe that f (x, y) = X AY , where A =[aij] is the matrix representation of f with respect to the ordered basis {u1, u2,...,un}, X and Y are column vectors whose ith row entries are xi and yi, respectively. Conversely, let A =[aij] be a matrix in Mn(F). Define a map f from V × V to F by

f (x, y) = XtAY ,

= n , = n , where x i=1xiui y i=1yiui X a column vector with ith row entry xi, and Y a column vector with ith row entry is yi. It is easy to observe that f defined above is a bilinear form. Since f (ui, uj) = aij, the matrix of f is A. This shows that the matrix representation map is an isomorphism. 

Corollary 5.6.5 Let V be a vector space and {u1, u2,...,un} an ordered basis of { , ,..., } { =   | ≤ ≤ , ≤ ≤ } V . Let u1 u2 un be the dual basis. Then fij ui uj 1 i n 1 j n forms a basis of BL(V, F). { | ≤ Proof The matrix representation map Mu1,u2,...,un takes fij to eij, and since eij 1 i, j ≤ n} forms a basis of Mn(F), the result follows from the above theorem. 

Effect of Change of Basis on Matrix Representation

Theorem 5.6.6 The matrix representations of a bilinear form on a vector space V with respect to different choices of bases are all congruent

Proof Let f be a bilinear form on V .Let{u1, u2,...,un} and {v1,v2,...,vn} be ordered bases of V .LetP =[pij] be the matrix of transformation from the ordered 178 5 Determinants and Forms

{ , ,..., } {v ,v ,...,v } v = n basis u1 u2 un to the basis 1 2 n . This means that j i=1pijui. Clearly, P is nonsingular. Further, suppose that

( ) =[ ] Mu1,u2,...,un f aij and ( ) =[ ]. Mv1,v2,...,vn f bij

Then aij = f (ui, uj) and bij = f (vi,vj). Thus, using bilinearity of f we get

= (v ,v) = (n ,n ) = n (n ( , ) ). bij f i j f k=1pkiuk l=1pljul k=1pki l=1f uk ul plj

t ( ) This shows that bij is the ith row jth column entry of P Mu1,u2,...,un f P. Hence

( ) = t ( ) , Mv1,v2,...,vn f P Mu1,u2,...,un f P where P is the matrix of transformation from the ordered basis {u1, u2,...,un} to {v1,v2,...,vn}. 

Definition 5.6.7 Let f and g be bilinear forms on V . We say that f is congruent to g if there is an isomorphism T from V to V such that g(x, y) = f (T(x), T(y)) for all x, y ∈ V .

The main problem in the theory of bilinear form is the classification of bilinear forms up to congruence over different fields. We shall give a solution to this problem for symmetric (f (x, y) = f (y, x)) bilinear forms. Following theorem reduces this problem to the problem of classifying matrices up to congruence. More precisely, we need to determine a unique member from each congruence class of matrices and determine an algorithm to reduce a matrix to the unique representative of the congruence class determined by that matrix. Theorem 5.6.8 A bilinear form f is congruent to a bilinear form g on V if and only if their matrix representations corresponding to any choice of ordered bases are congruent.

Proof Let f and g be congruent bilinear forms on V . Then there is an isomorphism T from V to V such that g(x, y) = f (T(x), T(y)) for all x, y ∈ V .Let{u1, u2,...,un} be an ordered basis of V . Then {T(u1), T(u2),...,T(un)} is also an ordered basis of V .Alsog(ui, uj) = f (T(ui), T(uj)) for all i, j. This shows that

( ) = ( ). Mu1,u2,...,un g MT(u1),T(u2),...,T(un) f

Since matrix of a bilinear form associated to any two ordered basis of V are congruent, it follows that the matrices of f and g corresponding to any choice of ordered bases ( ) = t are congruent. Conversely, suppose that Mu1,u2,...,un g P Mv1,v2,...,vn P. Then 5.6 Bilinear and Quadratic Forms 179

( , ) = n (n (v ,v ) ) = (n v ,n v ) = (w ,w ), g ui uj k=1pki l=1f k l plj f k=1pki k l=1plj l f i j

w = n v w = n v {w ,w ,..., where i k=1pki k, and j l=1plj l. Since P is nonsingular 1 2 wn} form an ordered basis of V .Thus,ifT denotes the isomorphism from V to V which takes ui to wi, then g(ui, uj) = f (T(ui), T(uj)) for all i, j. Since {u1, u2,...,un} is an ordered basis of V , and f and g are bilinear forms, it follows that g(x, y) = f (T(x), T(y)) for all x, y ∈ V . 

Example 5.6.9 We have seen above that any bilinear form on Fn is given by t f (X, Y) = X AY = i,jxiaijyj, where A =[aij], X is the column vector whose ith row is xi, and Y is the column vector whose ith row is yi. Note that the matrix of this bilinear form with respect to the standard ordered basis is A. Consider the bilinear form on R3 given by

f (X, Y) = x1y2 + 2x1y3 + x2y1 + x2y3 + 2x3y1 + x3y2.

The matrix A of this bilinear form with respect to the standard ordered basis is ⎡ ⎤ 012 A = ⎣ 101⎦ . 210

From Example2.7.5, it follows that this matrix is congruent to the diagonal matrix ( , − 1 , − ) Diag 2 2 4 , and the matrix P of transformation is ⎡ ⎤ − 1 − 1 2 1 = ⎣ 1 − ⎦ . P 0 2 2 00 1

Thus, the bilinear form g on R3 given by

( , ) = − 1 − g X Y 2x1y1 2 x2y2 4x3y3

R3 t is congruent to f . The isomorphism T from to itself which takes ei to the ith column of P is an isomorphism such that g(X, Y) = f (T(X), T(Y)) for all X, Y ∈ R3  − 1 − ,  1 − ,  . In fact, the substitution x1 x1 2 x2 x3 x2 2 x2 2x3 x3 x3, and  − 1 − ,  1 − ,  y1 y1 2 y2 y3 y2 2 y2 2y3 y3 y3 transforms f to g. Theorem 5.6.10 Let f be a bilinear form on a vector space V of finite dimension n over a field F. Then ρ(Lf ) = ρ(A) = ρ(Rf ), where A is a matrix of f corresponding to any choice of basis, and ρ denotes the rank. 180 5 Determinants and Forms

Proof Let us first observe that matrices of f corresponding to different ordered bases are congruent, and so they all have the same rank. Because of the rank-nullity theorem, it is sufficient to show that ν(Lf ) = ν(A) = ν(Rf ), where ν denotes the nullity. Now,

ν(Lf ) = dim({x ∈ V | Lf (x) = fx = 0}) = dim({x ∈ V | f (x, y) = 0 for all y ∈ V }).

If we fix an ordered basis {u1, u2,...,un} of V , then the map T from V to the n = n vector space F , which associates to x i=1xiui the column vector X whose t t ith row is xi, is an isomorphism, and then f (x, y) = T(x) AT (y) = X AY .This isomorphism takes the subspace {x | f (x, y) = 0 for all y ∈ V } of V isomorphically n t n n to the subspace {X ∈ F | X AY = 0 for all Y ∈ F }. Thus ν(Lf ) = dim({X ∈ F | XtAY = 0 for all Y}). Next, over any field if C is a column vector such that CtY = 0 for all Y ∈ Fn, then C = 0 (verify). Hence {X ∈ Fn | XtAY = 0 for all Y ∈ Fn}= n t n t t {X ∈ F | X A = 0}={X ∈ F | A X = 0}. This shows that ν(Lf ) = ν(A ). t Since A is a square matrix, ν(A ) = ν(A). This shows that ρ(Lf ) = ρ(A). Similarly, we can show that ρ(Rf ) = ρ(A). 

Definition 5.6.11 Let f be a bilinear form on V . Then the common number ρ(Lf ) = ρ(A) = ρ(Rf ) is called the rank of f . Corollary 5.6.12 Let f be a bilinear form on a vector space V of dimension n. Then the following conditions are equivalent. 1. Rank of f is n. 2. fx(y) = 0 for all y implies that x = 0. 3. f y(x) = 0 for all x implies that y = o. 

Definition 5.6.13 A bilinear form f on V is called non-degenerate,ornonsingular bilinear form if it satisfies any one (and hence all) of the above three conditions in the corollary. Symmetric Bilinear Forms Now, we try to describe bilinear forms which has a nice diagonal representation

( , ) = n f x y i=1aixiyi

{ , ,..., }, = n = n with expect to an ordered basis u1 u2 un x i=1xiui, and y i=1yiui. This is equivalent to say that the matrix representation of f with respect to some ordered basis is diagonal. Since matrix representation of a bilinear form with respect to any two bases is congruent, we need to characterize those bilinear forms whose matrix representations are congruent to diagonal matrices. If A is a matrix which is congruent to a diagonal matrix D, then there is a nonsingular matrix P such that PtAP = D or equivalently, A = QDQt, where Q = (P−1)t. Thus, At = QDtQt = QDQt = A. This shows that A is symmetric matrix. In other words, all matrices associated to the bilinear form f are symmetric. This is so if and only if 5.6 Bilinear and Quadratic Forms 181 f (x, y) = f (y, x) for all x, y ∈ V (verify). Such a bilinear form is called a symmetric bilinear form. Corollary 5.6.14 A necessary condition for a bilinear form f to have a diagonal representation is that it is a symmetric bilinear form.  Let f be a symmetric bilinear form on a finite-dimensional vector space V . A pair of vectors x, y ∈ V is said to be orthogonal to each other if f (x, y) = 0. Let W be a subspace of V . Then W ⊥ ={x ∈ V | f (x, y) = 0 for all y ∈ W} is a subspace of V (verify), and it is called the orthogonal compliment of W with respect to the bilinear form f . Observe that unlike the case of inner product space, W W ⊥ may ⊥ be different from {0}. Clearly, kerLf = kerRf ={x ∈ V | fx = 0}=V .IfV is a vector space over the field R of real numbers, then f is called positive (negative) if f (x, x) ≥ 0(≤ 0) for all x ∈ V .Itissaidtobepositive (negative) definite if it is positive (negative), and f (x, x) = 0 if and only if x = 0. To say that f has a diagonal representation with respect to the basis S ={x1, x2,...,xn} is to say that S is orthogonal basis in the sense that the members of S are pairwise orthogonal. The following result is the converse of the above corollary in case the field is of characteristic different from 2. Theorem 5.6.15 Let f be a symmetric bilinear form on a finite-dimensional vector space V over a field F of characteristic different from 2. Then there is an orthogonal basis of V , or equivalently, there is an ordered basis with respect to which f has diagonal representation.

Proof The proof is by the induction on the dimension of V . If dimension of V is 0 or 1, then there is nothing to do. Assume that the result is true for symmetric bilinear forms on vector spaces of dimension n.Letf be a symmetric bilinear form on a vector space V of dimension n + 1. If f is zero bilinear form, then there is nothing to do. Suppose that f = 0. We claim that there is a x ∈ V −{0} such that f (x, x) = 0. Suppose not. Then f (x, x) = 0 for all x ∈ V . Now, since f is symmetric bilinear form

f (x + y, x + y) − f (x − y, x − y) = 2f (x, y) + 2f (y, x) = 4f (x, y) for all x, y ∈ V . Hence 4f (x, y) = 0 for all x, y ∈ V . Since the field is of char- acteristic different from 2, f (x, y) = 0 for all x, y ∈ V . This is a contradic- tion to the supposition that f = 0. Let u1 ∈ V −{0} such that f (u1, u1) = 0. Let W ={au1 | a ∈ F} be the subspace generated by u1. Then dimension of W is 1. ⊥ ⊥ Consider W ={v ∈ V | f (u1,v) = 0}. Then W is a subspace of V (verify). ∈ ⊥ = ( , ) = ( , ) ( , ) = Suppose that au1 W . Then 0 f u1 au1 af u1 u1 . Since f u1 u1 0, it follows that a = 0. Thus, W W ⊥ ={0}. Further, let v ∈ V . Then

f (u1,v) f (u1,v− u1) = f (u1,v)− f (u1,v) = 0. f (u1,u1)

f (u1,v) ⊥ Put w = v − u1. Then w ∈ W , and f (u1,u1) 182 5 Determinants and Forms

f (u1,v) v = u1 + w f (u1,u1) belongs to W + W ⊥. This shows that V = W ⊕ W ⊥. Since restriction of f to W ⊥ is also symmetric bilinear form on W ⊥ and dimension of W ⊥ is n, it follows that ⊥ there is an ordered basis {u2, u3,...,un+1} with respect to which f restricted to W has diagonal representation. In other words f (ui, uj) = 0 for all i = j, i ≥ 2 and j ≥ 2. Already f (u1, uj) = 0 for all j ≥ 2. Thus, f has the diagonal representation with respect to the ordered basis {u1, u2,...,un+1}.  Corollary 5.6.16 Let A be a symmetric matrix with entries in a field F of charac- teristic different from 2. Then there is a nonsingular matrix P such that PtAP is a diagonal matrix. 

Remark 5.6.17 The proof of the above theorem and the corollary is algorithmic. It gives an algorithm to reduce a symmetric bilinear form (symmetric matrix) over a field of characteristic different from 2 to a diagonal bilinear form (matrix). An algorithm to reduce a symmetric matrix over a field of characteristic different from 2 congruently to a diagonal matrix is given in the proof of Theorem2.7.2 which is further illustrated in Example 2.7.5. This also gives an algorithm (see Example5.6.9) to reduce a symmetric bilinear form to diagonal form. Corollary 5.6.18 Let f be a symmetric bilinear form on a finite-dimensional vector space V over the field C of complex numbers (or over a field F of characteristic different from 2, and which contains square root of each of its elements). Then there is a basis {u1, u2,...,un} of V such that (i) f (ui, uj) = 0 for i = j, (ii) f (ui, ui) = 1 for i ≤ r, where r is rank of f , and (iii) f (ui, ui) = 0 for i ≥ r + 1. More precisely, the matrix of f with respect to some ordered basis is in normal form.

Proof From Theorem 5.6.15, we can find an ordered basis {v1,v2,...,vn} such that thematrixoff with respect to this ordered basis is diagonal. Suppose that the rank of f is r. We may assume that the first r diagonal entries are different from 0, and the rest of the diagonal entries are 0. This means that f (vi,vj) = 0fori = j, f (vi,vi) = 0 for i ≤ r, and f (vi,vi) = 0for i ≥ r + 1. Since the field contains square root of √ 1 each of its elements, we have f (vi,vi) ∈ C.Takeui = vi for i ≤ r, and f (vi,vi) uj = vj for j ≥ r + 1. Clearly, the ordered basis {u1, u2,...,un} has the required properties.  Corollary 5.6.19 Any symmetric matrix over C is congruent to a matrix in normal form.  Since any two congruent bilinear forms (matrices) have same rank, we have the following corollary. Corollary 5.6.20 Any two symmetric bilinear forms (matrices) over the field C of complex numbers are congruent if and only if they have same rank.  5.6 Bilinear and Quadratic Forms 183

The above corollary is not true over the field R of real numbers. In and −In have same rank where as they are not congruent over the field R of real numbers (verify). However, over the field R of real numbers we have the following results. Proposition 5.6.21 Let f be a symmetric bilinear form of rank r on a finite- dimensional real vector space. Let {u1, u2,...,un} be an orthogonal basis of V with f (ui, ui) = 0 for all i ≤ r, and f (ui, ui) = 0 for all i ≥ r + 1. Then f is positive (negative) if and only if f (ui, ui) ≥ 0 (≤ 0). It is positive (negative) definite if and only if f (ui, ui)>0 (< 0) for all i. = n ( , ) = n 2 ( , ) 2 Proof If x i=1aiui, then f x x i=1ai f ui ui . Since ai is always non- negative, the result follows. 

Theorem 5.6.22 (Sylvester) Let f be a symmetric bilinear form on a real vector space (more generally over a sub field of the field of real numbers in which all positive members have square root). Then there is an ordered basis {u1, u2,...,un} of V , and non-negative integers r, p and q such that (i) f (ui, ui) = 1 for all i ≤ p, (ii) f (ui, ui) =−1 for all i, p + 1 ≤ i ≤ r, (iii) f (ui, ui) = 0 for all i ≥ r + 1, and (iv) f (ui, uj) = 0 for all i = j. Further, integers r, p, and q are independent of choice of such bases.

Proof From Theorem 5.6.15, it follows that there is an ordered basis {v1,v2,...,vn} such that f (vi,vj) = 0fori = j, and f (vi,vi) = 0 for all i ≥ r + 1, where r is the rank of f . Changing the order of {v1,v2,...,vn}, we may assume that f (vi,vi)>0 √ 1 for i ≤ p, and f (vi,vi)<0forp + 1 ≤ i ≤ r.Takeui = vi for i ≤ r, |f (ui,ui)| and ui = vi for all i ≥ r + 1. Then it is clear that {u1, u2,...un} has the required property, where q = r − p. It remains to show that r, p and q are independent of the choice of basis. Since r is the rank of f , and which is invariant (congruent matrices have same rank), it is independent of the choice of basis. It is sufficient to show that p is independent of the choice of basis. Let {v1,v2,...,vn} be an other  ordered orthogonal basis such that f (vi,vi) = 1fori ≤ p , and f (vi,vi) =−1for  ⊥ p + 1 ≤ i ≤ r. It is clear from the above proposition that V has {ur+1, ur+2,...,un} and {vr+1,vr+2,...,vn} as bases. We show that

X ={v1,v2,...,vp , up+1, up+2,...,ur, ur+1,...,un} is linearly independent. Suppose that

a1v1 + a2v2 +···+ap vp + ap+1up+1 + ap+2up+2 +···+ar ur + ar+1ur+1 +···+anun = 0.

Then u + v + w = 0, where u = a1v1 + a2v2 +···+ap vp ,v = ap+1up+1 + ap+2up+2 +···+arur, and w = ar+1ur+1 + ar+2ur+2 +···+anun. Clearly, f (u, u) ≥ 0, f (v, v) ≤ 0, and f (x,w) = 0 for all x ∈ V (this is because w ∈ V ⊥). Thus, 184 5 Determinants and Forms

0 = f (u, u + v + w) = f (u, u) + f (u,v)+ f (u,w) = f (u, u) + f (u,v), and similarly, since f is symmetric,

0 = f (v, u + v + w) = f (v, v) + f (u,v).

From the above two equations, f (u, u) = f (v, v). Since f restricted to the sub- space generated by {v1,v2,...,vp } is positive definite, and f restricted to the subspace generated by {up+1, up+2,...,ur} is negative definite, it follows that f (u, u) = 0 = f (v, v). This, in turn, implies that u = 0 = v, and so also w = 0. Since {v1,v2,...,vp }, {up+1, up+2,...,ur}, and {ur+1, ur+2,...,un} are linearly independent, it follows that X is linearly independent. Hence n − p + p ≤ n, and so p ≤ p. Interchanging the role of the bases, we see that p ≤ p. The result follows. 

Remark 5.6.23 p is the largest among the dimensions of subspaces of V over which f is positive definite. Similarly, q is the largest among the dimensions of the subspaces of V over which f is negative definite.

Corollary 5.6.24 Every real symmetric matrix is congruent to a unique diagonal matrix of the form

( , ,..., , − , − ,...,− , , ,..., ). Diag 1 1 1!  1 1 1! 0 0 0!  p q n−p−q

Since for any r ≤ n, there are r + 1 pairs of non-negative integers p, q such that p + q = r, we have the following corollary. (n+1)(n+2) × Corollary 5.6.25 There are 2 distinct congruence classes of n n real sym- metric matrices. 

Definition 5.6.26 If f (A) is a real symmetric bilinear form (matrix), the uniquely determined integer p − q is called the signature of f (A).

Real Skew-Symmetric Forms (Matrices) Recall that a bilinear form f is skew-symmetric if f (x, y) =−f (y, x). It follows that f is skew-symmetric if and only if its matrix with respect to any basis is skew- symmetric. If the field F is of characteristic 0, then f (x, x) = 0 for all x ∈ V . Proposition 5.6.27 Let f be a skew-symmetric bilinear form on a finite-dimensional vector space V over a field of characteristic 0. Suppose that f (x, y) = 0. Then {x, y} is linearly independent. Further, if z ∈ W, where W is the subspace generated by { , } = f (z,y) − f (z,x) x y , then z f (x,y) x f (x,y) y. Proof Suppose that ax + by = 0. Then 0 = f (ax + by, x) = af (x, x) + bf (y, x) =−bf (x, y). Since f (x, y) = 0, b = 0. Similarly, 0 = f (ax + by, y) = af (x, y) + bf (y, y) = af (x, y). Again, since f (x, y) = 0, a = 0. This proves that 5.6 Bilinear and Quadratic Forms 185

{x, y} is linearly independent. Next, if z = ax + by, then f (z, x) = af (x, x) + ( , ) =− ( , ) =−f (z,x) = f (z,y) bf y x bf x y . Thus, b f (x,y) . Similarly, a f (x,y) . Hence

= f (z,y) − f (z,x) .  z f (x,y) x f (x,y) y

Proposition 5.6.28 Under the hypothesis of the above proposition V = W ⊕ W ⊥. v ∈ w = f (v,y) − f (v,x) Proof Let V and f (x,y) x f (x,y) y. Then

(v − w, ) = (v, ) + f (v,x) ( , ) = (v, ) − (v, ) = . f x f x f (x,y) f y x f x f x 0

Similarly, f (v − w, y) = 0. This shows that v − w ∈ W ⊥. Thus V = W + W ⊥. Suppose that v = ax + by ∈ W ⊥. Then 0 = f (v, x) =−bf (x, y), and 0 = f (v, y) = af (x, y). Since f (x, y) = 0, it follows that a = 0 = b. Hence v = 0.  Theorem 5.6.29 Let f be a skew-symmetric bilinear form on a finite-dimensional vector space V over a field F of characteristic 0. Then there is a non-negative integer r, and an ordered basis

{u1,v1, u2,v2,...,ur,vr, ur+1, ur+2,...,un} of V such that (i) f (ui,vi) = 1 =−f (vi, ui) for all i ≤ r, (ii) f (ui, uj) = 0 = f (vi,vj) for all i, j and (iii) f (ui,vj) = 0 for all i = j. Proof The proof is by the induction on the dimension of V . If dimension of V is 0, there is nothing to do. If dimension of V is 1, then also any skew-symmetric bilinear form on V is 0, and there is nothing to do. Assume that the result is true over all vector spaces of dimension less than n.LetV be a vector space of dimension n, and f a skew-symmetric bilinear form on V .Iff is 0, then there is nothing to do. Suppose that f = 0. Then there exists u1,v1 ∈ V such that f (u1,v1) = 0. Multiplying by a suitable scalar to u1, we may suppose that f (u1,v1) = 1. From above results, it follows that {u1,v1} is linearly independent, and if W is the subspace generated ⊥ ⊥ by {u1,v1}, then V = W ⊕ W . Clearly, the dimension of W is n − 2, and the restriction of f to W ⊥ is skew-symmetric. By the induction hypothesis, there exists ⊥ an ordered basis {u2,v2, u3,v3,...,ur,vr, ur+1, ur+2,...,un} of W such that (i) f (ui,vi) = 1 =−f (vi, ui) for all i, 2 ≤ i ≤ r, (ii) f (ui, uj) = 0 = f (vi,vj) for all i, j ≥ 2 and (iii) f (ui,vj) = 0 for all i = j, i, j ≥ 2. Clearly, {u1,v1, u2,v2,...,ur,vr, ur+1, ur+2,...,un} has the required properties.  Corollary 5.6.30 The rank of a skew-symmetric bilinear form on a vector space over a field of characteristic 0 is always even.  186 5 Determinants and Forms

Following is the matrix form of the theorem. Corollary 5.6.31 Every n × n skew-symmetric matrix A with entries in a field F of characteristic 0 is congruent to a matrix of the form ⎡ ⎤ 0100000000000 ⎢ ⎥ ⎢ −1000000000000⎥ ⎢ ⎥ ⎢ 0001000000000⎥ ⎢ ⎥ ⎢ 00−100000 0 0000⎥ ⎢ ⎥ ⎢ · · · ····· · ····⎥ ⎢ ⎥ ⎢ · · · ····· · ····⎥ ⎢ ⎥ ⎢ · · · ····· · ····⎥ , ⎢ ⎥ ⎢ · · · ····· · ····⎥ ⎢ ⎥ ⎢ 0 0 0 00000 0 1000⎥ ⎢ ⎥ ⎢ 0 0 0 00000−10000⎥ ⎢ ⎥ ⎢ 0 0 0 00000 0 0000⎥ ⎣ · · · ····· · ····⎦ 0 0 0 00000 0 0000 where 2r is the rank of A.  It is clear that there is no nondegenerate skew-symmetric bilinear form on a vector space of odd dimension. Suppose that f is a nondegenerate skew-symmetric bilinear form on a vector space V of even dimension 2r over a field of characteristic 0. Arranging the basis vectors u1,v1, u2,v2,...,un,vn of the theorem as u1, u2,...,un,v1,v2,...,vn and looking at the matrix representation, we get the following corollary. Corollary 5.6.32 Every nonsingular 2n × 2n skew-symmetric matrix with entries in a field F of characteristic 0 is congruent to a matrix of the form   0n J , −J 0n where 0n is the n × n zero matrix and ⎡ ⎤ 000001 ⎢ ⎥ ⎢ 000010⎥ ⎢ ⎥ ⎢ ······⎥ J = ⎢ ⎥ ⎢ ······⎥ ⎣ ······⎦ 100000 

Quadratic Forms, Orthogonal Reduction Let V be a finite-dimensional vector space over a field F.Amapq from V to F is called a quadratic form, if there is a bilinear form f on V such that q(v) = f (v, v) 5.6 Bilinear and Quadratic Forms 187 for all v ∈ V .Iff is skew-symmetric, then q(v) = f (v, v) = 0. We assume that the field is of characteristic different from 2. Then every bilinear form f = fs + fss, where fs is symmetric and fss is skew-symmetric. But, then q(v) = fs(v). Thus, for every quadratic form q, there is a symmetric bilinear form f such that q(v) = f (v, v) for all v ∈ V . The following proposition says that the symmetric bilinear form is uniquely determined by the quadratic form. Proposition 5.6.33 Let q be a quadratic form corresponding to a symmetric bilinear form f . Then (v, w) = 1 ( (v + w) − (v − w)). f 4 q q

Proof f (v + w, v + w) − f (v − w, v − w) = 4f (v, w). 

A quadratic form on a vector space V with respect to an ordered basis {u1, u2,..., un} is given by q(v) = i,jxiaijxj, where aij = f (ui, uj), f being the symmetric v = n bilinear form representing q, and i=1xiui. The following two results follow from the corresponding results on symmetric bilinear forms over the field C of complex numbers, and over the field R of real numbers. Corollary 5.6.34 Let q be a quadratic form on a vector space V over the field C of complex numbers. Then there is an ordered basis {u1, u2,...,un} of V such that the representation of q with respect to this basis is

(v) = 2 + 2 + ··· + 2, q x1 x2 xr where r is the rank of q (rank of q is defined to be the rank of corresponding f ), and v = n v  i=1xi i. Corollary 5.6.35 Let q be a quadratic form on a real vector space V . Then there is an ordered basis {u1, u2,...,un} of V such that

(v) = 2 + 2 + ··· + 2 − 2 − 2 − ··· − 2, q x1 x2 xp xp+1 xp+2 xr where r is the rank of q, and 2p − r is the signature of q(signature of q is defined to be the signature of the corresponding symmetric bilinear form).  Example 5.6.36 Consider the bilinear form f on R3 given in Example 5.6.9.Its reduced diagonal form and the matrix of transformation P is given in that example. Clearly, the form is further congruent to x1y1 − x2y2 − x3y3, and the correspond- ing matrix is congruent to diag(1, −1, −1). The rank√ is 3, and the signature is −1. The matrix of transformation is given by diag( √1 , 2, 2)P (check it). Let q be the 2 quadratic form on R3 given by

q(X) = 2x1x2 + 4x1x3 + 2x2x3.

It can be seen easily that the symmetric bilinear form of q is the bilinear form given in Example5.6.9. The congruent reduction of q to the normal form is 188 5 Determinants and Forms

( ) = 2 − 2 − 2. q X x1 x2 x3

The matrix of transformation is as above. One can obtain the ordered basis of R3 using the matrix of transformation with respect to which the quadratic form is in reduced form as given above.

Using Corollary5.5.17 (orthogonal reduction), we get the following proposition. Proposition 5.6.37 Let q be a quadratic form on a real inner product space V . Then there is an orthonormal ordered basis {u1, u2,...,un} of V such that

(v) = 2 + 2 + ··· + 2, q λ1x1 λ2x2 λnxn

v = n , ,..., where i=1xiui, and λ1 λ2 λn are eigenvalues of the matrix of q corre- sponding to a basis of V . 

Remark 5.6.38 Corollary5.5.17 gives an algorithm to find an orthogonal transfor- mation which reduces the given quadratic form to the diagonal form. This reduction is called an orthogonal reduction.

Surfaces in R3 Represented by the Equations of Second Degree A general equation of second degree representing a surface in R3 is given by

f (x, y, z) = ax2 + by2 + cz2 + 2hxy + 2fyz + 2gxz + 2ux + 2vy + 2wz + d = 0.

Let us denote the column vector ⎡ ⎤ x ⎣ y ⎦ z by X. Consider the quadratic form q on R3 given by

q(X) = ax2 + by2 + cz2 + 2hxy + 2fyz + 2gxz.

Then f (x, y, z) = q(X) + 2ux + 2vy + 2wz + d. The matrix A of the quadratic form q is given by ⎡ ⎤ ahg A = ⎣ hbf⎦ , g fc and q(X) = XtAX. Using Corollary5.5.17, we can find orthogonal matrix O such that t  t   O AO = diag(λ1, λ2, λ3). Put X = O X. Then X = OX . Substituting X = OX , the quadratic form reduces to the form

 2 2 2 q(X ) = λ1x + λ2y + λ3z . 5.6 Bilinear and Quadratic Forms 189

Suppose that ⎡ ⎤ l1 m1 n1 ⎣ ⎦ O = l2 m2 n2 . l3 m3 n3

The fact that O is orthogonal means that the rows represent direction cosines of three perpendicular axes, and the columns also represent the direction cosines of three perpendicular axes. Further, then f (x, y, z) reduces to

   2 2 2       f (x , y , z ) = λ1x + λ2y + λ3z + 2u x + 2v y + 2w z + d,

   where u = ul1 + vl2 + wl3,v = um1 + vm2 + wm3,w = un1 + vn2 + wn3. If the quadratic form q is nondegenerate, then all λi are nonzero, and then making perfect squares f (x, y, z) it reduces to

 u 2  v 2  w 2  λ1(x + ) + λ2(y + ) + λ3(z + ) + d , λ1 λ2 λ3

    where d = d − ( u )2 − ( v )2 − ( w )2. Substituting x = x + u , y = y + λ1 λ2 λ3 λ1   v , z = z + w , the equation reduces to λ2 λ3

2 2 2  λ1x + λ2y + λ3z =−d .

  If d = 0, then it represents a cone. If all λi together with d are positive, then such  a surface does not exists. Suppose that all λi are positive, and d is negative. Then it represents an ellipsoid with center given by x = y = z = 0, and the principal axes given by the lines x = 0, y = 0, z = 0. Expressing x, y, and z in terms of x, y, z, and then, in turn, expressing x, y, z in terms of x, y, z with help of the orthogonal transformation O, we get center, principal axes, and principal planes in terms of original coordinate systems.  If two of λi are positive, and the other is negative, and also d is negative, then it represents a one-sheeted hyperboloid whose invariants can be obtained as above. If one of them is positive, and other two are negative, then it represents two-sheeted hyperboloid. 2 Next, suppose that λ3 = 0. Then the above equation will reduce to λ1x + 2  2 2 λ2y = 2az ,ortoλ1x + λ2y = a. In case 1 it represents elliptic paraboloid if both λ1, λ2 are positive, and it represents hyperbolic paraboloid otherwise. Further, in case 2 it represents elliptic cylinder, or hyperbolic cylinder. If two eigenvalues are 0, then it reduces to the form x2 = 4y, or to the form x2 = a. In case 1 it represents parabolic cylinder, and in case 2 it represents pair of parallel planes. We illustrate the above discussion by means of an example. Example 5.6.39 Consider the second-degree equation

3 2 + 2 + 3 2 − + − = . 2 x y 2 z xz x 1 0 190 5 Determinants and Forms

The matrix of the quadratic form associated to this equation is the matrix A of Example 5.5.19. Its eigenvalues are 1, 1, 2. If we transform the equation using the orthogonal transformation O (see Example 5.5.19), the equation is transformed to

x2 + y2 + 2z2 + √1 x − √1 y = 1. 2 2

Completing the square it reduces to

(x + √1 )2 + (y + √1 )2 + 2z2 = 5 . 2 2 2 2 4 √ √ √ 5 , 5 ,  = This represents ellipsoid with axes 2 2 5. The center is given by x − √1 , y =−√1 , and z = 0. Since X = OX , substituting the values we get that 2 2 2 2 x =−1 , y =−√1 , z =−1 . The principal planes are given by x =−√1 = y, 4 2 2 4 2 2 and z = 0. Using the transformation X = OtX, we see that the principal planes are x√+z = √1 , y = √1 , and −√x+z = 0. 2 2 2 2 2 2 Exercises

5.6.1 Show that f (X, Y) = x1y1 + 2x2y1 + 3x1y2 + x2y3 + 4x3y3 defines a bilinear form on R3. Find its matrix representation with respect to the standard basis, and also with respect to the ordered basis {e1 + e2, e2 + e3, e1 + e3}. Conclude that these two matrices are congruent to each other. Is this bilinear form symmetric? Find its rank. Is this nondegenerate?

5.6.2 Let V = Mn(F) denote the vector space of all n × n matrices with entries in F. Define a map f from V × V to F by f (A, B) = Tr(AtCB), where C is a fixed matrix. Is this symmetric? Find its rank in terms of the matrix C.

5.6.3 Determine which of the following define a bilinear form on R3. ( , ) = 2 + 2 + + (i) f X Y x1 y1 x1y2 x3y3. (ii) f (X, Y) = 2 for all X, Y ∈ R3. (iii) f (X, Y) = x1x2 + y1y2 + x1y3. (iv) f (X, Y) = x1y3 − x2y2 + x3y2.

5.6.4 Let V be the vector space of 3 × 3 matrices over R and ⎡ ⎤ 123 A = ⎣ 224⎦ . 345

Define a map f from V × V to R by f (X, Y) = Tr(XtAY ). Show that f is a bilinear form. Is it a symmetric bilinear form (observe that A is symmetric). Find the matrix of f relative to the ordered basis {e11, e12, e13, e21, e22, e23, e31, e32, e33}, where eij is the matrix whose ith row jth column entry is 1 and the rest of the entries are 0. Find its rank. 5.6 Bilinear and Quadratic Forms 191

5.6.5 Let V be as above. Define a map f from V × V to R by f (A, B) = Tr(AB) + Tr(A)Tr(B). Show that it is a symmetric bilinear form. Find its rank and signature.

5.6.6 Show that a bilinear form on V is product of linear functionals if and only if it is of rank 1.

5.6.7 Let f be a nondegenerate form on a finite-dimensional vector space, and g a bilinear form. Show that there exists a unique linear transformation T1 on V given by g(v, w) = f (T1(v), w), and also there exists a unique linear transformation T2 on V such that g(v, w) = f (v, T2(w)).

5.6.8 Reduce the following symmetric bilinear forms on C3 congruently to the normal form, and find the matrices of transformations. (i) f (X, Y) = x1y2 + ix1y3 + x2y1 + ix3y1. (ii) g(X, Y) = (1 + i)x1y1 + x1y3 + x3y1 + ix2y3 + ix3y2. (iii) h(X, Y) = x1y3 + x2y3 + x3y1 + x3y2.

5.6.9 Check if the following pairs of bilinear forms on C3 are congruent. (i) (f , g). (ii) (g, h). (iii) (f , h), where f , g, h are bilinear forms defined in Exercise5.6.8.

5.6.10 Reduce the following complex symmetric matrix congruently to a matrix in normal form. ⎡ ⎤ 12 i ⎣ 24 7 ⎦ . i 7 −i

5.6.11 Reduce the following symmetric bilinear forms over R3 congruently to nor- mal form. Find the matrix of transformation in each case, and also rank and signatures: (i) f (X, Y) = 2x1y1 + 3x1y2 + x1y3 + 3x2y1 + x2y2 − x2y3 + x3y1 − x3y2. (ii) g(X, Y) = x1y2 + x2y3 + x2y1 + x3y2. (iii) h(X, Y) = x1y3 − x2y2 + x3y1.

5.6.12 Check if the following pairs of real symmetric bilinear forms are congruent. (i) (f , g). (ii) (g, h). (ii) (f , h), where f , g, h are as in the above exercise.

5.6.13 Find the ranks and signatures of the following matrices by congruently reduc- ing them in to the standard canonical forms. 192 5 Determinants and Forms

(i) ⎡ ⎤ 102 A = ⎣ 034⎦ 246

(ii) ⎡ ⎤ 123 B = ⎣ 234⎦ 345

(iii) ⎡ ⎤ 345 C = ⎣ 456⎦ 567

5.6.14 Determine which of the following pair of matrices are congruent. (i) (A, B). (ii) (A, C). (iii) (B, C), where A, B, C are as in the above

5.6.15 Can we have a nondegenerate skew-symmetric bilinear form on a complex vector space of dimension 3? Support.

5.6.16 Find the number of congruence classes of skew-symmetric bilinear forms on a real vector space of dimension 3.

5.6.17 Show that any two nondegenerate skew-symmetric 2n × 2n real matrices are congruent.

5.6.18 Reduce, orthogonally, the following quadratic forms in to standard canonical form. (i) y2 + z2 + yz + zx − xy. (ii) 4x2 + 3y2 + 2z2 + 4yz − 4xy. (iii) xy + yz + zx.

5.6.19 Find the eigenvalues of the real symmetric matrix A, and also an orthogonal matrix O such that OtAO is diagonal, where ⎡ ⎤ 2 −25 A = ⎣ −2 −110⎦ . 510−22

5.6.20 Reduce the following surfaces in to standard form, find their nature, and also their invariants such as center, principal axis, and principal planes. 5.6 Bilinear and Quadratic Forms 193

(i) y2 + z2 + yz + zx − xy − 2x + 2y − 2z + 1 = 0. (ii) 4x2 + 3y2 + 2z2 + 4yz − 4xy − 4x − 6y − 8z − 6 = 0. (iii) 3x2 + 6y2 + 11z2 + 8yz + 10zx + x − y + z − 4 = 0. 5.6.21 Let f be a nondegenerate bilinear form on a vector space V .LetO(f ) denote the set of all linear transformations T which preserve f in the sense that f (T(x), T(y)) = f (x, y). Show that O(f ) is a group. n+1 5.6.22 The bilinear form <>L on R defined by

< x, y >L = x1y1 + x2y2 +···+xnyn − xn+1yn+1 is called the Lorentz inner product. Show that <>L is a symmetric nondegenerate bilinear form of signature n − 1. The transformations preserving Lorentz inner prod- uct are called the Lorentz transformations. Show that the set O(n, 1) of all Lorentz transformations form a group under composition of maps. This group is called the Lorentz Group.

5.6.23 Aset{u1, u2,...,ur} is called Lorentz orthonormal set if < ui, uj >L = 0, and < ui, ui >L =±1. Use the Selvester’s law to show that at most one i will be such that < ui, ui >L =−1. 5.6.24 Show that every Lorentz orthonormal set is linearly independent. Show also that any Lorentz orthonormal set can be enlarged to a Lorentz orthonormal basis. 5.6.25 A (n + 1) × (n + 1) matrix A is called Lorentz matrix if AJAt = J, where J = Diag(1, 1,...,1, −1). Show that a linear transformation is a Lorentz transfor- mation if and only if its matrix representation with respect to any Lorentz orthonormal basis is a Lorentz matrix. 5.6.26 Show that the determinant of a Lorentz matrix is ±1.

5.6.27 Call a Lorentz matrix a positive Lorentz matrix if < Aen+1, en+1 >> 0. Show that the set PSO(n, 1) of positive Lorentz matrices of determinant 1 is a group under the product of matrices. This group is called the special positive Lorentz group. 5.6.28 Let A ∈ PSO(1, 1). Show that there is a unique x ≥ 0 such that   coshx sinhx A = . sinhx coshx

Describe the group PSO(1, 1). 5.6.28. Try to describe the geometry of Lorentz transformations. More explicitly, show that any matrix A ∈ PSO(n, 1) is similar to a matrix of the form   B 0(n−1)×2 , 02×(n−1) C where B ∈ PSO(n − 1), and C ∈ SO(1, 1). Compare with the geometry of orthogonal transformation. Chapter 6 Canonical Forms, Jordan and Rational Forms

In the previous chapter, we studied congruence classes of matrices over some special type of fields. This chapter is devoted to describe similarity classes of matrices with entries in some special type of fields. For the purpose, we first introduce the concept of a module over a ring, and obtain the structure theory of modules over a principal ideal domain. The reader is referred to Algebra 1 for the definition and some basic properties of rings.

6.1 Concept of a Module over a Ring

A module over a ring R is a structure obtained by replacing a field F in the definition of a vector space over F by a ring R. Thus, Definition 6.1.1 Let R be a ring with identity 1. A left R-module is an abelian group (M, +) together with a map · from R×M to M (the image of (a, x) under · is denoted by a · x) such that (i) (a + b) · x = a · x + b · x (ii) a · (x + y) = a · x + a · y (iii) (ab) · x = a · (b · x) (iv) 1 · x = x for all a, b ∈ R and x, y ∈ M. In the similar manner we can define right modules. We also say that M is a left(right)R-module or M is a left(right) module over R. Remark 6.1.2 If a left R-module structure on M is such that (ab) · x = (ba) · x for all a, b ∈ R, and x ∈ M, then this left R-module M can also be viewed as a © Springer Nature Singapore Pte Ltd. 2017 195 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_6 196 6 Canonical Forms, Jordan and Rational Forms right R-module by defining x · a = a · x. In particular, if R is a commutative ring, then every left R-module can also be considered as a right R-module. In this case we simply say that M is a R-module.

A module over a field F is simply a vector Space over F. We shall develop the theory of left modules. The theory of right modules can be developed on the same lines. Let M bealeftR-module. Let a ∈ R. Define a map fa from M to M by fa(x) = a·x. Since a · (x + y) = a · x + a · y for all x, y ∈ M, fa ∈ End(M, +). Thus, we have amapf from R to End(M, +) given by f (a) = fa. Since

fa+b(x) = (a + b) · x = a · x + b · x = fa(x) + fb(x) = (fa + fb)(x), and

fab(x) = (ab) · x = a · (b · x) = fa(fb(x)) = (faofb)(x) for all a, b ∈ R, and x ∈ M, we see that

f (a + b) = f (a) + f (b), and f (ab) = f (a)of (b) for all a, b ∈ R.Also

f (1)(x) = f1(x) = 1 · x = x = IM (x) for all x ∈ M. Hence f (1) = IM , the identity of the ring End(M, +). It turns out that every left R-module M gives rise to a ring homomorphism f from R to End(M, +) defined by f (a)(x) = a · x. Conversely, given any ring homomorphism f from R to End(M, +) (a ring homo- morphism is assumed to preserve identity), (M, +) becomes an R-module with respect to the external product · given by a · x = f (a)(x), and this in turn, induces the same homomorphism f . Thus, a left R-module can be viewed as a representation of R in a ring of endomorphism of an abelian group. Similarly, a right R-module determines and is determined uniquely by an anti- homomorphism f from R to End(M, +) in the sense that f (ab) = f (b)of (a) for all a, b ∈ r. Let M be a left R-module and a ∈ R. Then the map fa is group homomorphism from (M, +) to itself. Thus,

a · 0 = fa(0) = 0, and a · (−x) = fa(−x) =−(fa(x)x) =−a · x for all a ∈ R and x ∈ M. Also, since the map f : R −→ End(M, +) defined by f (a) = fa is a homomor- phism, f (0) = f0 is the zero map. Thus, we have 6.1 Concept of a Module over a Ring 197

0 · x = f0(x) = 0, and

(−a) · x = f−a(x) =−fa(x) = (−a · x) for all a ∈ R and x ∈ M.

Remark 6.1.3 Unlike in vector spaces, α·x = 0 need not imply that [α = 0 or x = 0]. If in a module this implication holds, then we say that the module is torsion free module. We say that it is torsion module if for each x = 0, there is an element a = 0inR such that ax = 0

Example 6.1.4 Every abelian group is naturally a Z module. It is a torsion-free module if and only if every nonzero element is of infinite order, and it is a torsion module if and only if all elements are of finite order.

Example 6.1.5 Let R be a ring with identity. Then (R, +) is a R-module. Here · is the ring multiplication.

Example 6.1.6 Let R be a ring with identity. Let Rn denote the n times Cartesian n n product of R with itself. Thus, R ={(a1, a2,...,an) | ai ∈ R}. Clearly, R is an abelian group with respect to the coordinate-wise addition. Define the external operation · from R × Rn to Rn by

a · (a1, a2,...,an) = (a · a1, a · a2,...,a · an).

Then (Rn, +) is a left R-module. It can also be made a right R-module.

Definition 6.1.7 Let M bealeftR-module. A subset N of M is called a submodule of M, if (i) N is a subgroup of (M, +) (ii) The map · from R × M to M induces a map from R × N to N. In other words a · x ∈ N for all a ∈ R and x ∈ N.

Thus, a submodule is a module at its own right.

Remark 6.1.8 If we consider a ring R with identity as left(right) module over itself, then the submodules, by definition, are the left(right) ideals of R.

The proofs of the following propositions are imitations of the corresponding results in vector spaces.

Proposition 6.1.9 Let M be a left R-module. Then a nonempty subset N of M is a submodule if and only if ax + by ∈ N for all a, b ∈ R, and x, y ∈ N. 

Proposition 6.1.10 Intersection of a family of submodules is a submodule.  198 6 Canonical Forms, Jordan and Rational Forms  Proposition 6.1.11 Let N1 and N1 be submodules of a R-module M. Then N1 N2 is a submodule if and only if N1 ⊆ N2 or N2 ⊆ N1. 

Proposition 6.1.12 Let N1 and N2 be submodules of a left R-module. Then N1 + ={ + | ∈ , ∈ } N2 x y x N1 y N2 is also a submodule (called the sum of N1 and N2) which is generated by N1 N2. 

Proposition 6.1.13 Union of a chain of submodules is a submodule. 

Let M be a left R-module, and S a subset of M. Then the smallest submodule of M containing S exists, and it is in fact the intersection of all submodules of M containing S. This submodule is called the submodule generated by S or the submodule spanned by S, and it is denoted by < S >.If< S > = M, then we say that S generates M,orS is a set of generators of M. A module M is said to be finitely generated if it has a finite set of generators. Thus, < ∅ > ={0}.

Definition 6.1.14 Let S be a nonempty subset of a left R-module M. An element x ∈ M is called a linear combination of members of S if

x = a1x1 + a2x2 + ··· + anxn for some a1, a2,...,an ∈ R and x1, x2,...,xn ∈ S.

Proposition 6.1.15 Let S be a nonempty subset of a left R-module M. Then < S > is the set of all linear combination of members of S.

Proof Imitate the proof of the corresponding result in vector space case. 

A subset S of a left R-module M is called independent if (i) o ∈/ S, and (ii) given a finite subset {x1, x2 ...,xn} of S with xi = xj,fori = j,

a1x1 + a2x2 +···+anxn = 0 implies that aixi = 0 for all i.

A subset which is not independent is called a dependent set.

Definition 6.1.16 A subset S of a module M is called linearly independent if given a finite subset {x1, x2, ···xn} of S, xi = xj for i = j,

a1x1 + a2x2 +···+anxn = 0 implies that ai = 0 for all i.

A subset S which is not linearly independent is called a linearly dependent subset.

Clearly, a subset of a linearly independent(independent) set is always linearly independent(independent). 6.1 Concept of a Module over a Ring 199

Proposition 6.1.17 Every linearly independent subset is independent.

Proof Let S be a linearly independent subset. Then 0 ∈/ S, for otherwise 1 · 0 = 0, whereas 1 = 0. Next, since ai = 0 for all i implies that aixi = 0 for all i, the result follows. 

Remark 6.1.18 An independent subset need not be linearly independent subset. For example, consider the Z-module Z6. Then {2, 3} is independent but not linearly independent(verify).

Since in a vector space aixi = 0 and xi = 0 implies that ai = 0, we have the following proposition.

Proposition 6.1.19 A subset S of a vector space V is linearly independent if and only if it is independent. 

Proposition 6.1.20 Union of a chain of linearly independent(independent) subsets is linearly independent(independent). { | ∈ } Proof Let Sα α be a family of linearly independent(independent) subsets. ∈/ ∈/ , ,..., Then 0 Sα for all α, and hence 0 α∈ Sα.Letx1 x2 xn be distinct elements ∈ { | ∈ } α∈ Sα. Suppose that xi Sαi . Since Sα α is a chain, there exists αr such ⊆ , ,..., that Sαi Sαr for all i. Thus, x1 x2 xn all belong to Sαr . Since Sαr is linearly independent(independent),

a1x1 + a2x2 +···+anxn = 0 implies that ai = 0(aixi = 0) for all i.

This shows that the union is linearly independent(independent). 

Proposition 6.1.21 Every linearly independent(independent) subset can be embed- ded in to a maximal linearly independent(independent) subset.

Proof Let S be a linearly independent(independent) subset of a module M.LetX be the set of all linearly independent(independent) subsets which contain S. Then X =∅,forS ∈ X. Thus, (X, ⊆) is a nonempty partially ordered set. From the previous proposition, it follows that every chain in X has an upper bound. By the Zorn’s Lemma X has a maximal element T (say). Clearly T is also maximal linearly independent(independent). 

Remark 6.1.22 A maximal linearly independent subset of a module may be far from a set of generators. For example the additive group (Q, +) is a Z-module. Every { }, = = m = p singleton α α 0 is a maximal linearly independent: given α n , and β q , pnα − qmβ = 0. We also know that it has no finite set of generators. However, we have the following proposition.

Proposition 6.1.23 Let S be a maximal independent subset of a left R-module M. Let x ∈ M. Then there exists α ∈ R −{0} such that αx ∈< S > 200 6 Canonical Forms, Jordan and Rational Forms

Proof If x ∈ S, then 1x = x ∈ S and S ⊆< S >.Ifx = 0, then 0 = 1 · 0 ∈< S >. Suppose x = 0, and x ∈/ S. Since S is supposed to be a maximal independent subset, S {x} is dependent. Thus, there exist α, α1, ···αn in R, and x1, x2, ···xn in S, xi = xj for i = j such that

αx + α1x1 + α2x2 + ··· + αnxn = 0, where not all of αx, α1x1, α2x2, ... , αnxn are zero. If αx = 0, then

α1x1 + α2x2 + ··· + αnxn = 0, where not all of α1x1, α2x2,...,αnxn are 0. This contradicts the supposition that S is independent. Thus, αx = 0, and

αx =−α1x1 − α2x2 − ··· − αnxn belongs to < S >. Since αx = 0, α = 0. 

Proposition 6.1.24 Let M be a left R-module and S a set of generators for M. If T is a subset of M such that S is properly contained in T, then T is dependent.

Proof If 0 ∈ T, then there is nothing to do. Let x ∈ T −S, x = 0. Since < S > = M, there exists x1, x2,...,xn ∈ S, xi = xj for i = j and α1, α2,...,αn ∈ R such that x = α1x1 + α2x2 +···+αnxn. But, then

1x + (−α1)x1 + (−α2)x2 + ··· + (−αn)xn = 0, where 1x = 0. Hence T is dependent. 

Definition 6.1.25 Let M bealeftR-module. A subset S of M is called a minimal or irreducible set of generators of M if < S > = M, and no proper subset of S generates M.

Remark 6.1.26 A set of generators need not contain any minimal set of generators. In fact, a module need not have any minimal set of generators. For example, the Z-module (Q, +) does not have any minimal set of generators. However, we have already seen that this is true in vector spaces.

Direct Sum of Modules

Let M1, M2,...,Mr be left R-modules. Then M = M1 × M2 ×···×Mr is an abelian group with respect to the coordinate-wise addition. If we define the external multiplication · by α · (x1, x2,...,xr) = (αx1, αx2,...,αxr), then M becomes a left R-module. This is called the external direct sum of M1, M2,...,Mr. A module M is said to be direct sum(internal direct sum) of its submodules M1, M2,...,Mr, if every element x of M has a unique representation as 6.1 Concept of a Module over a Ring 201

x = x1 + x2 + ··· + xr, where xi ∈ Mi for all i. The notation

M = M1 ⊕ M2 ⊕ ··· ⊕ Mr stands to say that M is direct sum of its submodules M1, M2,...,Mr.

Proposition 6.1.27 Let M1, M2,...,Mr be submodules of a module M. Then the following conditions are equivalent. (1) M = M1 ⊕ M2 ⊕ ··· ⊕ Mr. = + + ··· + (2) (i). M M1 M2 Mr, and i i (ii). Mi M ={0} for all i, where M denotes the submodule (M1 + M2 +···+ Mi−1 + Mi+1 + Mi+2 +···+Mr) .

Proof 1 =⇒ 2. Assume 1. Since every element x of Mhas a unique representation i as x = x1 + x2 +···+xr, 2(i) is evident. Let x ∈ Mi M . Then x = x1 + x2 + ···+xi−1 + xi+1 + xi+2 +···xr, where xj, j = i belong to Mj. Thus,

0 = x1 + x2 +···+xi−1 − x + xi+1 + xi+2 +···+xr = 0 + 0 +···+0.

From the uniqueness of the representation of 0, it follows that each xj is zero, and so x is also 0. 2 =⇒ 1. Assume 2. Since every element of M is sum of elements of M1, M2,..., Mr, it follows that every element x of M has a representation x = x1 + x2 +···+xr, where xi ∈ Mi for all i. Now, we prove the uniqueness of the representation. Suppose that

x = x1 + x2 +···+xr = y1 + y2 +···+yr,  i where xi, yi ∈ Mi for all i. Then xi − yi ∈ Mi M ={0}. This shows that xi = yi for all i. 

i Remark 6.1.28 If M is direct sum of M1, M2,...,Mr, then M as defined in the above proposition is direct sum of M1, M2,...,Mi−1, Mi+1, Mi+2,...,Mr.

Quotient Modules Let M be a left R-module, and N a submodule of M. Then N is a subgroup of (M, +). Consider the quotient group

M/N ={x + N | x ∈ M}.

Here the coset x + N ={x + n | n ∈ N}, and the addition is defined by

(x + N) + (y + N) = (x + y) + N. 202 6 Canonical Forms, Jordan and Rational Forms

Define the external multiplication ·:R × M/N −→ M/N by

α · (x + N) = (α · x) + N.

Then M/N is a left module, and it is called the quotient module of M modulo N. Remark 6.1.29 In general, submodule of a finitely generated module need not be finitely generated. For example, the polynomial ring Z[X1, X2,...] over Z in infinitely many variables is a module over itself which is generated by the identity of the ring, whereas the submodule generated by the set of all indeterminate is not finitely generated(verify). Module Homomorphisms, Isomorphisms

Let M1 and M2 be left R-modules over a ring R.Amapf from M1 to M2 is called a R-module homomorphism if f (ax + by) = af (x) + bf (y) for all a, b ∈ R and x, y ∈ M1. A bijective homomorphism is called an isomorphism. The proofs of the following results are imitation of the proofs of the corresponding results in vector space theory, and are left as simple exercises.

Proposition 6.1.30 Let f be a homomorphism from a left R-module M1 to a left R-module M2. Then, (i) f (0) = 0, (ii) f (−x) =−f (x) for all x ∈ M1, (iii) the image of a submodule of M1 under the map f is a submodule of M2, and (iv) the inverse image of a submodule of M2 under the map f is a submodule of M1. 

−1 In particular, f ({0}) is a submodule of M1, called the kernel of the homomor- phism, and it is denoted by kerf . Proposition 6.1.31 A homomorphism f is injective if and only if kerf ={0}.  Theorem 6.1.32 (Fundamental theorem of homomorphism). Let f be a homomor- phism from a left R-module M1 to a left R-module M2. Let N1 be a submodule of M1. Then there exists a homomorphism ffromM1/N1 to M2 such that foν = f if and only if N1 ⊆ kerF. Also if such a homomorphism exists, then it is unique. Further, then f is injective if and only if N1 = kerf . Finally, f is an isomorphism if and only if f is surjective, and N1 = kerf . 

Theorem 6.1.33 (Noether isomorphism theorem). Let N1 and N2be submodules of a left R-module M. Then (N1 + N2)/N2 is isomorphic to N1/N1 N2. 

Proposition 6.1.34 Let M1 and M2 be left R-modules and f a homomorphism from M1 to M2. Then, (i) f is surjective if and only if it takes a set of generators to a set of generators. (ii) f is injective if and only if it takes an independent set to an independent set.  6.2 Modules over P.I.D 203

6.2 Modules over P.I.D

In this section, we shall be mainly interested in finitely generated modules. Let M be a finitely generated R-module. We say that a finite subset S ={x1, x2,...,xn} is a basis of M if every element x ∈ M can be written uniquely as

x = a1x1 + a2x2 + ··· + anxn.

This amounts to say that S generates M, and S is linearly independent. A R-module M is said to be a free module over R if it has a basis. Thus, every vector space V (a module over a field) over a field F is a free F- module. This is not true for modules over an arbitrary ring. For example, Z2 is a Z4-module but not free Z4-module.

Proposition 6.2.1 Let M be a free R-module with S as a basis. The every map f from S to a R-module N can be extended uniquely to a homomorphism from M to N.

Proof Let S ={x1, x2,...,xn} be a basis of M, and f a map from S to N. Since every element of M can be written uniquely as a1x1 + a2x2 +···+anxn, f can be extended to a map f from M to N defined by

f (a1x1 + a2x2 +···+anxn) = a1f (x1) + a2f (x2) +···+anf (xn).

Clearly, f is a homomorphism which extends f . It is also clear that the definition of f is forced, and so it is unique. 

Corollary 6.2.2 A R-module M is free with a basis containing n elements if and only if it is isomorphic to Rn.

n Proof The R-module R has {e1, e2,...en} as a basis, where ei is a row with n columns in which ith column is 1 and the rest of the columns are 0 (This basis is called the standard basis). Thus, Rn is a free R-module. Since an isomorphism takes a basis to a basis, any isomorphic image of Rn is a free R-module with a basis containing n elements. Conversely, if M is a free R-module with a basis {x1, x2,...,xn}, then the n map which takes xi to ei extends to an isomorphism from M to R . 

Remark 6.2.3 Unlike in the case of vector spaces, for arbitrary ring R, Rn isomorphic to Rm does not imply that n = m. For example, if we take R to be the ring of endomorphism on an infinite dimensional vector space V , then R2 is isomorphic to R (verify). However, if R is a P.I.D., then Rn isomorphic to Rm implies that n = m. The proof of this fact will follow soon.

Proposition 6.2.4 Let M be a R-module, and N a submodule such that M/Nisfree. Then M is isomorphic to N ⊕ M/N. 204 6 Canonical Forms, Jordan and Rational Forms

Proof Let T ={x1 + N, x2 + N,...,xr + N} be a basis of M/N. Then since a1x1 +a2x2 +···+arxr = 0 implies that a1(x1 +N)+a2(x2 +N)+···+ar(xr +N) = N (the zero of M/N), and since T is a basis of M/N, it follows that ai = 0for all i. Thus, S ={x1, x2,...,xr} is a linearly independent subset of M.LetL be the submodule of M generated by S. Then L is free with S as a basis, and the map xi  xi + N extends an isomorphism from L to M/N.Letx ∈ M. Then x + N = a1(x1 + N) + a2(x2 + N) +···+ar(xr + N) for some a1, a2,...,ar ∈ R. But, then x − (a1x1 + a2x2 +···+arxr) belongs to N. Hence M = N + L.Next, suppose that y + z = 0, where y ∈ N and z ∈ L. Then z + N = z + y + N = N, and since z  z + N is an isomorphism from L to M/N, it follows that z = 0. In turn, y = 0. This shows that M = N ⊕ L ≈ N ⊕ M/N.

Proposition 6.2.5 If L and N are free submodules of M such that M = L ⊕ N, then M is a free module.  Proof Suppose that S1 is a basis of L, and S2 is a basis of N. Then S1 S2 is linearly independent, and also generates M (verify). 

Recall that a commutative ring is a principal ideal domain (P.I.D.) if it is integral domain, and every ideal of R is of the form Ra for some a ∈ R. Note that Ra = Rb if and only if a and b differ by a unit in R. For further details see the Algebra 1.

Theorem 6.2.6 Let R be a principal ideal domain, and M a free R-module with a finite basis containing n elements. Then every nonzero submodule of M is a free module with a basis containing at most n elements.

Proof The proof is by the induction on n.Ifn = 1, then M = < x1 > = Rx1, where x1 = 0, and ax1 = 0 implies that a = 0. The map a  ax1 is clearly a R-isomorphism from R to M. Therefore, it is sufficient to show that every nonzero submodule of R considered as a module over R is free. A nonzero submodule of R is an ideal of R, and it is of the form Ra, where a = 0. Clearly, Ra is free with {a} as a basis. Assume that the result is true for n < m.LetM be a free module with S = {x1, x2,...,xm} as a basis of M. Then the submodule < x1 > generated by x1 is free with {x1} as a basis. Consider the quotient module L = M/, and the quotient map ν. Since S generates M, ν(S) generates L. Since ν(x1) is the zero of

L,itfollowsthatS ={ν(x2), ν(x3)...,ν(xm)} also generates L. We show that S is linearly independent. Suppose that

a2ν(x2) + a3ν(x3) + ··· + amν(xm) = < x1 >.

Then

a2x2 + a3x3 +···+amxm + < x1 > = < x1 >, 6.2 Modules over P.I.D 205 or equivalently,

a2x2 + a3x3 +···+amxm =−a1x1 for some a1 ∈ R. But, then

a1x1 + a2x2 +···+amxm = 0.

Since S is linearly independent, each ai = 0. This shows that L is free, and S is a basis of L. By the induction hypothesis, every submodule of L = M/ is free with a basis containing at most m − 1 elements. Now, let N be a nonzero submodule of M.Ifν(N) is zero submodule of L, then N is a nonzero submodule of < x1 >, and so from the previous case, it is free with a singleton basis. Suppose that ν(N) is nonzero, and so it is free with a basis containing at most m − 1 elements. The restriction ν/N of ν to N is a surjective homomorphism from N to ν(N) whose kernel is N < x1 >. Since N/N < x1 > is a submodule of M/,it is free with a basis containing at most m − 1 elements. The result follows from the Propositions 6.2.4 and 6.2.5. 

Remark 6.2.7 In fact every submodule of a free R-module is free. The proof uses transfinite induction.

Z-modules are abelian groups, and free Z-modules are called free abelian groups. Since Z is a P.I.D., we have the following corollary.

Corollary 6.2.8 Every subgroup of a finitely generated free abelian group is free. 

Proposition 6.2.9 Every finitely generated module over R is isomorphic to quotient of a free module.

n Proof If M is generated by {x1, x2,...,xn}, then the map f from R to M defined by f (a1, a2,...,an) = a1x1 + a2x2 +···+anxn is a surjective homomorphism. Since Rn is free, the result follows from the fundamental theorem of homomorphism. 

Corollary 6.2.10 Every submodule of a finitely generated module over a P.I.D. is finitely generated.

Proof Let M = < {x1, x2,...,xn} > be a finitely generated module. Then the map f n from R to M defined by f (a1, a2,...,an) = a1x1 +a2x2 +···+anxn is a surjective homomorphism from the free module Rn to M.LetN be a submodule of M. Then f −1(N) is a submodule of Rn. Since Rn is a free module with a basis consisting of n elements, it follows that f −1(N) is a free submodule of Rn with a basis containing at most n elements. Hence N = f (f −1(N)) is also generated by at the most n elements. 

A module M is called a cyclic module if M is generated by a single element. Thus, M is cyclic if there exists an element x ∈ M such that M = < x > = Rx. 206 6 Canonical Forms, Jordan and Rational Forms

The map f defined by f (a) = ax is a surjective homomorphism from R to M. Since R is cyclic over R, and homomorphic images of cyclic modules are cyclic modules, it follows that a module M is cyclic if and only if it is homomorphic image of R. Suppose that R is P.I.D., and M = Rx a cyclic module. Let N be a submodule of M. Then f −1(N) is a submodule of R, and so it is an ideal Ra for some a ∈ R. Thus, f −1(N) is cyclic. Hence N = f (f −1(N)) is cyclic. This shows that submodule of a cyclic module over a P.I.D. is cyclic. Is this assertion true if R is not a P.I.D.? Let M be a R-module, where R is a P.I.D. Let x ∈ M. Consider the map f from R to M given by f (a) = a · x. Then f is clearly a R-homomorphism. The kernel of f is an ideal of R. Since R is a principal ideal domain, kerf = Ra for some a ∈ R.If a = 0, then the kernel of f is {0}, and in this case the submodule < a > generated by a is isomorphic to R. In this case we say that x is of period 0. Thus, x is of period 0 if and only if ax = 0 implies that a = 0. Such an element is also called a torsion free element of M.Ifkerf = Ra is nonzero, then a = 0 and ax = 0. In this case x is called a torsion element, and a, where kerf = Ra, is called a period of x. If a and b are periods of a torsion element x, then Ra = Rb, and so a ∼ b. Thus, period of a torsion element is unique up to associates. It is clear that a is a period of x in M if and only if bx = 0 if and only if a/b. A period of an element x is denoted by o(x). Suppose that M = < x > is a cyclic module generated by x.Ifx is of period 0, then M is isomorphic to R, and if a period of x is a = 0, then M is isomorphic to R/Ra. Thus, a cyclic R-module is isomorphic R,oritisisomorphictoR/Ra for some a = 0. In case of abelian groups (Z-modules), period corresponds to order of the element.

Definition 6.2.11 A module M over a ring R is called a torsion module if every element of M is a torsion element. It is said to be torsion free, if every nonzero element of M is torsion free. A module which is neither torsion nor torsion free is called a mixed module.

Every finite abelian group is torsion Z-module. A torsion Z-module is also called a periodic group. The additive group Z of integers is torsion-free Z-module.

Proposition 6.2.12 Let M be a R-module, and let T(M) denote the set of all torsion element of M. Then T(M) is a torsion submodule of M, and M/T(M) is torsion-free module.

Proof Suppose that x, y ∈ T(M), and a, b ∈ R. Since x, y are torsion elements, there exist nonzero elements c and d such that cx = 0 = dy. Clearly, then cd = 0, and cd(ax + by) = 0. This shows that ax + by ∈ T(M). Thus, T(M) is a submodule of M. Next, let x + T(M) be a nonzero element of M/T(M). Then x + T(M) = T(M). This means that x is not a torsion element of M. Suppose that a(x +T(M)) = T(M). Then ax ∈ T(M). Hence there exists b = 0 such that bax = 0. Since x is torsion free, ba = 0. Again, since b = 0, a = 0. This shows that M/T(M) is torsion free.  6.2 Modules over P.I.D 207

Definition 6.2.13 T(M) is called the torsion part of M, and M/T(M) is called the torsion free part of M.

If M is finitely generated over a P.I.D., then so are T(M) and M/T(M).

Theorem 6.2.14 Every finitely generated torsion-free module over a P.I.D. is free.

Proof Let S ={x1, x2,...,xn} be a set of generators of M. We may assume without any loss that each xi = 0. Since M is torsion free, {xi} is linearly independent. Let T be a maximal linearly independent subset of S. Without any loss, we may suppose that T ={x1, x2,...,xr}, r ≤ n.LetN be a submodule of M generated by T. Then N is free. Since T is maximal linearly independent, {x1, x2,...,xr, xr+i} is linearly dependent for all i, 1 ≤ i ≤ n − r. Hence, there are a1, a2,...,ar, ar+i in R not all 0 such that

a1x1 + a2x2 +···+arxr + ar+ixr+i = 0.

Since T is linearly independent, ar+i = 0, for otherwise each ai = 0. Also ar+ixr+i = −a1x1 − a2x2 −···−arxr belongs to N.Leta = ar+1ar+2 ···an. Then a = 0, and axi ∈ N for all i. Since S generates M, ax ∈ N for all x ∈ M. Define a map f from M to N by f (x) = ax. Clearly, this is a module homomorphism. Since M is torsion free, and a = 0, ax = 0 implies that x = 0. This means that f is injective. Thus, M is isomorphic to a submodule of N. Since N is free with a finite basis, and submodule of a free module with finite basis is free (Theorem 6.2.6), it follows that M is free. 

Corollary 6.2.15 If M is a finitely generated module over a P.I.D., then M = T(M) ⊕ M/T(M).

Proof Since M is finitely generated, M/T(M) is finitely generated and torsion free. From the above theorem M/T(M) is free. From Proposition 6.2.4, it follows that M = T(M) ⊕ M/T(M). 

Since every finitely generated free module over R is isomorphic to Rn for some n, we have the following corollary.

Corollary 6.2.16 Every finitely generated module over a P.I.D. is isomorphic to the direct sum of a finitely generated torsion module and Rn for some n. 

Since every finitely generated torsion abelian group is finite, we have the follow- ing:

Corollary 6.2.17 Every finitely generated abelian group is isomorphic to the direct sum of a finite abelian group with Zn for some n. 

Thus, to study the structure of finitely generated modules over principal ideal domains, it is sufficient to study the structure of finitely generated torsion modules over principal ideal domains. 208 6 Canonical Forms, Jordan and Rational Forms

Proposition 6.2.18 Let M be a finitely generated torsion module. Then there exists a = 0 such that ax = 0 for all x ∈ M.

Proof Suppose that M = < {x1, x2,...,xn} >.Letai be a period of xi. Then aixi = 0. Let a = a1a2 ···an. Then a = 0, and ax = 0 for all x ∈ M. 

Let A ={a ∈ R | ax = 0 for all x ∈ M}. Then A is an ideal of R, and since R is P.I.D., A = Rm for some m ∈ R. Such a m is called an exponent of M.Itisclear that exponent of M is unique up to associates. Let M be a torsion module over R, and p a prime element of R. We say that M is a p-module if given any element x ∈ M, there exists n ∈ N such that pn · x = 0. n Let M be a torsion module and p aprimeofR.LetMp ={x ∈ M | p x = 0 for some n ∈ N}. Then Mp is a submodule of M, and it is called the p-part of M.

Theorem 6.2.19 Let M be a torsion module, and a an exponent of M. Let {p1, p2,...,pn} be a set of primes dividing a such that pi and pj are not associate for i = j, and also each prime divisor of a is an associate of pi for some i. Then

= ⊕ ⊕···⊕ . M Mp1 Mp2 Mpn

Proof Let x ∈ M. Since a is exponent of M, period of x divides a. We may suppose that

( ) = t1 t2 ··· tn . o x p1 p2 pn

= o(x) ( , ,..., ) ∼ , ,..., Let qi ti . Then q1 q2 qn 1. Since R is a P.I.D.,there exist u1 u2 un pi in R such that

u1q1 + u2q2 + ··· + unqn = 1.

Hence

x = u1q1x + u2q2x + ··· + unqnx.

ti = ti = ( ) = ∈ Now, p uiqix uip qix uio x x 0. This shows that uiqix Mpi . Hence

= + + ··· + . M Mp1 Mp2 Mpn

Further, suppose that

x1 + x2 + ··· + xn = 0,

t t t t − t + ∈ ( ) ∼ i = 1 2 ··· i 1 i 1 ··· tn where xi Mpi . Suppose that o xi pi .Letqi p1 p2 pi−1pi+1 p . Then = ( + +···+ ) = ( ti , ) ∼ ,v 0 qi x1 x2 xn qixi. Since pi qi 1, there exists ui i such ti + v = = ti + v = that uipi iqi 1. Hence xi uipi xi iqixi 0. This shows that the 6.2 Modules over P.I.D 209

representation of an element x as sum of elements of Mpi is unique. Hence M is the ⊕ ⊕···⊕  direct sum Mp1 Mp2 Mpn . Now, we describe the structure of finitely generated p-modules, where p is a prime. First observe that if M is a torsion module generated by {x1, x2,...,xn}, then the exponent of M is l.c.m. of o(x1), o(x2),...o(xn). Thus, if M is a p-module generated ni m by {x1, x2,...,xn}, where o(xi) ∼ p , and m is the maximum of ni, then p will be an exponent of M.

Theorem 6.2.20 Let M be a finitely generated p-module over a P.I.D.(p a prime). Then M is direct sum

< x1 > ⊕ < x2 > ⊕···⊕< xm >

ni of cyclic modules, where o(xi) ∼ p , n1 ≥ n2 ≥···≥nm.

Proof (The proof is the imitation of the proof of the Theorem 7.3.1 of the Algebra 1) Let M be a p-module generated by {x1, x2,...,xm}, where xi = 0 for all i.The proof is by the induction on m.Ifm = 1, then M = < x1 >, and then there is nothing to do. Assume that the result is true for m = r. We prove it for r + 1. Let S ={x1, x2,...,xr+1} be a set of generators for M, where xi = 0 for all i. Suppose ni n1 that o(xi) ∼ p . We may assume that n1 ≥ n2 ≥···≥nr+1. Thus, p is an exponent of M. Consider the quotient module M/, and the quotient map ν from M to M/. Then M/ is generated by {ν(x2), ν(x3), . . . , ν(xr+1)}. Clearly, t M/ is a p-module of exponent p , where t ≤ n1. By the induction hypothesis,

M/ =< ν(y2)>⊕ < ν(y3)>⊕···⊕< ν(ys)>

ni for some y2, y3,...,ys in M such that o(ν(yi)) = p , n1 ≥ n2 ≥ n3 ≥ ··· ≥ ns. We show that there exists zi ∈ M for all i ≥ 2 such that ν(zi) = ν(yi), and t t o(zi) = o(ν(yi)) = o(ν(zi)). Since p yi = 0 implies that p (ν(yi)) = < x1 > (the zero of M/), it follows that o(ν(yi)) divides o(yi). Since o(ν(yi)) = ni ni ni ni ti p , p ν(yi) = < x1 >. This means that p yi ∈< x1 >. Suppose that p yi = p aix1, ni ni where (p, ai) ∼ 1, and ti ≤ n1.Ifti = n1, then p yi = 0, and o(yi) divides p . ni Hence o(yi) ∼ p , and then there is nothing to do. Suppose that ti < n1. Since n1 n1 (ai, p ) ∼ 1, there exist u,v ∈ R such that uai + vp = 1. But, then x1 = uaix1. n1 ni ti n1−ti Hence o(aixi) = o(x1) = p . Thus, o(p yi) = o(p x1) = p . This shows that n1−ti+ni n1 o(yi) = p . Since p is exponent of M,itfollowsthatn1 −ti +ni ≤ n1. Hence ti−ni ni ni ≤ ti.Takezi = yi −p aix1. Then ν(zi) = ν(yi), and o(zi) = p = o(ν(zi)). Now, we show that

M =< x1 > ⊕ < z2 > ⊕ < z3 > ⊕···⊕< zs >.

Let x ∈ M. Since {ν(z2), ν(z3),...,ν(zs)} generates M/,itfollowsthat ν(x) = a2ν(z2) + a3ν(z3) +···+asν(zs) for some a2, a3,...,as in R. Hence 210 6 Canonical Forms, Jordan and Rational Forms

x − a2z2 − a3z3 − ··· − aszs = a1x1 for some a1 ∈ R. Thus,

x = a1x1 + a2z2 + a3z3 + ··· + aszs.

Next, suppose that

a1x1 + a2z2 + a3z3 + ··· + aszs = 0.

Then

a2ν(z2) + a3ν(z3) + ··· + asν(zs) = < x1 >.

Since

M/ =< ν(z2)>⊕ < ν(z3)>⊕···⊕< ν(zs)>, aiν(zi) = < x1 > for all i ≥ 2. But, then o(zi) = o(ν(zi)) divides ai for all i ≥ 2. Hence aizi = 0 for all i ≥ 2. In turn, a1x1 is also 0. Thus, every element x can be written uniquely as

x = w1 + w2 + ··· + ws, where w1 ∈< x1 >, wi ∈< zi > for i ≥ 2. 

Combining the above results, we obtain the following:

Corollary 6.2.21 Every finitely generated module M over a P.I.D. is isomorphic to direct sum of finitely many cyclic modules, some of them isomorphic to R, and some of them isomorphic to R/Rpn for different primes p and for different n ∈ N. 

Proposition 6.2.22 Let R be an integral domain. Then Rm is isomorphic to Rn as R-modules if and only if n = m.

m Proof Let {e1, e2,...,em} be the standard basis of R .Letf be an isomorphism from Rm to Rn.LetF be the field of quotients of R. Then f can be extended to a vector space homomorphism f from Fm to Fn by

f (a1e1 + a2e2 + ··· + amem) = a1f (e1) + a2f (e2) + ··· + amf (em).

It is clear that f is injective. Hence m ≤ n. Similarly, considering f −1 we see that n ≤ m. .

The proof of the following proposition is straightforward verification. 6.2 Modules over P.I.D 211

Proposition 6.2.23 Let R be a principal ideal domain. Then two R-modules M and M are isomorphic if and only if T(M) is isomorphic to T(M ), and M/T(M) is isomorphic to M /T(M ). 

It follows from the Proposition 6.2.22, that there is a unique n ∈ N such that M/T(M) is isomorphic to Rn.Thisn is the rank of M. The following proposition is also easy to observe.

Proposition 6.2.24 A finitely generated torsion module M over a P.I.D. is isomor-  phic to M if and only if Mp is isomorphic to Mp for all prime p. The proof of the following proposition is also an imitation of the proof of Theorem 7.3.3 of Algebra 1.

Proposition 6.2.25 Let M and M be finitely generated p-modules. Suppose that

M =< x1 > ⊕ < x2 > ⊕···⊕< xm >,

ri where o(xi) ∼ p , r1 ≥ r2 ≥···rm, and

M =< y1 > ⊕ < y2 > ⊕···⊕< yn >,

sj where o(yj) ∼ p , s1 ≥ s2 ≥ ··· ≥ sn. Then M is isomorphic to M if and only if m = n and ri = si for all i. 

n Proof Suppose that m = n and ri = si for all i. Then < xi >≈ R/Rp i ≈< yi > for all i. Further, if P ≈ P and Q ≈ Q , then P ⊕ Q ≈ P ⊕ Q . Thus, < x1 >≈< y1 >, < x1 > ⊕ < x2 >≈< y1 > ⊕ < y2 >. Proceeding inductively, we find that M is isomorphic to M . The proof of the converse is by the induction on max(m, n). r1 If max(m, n) = 1, then M = < x1 > is cyclic of exponent p , and M = < y1 > is cyclic of exponent ps1 . Since isomorphic modules have same exponent, it follows that r1 = s1. Assume that the result is true for max(m, n) = m, n ≤ m.Let

M =< x1 > ⊕ < x2 > ⊕···⊕< xm+1 >,

ri where o(xi) ∼ p , r1 ≥ r2 ≥···rm+1, and

M =< y1 > ⊕ < y2 > ⊕···⊕< yk >,

sj where k ≤ m + 1, o(yj) ∼ p , s1 ≥ s2 ≥···≥sk be isomorphic modules. Let σ be an isomorphism from M to M . Clearly, exponent of M is pr1 , and the exponent of s1 M is p . Since isomorphic modules have same exponents, it follows that r1 = s1. r1 Since σ is an isomorphism p = o(x1) = o(σ(x1)) = o(y1). Suppose that

σ(x1) = β1y1 + β2y2 + ··· + βkyk ··· . (6.1) 212 6 Canonical Forms, Jordan and Rational Forms

r1 r1 Since o(σ(x1)) = p , o(βjyj) = p for some j. After rearranging, we may assume r1 that o(β1y1) = p . We show that

M =< σ(x1)>⊕ < y2 > ⊕···⊕< yk >.

r1 Since p is an exponent of M , and o(β1y1) divides o(y1),itfollowsthato(y1) ∼ r1 r1 o(β1y1) ∼ p . Hence the (β1, p ) ∼ 1. Since R is a principal ideal domain, there exist u,v ∈ R such that

r1 uβ1 + vp = 1.

Hence

y1 = uβ1y1 = u(σ(x1) − β2y2 − β3y3 − ··· − βkyk).

Since {y1, y2,...,yk} generates M , {σ(x1), y2,...,yk} also generates M . Next, sup- pose that

δ1σ(x1) + δ2y2 + ··· + δkyk = 0.

Substituting the value of σ(x1) from (6.1), we find that

δ1β1y1 + (δ1β2 + δ2)y2 + ··· + (δ1βk + δk)yk = 0.

Since

M =< y1 > ⊕ < y2 > ⊕···⊕< yk >,

δ1β1y1 = (δ1β2 + δ2)y2 = ··· = (δ1βk + δk)yk = 0.

r1 r1 Since o(β1y1) = p , p divides δ1. Hence δ1yj = 0 for all j ≥ 2. In turn, δjyj = 0 for all j ≥ 2. Consequently, δ1σ1 is also 0. This shows that

M =< σ(x1)>⊕ < y2 > ⊕···⊕< yk >.

Since σ is an isomorphism from M to M such that σ(< x1 >) = < σ(x1)>,it induces an isomorphism from M/ to M /<σ(x1)>. Clearly,

M/≈< x2 > ⊕ < x3 > ⊕···⊕< xm+1 >, and

M /<σ(x1)>≈< y2 > ⊕ < y3 > ⊕···⊕< yk >. 6.2 Modules over P.I.D 213

By the induction assumption, m + 1 = k, and ri = si for all i. 

If {m1, m2,...,mr} is a set of pairwise co-prime members of R, then it follows, from the Chinese remainder theorem, that R/Rm is isomorphic to R/Rm1 ⊕R/Rm2 ⊕ ···⊕R/Rmr. Using this fact, and the above results, we obtain the following theorems.

Theorem 6.2.26 Let M be a finitely generated torsion module over R, where R is a P.I.D. Then there exists an ordered set {a1, a2,...,at} of elements of R such that ai divides ai+1 for all i, and M is isomorphic to

R/Ra1 ⊕ R/Ra2 ⊕···⊕R/Rat.

Further, such an ordered set {a1, a2,...,at} is unique in the sense that if M is also isomorphic to

R/Rb1 ⊕ R/Rb2 ⊕···⊕R/Rbs, where bj divides bj+1 for all j, then t = s and ai is an associate of bi for all i. 

Theorem 6.2.27 Let M be a finitely generated module over a principal ideal domain R. Then there exists a nonnegative integer n together with an ordered set {a1, a2,...,at} of elements of R such that ai divides ai+1 for all i, and M is isomor- phic to

n R ⊕ R/Ra1 ⊕ R/Ra2 ⊕···⊕R/Rat.

Further, n and the ordered set {a1, a2,...,at} is unique in the sense that if M is also isomorphic to

m R ⊕ R/Rb1 ⊕ R/Rb2 ⊕···⊕R/Rbs, where bj divides bj+1 for all j, then m = n, t = s, and ai is an associate of bi for all i. 

Exercises

6.2.1 Describe all torsion abelian groups of exponent 24 which are generated by at the most three elements.

6.2.2 Describe all torsion R[X]- modules which are of exponents (X −1)2(X2 +1)2, and which are generated by at the most two elements.

6.2.3 Describe the torsion modules of exponent (X2 +1)3 which cannot be generated by less than three elements. 214 6 Canonical Forms, Jordan and Rational Forms

6.3 Rational and Jordan Forms

Let V be a finite-dimensional vector space over a field F, and T be a linear transfor- mation on V . Then V becomes a F[X]-module with respect to the external operation · defined by f (X) · v = f (T)(v) (verify that it is indeed a F[X] module). This module will be referred as F[X] module associated to the linear transformation T. By the Cayley Hamilton theorem T (T) = 0, where T (X) is the characteristic polynomial of T. Hence T (X) · v = T (T)(v) = 0. Thus, V (being finite dimen- sional)is a finitely generated F[X] module which is torsion module. Since F[X] is a P.I.D., using the structure theory of finitely generated torsion module over P.I.D. developed in the previous section, we study the linear transformation T by looking at its matrix representation with respect to suitable bases. Let mT (X) be an exponent of the module V . Then mT (T) = 0, and whenever f (T) = 0, mT (X) divides f (X). In particular, mT (X) divides T (X).IfmT (X) is assumed to be monic (leading coefficient 1), then mT (X) is unique, and it is called the minimum polynomial of T.

Proposition 6.3.1 Let T1 and T2 be linear transformations on V . Then T1 is similar to T2 if and only if the F[X] module V associated to T1 is isomorphic to the F[X] module associated to T2. −1 Proof Suppose that T2 = PT1P , where P is a nonsingular linear transformation −1 on V . Then given any polynomial f (X) ∈ F[X],wehavef (T2) = Pf (T1)P .It follows that P is in fact a module isomorphism from the module V associated to T2 to the module associated to T1. Conversely, suppose that P is an isomorphism from the F[X] module V associated to T1 to the F[X] module associated to T2. Then P is clearly a nonsingular linear transformation on V , and P(T1(v)) = P(x · v) = −1 x · P(v) = T2(P(v)) for all v ∈ V . Hence T2 = PT1P .  Let T be a linear transformation on a finite-dimensional vector space V over a field F such that the associated F[X] module V is cyclic module generated by v ∈ V . Then it is clear that the period o(v) of v is precisely the minimum polynomial mT (X) of T. Thus, we have an F[X] module isomorphism η from F[X]/F[X]mT (X) to the F[X] module V given by η(f (X) + F[X]mT (X)) = f (X) · v = f (T)(v). Suppose that

2 n−1 n mT (X) = a0 + a1X + a2X + ··· + an−1X + X .

Every element of F[X]/F[X]mT (X) is uniquely expressible as r(X) + F[X]mT (X), where r(X) is a polynomial of degree at most n − 1. Thus, {1 + F[X]mT (X), X + n−1 F[X]mT (X),...,X + F[X]mT (X)} is a basis of the vector space F[X]/F[X] mT (X) over F. Since η (being a F[X] module isomorphism) is a F-isomorphism, i i 2 n−1 and η(X + F[X]mT (X)) = T (v),itfollowsthat{v,T(v), T (v), . . . , T (v)} is a basis of V . Since mT (T) = 0, we have

n 2 n−1 T (v) =−a0v − a1T(v) − a2T (v) − ··· − an−1T (v). 6.3 Rational and Jordan Forms 215

Hence the matrix representation of T with respect to this ordered basis is clearly ⎡ ⎤ 000···0 −a0 ⎢ ⎥ ⎢ 100···0 −a1 ⎥ ⎢ ⎥ ⎢ 010···0 −a2 ⎥ ⎢ ⎥ ⎢ ······· · ⎥ ⎢ ⎥ ⎢ ······· · ⎥ ⎢ ⎥ ⎢ ······· · ⎥ ⎣ ⎦ 000···0 −an−2 000···1 −an−1

The above matrix is termed as the companian matrix of the polynomial

2 n−1 n a0 + a1X + a2X + ··· + an−1X + X .

Definition 6.3.2 AmatrixA with entries in a field F is said to be in rational canonical form if there exists an ordered set {f1(X), f2(X),...,fr(X)} of polynomials such that fi(X) divides fi+1(X) for all i and ⎡ ⎤ A1 0 ··· 0 ⎢ ⎥ ⎢ 0 A2 ··· 0 ⎥ ⎢ ⎥ ⎢ · · ··· · ⎥ A = ⎢ ⎥ , ⎢ · · ··· · ⎥ ⎣ · · ··· · ⎦

00···Ar where Ai is the companion matrix of fi(X), and each 0 is zero matrix of appropriate order.

Using the Theorem 6.2.26, and above discussion, we obtain the following theorem.

Theorem 6.3.3 Let T be a linear transformation on V . Then there is a basis of V such that the matrix of T with respect to this basis is in rational canonical form. 

Corollary 6.3.4 Every square matrix with entries in a field F is similar to a unique matrix in rational canonical form. 

The following example illustrates as to how to find/reduce a matrix having all its eigenvalues in the field to a matrix in rational canonical form. We chose a simple triangular matrix for convenience of the computation.

Example 6.3.5 Consider the matrix A given by ⎡ ⎤ 122 A = ⎣ 011⎦ . 002 216 6 Canonical Forms, Jordan and Rational Forms

The eigenvalues of A are 1, 1, 2, and the characteristic polynomial φA(x) of A is 2 given by φA(x) = (x − 1) (x − 2). The minimum polynomial mA(x) is a divisor of the characteristic polynomial having all eigenvalues as roots. Thus, the possibilities 2 for mA(x) are (x − 1) (x − 2) and (x − 1)(x − 2). Since (A − I)(A − 2I) = 0, 2 3 it follows that mA(x) = φA(x) = (x − 1) (x − 2).Now,R is a R[x] module associated to the matrix A. The exponent of this module is the minimum polynomial mA(x). The primes dividing the exponent are x −1 with multiplicity 2, and x −2 with − R3 R3 ={v ∈ R3 | multiplicity 1. The x 1 part (x−1) of the module is given by (x−1) 2 2 t 2 (x−1) ·v = 0}={v | (A − I) v = o}={(v1,v2,v3) | v3 = 0}=R ×{0}. It can be easily checked that this is a cyclic submodule generated by (1, 1, 0).This R[ ]/R[ ]( − )2 − R3 submodule is isomorphic to x x x 1 . Further, the x 2 part (x−2) is again the null space {(4α, α, α) | α ∈ R} of A − 2I, and it is a cyclic submodule of R3 isomorphic to R[x]/R[x](x −2). Indeed, since (x −1)2, and (x −2) are co-prime, R3 itself is a cyclic module (for example generated by (5, 2, 1)), and it is isomorphic to the module R[x]/R[x]mA(x). Thus, the rational form of A is the companion matrix of the minimum polynomial mA(X) of A. As such the rational form of the matrix is given by ⎡ ⎤ 00 2 ⎣ 10−5 ⎦ 01 4

To get the matrix P such that PAP −1 is the rational form of A, we need to find the matrix P of transformation from the standard basis to the basis {vt, Avt, A2vt}, where v = (5, 2, 1) is the generator of the module. Indeed, {vt, Avt, A2vt} are the columns of the matrix P.

Let T be a linear transformation on a finite-dimensional vector space V over a n field F. Suppose that mT (X) = (X − λ) , where λ ∈ F, and F[X] module V associated to T is a cyclic module (note that (X − λ) is a prime element of F[X]) generated by v. Then V = F[X]v ={f (X) · v | f (X) ∈ F[X]}. Since period of v is (X − λ)n, it follows that f (X) · v is uniquely expressible as r(X) · v, where r(X) is the remainder obtained when f (X) is divided by (X − λ)n. Further, every polynomial r(X) is uniquely expressible as polynomial s(X − λ) in X − λ which is of same degree as of r(X) (write r(X) = r(X − λ + λ), and use the binomial theorem). This shows that every element of V is unique F-linear combination of {v,(X −λ)·v,(X −λ)2 ·v,...,(X −λ)n−1 ·v}. Also by the definition of F[X]-module (X−λ)i·v = (T −λI)i(v). Hence {v,(T −λI)(v), (T −λI)2(v),...,(T −λI)n−1(v)} is a basis of V . Further T((T − λI)i(v)) = (T − λI)i+1(v) + λ(T − λ)i(v).This shows that the matrix of T with respect to the above ordered basis is the n × n matrix A given by 6.3 Rational and Jordan Forms 217 ⎡ ⎤ λ 0000000 ⎢ ⎥ ⎢ 1 λ 000000⎥ ⎢ ⎥ ⎢ 01λ 00000⎥ ⎢ ⎥ ⎢ ········⎥ A = ⎢ ⎥ ⎢ ········⎥ ⎢ ⎥ ⎢ ········⎥ ⎣ 000001λ 0 ⎦ 0000001λ

Such a matrix is called a Jordan block of order n. We have established the following proposition.

Proposition 6.3.6 Let T be a linear transformation on V with minimum polynomial (X − λ)n, and it is such that the corresponding F[X]-module V is cyclic. Then there is a basis of V with respect to which the matrix of T is a Jordan block as given above. 

Example 6.3.7 Consider the nonidentity uni-upper triangular matrix A given by ⎡ ⎤ 1 αβ A = ⎣ 01γ ⎦ . 001

3 Clearly, the characteristic polynomial φA(X) of A is (X − 1) . Further, A − I = 0, and ⎡ ⎤ 00αγ (A − I)2 = ⎣ 00 0 ⎦ . 00 0

2 Suppose that α = 0 = γ. Then (A − I) = 0. This means that mA(X) = φA(X) = 3 3 t (X − 1) .Letv =[v1,v2,v3] be a nonzero vector in R . Then (A − I) · v = t 2 t t [αv2 + βv3, γv3, 0] , and (A − I) · v =[αγv3, 0, 0] . This shows that the period t t 3 o(e3 ) of the column vector e3 is (X − 1) . Let us consider the R[X]-submodule of 3 t R generated by e3 . Since the set

t t t 2 t t {e3 , A · e3 =[β, γ, 1] , A · e3 =[2β + 2γ, 2γ, 1] }

3 3 t is a basis of R , it follows that R is a cyclic R[X]-module generated by e3 . Hence A is similar to the Jordan block of order 3 all of whose diagonal entries are 1. In particular, all such 3 × 3 matrices are similar.

From the structure Theorem 6.2.20 of finitely generated p-module over a P.I.D., and the Proposition 6.3.6, we have the following more general result. 218 6 Canonical Forms, Jordan and Rational Forms

Corollary 6.3.8 Let T be a linear transformation on a finite-dimensional vector space V of dimension n over a field F such that the minimum polynomial mT (X) = m1 (X − λ) . Then there exist integers m2 ≥ m3 ≥···≥mr with m1 ≥ m2, and a basis

{v ,v ,...,v ,v ,v , ··· ,v ,v ,...,··· ,v } 1 2 m1 m1+1 m1+2 m1+m2 m1+m2+1 m1+m2+···+mr of V with respect to which the matrix of T is ⎡ ⎤ A1 00000 ⎢ ⎥ ⎢ 0 A2 000 0⎥ ⎢ ⎥ ⎢ ······⎥ ⎢ ⎥ , ⎢ ······⎥ ⎣ ······⎦

00000Ar where Ai is a Jordan block of order mi all of whose diagonal entries are λ. 

Example 6.3.9 Consider the nonidentity uni-upper triangular matrix A given by ⎡ ⎤ 10β A = ⎣ 01γ ⎦ . 001

3 Clearly, the characteristic polynomial φA(X) of A is (X − 1) . Further, A − I = 0, 2 2 and (A − I) = 0. This means that mA(X) = (X − 1) .Letv =[v1,v2,v3] be a 3 t t nonzero vector in R . Then (A − I) · v =[βv3, γv3, 0] . Since (β, γ) = (0, 0),it t t 3 2 follows that the period o(e3 ) of e3 in the corresponding R[X]-module R is (X−1) . 3 t 3 Thus, the R[X]-submodule of R generated by e3 is the subspace W of R generated by the set

t t t {e3 , A · e3 =[β, γ, 0] }.

Clearly, the dimension of W is 2. Consider the vector [u,v,0]t such that βv−uγ = 0. t t t 3 t Then {e3 , [β, γ, 0] , [u,v,0] } is a basis of R . Also the period of [u,v,0] is (X −1). The subspace U generated by [u,v,0]t is a R[X] - submodule such that the module R3 is the direct sum W ⊕ U. Evidently, the matrix A is similar to the matrix

A1 0 , 0 A2 where A1 is a Jordan block of order 2 with diagonal entries 1, A2 is the Jordan block of order 1 with diagonal entry 1, and zeros are the zero matrices of appropriate orders. If t t t P is a matrix with first, second, and the third columns as e3 , [β, γ, 0] , and [u,v,0] respectively, then P−1AP is the matrix 6.3 Rational and Jordan Forms 219

A1 0 . 0 A2

Consequently, all such matrices are similar.

Definition 6.3.10 AmatrixA is said to be in Jordan canonical form if there exist λ1, λ2,...,λr all distinct elements in F, and integers

, ,..., , , ,..., ,..., m1 m2 mt1 mt1+1 mt1+2 mt2 mtr such that mk ≥ ml for all ti + 1 ≤ k ≤ l ≤ ti+1 for all i, and Jordan Blocks , ,..., × A1 A2 Atr , where Aj is mj mj Jordan block with diagonal entries λi for all J, ti + 1 ≤ j ≤ ti+1 such that ⎡ ⎤ A1 00000 ⎢ ⎥ ⎢ 0 A2 000 0 ⎥ ⎢ ⎥ A = ⎢ · · ··· ·⎥ ⎣ · · ··· ·⎦

00000Atr

The following result is immediate consequence of the structure theorems (Theo- rems 6.2.19 and 6.2.20) of finitely generated torsion modules over a P.I.D.

Theorem 6.3.11 Let T be a linear transformation on a vector space V over a field F such that all the characteristic roots of T are in F. Then there is a basis of V with respect to which the matrix of T is in Jordan canonical form.

Proof Since all the characteristic roots of T are in F, the minimum polynomial mT (X) of T is given by

m1 m2 mr mT (X) = (X − λ1) (X − λ2) ···(X − λr) , where λ1, λ2,...,λr are distinct characteristic roots of T. Thus, the F[X] module V associated to T is a finitely generated torsion module of exponent mT (X).Usingthe structure theorem of finitely generated torsion module over a P.I.D., together with the above results, we find that there is a basis of V with respect to which the matrix of T is in Jordan canonical form. 

Corollary 6.3.12 Let T be a linear transformation on a finite-dimensional vector space V over an algebraically closed field F. Then there is a basis of V with respect to which the matrix of T is in Jordan canonical form. 

Since the matrices with respect to different bases are similar, we have the following corollary.

Corollary 6.3.13 Every square matrix with entries in an algebraically closed field is similar to a matrix in Jordan canonical form.  220 6 Canonical Forms, Jordan and Rational Forms

Following corollary follows from the uniqueness theorem for the decomposition of finitely generated torsion modules over a P.I.D. as direct sum of cyclic p-modules for different primes.

Corollary 6.3.14 Let A and B be two n×n matrices with entries in an algebraically closed field F. Then A and B are similar if and only if they are similar to matrices in Jordan canonical forms with same set of Jordan blocks. 

Corollary 6.3.15 A square matrix A is similar to a diagonal matrix if and only if its minimum polynomial has all its roots distinct. 

We illustrate the reduction of a matrix in to its Jordan canonical form by means of an example.

Example 6.3.16 Consider the matrix ⎡ ⎤ 122 A = ⎣ 011⎦ 002 of the Example 6.3.5. As already observed, in Example 6.3.5, that the R[x]-module R3 associated to the matrix A is the direct sum of the cyclic (x − 1)-submodule R2 ×{0} with a generator (1, 1, 0) (isomorphic to the direct sum of R[x]/R[x](x − 1)2), and the cyclic (x − 2) submodule {(4α, α, α) | α ∈ R} (isomorphic to R[x]/R[x](x − 2)). As such the representation of the matrix relative to the basis {(1, 1, 0)t,(A − I)(1, 1, 0)t,(4, 1, 1)t} is the Jordan canonical form ⎡ ⎤ 100 ⎣ 110⎦ 002 of the matrix A. The matrix P of transformation is the matrix with columns {(1, 1, 0)t,(A − I)(1, 1, 0)t,(4, 1, 1)t}.

Recall that a linear transformation is said to be a semi-simple linear transforma- tion, or it is said to be a diagonalizable linear transformation if its matrix represen- tation with respect to certain basis is diagonal. T is said to be nilpotent if T n = 0 for some n. A square matrix A is said to be a semi-simple matrix, or it is said to be a diagonalizable matrix if it is similar to a diagonal matrix. It is said to be nilpotent if An = 0forsomen.

Theorem 6.3.17 (Jordan–Chevalley) Let T be a linear transformation on a finite- dimensional vector space V over an algebraically closed field F (or at least all characteristic roots of T are in F). Then T can be expressed uniquely as T = Ts + Tn, where Ts is semi-simple, Tn is nilpotent, and Ts and Tn commute. Further, there are polynomials g(X) and h(X) without constant terms such that g(T) = Ts and h(T) = Tn. 6.3 Rational and Jordan Forms 221

r ( ) = ( − )mi , ,..., Proof Suppose that mT X i=1 X λi , where λ1 λ2 λr are the distinct eigenvalues of T. Since every finitely generated torsion module over a P.I.D. is direct sum of p - submodules for different primes p dividing the exponent of the module, we have V = V1 ⊕ V2 ⊕···⊕Vr, where V is F[X]-module associated to the linear transformation T, and Vi is the (X − λi)-submodule of V . Clearly, m Vi = Ker(T − λiI) i .LetTs be the linear transformation defined on V by the requirement that Ts(x) = λix for all x ∈ Vi, 1 ≤ i ≤ r. Then Ts is clearly a semi- simple linear transformation. Take Tn = T−Ts. Then Tn is nilpotent, for the matrix of Tn relative to the basis of V obtained by taking the union of bases of Vi is strictly lower triangular. Thus, T = Ts + Tn. We show that Ts and Tn have the required property. m1 m2 mr Since λ1, λ2,...,λr are all distinct, the set {(X −λ1) ,(X −λ2) ,...,(X −λr) } is a set of pairwise co-prime elements of F[X]. By the Chinese remainder theorem, m there exists a polynomial g(X) such that g(X) ≡ λi(mod(X −λi) i ) for all i, and also g(X) ≡ 0(modX). Then it is clear that Ts = g(T), and if we take h(X) = X −g(X), then Tn = T − Ts = h(T). Since any two polynomial in T will commute with each other, it follows that Ts and Tn commute with each other. Next, suppose that T = T1 + T2 is another such decomposition. Then Ts − T1 = T2 − Tn. Since Ts, T1, and also T2, Tn commute, it follows that Ts − T1 is semi-simple as well as nilpotent. But, then Ts − T1 = 0 (note that a diagonal matrix is nilpotent if and only if it is 0). Hence Ts = T1, and so also Tn = T2. 

Definition 6.3.18 The linear transformation Ts in the above theorem is called the semi - simple part of T, and Tn is called the nilpotent part of T. Corollary 6.3.19 (Jordan–Chevalley) Let A be a square matrix with entries in an algebraically closed field (or at least all the characteristic roots of A are in F).z Then A can be expressed uniquely as A = As + An, where As is diagonalizable, An is nilpotent, and As and An commute. Further, there exist polynomials g(X) and h(X) without constant terms such that As = g(A), and An = h(A).  Corollary 6.3.20 Let T be a linear transformation on a finite-dimensional vector space V over an algebraically closed field F. Then a linear transformation S on V commutes with T if and only if it commutes with its semi-simple and nilpotent parts.

Proof Clearly if S commutes with Ts and Tn, then it commutes with T = Ts + Tn. Conversely, if S commutes with T, then it commutes with f (T) for all polynomials f (X), and since Ts and Tn are polynomials in T, it commutes with Ts as well as with Tn.  Recall that a linear transformation T is unipotent if all of its all characteristic roots are 1.

Corollary 6.3.21 (Multiplicative Jordan–Chevalley theorem) Let T be a nonsin- gular linear transformation on a finite-dimensional vector space V over an alge- braically closed field F (or at least all characteristic roots of T are in F). Then T is uniquely expressible as T = TsTu, where Ts is semi-simple, Tu is unipotent, and TsTu = TuTs. Further, Ts and Tu are polynomials in T. 222 6 Canonical Forms, Jordan and Rational Forms

Proof We know that T is uniquely expressible as T = Ts + Tn, where Ts is semi-simple, Tn is nilpotent, TsTn = TnTs, and also Ts, Tn are polynomials in T. = + −1 Since T is nonsingular, Ts is also nonsingular. Set Tu I Ts Tn. Since Ts −1 and Tn commute, and Tn is nilpotent, it follows that Ts Tn is nilpotent. Hence Tu is unipotent. Clearly T = TsTu. The rest follows from the properties of Ts and Tn.

Application to Differential Equations Consider the first-order linear differential equation

dx = ax. dt

The general solution to the above differential equation is x(t) = ceat, where c is an arbitrary constant. In complete analogy to the above differential equation, we discuss the solution to a system of n-homogeneous first-order linear differential equations with constant coefficients. In the matrix form, this system of equations can be expressed as

dX(t) = A · X(t), dt where X(t) is a smooth column vector point function (smooth function from R to Rn) and A a n × n real matrix. A n2 First, let us introduce e . Identify Mn(R) with the Euclidean space R with Euclidean metric. Consider the sequence {Tm} of functions from the metric space Mn(R) to itself defined by

A2 Am T (A) = I + A + + ··· + . m 2! m! It can be seen easily that the above sequence is uniformly convergent on any compact A subset of Mn(R). Let us denote e by

Limm→∞Tm(A).

A This defines a map exp from Mn(R) to Mn(R) by exp(A) = e . Using elementary analysis, we observe that exp is continuous, in fact, differentiable, and its Jacobian at 0 is the identity matrix of order n2. By the inverse function theorem, it follows that exp is local diffeomorphism. Again using the Abel’s result, we can show that eA+B = eA · eB provided that AB = BA. In particular, it follows that

e−A · eA = e−A+A = e0 = I = eA · e−A. 6.3 Rational and Jordan Forms 223

A Hence e is always nonsingular. Thus, exp is a local diffeomorphism from Mn(R) to GL(n, R).Themapt  etA is a group homomorphism from (R, +) to GL(n, R) for all A ∈ Mn(R). These are called one-parameter family of subgroups of GL(n, R).

Theorem 6.3.22 The columns of the matrix etA form a basis of the space of solutions of the system of homogeneous linear differential equations expressed in matrix form by

dX(t) = A · X(t), dt where X(t) is a smooth column vector point function.

Proof Using the theorem on term by term differentiation of a uniformly convergent series, it follows that

detA = A · etA. dt

[ ( )] ( ) d aij t =[ ( )] ( ) = daij t ( ) tA Since dt bij t , where bij t dt , each column Yi t of e satisfies

dY (t) i = A · Y (t). dt i

Thus, each column of etA is a solution of the given system of differential equations. Conversely, suppose that X(t) is a solution. Then

de−tAX(t) dX(t) =−Ae−tAX(t) + e−tA =−Ae−tAX(t) + e−tAAX(t). dt dt

Since A and e−tA commute, it follows that

de−tAX(t) = 0. dt

Hence e−tAX(t) = C, where C is a constant column vector. It follows that X(t) = etA · C, and so every solution of the system of equations is a linear combination of the columns of etA. Since etA is nonsingular the columns are linearly independent. 

Thus, the problem is to compute etA. Let us observe the following: A λ1 λ2 λn (i) If A = Diag(λ1, λ2,...,λn), then e = Diag(e , e ,...,e ). (ii) If 224 6 Canonical Forms, Jordan and Rational Forms ⎡ ⎤ 0 t 0 ····0 ⎢ ⎥ ⎢ 00t 0 ···0 ⎥ ⎢ ⎥ ⎢ ········⎥ ⎢ ⎥ ⎢ ········⎥ A = ⎢ ⎥ , ⎢ ········⎥ ⎢ ⎥ ⎢ ········⎥ ⎣ 000···0 t ⎦ 00·····0 then ⎡ ⎤ 2 n−1 1 t t ···· t ⎢ 2! (n−1)! ⎥ ⎢ t2 ··· tn−2 ⎥ ⎢ 01 t 2! (n−2)! ⎥ ⎢ ⎥ ⎢ ·· · ···· · ⎥ ⎢ ⎥ A = ⎢ ·· · ···· · ⎥ . e ⎢ ⎥ ⎢ ·· · ···· · ⎥ ⎢ ⎥ ⎢ ·· · ···· · ⎥ ⎣ 00 0 ···1 t ⎦ 00 0 ···01

(iii) If

B 0 A = 1 , 0 B2 then

eB1 0 eA = . 0 eB2

(iv) If A = CBC−1, then eA = CeBC−1. (v) If A and B commute, then eA+B = eA · eB. (vi) If A is a real n × n matrix, and X(t) a complex-valued solution of the system of differential equations

dX(t) = A · X(t), dt then the real and imaginary parts of X(t) are also solutions of the above system of equations. Using the Jordan–Chevalley decomposition A = As + An, where As is similar to a diagonal matrix, and An is similar to direct sum of the matrices of the form 6.3 Rational and Jordan Forms 225 ⎡ ⎤ 010··0 ⎢ ⎥ ⎢ 0010· 0 ⎥ ⎢ ⎥ ⎢ ······⎥ ⎢ ⎥ ⎢ ······⎥ ⎣ 00··01⎦ 00···0

A A A Further, As and An commute, and so e = e s · e n . Using the above observations, we can compute etA, and thereby get the general solution of the given homogeneous system of linear differential equations. We illustrate the above discussion by means of an example.

Example 6.3.23 Consider the system of differential equations given in the matrix form by

dX(t) = AX(t), dt where ⎡ ⎤ 110 A = ⎣ 011⎦ 001

Then As = I, and ⎡ ⎤ 010 ⎣ ⎦ An = 001 000

Since tA = tAs + tAn,itfollowsthat(tA)s = tAs, and (tA)n = tAn. Again, since tA tA tA tA t tAs and tAn commute, e = e s · e n . Clearly, e s = e I, and as discussed above ⎡ ⎤ t2 1 t 2! etAn = ⎣ 01 t ⎦ 00 1

Thus, ⎡ ⎤ t t t2 t e te 2! e etA = ⎣ 0 et tet ⎦ . 00 et

The columns of the above matrix form a basis of the space of solutions. 226 6 Canonical Forms, Jordan and Rational Forms

Exercises

6.3.1 Consider the following linear transformations on R3 whose matrix represen- tations with respect to the standard ordered basis are given by (i) ⎡ ⎤ 200 ⎣ 020⎦ , 003

(ii) ⎡ ⎤ 100 ⎣ 110⎦ , 003

(iii) ⎡ ⎤ 011 ⎣ −100⎦ , 100

(iv) ⎡ ⎤ 20 1 ⎣ 02−1 ⎦ , and 1 −12

(v) ⎡ ⎤ 133 ⎣ 313⎦ −3 −3 −5

In each case, find the minimum polynomial, the decomposition of the corresponding R[X]-module R3 as direct sum of cyclic modules, also a basis of R3 with respect to which the matrix representation is in rational canonical forms.

6.3.2 Reduce the matrices in the Exercise 6.3.1 to rational canonical form.

6.3.3 Determine the pairs of matrices in Exercise 6.3.1 which are similar.

6.3.4 Reduce the following two matrices over Z5 into rational canonical forms, and determine if they are similar to each other. 6.3 Rational and Jordan Forms 227

(i) ⎡ ⎤ 1 3 7 ⎣ 2 0 4 ⎦ 0 4 1

(ii) ⎡ ⎤ 1 2 0 ⎣ 0 1 3 ⎦ 1 0 1

6.3.5 Reduce the matrices in Exercise 6.3.1 in Jordan canonical form considering them as matrices over the field C of complex numbers. Determine which pairs are similar over C.

6.3.6 Reduce the following complex matrices into Jordan canonical form, and also, in each case, find a nonsingular matrix P such that PAP −1 is in Jordan canonical form. Determine which pair of matrices are similar to each other. (i) ⎡ ⎤ i 10 ⎣ 02i −1 ⎦ 101+ i

(ii) ⎡ ⎤ i 1 + i 0 ⎣ 1 i 0 ⎦ 01i

(iii) ⎡ ⎤ 01i ⎣ ii1 ⎦ 00i

6.3.7 Show that a linear transformation S commutes with T if and only it commutes with Ts as well as with Tn.

6.3.8 Let T1 and T2 be linear transformations on a vector space V of dimension 3 over a field. Show that the module V over F[X] associated to T1 is isomorphic to the F[X]-module V associated to T2 if and only if they have same characteristic polynomials and minimum polynomials. Deduce that any two 3 × 3matrixoverF are similar if and only if they have same characteristic and minimum polynomials. Is this result true for 4 × 4 matrices? Support. 228 6 Canonical Forms, Jordan and Rational Forms

6.3.9 Let A be a complex matrix all of whose characteristic roots are real. Show that A is similar to a real matrix in Jordan form. 6.3.10 Let A be a n × n real matrix such that A2 + I = 0. Show that n = 2r is even. Show also that A is similar to

0n −In In 0n

6.3.11 Let T be a nilpotent transformation on a complex finite dimensional vector 2 n space V .Letf (X) = a0 + a1X + a2X + ···+ anX . Find the semi-simple part of f (T). 6.3.12 Find eA for ⎡ ⎤ 102 A = ⎣ 010⎦ , 202 and also the solution of the system of linear equations given in matrix form by

dX(t) = AX(t). dt

6.3.13 Reducing the matrix ⎡ ⎤ 110 A = ⎣ 010⎦ 011 in to Jordan canonical form, find eA, and then solve the corresponding system of differential equations. 6.3.14 If λ is an eigenvalue of A, then show that eλ is an eigenvalue of eA. Deduce from this fact that eA is nonsingular. 6.3.15 Show that Det(eA) = etrA. 6.3.16 Show that the map exp induces a map from the space sl(n, R) of n×n matrices with trace 0 to the group SL(n, R) of matrices of determinant 1. Show further that it is a local diffeomorphism. Determine the dimension of the group SL(n, R).

6.3.17 Show that the map exp induces local diffeomorphism from the space SSn(R) of skew symmetric matrices to the group O(n) of orthogonal matrices. Determine the dimension of O(n).

6.3.18 Is exp surjective from Mn(R) to GL(n, R)? Support. 6.3.19 Give an example to show that eA+B need not be eA · eB. Chapter 7 General Linear Algebra

The present chapter is devoted to the study of Noetherian rings, Projective modules, Injective Modules, Tensor product of modules, Grothendieck, and Whitehead groups of rings.

7.1 Noetherian Rings and Modules

Over an arbitrary ring, we note that left and right modules are in general distinct. Recall, further, that all subspaces of a finitely generated vector space over a field with a given choice of basis determines and is uniquely determined by a matrix with entries in F. This is a consequence of the fact that every subspace of F n is finitely generated. However, this fact is not true in general rings. Rings over which all submodules of left module Rn can be described by matrices are essentially left noetherian rings, which we describe in this section. The theory of right noetherian modules and right noetherian rings can be developed exactly on the same lines. A module will always mean a left module, unless stated otherwise. A module M over R is said to satisfy ascending chain condition (A.C.C), if given any chain N1 ⊆ N2 ⊆···⊆ Nr ⊆ Nr+1 ⊆···

, ∈ N = ≥ of submodules of M there exists n0 such that Nr Nn0 for all r n0. A module M is said to satisfy maximal condition, if given any nonempty family {Mα | α ∈ } of submodules of M, it has a maximal member. Theorem 7.1.1 Let M be a module over R. Then the following conditions are equivalent.

© Springer Nature Singapore Pte Ltd. 2017 229 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_7 230 7 General Linear Algebra

1. M satisfies A.C.C. 2. Every submodule of M is finitely generated. 3. M satisfies maximal condition.

Proof 1 =⇒ 3. Let X ={Mα | α ∈ } be a nonempty family of submodules ∈ of M. Suppose that X has no maximal member. Let Mα1 X. Since Mα1 is not a ∈ ⊂ maximal member of the family, there is a member Mα2 X such that Mα1 Mα2 . ∈ Again, since Mα2 is not a maximal member, there is a member Mα3 X such that ⊂ Mα2 Mα3 . Using induction, we arrive at a properly ascending chain of submodules of M. This is a contradiction to 1 (note that we have used axiom of choice in some form). 3 =⇒ 2. Assume 3. Let L be a submodule of M.LetX be the family of all finitely generated submodules of L. Clearly, {0}∈X, and so X is a nonempty family. From 3, it has a maximal member L0 (say). We claim that L0 = L. Suppose not. Then there is x ∈ L − L0. But, then L0+ < x > is also finitely generated, and so it belongs to X. This is a contradiction to the choice of L0. Thus, L is finitely generated. 2 =⇒ 1. Assume 2. Let

M1 ⊆ M2 ⊆···⊆ Mr ⊆ Mr+1 ⊆···  = be an ascending chain of submodules of M. Then L r≥1 Mr is a submodule. By = < { , ,..., } > ∈ 2, L is finitely generated. Suppose that L x1 x2 xn . Then xi Mri ∈ for some ri .Letr0 be the maximum of all ri . Then xi Mr0 for each i. It follows = = ≥  that L Mr0 , and so Mr Mr0 for all r r0. A module M over R is said to be noetherian module if it satisfies any one, and hence all of the conditions in the above theorem. A ring R is said to be a noetherian ring if it is a noetherian module over itself. If we consider a ring R as a left module over itself, then submodules are precisely the left ideals. Thus, a ring R is a left noetherian ring if and only if all its left ideals are finitely generated. Since submodule of a submodule is again a submodule of the module, we have Proposition 7.1.2 Every submodule of a noetherian module is a noetherian module.  Proposition 7.1.3 Any homomorphic image of a noetherian module is noetherian.

Proof Let f : M1  M2 be a surjective homomorphism, where M1 a noetherian −1 module. Let L be a submodule of M2. Then f (L) is a submodule of M1. Since −1 M1 is noetherian, f (L) is finitely generated. Since image of a finitely generated module under a homomorphism is finitely generated, L = f ( f −1(L)) ( f being −1 surjective f ( f (L)) = L) is finitely generated. Thus, every submodule of M2 is finitely generated. Hence M2 is a noetherian module.  The argument used in the proof of the above proposition is valid for rings(inverse image of an ideal under a homomorphism of rings is an ideal), and so we have the following proposition. 7.1 Noetherian Rings and Modules 231

Proposition 7.1.4 Any homomorphic image of a noetherian ring is noetherian.  Corollary 7.1.5 Quotient of a noetherian module (ring) is a noetherian mod- ule(ring).  Proposition 7.1.6 Let M be a module over a ring R. Let L be a submodule of M such that L and M/L are noetherian. Then M is noetherian.

Proof Let U be a submodule of M. Then, U + L/L, being a submodule of M/L,is finitely generated. By the second isomorphism theorem U/U L is isomorphic to U + L/L. Hence U/U L is finitely generated. Further, U L, being submodule of a northerian module L, is noetherian. Hence U L is finitely generated. We know that if N and M/N are finitely generated, then M is also finitely generated. Thus, U is finitely generated. This shows that M is noetherian. 

Proposition 7.1.7 Let M1, M2,...,Mr be modules over R. Then M = M1 × M2 × ···×Mr is noetherian if and only if each Mi is noetherian.

Proof For each i, the projection pi is a surjective homomorphism from M to Mi . Sine homomorphic image of a noetherian module is noetherian, if M is noetherian, then each Mi is noetherian. Conversely, suppose that each Mi is noetherian. We have to show that M is noetherian. By the induction, it is sufficient to prove the result for r = 2. Suppose that M1 and M2 are noetherian. The projection p2 of M on to M2 is a surjective homomorphism whose kernel is M1 ×{0}. By the fundamental theorem of homomorphism M/M1 ×{0} is isomorphic to M2. Thus, M/M1 ×{0} is noetherian. Also M1 ×{0} is isomorphic to M1 (the map x  (x, 0) is an isomorphism), and so it is noetherian. From the previous proposition M is noetherian.  Theorem 7.1.8 Let R be a noetherian ring. Then every finitely generated module over R is a noetherian module. Proof Let R be a noetherian ring, and M a finitely generated module over R which r is generated by S ={x1, x2,...,xr }. Define a map η from R to M by

η(α1,α2,...,αr ) = α1x1 + α2x2 +···+αr xr .

Clearly, η is a surjective homomorphism. Since R is noetherian, from the previous result, Rn is also noetherian. Since homomorphic image of a noetherian module is noetherian, M is noetherian.  Remark 7.1.9 It follows that R is noetherian if and only if every finitely generated module over R is noetherian. Example 7.1.10 Every P.I.D. is a noetherian ring because every ideal is generated by a singleton. Thus, every finitely generated module over a P.I.D is noetherian. In particular, every submodule of a finitely generated module over a P.I.D is finitely generated. Note that this is not true over an arbitrary ring (give an example). In turn, every subgroup of a finitely generated abelian group (Z-module) is finitely generated. 232 7 General Linear Algebra

Example 7.1.11 The polynomial ring Z[X1, X2,...,Xn ...] over Z in countably infinite set of indeterminates {X1, X2,...,Xn,...} is not a noetherian ring, for the ideal generated by {X1, X2,...,Xn,...} is not finitely generated. This also shows that, in general, submodule of a finitely generated module need not be finitely gen- erated. Check if it is a U.F.D.

Example 7.1.12 Subring of noetherian ring need not be a noetherian ring: The ring Z[X1, X2,...,Xn ...] is an integral domain which is not noetherian. However, its field of fractions is noetherian.

Example 7.1.13 Let R be noetherian integral domain. Then every nonzero nonunit element of R can be written as finite product of irreducible elements of R. To prove this, it is sufficient to show that there is no infinite chain a1, a2,...,an,...such that an+1 is proper divisor of an for all n, or equivalently, there is no infinite properly ascending chain of principal ideals. This is true, for R is a noetherian ring.

Theorem 7.1.14 (Hilbert Basis Theorem). Let R be a commutative ring with iden- tity. Then the polynomial ring R[X] is noetherian if and only if R is noetherian.

Proof Suppose that R[X] is noetherian. Define a map η from R[X] to R by η( f (X)) = f (0) (the constant term of f (X)). Then η is a surjective homomor- phism. Since homomorphic image of a noetherian ring is a noetherian ring, R is noetherian. Conversely, suppose that R is noetherian. Then, we have to show that R[X] is noetherian. Let A be an ideal of R[X]. We show that A is finitely generated. Let n ∈ N {0}. Define An by

n−1 n An ={a ∈ R | there exists f(X) = a0 + a1 X +···+an−1 X + aX ∈ A}.

n Clearly, 0 ∈ An,for0 = 0 + 0X + ···0X ∈ A.Leta, b ∈ An. Then n−1 n there exist f(X) = a0 + a1 X + ··· + an−1 X + aX , and g(X) = n−1 n b0 + b1 X +···+bn−1 X + bX ∈ A. Since A is an ideal, f (X) − g(X) ∈ A, and also α f (X) ∈ A. Hence a − b ∈ An, and also αa ∈ An. This shows that An is an n ideal of R. Further, let a ∈ An, and f (X) ∈ A be such that aX is the leading term of f (X). Since Xf(X) ∈ A, a ∈ An+1. Thus, An ⊆ An+1 for all n, and we have an ascending chain A0 ⊆ A1 ⊆ A2 ⊆···⊆ An ⊆··· of ideals of R. Since R is noetherian, there exists m ∈ N such that An = Am for all n ≥ m. Again, since R is noetherian, An is finitely generated ideal { , ,..., } of R for all n.Let ai1 ai2 aini be a set of generators of the ideal Ai .Let i fij(X), 0 ≤ i ≤ m, 1 ≤ j ≤ ni be a polynomial in A whose leading term is aijX . We show that S ={fij(X) | 0 ≤ i ≤ m, 1 ≤ j ≤ ni } is a set of generators of the ideal A of R[X].Let f (X) ∈ A. We show that f (X) is linear combination of members of S with coefficient in R[X]. The proof is by the induction on degree of f (X) (clearly 0 is linear combination of members of S). If degree of f (X) is 0, then 7.1 Noetherian Rings and Modules 233 f (X) is constant, and so it belongs to A0. But, then it is a linear combination of { , ,..., } = a01 a02 a0n0 with coefficients in R. Clearly, a0 j foj, and so in this case f (X) is a linear combination of members of S with coefficients in R ⊂ R[X]. Thus, the result is true if the degree of f (X) is 0. Assume that the result is true for all those polynomials in A whose degree is less than r.Let f (X) ∈ A, and degree f (X) is r. There are two cases:

(i) r ≥ m. (ii) r ≤ m − 1.

r Consider the case (i). Let f (X) = a0 + a1 X +···+ar X be a member of A. Then ar ∈ Ar = Am . Since Am is generated by {am1, am2,...,anm}, ar = α + α +···+α α ,α ,...,α m1am1 m2am2 mnm amnm for some m1 m2 mnm in R. Then,

( ) = ( ) − r−m (α ( ) + α ( ) +···+α ( )) f1 X f X X m1 fm1 X m2 fm2 X mnm fmnm X is a member of A, and it is 0, or it is of degree less than r. By the induction hypothesis f1(X) is a linear combination of members of S with coefficients in R[X]. In turn, f (X) is also a linear combination of members of S with coefficients in R[X]. Consider the case (ii). In this case r ≤ m − 1, and so frj(X), 1 ≤ j ≤ nr are in S. Now,

( ) = ( ) − α ( ) − α ( ) −···−α ( ) f1 X f X r1 fr1 X r2 fr2 X rnr frnr X belongs to A, and it is 0, or it is of degree less than r. Again, by the induction hypothesis, f1(X) is a linear combination of members of S. Hence, f (X) is also a linear combination of members of S with coefficients in R[X]. 

Using the induction on n, we get the following corollary.

Corollary 7.1.15 If R is noetherian ring, then R[X1, X1,...,Xn] is also noetherian. 

Remark 7.1.16 Although in a noetherian domain every nonzero nonunit element is product√ of irreducible elements, it need not be a U.F.D. For example, consider Z[ −5]. This is not a U.F.D. Now, Z[X√] is a noetherian ring (by the Hilbert basis theorem), and the map f (X)  f ( −5) is a surjective homomorphism√ of rings(verify). Since homomorphic image of a noetherian ring is noetherian Z[ −5] is a noetherian ring. Also observe that a U.F.D. need not be a noetherian ring. For example Z[X1, X2,...,Xn, ···]is a U.F.D. but it is not noetherian.

Exercises

7.1.1 Show that a noetherian domain is a U.F.D. if and only if g.c.d. exists in R.

7.1.2 Show that every proper ideal of a noetherian ring can be embedded in a max- imal ideal. 234 7 General Linear Algebra

7.1.3 Give an example of an integral domain in which every nonzero nonunit element is expressible as product of irreducible elements but still it is not a noetherian ring.

7.1.4 Is a direct product of noetherian rings a noetherian ring? Support.

7.1.5 Show that a vector space is noetherian if and only if it is finite dimensional. √ 7.1.6 Show that Z[ n] is a noetherian ring for all integers n.

7.1.7 Show that an abelian group is a noetherian Z-module if and only if it is finitely generated.

7.1.8 Give an example of a noetherian module which does not satisfy D.C.C. for submodules.

7.1.9 Let G be a finite commutative semigroup with identity. Show that the semi- group ring R(G) is noetherian if and only if R is noetherian.

7.1.10 Suppose that R is noetherian, and G is group which is also noetherian in the sense that every subgroup of G is finitely generated. Is R(G) also noetherian? Support.

7.1.11 Show that if R is noetherian, then the ring R[[X]] of formal power series is also noetherian. Hint. Imitate the proof of the Hilbert basis theorem by taking order function on the power series instead of degree of a polynomial.

7.2 Free, Projective, and Injective Modules

In the last section, we described rings over which all modules possesses one of the most important and crucial property of vector spaces (modules over fields), viz., all submodules of finitely generated modules are finitely generated. Following are other two important properties of vector spaces: (i). Given a vector space W over a field F, and a surjective homomorphism f from a vector space V over F to W, there is a vector space homomorphism t from W to V such that tof = IW . (ii). Given an injective homomorphism i from W to V , there is a homomorphism s from V to W such that soi = IW . In this section, we discuss modules over arbitrary rings with these important crucial properties. Later in Chap. 9 on representation theory of finite groups, we shall describe rings (semi-simple rings) over which all modules have both of these crucial properties. Let R be a ring(not necessarily commutative) with identity, and X beaset.We have the following universal problem: “Does there exists a pair (M, i), where M is a left R-module, i a map from X to M, with the property that given any such pair (N, j), there is a unique R-homomorphism φ from M to N such that φoi = j?” 7.2 Free, Projective, and Injective Modules 235

As in case of free groups, the solution to this problem is unique up to isomor- phism. More precisely, if (M, i) and (N, j) are solutions to the above problem, then there exists an isomorphism φ from M to N such that the following diagram is commutative. (imitate the proof in the case of free groups). X M

j φ i

N

We show the existence of solution to the above problem. If X is finite set n {x1, x2,...,xn} containing n elements. Then the pair (R , i), where i(x j ) = e j , is a solution to the problem. We do the construction of solution to the above prob- lem for an arbitrary set X.LetF(X) denote the set of all maps from X to R which vanish at all but finitely many points of X. More precisely, F(X) ={f : X −→ R | there is a f inite subset J such that f (x) = 0 for all x ∈ X − J}.Let f, g ∈ F(X). Define a map f + g from X to R by ( f + g)(x) = f (x) + g(x). Observe that f + g ∈ F(X). This defines a binary operation + on F(X) such that (F(X), +) is an abelian group. Define a multiplication · on F(X) by elements of R by (a · f )(x) = a · f (x), where · in the R.H.S. is the product in the ring. It is easy to see that F(X) is a left R-module. Define a map i from X to F(X) by i(x)(y) = 1if x = y, and 0 otherwise. It is clear that i is an injective map. We show that (F(X), i) is a solution to the above problem. Let f ∈ F(X) −{0}.Let{x1, x2,...,xn} the finite subset of X such that f (xi ) = ai = 0, and f (x) = 0ifx = xi . Then it is clear that f = a1i(x1) + a2i(x2) +···+ani(xn), and such a representation is unique. More precisely, i(X) is a basis for F(X).LetN be a left R-module, and j be a map from X to N. Define a map φ from F(X) to N by

φ(a1i(x1) + a2i(x2) +···+ani(xn)) = a1 j(x1) + a2 j(x2) +···+an j(xn).

Then φ is a homomorphism such that φoi = j. Since i(X) is a basis, such a map is unique. This completes the proof of the existence of the solution to the above problem. Definition 7.2.1 The solution to the above universal problem is called the free left R-module on X. Thus, (F(X), i) is the free left R-module on X. Proposition 7.2.2 Every left R-module is quotient of a free left R-module. Proof Let M be a left R-module, and (F(M), i) thefreeleftR-module on M.The identity map IM is a map from the set M to the left R-module M. From the universal 236 7 General Linear Algebra property of free left R-module, there is a unique homomorphism φ from F(M) to M such that φoi = IM . This shows that φ is surjective homomorphism. By the fundamental theorem of homomorphism M is isomorphic to F(M)/kerφ.  In Chap. 6 Sect. 6.1, we defined direct sum of finitely many R-modules. Now, we define direct sum of an arbitrary family of submodules. Let {Mα | α ∈ } beafamily of R-modules. Then the Cartesian product   Mα ={x :  −→ Mα | x(α) ∈ Mα for all α} α∈ α∈ is a left R-module with respect to the operations defined by(x+y)(α) = x(α)+y(α) and (a · x)(α) = a · x(α). Define a map iα from Mα to α∈ Mα by iα(x)(β) = x if β = α, and 0 otherwise. It is clear that iα is an injective homomorphism. Further, αth ( ) = (α) the projection pα from α∈ Mα to Mα defined by pα x  x is a surjective = homomorphism such that iαopα IMα . The submodule of α∈ Mα generated by α∈ iα(Mα) is clearly  {x ∈ Mα | x(α) = 0 except for finitely many α}. α∈

This submodule is denoted by ⊕α∈ Mα, and it is called the external direct sum α of the family {Mα | α ∈ }.IfM denotes the submodule of ⊕α∈ Mα generated α by β =α iβ (Mβ ), then iα(Mα) M ={0}.

Proposition 7.2.3 Let M be a module over a ring R. Let {Mα | α ∈ } beafamily of its submodules. Then the following conditions are equivalent.  1. (i) M is generated by α∈ Mα.  α α (ii) Mα M ={0}, where M is the submodule of M generated by β =α Mβ . 2. For every nonzero element x ∈ M, there is a unique finite subset {α1,α2,...,αr }  ∈ of distinct elements of together with unique nonzero elements xαi Mαi for each i, 1 ≤ i ≤ r such that

= + + ··· + . x xα1 xα2 xαr

Proof (1 ⇒ 2) Assume 1. Let x be a nonzero element of M. From 1(i), there exist {α ,α ,...,α }  ∈ a finite subset 1 2 r of together with nonzero elements xαi Mαi for each i, 1 ≤ i ≤ r such that

= + + ··· + . x xα1 xα2 xαr

Next, we prove the uniqueness. Suppose that

= + + ··· + = + + ··· + , x xα1 xα2 xαr yβ1 yβ2 yβs

{α ,α ,...,α } {β ,β ,...,β }  ∈ where 1 2 r and 1 2 s are sets of distinct elements of , xαi −{ } , ≤ ≤ ∈ −{ } , ≤ ≤ Mαi 0 for all i 1 i r, and yβ j Mβ j 0 for all j 1 j s. We need 7.2 Free, Projective, and Injective Modules 237 to show the following: (i) r = s, (ii) after some rearrangement αi = βi for all i, α = α ( , ) and (iii) x i y i for all i. We prove it by the induction on maxr s . Suppose that α ( , ) = = = α = β ∈ 1 ={} max r s 1. Clearly, r 1 s.If 1 1, then x Mα1 M 0 .This = α = β = = means that x 0, a contradiction. Hence 1 1, and then x xα1 yβ1 . Thus the result is true for max(r, s) = 1. Assume that result is true for max(r, s) = n. Let x be a nonzero element having representations

= + + ··· + = + + ··· + , x xα1 xα2 xαn+1 yβ1 yβ2 yβm where {α1,α2,...,αn+1} and {β1,β2,...,βm } are sets of distinct elements of , ≤ + ∈ −{ } , ≤ ≤ + ∈ −{ } , ≤ m n 1, xαi Mαi 0 for all i 1 i n 1, and yβ j Mβ j 0 for all j 1 j ≤ m. We show that α1 = β j for some j. Suppose not. Then

=− − ··· − + + + ··· + xα1 xα2 xαn+1 yβ1 yβ2 yβm  α 1 ={} = belongs to Mα1 M 0 . Hence xα1 0. This is a contradiction to the supposition that x − α1 = 0. Thus, α1 = β j for some j. After rearranging, we may assume that α1 = β1. Further,

− =− − ··· − + + ··· + . xα1 yα1 xα2 xαn+1 yβ2 yβm  α − 1 ={} = Hence xα1 yα1 belongs to Mα1 M 0 . This shows that xα1 yα1 . In turn,

+ + ··· + = + + ··· + . xα2 xα3 xαn+1 yβ2 yβ3 yβm

+ = α = β = By the induction hypothesis, n 1 m, i i , and xαi yβi for all i. (2 ⇒ 1) Assume 2. Evidently, 1(i) follows. From the uniqueness of the repre- α sentation of a nonzero element inM,itfollowsthatMα M cannot contain any α nonzero element of M. Thus, Mα M ={0}. 

Definition 7.2.4 We say that a module M over a ring R is an internal direct sum of the family {Mα | α ∈ } of its submodules if it satisfies any one, and hence both of the conditions in the above proposition is satisfied.

Proposition 7.2.5 Let M be an internal direct sum of the family {Mα | α ∈ } of its submodules. Then M is isomorphic to the external direct sum ⊕α∈ Mα.

Proof The map η from the external direct sum ⊕α∈ Mα to M defined by η(x) = α∈x(α) is easily seen to be an isomorphism. 

From now onward, we shall not distinguish the internal and the external direct sums. It follows from the construction of a free R- module F(X) on a set X that F(X) is isomorphic to the direct sum ⊕α∈ X Mα, where Mα = R for all α. Thus, F(X) is precisely the direct sum of X copies of R. 238 7 General Linear Algebra

Consider a chain

αn+1 αn Mn+1 Mn Mn−1

where Mn is an R-module for all n, and αn is a homomorphism for all n.This chain is called an exact sequence at Mn if kerαn = imageαn+1.Itissaidtobe exact sequence, if it is exact at all Mn. An exact sequence α β 0 M1 M2 M3 0

where 0 is the trivial module, is called a short exact sequence. Clearly, the above sequence is a short exact sequence if and only if (i) α is injective, (ii) β is surjective, and (iii) kerβ = imageα. If N is a submodule of a module M, then i ν 0 N M M/N 0

is a short exact sequence, where i is the inclusion map, and ν is the quotient map. The sequence 0 −→ M1 −→ M2 is exact if and only if M1 −→ M2 is injective. The sequence M2 −→ M3 −→ 0 is exact if and only if M2 −→ M3 is surjective, and the sequence 0 −→ M1 −→ M2 −→ 0 is exact if and only if M1 −→ M2 is an isomorphism. Theorem 7.2.6 (Five Lemma) Consider the following commutative diagram where rows are exact, and vertical maps are homomorphisms.

α1 α2 α3 α4 M1 M2 M3 M4 M5

f1 f2 f3 f4 f5

β1 β2 β3 β4 N1 N2 N3 N4 N5

(i) If f1 is surjective, f2 and f4 are injective, then f3 is injective. (ii) If f5 is injective, f4 and f2 are surjective, then f3 is surjective. (iii) If f1, f2, f4, f5 are isomorphisms, then f3 is also an isomorphism.

Proof (i). Suppose that f1 is surjective, f2 and f4 are injective. We have to show that f3 is injective. Suppose that f3(m) = 0. Then f4(α3(m)) = β3( f3(m)) (commutativity of the diagram) = β3(0) = 0. Since f4 is injective, α3(m) = 0. Thus, m ∈ kerα3 = imageα2 (exactness), and hence 7.2 Free, Projective, and Injective Modules 239

there is an element m2 ∈ M2 such that α2(m2) = m. Further, 0 = f3(m) = f3(α2(m2)) = β2( f2(m2)) (commutativity of the diagram). Thus, f2(m2) ∈ kerβ2 = imageβ1 (exactness). Hence, there exists n1 ∈ N1 such that β1(n1) = f2(m2). Since f1 is surjective, there is an element m1 ∈ M1 such that f1(m1) = n1.Now, f2(α1(m1)) = β1( f1(m1)) = β1(n1) = f2(m2). Since f2 is injective, α1(m1) = m2. But, already α2(m2) = m. Hence m = α2(α1(m1)) = 0, for imageα1 = kerα2. This shows that f3 is injective. (ii). Suppose that f5 is injective, f2 and f4 are surjective. We have to show that f3 is surjective. Let n ∈ N3. We have to show the existence of an element m ∈ M3 such that f3(m) = n.Now,β3(n) ∈ N4. Since f4 is surjective, there is an element m4 ∈ M4 such that f4(m4) = β3(n).Now f5(α4(m4)) = β4( f4(m4)) (commutativity of the diagram)= β4(β3(n)) = 0 (exactness). Since f5 is injective, α4(m4) = 0. Thus, m4 ∈ kerα4 = imageα3. Hence, there is an element m3 ∈ M3 such that α3(m3) = m4. Since β3( f3(m3)) = f4(α3(m3)) = f4(m4) = β3(n), β3(n − f3(m3)) = 0. Thus, n − f3(m3) ∈ kerβ3 = imageβ2. Hence there exists n2 ∈ N2 such that β2(n2) = n− f3(m3). Since f2 is surjective, there is an element m2 ∈ M2 such that f2(m2) = n2. Now n − f3(m3) = β2(n2) = β2( f2(m2)) = f3(α2(m2)). This shows that f3(m3 + α2(m2)) = n, and so f3 is surjective. (iii). Follows from (i) and (ii). 

Remark 7.2.7 The technique used in the proof of the above theorem is known as diagram chasing.

Corollary 7.2.8 Consider the following commutative diagram

0 M1 M2 M3 0

0 N1 N2 N3 0

where rows are exact, vertical arrows are homomorphisms, and the extreme vertical arrows are isomorphisms. Then the middle vertical arrow is also an iso- morphism. 

A short exact sequence α β 0 M1 M2 M3 0

is said to be a split exact sequence, if there exists a homomorphism t from M3 β = to M2 such that ot IM3 . The homomorphism t is called a splitting of the exact sequence. 240 7 General Linear Algebra

Proposition 7.2.9 A short exact sequence α β 0 M1 M2 M3 0

is split exact if and only if there exists a homomorphism s from M2 to M1 such α = that so IM1 . Further, there is a bijective correspondence between the set of splittings of the short exact sequence and the set of all homomorphisms s from M2 α = to M1 satisfying so IM1 . β = ∈ β( − (β( ))) = Proof Let t be a splitting. Then ot IM3 .Letx M2. Then x t x β(x) − β(t(β(x))) = β(x) − β(x) = 0. Hence x − t(β(x)) ∈ kerβ = imageα. Since α is injective, there is a unique s(x) ∈ M1 such that α(s(x)) = x − t(β(x)). Using the defining property of s and the injectivity of α, it can be seen that s is a homomorphism from M2 to M1.Alsoα(s(α(y))) = α(y) − t(β(α(y))) = α(y) (for βoα = 0). Since α is injective s(α(y)) = y for all y ∈ M1. Hence α = so IM1 . We show that the correspondence which associates an splitting t with s defined above is bijective. Suppose that t1 and t2 are splittings which associates to same s. Then α(s(x)) = x − t1(β(x)) = x − t2(β(x)). Since β is surjective, = α = t1 t2.Lets be a homomorphism from M2 to M1 such that so IM1 .Let y ∈ M3. Since β is surjective, there is an element x ∈ M2 such that β(x) = y. Define a binary relation t from M3 to M2 by t(β(x)) = x − α(s(x)). Suppose that β(x1) = β(x2). Then x1 − x2 ∈ kerβ = imageα. Hence there exists z ∈ M1 such that α(z) = x1 − x2.Now,s(x1 − x2) = s(α(z)) = z. Hence α(s(x1 − x2)) = α(z) = x1 − x2. Thus, x1 − α(s(x1)) = x2 − α(s(x2)).This shows that t is a map from M3 to M1. It can easily be seen that t is a homomorphism. Also β(t(β(x))) = β(x −α(s(x))) = β(x)−β(α(s(x))) = β(x),forβoα = 0. Thus, t is splitting, and since x − t(β(x)) = α(s(x)), the homomorphism from M2 to M1 associated to the splitting t is s.  α = A homomorphism s such that so IM1 is also called a splitting. Let M1 and M3 be R-modules. Then

i1 p2 0 M1 M1 ⊕ M3 M3 0

is a split exact sequence, where i1(x) = (x, 0), and p2(x, y) = y.Themap i2 from M3 to M1 ⊕ M3 given by i2(y) = (0, y) is a splitting, and the associated splitting from M1 ⊕ M3 to M1 is the first projection p1. Proposition 7.2.10 Let α β 0 M1 M2 M3 0 7.2 Free, Projective, and Injective Modules 241

be a split exact sequence. Then there exists an isomorphism from M2 to M1 ⊕ M3 such that the diagram α β 0 M1 M2 M3 0

IM1 f IM3

i1 p2 0 M1 M1 ⊕ M3 M3 0

is commutative. Further, if t and s are associated splittings, then t s 0 M3 M2 M1 0

is also a spilt exact sequence.

Proof Let t be a homomorphism from M3 to M2 which is a splitting, and s be the associated splitting. Define a map f from M2 to M1 ⊕ M3 by f (x) = (s(x), β(x)). Then f is a homomorphism which makes the diagram commutative(verify). By the −1 five lemma f is an isomorphism. Finally, since f is an isomorphism t = f oi2, and s = p1of. The result follows if we observe that

i2 p1 0 M3 M1 ⊕ M3 M1 0

is split exact. 

Let M and N be left R-modules. Let HomR(M, N) denote the set of all R- homomorphisms. Let f, g ∈ HomR(M, N). Define a map f + g from M to N by ( f + g)(x) = f (x) + g(x). It is easy to observe that f + g is also a member of HomR(M, N). This defines an addition in HomR(M, N) with respect to which it is an abelian group. We may be tempted to define a module structure on HomR(M, N) by defining (a · f )(x) = a · f (x).Buta · f need not be a member of HomR(M, N), and so it will not work in general. However, if R is a commutative ring, then it is indeed a member of HomR(M, N), and then HomR(M, N) becomes a R-module. Note that every R-module is also a Z-module, and HomR(M, N) is a subgroup of HomZ(M, N).Let f ∈ HomZ(M, N), and a ∈ R. Define a map f · a from M to N by ( f · a)(x) = f (a · x). Clearly, f · a ∈ HomZ(M, N). It is easy to observe that HomZ(M, N) is a right R-module with respect to the above right multiplication. Also, if R is a commutative ring, then HomR(M, N) is a right R-submodule of HomZ(M, N). Let M1 and M2 be left R-modules, and α a R-homomorphism from M1 to M2.Let

N bealeftR-module. Then we have a map α from HomR(M2, N) to HomR(M1, N) 242 7 General Linear Algebra defined by α ( f ) = foα. It can be easily seen that α is a group homomorphism. Similarly, we have a group homomorphism α from HomR(N, M1) to HomR(N.M2) given by α ( f ) = αof.Letβ be a homomorphism from M2 to M3.Weleaveit to the reader to verify that (i) (βoα) = α oβ , and (ii) (βoα) = β oα .Itis also clear that 0 and 0 are the corresponding zero homomorphisms. Further, it is straight forward to see that (IM ) and (IM ) are the corresponding identity maps. Theorem 7.2.11 Hom is a left exact functor in the following sense. (i) If α β M1 M2 M3 0

is an exact sequence of left R-modules, and N a left R-module, then the sequence β α 0 HomR(M3,N) HomR(M2,N) HomR(M1,N)

is exact. (ii) If α β 0 M1 M2 M3

is an exact sequence, then α β 0 HomR(N,M1) HomR(N,M2) HomR(N,M3)

is also exact.

Proof (i). Let f ∈ HomR(M3, N) such that β ( f ) = 0. Then foβ = 0. Since β is surjective (exactness), it follows that f = 0. This shows that β is injective. Next, since βoα = 0 (exactness of the given sequence), α oβ = (βoα) = 0 = 0. Hence image β ⊆ kerα .Let f ∈ kerα . Then α ( f ) = foα = 0, and so ker f ⊇ imageα = kerβ. By the fundamental theorem of homomorphism, there is a unique homomorphism f from M2/kerβ to N such that foν = f . Also, since β is surjective, we have an isomorphism β from M2/kerβ to M3 such that βoν = β. −1 −1 Then β ( foβ ) = foβ oβ = foν = f . Thus f ∈ imageβ . This shows that kerα = imageβ . The proof of (ii) is similar, and it is left as an exercise. 

Remark 7.2.12 Hom is not a right exact functor, for even if α is injective α need not be surjective, and even if β is surjective, β need not be surjective. Consider, Z Z > for example, the multiplication fm from to by m, where m 1. Then fm from HomZ(Z, Z) to itself is not surjective (verify). The quotient map ν frim Z to 7.2 Free, Projective, and Injective Modules 243

Zm is a surjective homomorphism but ν from HomZ(Zm , Z) to HomZ(Zm , Zm ) is not surjective for the simple reason that HomZ(Zm, Z) ={0}, whereas HomZ(Zm, Zm ) ≈ Zm ={0}.

Proposition 7.2.13 If α β 0 M1 M2 M3 0

is a split exact sequence, and N a module, then β α 0 HomR(M3,N) HomR(M2,N) HomR(M1,N) 0

and α β 0 HomR(N,M1) HomR(N,M2) HomR(N,M3) 0

are also split exact sequence. ( α ) = (α ) = Proof Let t and s be associated splittings. Then s o os IHomR (M1,N), and similarly, t oβ is also the identity map. This shows that α and β are surjective maps, and the above sequence splits. 

Definition 7.2.14 AleftR-module P is called a projective left R-module if given an exact sequence β M N 0

(or equivalently, β is a surjective homomorphism from M to N), and a homo- morphism f from P to N, there is a homomorphism g from P to M such that the diagram P g f

β M N 0

is commutative.

Dually, we have 244 7 General Linear Algebra

Definition 7.2.15 AleftR-module I is called an injective left R-module if given any exact sequence α 0 N M

(or equivalently, α an injective homomorphism), and a homomorphism f from N to I , there is a homomorphism g from M to I such that the diagram α 0 N M

g I

is commutative. Proposition 7.2.16 If P is a projective R-module, then every short exact sequence α β 0 M N P 0

splits. Dually, if M is injective, then also it splits. Proof Suppose that P is projective. Since β N P 0

is exact, and IP is a homomorphism from P to P, there is a homomorphism t from P to N such that βot = IP . Thus, t is a splitting. The rest also follows similarly.  The following result follows from Propositions 7.2.10 and 7.2.13. Corollary 7.2.17 If the last but one term in a short exact sequence is a projective module, or the second term in a short exact sequence is injective module, then Hom takes the short exact sequence to a split exact sequence.  Proposition 7.2.18 Every free R-module is projective. Proof Let (F(X), i) beafreeR-module on X, and β a surjective homomorphism from M to N. Then β−1{( foi)(x)} =∅for all x ∈ X. From the axiom of choice, there is a map c from X to M such that c(x) ∈ β−1{ foi(x)} for all x ∈ X.This means βoc = foi. Since (F(X), i) is a free R-module on X, there is a unique homomorphism φ from F(X) to M such that φoi = c. Hence βoφ and f both make the triangle 7.2 Free, Projective, and Injective Modules 245

i X F (X) foi f,βoφ

N

commutative. Since (F(X), i) is a free R-module, βoφ = f . This shows that F(X) is a projective module.  A submodule N of a module M over a ring R is called a direct summand of M if there is a submodule L of M such that M = N ⊕ L. Proposition 7.2.19 Direct summand of a projective (injective) module is projec- tive(injective). Proof Suppose that P ⊕ Q is projective. We show that P is projective. Let β be a surjective homomorphism from M to N, and f a homomorphism from P to f . Then fop1 is a homomorphism from P ⊕ Q to N. Since P ⊕ Q is projective, there is a homomorphism φ from P ⊕ Q to M such that βoφ = fop1.Wehavethe homomorphism φoi1 from P to M such that βoφoi1 = fop1oi1 = foIP = f . This shows that P is projective. The rest can be proved similarly.  Theorem 7.2.20 A left R-module P is projective if and only if it is direct summand of a free R-module. Proof Since a free R-module is projective, and direct summand of a projective mod- ule is projective, direct summand of a free module is projective. Next suppose that P is projective. From Proposition 7.2.2 we have a surjective homomorphism β from from F(P) to P. This gives us an exact sequence

0 −→ Kerβ −→ F(P) −→ P −→ o.

Since P is projective, the sequence splits. Hence from Proposition 7.2.10. P is direct summand of F(P).  We have the following corollary. Corollary 7.2.21 A left R-module P is projective if and only if every short exact sequence 0 −→ M −→ N −→ P −→ 0 splits. Proof Suppose that every short exact sequence

0 −→ M −→ N −→ P −→ 0 246 7 General Linear Algebra splits. Then, in particular,

0 −→ Kerβ −→ F(P) −→ P −→ o splits. From the Proposition 7.2.10, P is a direct summand of F(P). From Theorem 7.2.20, it follows that P is projective. The converse follows from the Proposition 7.2.16. 

Corollary 7.2.22 Let {Pα | α ∈ } be a family of left R-modules. Then P = ⊕α∈ Pα is projective if and only if each Pα is projective.

Proof If P is projective, then, since each Pα is direct summand of P, each Pα is projective. Conversely, suppose that each Pα is projective, then Pα is direct summand of a free module Fα. But, then P is a direct summand of ⊕α∈ Fα. Since direct sum of free modules are free, the result follows. 

Example 7.2.23 Every vector space (being free) is projective. It is also injective.

Example 7.2.24 Direct sum of infinite cyclic groups are Z-projective, for they are free.

Example 7.2.25 Since submodules of free modules over a P.I.D. are free, projective, and free modules over P.I.D.are same. In particular, all projective modules over F[X], where F is field, is free. It is a fact that all finitely generated projective module over F[X1, X2,...,Xn] is free. This fact was conjectured by J.P. Serre, and it was proved by D. Quillen and Suslin simultaneously, and independently in 1976.

Example 7.2.26 Zm is not Z-projective for it is not free. Z is Z-projective but it is not injective.

Example 7.2.27 Submodule of a free module need not be free. For example, Z6 is free over Z6,butZ3 being an ideal of Z6 is a submodule, and it is not free. Also Z6 = Z3 ⊕ Z2, and so Z3 is projective but not free. Example 7.2.28 Submodule of a projective module need not be a projective module. Z4 is free module over Z4, and so it is projective. However w Z2 is a submodule of Z4 which is not direct summand of a free module, and so it cannot be a projective module. Quotient of a projective module need not be projective, for otherwise every module will become projective.

Theorem 7.2.29 Let I be a left R-module. Then I is injective if and only if given any left ideal A of R and a R-homomorphism f from A (considered as R-module) to I , there exists a R-homomorphism f from R to I such that f /A = f.

Proof If I is injective, then by the definition, every homomorphism from A to I can be extended to a homomorphism from R to I . Conversely, suppose that every R-homomorphism from every ideal A to I can be extended to a homomorphism from R to I .Letξ be an injective homomorphism 7.2 Free, Projective, and Injective Modules 247 from a left R-module to M to a left R-module N, and φ a R-homomorphism from M to I . We have to show that φ can be extended to a homomorphism from N to I .LetX be the set of all pairs (L,ψ)such that L is a submodule of N containing ξ(M), and ψ a homomorphism from L to I such that ψoξ = φ. Further, X =∅,for (ξ(M), ψ) belongs to X, where ψ(ξ(x)) = φ(x) for all x ∈ M. Define a relation ≤ on X by (L,ψ)≤ (L,χ),ifL ⊆ L and χ/L = ψ. Clearly, (X, ≤) is a nonempty partially ordered set. Let {(Lα,ψα) | α ∈} be a chain in X. Since union of a chain of submodules is a submodule, L = α∈ Lα is a submodule of N containing ξ(M). We have a unique homomorphism ψ from L to N defined by the property that ψ/Lα = ψα for all α. Clearly, (L,ψ)is an upper bound of the chain. By the Zorn’s Lemma X has a maximal element (L0,ψ0) (say). We show that L0 = N. Suppose that L0 = N, and x0 ∈ N − L0.LetA ={λ ∈ R | λx0 ∈ L0}. Then A, being the inverse image of L0 under the homomorphism λ  λx0 from R to N, is an ideal of R, and the map f from A to I defined by f (λ) = ψ0(λx0) is a homomorphism. From our supposition, we have a R-homomorphism f from R to I such thatf /A = f . Let L1 = L0+ < x0 > be the submodule of N generated by L0 {x0}. Then L0 is a proper submodule of L1. Any element of L1 is of the form u + λx0, where u ∈ L0 and λ ∈ R. Suppose that u1 + λ1x0 = u2 + λ2x0, where u1, u2 ∈ L0 and λ1,λ2 ∈ R. Then (λ2 − λ1)x0 = u1 − u2 ∈ L0. Hence λ2 − λ1 ∈ A, and f (λ2 −λ1) = f (λ2 −λ1) = ψ0((λ2 −λ1)x0) = ψ0(u1 −u2) = ψ0(u1)−ψ0(u2). Hence f (λ2)− f (λ1) = ψ0(u1)−ψ0(u2), and so ψ0(u1)+ f (λ1) = ψ0(u2)+ f (λ2). Thus, we have a map ψ1 from L1 to I given by ψ1(u + λx0) = ψ0(u) + f (λ). clearly, ψ1 is a homomorphism, and ψ1/L0 = ψ0. Hence (L1,ψ1) ∈ X, and (L1,ψ1)>(L0,ψ0). This is a contradiction to the maximality of (L0,ψ0). Thus, L0 = N, and ψ0 is a homomorphism from N to I such that ψ0oξ = φ. This proves that I is injective.  Now, we describe Z-injective modules. Recall the following: Definition 7.2.30 An abelian group A is called a divisible if for all a ∈ A and n ∈ Z −{0}, there is a b ∈ A such that nb = a. Corollary 7.2.31 An abelian group A is Z-injective if and only if it is divisible. Proof An ideal of Z is of the form mZ for some nonnegative integer m.Fromthe above theorem, an abelian group A is Z-injective if and only if for each m,every homomorphism from mZ to A can be extended to a homomorphism from Z to A. Suppose that A is Z-injective. Let a ∈ A and n ∈ Z −{0}.Themap f from nZ to A defined by f (nm) = ma is a homomorphism. Since A is injective, there is a homomorphism f from Z to A such that f /nZ = f .Now f (n) = f (n) = 1 · a = a. Suppose that f (1) = b. Then a = f (n) = n · f (1) = n · b.This shows that A is divisible. Conversely, suppose that A is divisible. Let f from nZ to A be a homomorphism. Suppose that f (n) = a. Since A is divisible, there is b ∈ A such that nb = a.We have a homomorphism f from Z to A defined by f (m) = mb.Alsoifm = nr, then f (m) = nrb = rnb = ra = f (nr) = f (m). This means that f /nZ = f . It follows from the above theorem that A is injective.  248 7 General Linear Algebra

Example 7.2.32 The groups (Q, +), (R, +), (C, +), and (S1, ·) are all divisible (verify), and so they are all Z-injective.

Example 7.2.33 Homomorphic images, and also the quotients of divisible groups are divisible groups (verify). In particular, Q/Z ≈ P is Z-injective. Submodule of an injective module need not be injective, for (Z, +) is not injective whereas (Q, +) is injective.

Example 7.2.34 No nontrivial finite group can be divisible, for if | A |= n and a = 0, then we cannot find a b ∈ A such that nb = a.

We have seen that every module is quotient of a projective module. Dually, we show that every module is submodule of an injective module. Proposition 7.2.35 A left R-module I is injective if and only if for every left ideal A of R and every R homomorphism f from A to I , there exists a x ∈ I such that f (a) = ax for all a ∈ A.

Proof Suppose that I is injective, A a left ideal of R, and f a R-homomorphism from A to I . Then there exists a R-homomorphism φ from R to I such that φ/A = f . Suppose that φ(1) = x. Then φ(a) = φ(a · 1) = a · φ(1) = a · x for all a ∈ R. In particular, f (a) = a · xforalla∈ A. Conversely, suppose that such a x ∈ I exists. Then the map φ from R to I defined by φ(a) = a · x is a homomorphism which is an extension of f . The result follows from the Theorem 7.2.29. 

Let R be a ring with identity, and A be an abelian group. Then the set HomZ(R, A) of all additive group homomorphisms from (R, +) to A is an abelian group with respect to the pointwise addition. HomZ(R, A) becomes a left R-module with respect to the external multiplication · defined by (a · f )(b) = f (ba).IfA is also a left R-module, then HomR(R, A) is a subgroup of HomZ(R, A).If f ∈ HomR(R, A), then a · f ∈ HomR(R, A),for(a · f )(bc) = f (bca) = bf(ca) (for f is a R− homomorphism)= b(a · f )(c). This shows that HomR(R, A) is a left submodule of HomZ(R, A) for all left R-module A.

Proposition 7.2.36 Let M be a left R-module. Then the map φ from Hom R(R, M) to M defined by φ(f ) = f (1) is an isomorphism of R-modules.

Proof Clearly, φ(f + g) = ( f + g)(1) = f (1) + g(1) = φ(f ) + φ(g), and φ(a · f ) = (a · f )(1) = f (1 · a) = f (a) = a · f (1) (for f is a R- homomorphism)= a · φ(f ). This shows that φ is a homomorphism. Next, suppose that φ(f ) = φ(g). Then f (1) = g(1). Since f and g are R-homomorphisms, f (a) = a · f (1) = a · g(1) = g(a) for all a ∈ R. This shows that f = g, and so φ is injective. Lastly, let x ∈ M. Define a map f from R to M by f (a) = a · x. Then f ∈ HomR(R, M) and φ(f ) = x. This shows that φ is also surjective. 

Proposition 7.2.37 Let A be a divisible group and R a ring. Then the left R-module HomZ(R, A) is a left injective module over R. 7.2 Free, Projective, and Injective Modules 249

Proof Let B be a left ideal of R, and f be a R-homomorphism from B to HomZ(R, A). Then the map χ from B to A defined by χ(b) = f (b)(1) is clearly a group homomorphism from (B, +) to A. Since A, being divisible, is Z-injective, we can extend χ to a group homomorphism χ from the group (R, +) to A.Now,for b ∈ B, f (b)(1) = χ(b) = χ(b) = bχ(1). This show that f (b) = b · χ for all b ∈ B. From the Proposition 7.2.35, the result follows. 

The proof of the following proposition is an easy verification. Proposition 7.2.38 Direct sum of divisible groups are divisible. 

Proposition 7.2.39 Every abelian group can be embedded in to a divisible group.

Proof Let A be an abelian group. Then A is quotient of the free abelian group F(A) on A. Suppose that A ≈ F(A)/L.Now,F(A) is direct sum of A copies of Z, and so it is a subgroup of direct sum of A copies of (Q, +). Thus, A is isomorphic to a subgroup of quotient group of the direct sum of A copies of Q. Since direct sum of divisible groups are divisible, and also the quotient group of divisible groups are divisible, the result follows. 

Theorem 7.2.40 Every left R-module can be embedded in a left injective R-module.

Proof Let M bealeftR-module. From the above proposition, (M, +) is subgroup of a divisible group D. Since Hom is a left exact functor, HomZ(R, M) is isomorphic to a submodule of HomZ(R, D). Since D is divisible, HomZ(R, D) is injective over R.AlsoM ≈ HomR(R, M) is a submodule of HomZ(R, M). The result follows.  Corollary 7.2.41 A left R-module I is injective if and only if every short exact sequence of the type 0 I M N 0

splits.

Proof If I is injective, then it is already seen that the sequence will split. Conversely, suppose that every such exact sequence splits. Since every module can be embedded in an injective module, there is an injective module M such that I is a submodule of M. This gives us an exact sequence

0 −→ I −→ M −→ M/N −→ o.

By our hypothesis, the above exact sequence splits. Hence I is direct summand of an injective module M. Since direct summand of an injective module is an injective module, I is an injective module. 

Exercises

7.2.1 State and prove the Five lemma for groups. 250 7 General Linear Algebra

7.2.2 Develop the concept and the theory of projective and injective groups. Try to characterize them. 7.2.3 A commutative integral domain R is said to be a Dedekind domain if given any pair of ideals A and B of R such that B ⊂ A, there is√ an ideal C such that B = AC. For example, every PID is Dedekind domain. Z[ −5] is a Dedekind domain (prove it). Indeed, if we have a subfield F of C which is a finite-dimensional vector space over its subfield Q (such a field is called a number field) , and R the set of elements of F which are roots of monic polynomials with rational coefficients (called the ring of algebraic integers F), then R is a Dedekind domain. Let R be a Dedekind domain. Let A be an ideal of R. Show that A considered as a module over R is a finitely generated projective module. Hint. Let a ∈ A, a = 0. Then Ra ⊂ A.LetB be an ideal such that Ra = BA. Suppose that a = b1a1 + b2a2 + ··· + bnan. Check that (u1, u2,...,un) → u1a1 + u2a2 +···+unan is a module homomorphism with the inverse map given by x → (v1,v2,...,vn), where vi a = xbi . 7.2.4 Let R be a Dedekind domain. Using induction on n and the fact that projection maps from Rn to R are module homomorphisms, show that every finitely generated projective module is direct sum of finitely many ideals of R.

7.2.5 AringR with identity is called a local ring if the set M = R − R of non-units form a left ideal of R. Show that M is a two-sided ideal which is maximal ideal. Deduce that R/M is a division ring. Let [aij] be a n × n matrix such that the matrix [aij + M] is invertible in R/M. Show that A is invertible. Hint. If [aij + M] is the identity matrix in R/M, then using elementary operations [aij] can be reduced to identity matrix. 7.2.6 Use the Exercise 7.2.5 to show that every finitely generated projective module over a local ring is free.

7.2.7 R be a ring with identity, and A be a n × n idempotent matrix with entries in R. Show that Rn A is a finitely generated projective module, where elements of Rn are treated as row matrices. Conversely, show that any finitely generated projective module is isomorphic to such a module.

7.2.8 Let A and B be m × m idempotent matrices with entries in a ring R. Suppose that there is a invertible m × m matrix P such that PAP−1 = B. Show that the projective modules Rm A and Rm B are isomorphic as a module over R.

7.3 Tensor Product and Exterior Power

Let R be a ring with identity. Let M be a right R-module, N aleftR-module, and L an abelian group. A map f from M × N to L is called a balanced map if it satisfies the following two conditions: 7.3 Tensor Product and Exterior Power 251

(i) The map f is additive in both the coordinates in the sense that f (x + y, u) = f (x, u) + f (y, u), f (x, u + v) = f (x, u) + f (x,v)for all x, y ∈ M, and for all u,v ∈ N. (ii) f (xa, u) = f (x, au) for all x ∈ M, a ∈ R, and u ∈ N.

If further, M, N, and L are both sided R-modules, and in addition to (i) and (ii), we have f (xa, u) = af(x, u), then we say that f is a bilinear map. If f is a balanced map, then it follows from the additivity that f (0, u) = 0 = f (x, 0). Let M be a right R-module, and N bealeftR-module. We have the following universal problem: “Does there exists a pair (L, f ), where L is an abelian group, f a balanced map from M × N to L with the property that if (L, f ) is another such pair, then there is a unique homomorphism φ from L to L such that φof = f ?” As in earlier cases solution to above problem, if exists, is unique upto isomorphism. For the existence, consider the free abelian group (F(M × N), i) on M × N.LetA be the subgroup of F(M × N) generated by the elements of the types

(i) i(x + y, u) − i(x, u) − i(y, u), (ii) i(x, u + v) − i(x, u) − i(x,v), and

(iii) i(xa, u) − i(x, au).

Let L = F(M × N)/A, and f = νoi. We show that (L, f ) is a solution to the above problem. Let L be an abelian group, and g a balanced map from M × N to L. From the universal property of a free abelian group, there is a unique homomorphism φ from F(M × N) to L such that φoi = g. Since g is a balanced map φ(i(x +y, u)−i(x, u)−i(y, u)) = φ(i(x +y, u))−φ(i(x, u))−φ(i(y, u)) = g(x + y, u)−g(x, u)−g(y, u) = 0. Thus, the elements of the type (i) are contained in the kernel of φ. Similarly, elements of the types (ii) and (iii) are also contained in the kernel of φ. This shows that A is contained in the kernel of φ.Fromthe fundamental theorem of homomorphism, there is a unique homomorphism η from L = F(M × N)/A to L such that ηoν = φ. But, then ηof = ηoνoi = φoi = g. This completes the proof of the fact that (L, f ) is the solution to the above universal problem. The abelian group L is denoted by M ⊗R N, and it is called the tensor product of M and N. The image f (m, n) = i(m, n)+ A is denoted by m ⊗n. Thus, (m, n)  m ⊗ n is a balanced map, and hence

(i) (x + y) ⊗ u = x ⊗ u + y ⊗ u, (ii) x ⊗ (u + v) = x ⊗ u + y ⊗ v, and

(iii) xa ⊗ u = x ⊗ au for all x, y ∈ M, a ∈ R, and u,v ∈ N. 252 7 General Linear Algebra

Also 0 ⊗ u = 0 = x ⊗ 0 for all x ∈ M and u ∈ N. Further, if L is an abelian group, and g a balanced map from M × N to L, then  we have a unique homomorphism φ from M ⊗R N to L defined by the property φ(m ⊗ n) = g(m, n). Definition 7.3.1 Let R and S be rings with identities. An abelian group M which is a left R-module, and also a right S-module is called a Bi − (R, S) module if (a · x) · b = a · (x · b) for all x ∈ M, a ∈ R, and b ∈ S. Observe that if R is a commutative ring with identity, then a left R-module M is also a right R-module (define x · a = a · x). In fact, it is a bi-(R, R) module. Proposition 7.3.2 Let M be a right R-module and N a bi-(R, S) module. Then M ⊗R N has unique right S-module structure defined by (x ⊗ u) · b = (x ⊗ (u · b)). If M is bi-(S, R) module, and N a left R-module, then M ⊗R N is a left S-module.

Proof Let M be a right R-module, and N beabi-(R, S) module. Let b ∈ S. Define amap fb from M × N to M ⊗ N by fb(x, u) = x ⊗ ub. It is easy to observe (using the fact that N is a bi-(R, S) module) that fb is a balanced map. From the universal property of the tensor product, we have a unique homomorphism φb from M ⊗R N to itself defined by the property φb(x ⊗u) = x ×ub. Define an external multiplication on M ⊗R N by elements of S from right by z · b = fb(z) for all z ∈ M ⊗R N, ∈ ∈ = and b S. Since fb is a homomorphism for all b S, and fb1b2 fb2 ofb1 for all b1, b2 ∈ S, it follows that M ⊗R N is a right S-module with respect to the external multiplication defined above. The rest can be proved similarly. 

In particular, we have the following corollary.

Corollary 7.3.3 If R is a commutative ring, then M ⊗R N is a both sided R-module. 

Proposition 7.3.4 Let M and N be bi-(R, R) modules. Then we have a unique isomorphism f from M ⊗R NtoN⊗R M such that f (x ⊗ y) = y ⊗ x.

Proof The map φ from M × N to N ⊗R M defined by φ(x, y) = y ⊗ x is a balanced(in fact bilinear) map. From the universal property of the tensor product, we have a unique homomorphism f subject to the condition f (x ⊗ y) = y ⊗ x. Similarly, we have a unique homomorphism g from N ⊗R M to M ⊗ N subject to the condition g(y ⊗ x) = x ⊗ y. Clearly, gof (x ⊗ y) = x ⊗ y for all x ∈ M and y ∈ N. Since {x ⊗ y | x ∈ M, y ∈ N} is a set of generators of M ⊗R N, it follows = that gof IM⊗R N . Similarly, fogis also the identity map. This shows that f is an isomorphism. 

In particular, we have the following corollary. Corollary 7.3.5 Let R be a commutative ring, and M and N be are R-modules. Then M ⊗R N is isomorphic to N ⊗R M.  7.3 Tensor Product and Exterior Power 253

Proposition 7.3.6 Let M be a right R-module, N a bi-(R, S) module, and L a left S- module. Then there is a unique isomorphism φ from (M⊗R N)⊗S LtoM⊗R (N ⊗S L) subject to the condition φ((x ⊗ y) ⊗ z) = x ⊗ (y ⊗ z) for all x ∈ M, y ∈ N, and z ∈ L. Proof Let x ∈ M.Themap(y, z)  (x ⊗ y) ⊗ z defines a balanced map from N × L to (M ⊗R N) ⊗S L. Hence, there is a unique homomorphism φx from N ⊗S L to (M ⊗R N) ⊗S L subject to the condition φx (y ⊗ z) = (x ⊗ y) ⊗ z.Themap (x, u)  φx (u), where u ∈ N ⊗S L, is also a balanced map from M × (N ⊗S L) to (M ⊗R N) ⊗S L. Thus, there is a unique homomorphism φ from M ⊗R (N ⊗S L) to (M ⊗R N)⊗S L subject to the condition φ(x ⊗(y ⊗z)) = (x ⊗ y)⊗z. Similarly, we have a unique homomorphism ψ from (M ⊗R N) ⊗S L to M ⊗R (N ⊗S L) subject to the condition ψ((x ⊗ y) ⊗ z) = x ⊗ (y ⊗ z). It is clear that φ and ψ are inverses of each other.  Remark 7.3.7 The above result, in particular, says that if R is a commutative ring with identity, M1, M2,...,Mn are R-modules, then the tensor product of M1, M2,...,Mn taken in same order with respect to any two bracket arrangements are naturally isomorphic. Thus, we can define the tensor product M1 ⊗R M2 ⊗R ··· ⊗R Mn unambiguously. It is universal with respect to n-linear maps in the sense that if φ is a n-linear map from M1 × M2 × ··· × Mn to an R-module L, then there is a unique homomorphism ψ from M1 ⊗R M2 ⊗R ··· ⊗R Mn to L subject to ψ(x1 ⊗ x2 ⊗···⊗xn) = φ(x1, x2,...,xn). Proposition 7.3.8 Let R be a ring with identity. Then there is a unique R-isomorphism f from R ⊗R M to M defined by f (a ⊗ x) = ax. Proof The map (a, x)  ax is clearly a balance map from R × M to M. Hence there is a unique homomorphism φ from R ⊗R M to M such that φ(a ⊗ x) = ax. Also the map ψ from M to R ⊗R M defined by ψ(x) = 1 ⊗ x is a homomorphism. Now, (ψoφ)(a ⊗ x) = ψ(ax) = 1 ⊗ ax = 1a ⊗ x = a ⊗ x. Thus, ψ φ = φ ψ = ψ o IR⊗R M . Similarly, o IM . This shows that is a group isomorphism. Further, ψ(ax) = 1 ⊗ ax = a ⊗ x = a · (1 ⊗ x) = aψ(x). This shows that ψ is a R-isomorphism. 

Proposition 7.3.9 Let {Mα | α ∈ } be a family of right R-modules and N a left R-module. Then there is a unique isomorphism φ from (⊕α∈ Mα) ⊗R Nto ⊕α∈(Mα ⊗R N) such that φ( f ⊗ n)(α) = f (α) ⊗ n. Similar result holds if N is a right R-module and Mα is left R-module for each α.

Proof The map φ from (⊕α∈ Mα) × N to ⊕α∈(Mα ⊗R N) defined by φ((f, n))(α) = f (α) ⊗ n is easily seen to be a balanced map. Hence there is a unique homomorphism φ such that φ( f ⊗ n)(α) = f (α) ⊗ n. The inverse map is an obvious map. The proof of the second part is similar.  Remark A free left R-module is isomorphic to direct sum of several copies of R, and which are also bi-(R, R) modules. Thus, a free left (right) R-module is also a free bi-(R, R) module. 254 7 General Linear Algebra

Corollary 7.3.10 Tensor product of free left R-modules is a free left R-module. In turn, the tensor product P ⊗ Q of a projective bi-(R, R) module P with a projective left R-module Q is a projective left R-module.

Proof Since a free left R-module is direct sum of so many copies of R, and since R⊗R is isomorphic to R, the first part of the result follows from the above proposition. Further, let P be a projective bi-(R, R) module, and Q a projective left R-module. Then there exists a right R-module L, and a left R-module M such that such that P ⊕L is a free R-module, and Q⊕M isalsoafreeR-module. Since tensor product of free R-modules are free R-modules, (P ⊕L)⊗(Q⊕M) is a free R-module. From the previous proposition, (P⊗Q)⊕U is free, where U = (P⊗M)⊕(L⊗Q)⊕(L⊗M). Hence P ⊗ Q is a projective module. 

Corollary 7.3.11 Let V and W be finite-dimensional vector spaces over a field F. Then dim(V ⊗F W) = dim(V ) · dim(W).

Proof Suppose that dim(V ) = n, and dim(W) = m. Then V is isomorphic to direct sum of n copies of F, and W is isomorphic to m copies of F. From the above proposition it follows that V ⊗ W is isomorphic to the direct sum of nm copies of F. The result follows. 

Remark 7.3.12 Let {e1, e2,...,en} be a basis of V , and { f1, f2,..., fm } be a basis of W. Then {ei ⊗ f j | 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a set of generators of V ⊗ W, and so it is a basis of V ⊗ W.

Example 7.3.13 We show that Zm ⊗Z Zn is isomorphic to Zd , where d is the g.c.d    of m and n: Suppose that a = a in Zm, and b = b in Zn. Then m divides a − a , and n divides b − b. This means d divides a − a, and it also divides b − b. In turn,     d divides ab − a b . Hence ab = a b in Zd . Thus, we have a map f from Zm × Zn to Zd defined by f (a, b) = ab. Evidently, f is a balanced map, and so it induces a unique homomorphism f from Zm ⊗Z Zn to Zd such that f (a ⊗ b) = ab. Clearly, f is surjective. Suppose that ab = 0inZd . Then d divides ab. By the Euclidean algorithm, there are integers u and v such that d = um + vn. Since d divides ab, there are integers r and s such that rm + sn = ab. But, then

a ⊗ b = a ⊗ b1 = ab ⊗ 1 = ab ⊗ 1 = rm + sn ⊗ 1 = sn ⊗ 1 = s ⊗ n1 = s ⊗ 0 = 0

This shows that kernel of f is {0}, and so f is also injective.

Let f1 be a R-homomorphism from a right R-module M1 to a R-module M2, and g1 a R-homomorphism from a left R-module N1 to a left R-module N2.Themap f1 × g1 from M1 × N1 to M2 ⊗R N2 defined by ( f1 × g1)(x, y) = f1(x)⊗ g1(y) is a balanced map(verify). Hence it induces a unique homomorphism f1 ⊗ g1, called the tensor product of f1 and g1,fromM1 ⊗R N1 to M2 ⊗R N2 such that ( f1 ⊗g1)(x ⊗y) = f1(x) ⊗ g1(y). Since {u ⊗ v | u ∈ M2,v ∈ N2} is a set of generators of M2 ⊗R N2, 7.3 Tensor Product and Exterior Power 255 it follows that tensor product of any two surjective homomorphisms is a surjective homomorphism. Suppose further that f2 is a homomorphism from M2 to M3, and g2 that from N2 to N3. Then

( f2of1) ⊗ (g2og1) = ( f2 ⊗ g2)o( f1 ⊗ g1).

If N is a left R-module, then the homomorphism f ⊗ IN from M1 ⊗R N to M2 ⊗R N is denoted by f . It is clear that (gof ) = g of , and 0 = 0. Also ⊗ = IM IN IM⊗R N . Tensoring is a right exact functor in the sense of the following theorem. Theorem 7.3.14 Let α β M1 M2 M3 0

be an exact sequence of right R-modules, and N be a left R-module. Then the sequence α β

M1 ⊗R N M2 ⊗R N M3 ⊗R N 0

is also exact.

Proof Since β is surjective, and the tensor product of surjective homomorphisms are surjective, β is surjective. Thus, we need to show that kerβ = imageα .Again, since βoα = 0, we have 0 = (βoα) = β oα . Hence imageα ⊆ kerβ . Put imageα = L. By the fundamental theorem of homomorphism, we have a unique homomorphism φ from (M2 ⊗R N)/L to M3 ⊗R N defined by φ((m2 ⊗ n) + L) = β(m2) ⊗ n. It is sufficient to show that φ is an isomorphism. We construct β( ) = β(  ) −  β = α its inverse. If m2 m2 , then m2 m2 belongs to ker image . Hence ( ⊗ ) − (  ⊗ ) = ( −  ) ⊗ m2 n m2 n m2 m2 n is in L. This ensures that we have a map (m3, n)  m2 ⊗ n + L from M3 × N to (M2 ⊗R N)/L where β(m2) = m3.Thisis a balanced map(verify). Hence we have a unique homomorphism ψ from M3 ⊗R N to (M2 ⊗R N)/L such that ψ(m3 ⊗ n) = m2 ⊗ n + L, where β(m2) = m3.Now (φoψ)(m3 ⊗ n) = φ(m2 ⊗ n + L) = β(m2) ⊗ n = m3 ⊗ n. This shows that φoψ is the identity map. Similarly, ψoφ is also the identity map. This proves that φ is an isomorphism, and so L = kerβ . 

Remark 7.3.15 Consider the homomorphism f from Z to Z defined by f (a) = 5a. Then f is injective but f from Z ⊗Z Z5 to itself is the zero map, for f (m ⊗ a) = f (m) ⊗ a = 5m ⊗ a = m ⊗ 5a = 0. Since Z ⊗Z Z5 ≈ Z5 is nontrivial, f is not injective. This shows that tensoring is not left exact. 256 7 General Linear Algebra

Example 7.3.16 Let A be an abelian group. Then Zm ⊗Z A is isomorphic to A/mA: Consider the exact sequence α ν 0 Z Z Zm 0

where α is the multiplication by m. Taking tensor product with A, and observing the fact that tensoring is right exact, we see that Zm ⊗Z A is isomorphic to (Z ⊗Z A)/kerν .Againkerν = imageα . The isomorphism f from Z⊗Z A to A given by f (n ⊗ a) = na takes imageα to mA. The assertion follows from the fundamental theorem of homomorphism.

s Let V be a vector space over a field F.Let⊗ V denote s times tensor product r of V with itself, and ⊗ V denote r times tensor product of the dual space V s r r (⊗ ) ⊗ (⊗ ) with itself. Let Vs denote the tensor product V V . The members of r ( , ) Vs are called tensors of the type r s . Tensor product induces a multiplication ( ) =⊕ r in T V r,s Vs with respect to which it is an associative algebra called the tensor algebra of V . The Riemann’s metric tensor is an example of a tensor of order (2, 0). r Let V be a vector space over F.LetW denote the subspace of ⊗ V generated by {x ⊗x ⊗···⊗x | x = x for some i = j}, and let r V denote the quotient space 1 2 r i j    (⊗r )/ ⊗ ⊗···⊗ + ··· V W. Let us denote the coset x1 x2 xr W by x1 x2 xr . The map f from V r to r V defined by    f (x1, x2,...,xr ) = x1 x2 ··· xr  is r-alternating, and the pair ( r V, f ) is universal in the sense that if g is any r- r alternating map from V to a space U, then there is a unique linear transformation η from r V to U such that    η(x1 x2 ··· xr ) = g(x1, x2,...,xr ).  The pair ( r V, f ) is called the rth exterior power of V . r r  If T is a linear transformation from V to W. Then the map ×T from V to r (×r )( , ,..., ) = ( ) ( ) ··· ( ) W defined by T x1 x2 xr T x1 T x2  T xris an r r r - alternating map, and so it induces a unique homomorphism  T from V to r ··· ( ) ( ) ··· ( ) W which takes x1 x2  xr to T x1 T x2 T xr .Itiseasy r  r  r r  ( ) = ( ) ( ) = r to verify that T 0T  T o T , and IV I V . In particular, if T is an isomorphism, then r T is an isomorphism for all r. If V is vector space of dimension n, then any m-alternating map on V for m > n is zero map (for an r alternating map takes any linearly dependent r tuple to 0). Thus, we have the following proposition. 7.3 Tensor Product and Exterior Power 257  Proposition 7.3.17 Let V be a vector space of dimension n. Then m V ={0} for all m > n.   Theorem 7.3.18 Let V be a vector space of dimension n. Then dim n V = 1.

Proof Let {e1, e2,...,en} be an ordered basis of V .If f is an n-alternating map, then for any ordered n-tuple {x1, x2,...,xn}, f (x1, x2,...,xn) = detA · ( , ,..., ) = n =[ ] f e1 e2 en , where x j i=1aijei , and A aij . Thus, f is determined n uniquely by its value f (e1, e2,...,en). This shows that dim V is at most 1. Also ( , ,..., ) = the map f defined by f x1 x2 xn detA defines a nonzero n-alternating n map(indeed, f (e1, e2,...,en) = 1). This shows that the dimension of V is 1. 

LetV be a vector space of dimensionn, and T a linear transformation on V . n n n Then T is a linear transformation on V . Since V is of dimension 1, the linear transformation n T is multiplication by a scalar. This scalar is called the determinant of T . It is easy to observe that this definition of determinant agrees with the definition of determinant in the Chap. 5.

Theorem 7.3.19 Let V be a vector space of dimension n, and r ≤ n. Then r n dim V = Cr . = { , ,..., } Proof For r n, the result is the content of the above theorem.  Lete1 e2 en ··· be a basis of V . Then as observed in the above theorem e1 e2 en is nonzero. ={ ··· | < < ··· < } r Consider the subset S ei1 ei2 eir i1 i2 eir of V . Clearly, every member of r V is a linear combination of members of S, and so it is a set of generators. We show that S is linearly independent. Suppose that     ( ··· ) = . i1

Fix j1 < j2 < ··· < jr . Suppose that

{1, 2,...,n}−{j1, j2,..., jr }={jr+1, jr+2,..., jn}.    Taking the exterior product with jr+1 jr+2 ··· jn we obtain that       ··· ··· = . a j1 j2··· jr e j1 e j2 e jr e jr+1 e jn 0

Since jk = jl for all k = l,itfollowsthat       ··· =±( ··· ) = . e j1 e j2 e jn e1 e2 en 0

= Hence a j1 j2··· jn 0. This shows that S is linearly independent, and so it is a basis. n Clearly, the number of elements in S is Cr .    The exterior product gives us an external multiplication from r V × s V  + to r s V , and this can be extended linearly to a multiplication on E(V ) = 258 7 General Linear Algebra  ⊕∞ ( r ) r=0 V with respect to which it is an associative algebra, and it is called the exterior algebra of V .Also,

r ( ( )) = n = n + n + ··· + n = n. dim E V r=0 Dim V C0 C1 Cn 2

Exercises

7.3.1 Show that Z5 ⊗Z Z3 is the trivial group.

7.3.2 Show that A ⊗Z Zm is trivial whenever A is divisible. Deduce that Q ⊗Z Zm is trivial.

7.3.3 Let A be an abelian group of exponent m. Show that A ⊗Z Zm is isomorphic to A.

7.3.4 Show that tensoring takes a split exact sequence to a split exact sequence.

7.3.5 Find Q ⊗Z Q.

7.3.6 Let R be a commutative ring. Show that HomR(A ⊗R B, C) is isomorphic to HomR(A, HomR(B, C)).

7.3.7 Show that the definition of determinant in this section agrees with the defi- nition of determinant given in the previous chapter, and establish all properties of determinant using this definition.

7.3.8 Let V be a vector space of dimension at least 2. Show that a linear transfor- mation T on V is an isomorphism if and only if r T is an isomorphism for all r ≥ 2.

7.4 Lower K-theory

In this section, we shall introduce and discuss the functors K0 and K1 from the category of rings to the category of abelian groups. Let R be a ring with identity. Let ℘(R) denote the set of isomorphism classes of finitely generated projective left R-modules (Note that ℘(R) is, indeed, a set). The isomorphism class of projective module determined by P will be denoted by [P].LetK0(R) denote the abelian group generated by ℘(R) subject to the relation

[P]+[Q]=[P ⊕ Q].

More precisely, K0(R) = F/N, where F is the free abelian group with basis ℘(R), and N is the subgroup of F generated by the set of elements of the type [P]+[Q]−[P ⊕ Q]. The group K0(R) is also called the Grothendieck 7.4 Lower K-theory 259 group of the ring R. It is also called the Grothendieck group of the category PR of finitely generated left projective modules over R. The coset [P]+N is denoted by < P >. Thus, the elements of the type < P > generate K0(R). Clearly, any element of K0(R) is expressible as < P > − < Q >. Definition 7.4.1 Two finitely generated projective R-modules P and Q are said to be stably isomorphic,ifP ⊕ Rn is isomorphic to Q ⊕ Rn for some n. A projective R-module P is said to be stably free if it is stably isomorphic to a free R-module.

Remark 7.4.2 Clearly, isomorphic projective modules are stably isomorphic. How- ever, two stably isomorphic projective modules need not be isomorphic. For example, let V be an infinite-dimensional vector space over a field F, and R the ring of endo- morphisms of V . Then the module R over R is isomorphic to the module R ⊕ R (check it). As such, the trivial module is stably isomorphic to the module R,butR is not isomorphic to the trivial module.

Proposition 7.4.3 < P > = < Q > if and only if P and Q are stably isomorphic. In turn, < P > − < Q > = < P > − < Q > if and only if P ⊕ Q is stably isomorphic to P ⊕ Q.

Proof Suppose that P is stably isomorphic to Q. Then [P ⊕ Rn]=[Q ⊕ Rn] for some n. This means that < P > + < Rn > = < Q > + < Rn >.This shows that < P > = < Q >. Conversely, suppose that < P > = < Q >. Then [P]−[Q]∈N. Since N is the subgroup of F generated by the elements of the type [P]+[Q]−[P ⊕ Q], there exist elements [Pi ], [Qi ], i = 1, 2,...,n, and [L j ], [M j ], j = 1, 2,...,m in F such that

[ ]−[ ]=n ([ ]+[ ]−[ ⊕ ]) − m ([ ]+[ ]−[ ⊕ ]) P Q i=1 Pi Qi Pi Qi j=1 L j M j L j M j in F. Equivalently,

[ ]+n [ ⊕ ]+m ([ ]+[ ]) = P i=1 Pi Qi j=1 L j M j [ ]+m [ ⊕ ]+n ([ ]+[ ]). Q j=1 L j M j i=1 Pi Qi

Since F is free abelian on ℘(R), there is a bijective correspondence between the set of terms in the sum of the LHS to the set of terms in the sum of RHS so that the corresponding terms represent same elements in ℘(R). This ensures the existence of a finitely generated projective module U such that P ⊕ U is isomorphic to Q ⊕ U. Since U is projective, there is a module V such that U ⊕ V is isomorphic to Rn for some n. But, then P ⊕ Rn is isomorphic to Q ⊕ Rn. This shows that P is stably isomorphic to Q. The rest follows immediately. 

Let f be a homomorphism from a ring R1 toaringR2. A ring homomorphism is always assumed to preserve the identities of the rings. R2 can be treated as a right R1-module by defining r · a = rf(a), r ∈ R2, a ∈ R1. In fact, then R2 is a − ( , ) ⊗ bi R2 R1 module. If M is a left R1-module, then R2 R1 M is a left R2-module. 260 7 General Linear Algebra

We denote this R2-module by f(M). Since tensor product distributes over direct sum, the following assertions can be easily verified.

(i) f(P ⊕ Q) ≈ f(P) ⊕ f(Q), (ii) f takes finitely generated modules to finitely generated modules, (iii) f takes free modules to free modules, (iv) f takes projective modules to projective modules, (v) f defines a map from ℘(R1) to ℘(R2), and it respects the relation [P]+[Q]= [P ⊕ Q].

In turn, f induces a homomorphism K0( f ) from K0(R1) to K0(R2) given by K0( f )(< P > − < Q >) = < f(P)>− < f(Q)>. Further, (i) K0(IR) = ( ) = ( ) ( ) IK0(R), and (ii) K0 gof K0 g oK0 f . In the language of category theory, it says that K0 is a functor from the category of rings to the category of abelian groups. If R is a commutative ring, P and Q are finitely generated projective modules, then P⊗Q is again a finitely generated projective module. Since tensor product distributes over direct sum, we have a product · on K0(R) given by (< P > − < Q >) · (< P > − < Q >) = < P ⊗ Q > + < Q ⊗ Q > − < P ⊗ Q > − < Q ⊗ P >. It follows that K0(R) is a commutative ring. Thus, K0 defines a functor from the category of commutative rings to the category of commutative rings also. Proposition 7.4.4 Let R be a ring such that the following hold: (i) Rn is isomorphic Rm if and only if n = m. (ii) Every finitely generated projective module is free.

Then K0(R) is isomorphic to the group of integers.

Proof Under the hypothesis of the proposition ℘(R) =[Rn], and Rn is stably isomorphic to Rm if and only if n = m. Thus, [Rn]=< Rn >.Themapη from the set {< Rn > | n ∈ N} to N defined by η(< Rn >) = n is a bijective map such that η = η(< Rn >) + η(< Rm >). Suppose that < Rn > − < Rm > = < Rr > − < Rs >. Then < Rn+s > = < Rm+r >.This means that n + s = m + r. Thus, η can be extended to a map η from K0(R) to Z given by η(< Rn > − < Rm >) = n − m. Clearly, η is an isomorphism. 

Corollary 7.4.5 (i) If D is a division ring, then K0(D) is the group of integers. (ii) If R is a principal ideal domain, then K0(R) is the group of integers. (iii) If R is a local ring, then K0(R) is the group of integers.

Proof In each of the cases, the hypothesis of the above proposition is satisfied (for local ring see Exercise 5, 6, and 7 of the Sect. 7.2. 

Now, we introduce the functor K1 from the category of rings to the category of abelian groups. Let R be a ring with identity. Let us denote by GL(n, R) the group of invertible n×n matrices with entries in R. There is a natural embedding of GL(n, R) in to GL(n + 1, R)given by 7.4 Lower K-theory 261

A 0 A  . 01

Under this embedding, GL(n, R) can be treated as a subgroup of GL(n + 1, R).We get a chain of groups

GL(1, R) ⊆ GL(2, R) ⊆···⊆ GL(n, R) ⊆ GL(n + 1, R) ⊆······

The union of this chain is a group denoted by GL(R), and it is called the gen- × λ eral linear group over the ring R. The elementary n n matrices Eij (also called transvections) are members of GL(n, R). The Steinberg relations still hold among these matrices. Let E(n, R) denote the subgroup of GL(n, R) generated by these elementary matrices. We have a chain

E(1, R) ⊆ E(2, R) ⊆···⊆ E(n, R) ⊆ E(n + 1, R) ⊆······ of subgroups of GL(R). The union E(R) of this chain is a subgroup of GL(R). Recall that for a field F, every matrix of determinant 1 can be reduced to the iden- tity matrix using the elementary operations corresponding to transvections. In other words, the special linear group SL(F) ⊆ E(F). Already, the elements of E(F) are of determinant 1. Thus, SL(F) = E(F). Proposition 7.4.6 E(R) is a perfect group in the sense that [E(R), E(R)]=E(R). [ λ , μ ]= λμ λ = Proof One of the Steinberg relation is Eij E jl Eil . Taking 1, we observe that every transvection is a member of [E(R), E(R)]. Hence, [E(R), E(R)]= E(R). 

Proposition 7.4.7 Matrices of the type

IA I 0 0 −I , , and . 0 I AI I 0 are members of E(R).

Proof The result follows, if we observe that the matrices described in the proposition can be reduced to the identity matrix by applying the elementary operations associated λ  to the matrices Eij. Corollary 7.4.8 Let A ∈ GL(R). Then the matrix

A 0 . 0 A−1 is a member of E(R). 262 7 General Linear Algebra

Proof Follows from the above proposition and the identity

− A 0 = IA I 0 IA 0 I . 0 A−1 0 I −A−1 I 0 I I 0



Lemma 7.4.9 Whitehead Lemma. [GL(R), GL(R)]=E(R).

Proof Already E(R) =[E(R), E(R)]⊆[GL(R), GL(R)]. Thus, it is sufficient to observe that any commutator ABA−1 B−1 in GL(n, R) treated as an element of GL(2n, R) can be expressed as

−1 − − A 0 B 0 (BA) 0 ABA 1 B 1 = . 0 A−1 0 B−1 0 BA



Definition 7.4.10 The abelian group GL(R)/E(R) is called the Whitehead group of the ring R, and it is denoted by K1(R).

Thus, the Whitehead group K1(R) can be viewed as the group of equivalent matrices in GL(R), where two matrices A and B in GL(R) is said to be equivalent one can be obtained from the other by using elementary operations associated to transvections. For example, over fields, two matrices are equivalent if and only if they have same determinant. Note that a nonsingular matrix A with entries in a field F can be reduced to the matrix diag(1, 1,...,1, detA) by using the elementary operations associated to transvections. If f is a homomorphism from a ring R1 to a ring R2, then it induces a map from Mn(R1) to Mn(R2) which takes A =[aij] to f (A) =[bij], where bij = f (aij). In fact, it maps GL(R1) to GL(R2), and E(R1) to E(R2). In turn, it induces a homomorphism K1( f ) from K1(R1) to K1(R2). It can be easily observed that (i) ( ) = ( ) ( ) ( ) = K1 gof K1 g oK1 f , and (ii) K1 IR IK1(R). In the language of category theory, K1 defines another functor from the category of rings to the category of abelian groups. If R is a commutative ring, then determinant of a square matrix with entries in R makes sense, and then every element of E(R) is of determinant 1. Thus, E(R) ⊆ SL(R). In general E(R) = SL(R). We denote the group SL(R)/E(R) by SK1(R). It follows that K1(R) = SK1(R) ⊕ U(R), where U(R) is the group of units of R. For most of the commutative rings R, SL(R) = E(R), and in such cases K1(R) = U(R). For example, if R is a Field, or a Local ring, or an Euclidean domain, or the ring of integers in a num- ber field, the matrices of determinant 1 can be reduced to the identity matrix by λ using elementary operations associated to the transvections Eij. Thus, in these cases 7.4 Lower K-theory 263

SL(R) = E(R), and so K1(R) = U(R). However, there are commutative rings in which a matrix of determinant 1 may not be reducible to the identity matrix by using elementary operations associated to transvections. For example, consider the ring R = R[x, y]/, where  is the ideal generated by x2 + y2 − 1. Then the matrix x˜ −˜y A = , y˜ x˜ where x˜ = x +  and y˜ = y +  is a matrix of determinant 1. Using topological arguments (see “Algebraic K-Theory” by Milnor, p. 58), it can be shown that no nontrivial power of A can be in E(R). Thus, SK1(R) contains an element of infinite order. Exercises

7.4.1 Compute K0(Z6), and also K1(Z6).

7.4.2 Find K0(R), where R is the ring of endomorphisms of an infinite-dimensional vector space V .

7.4.3 Determine K0(M2(C)).

7.4.4 Show that K0(R1 × R2) ≈ K0(R1) × K0(R2).

7.4.5 Determine K0(Z[i]) and K1(Z[i]). Chapter 8 Field Theory, Galois Theory

This chapter is devoted to the theory of fields, Galois theory, geometric construc- tions by ruler and compass, and the theorem of Abel–Ruffini about the polynomials equations of degree n, n ≥ 5. We also discuss cubic and biquadratic equations.

8.1 Field Extensions

Let K be a subfield of a field L. Then we say that L is a field extension of K.The notation L/K is used to say that L is a field extension of K.IfL is a field extension of K, then (L, +) is a vector space over K (the multiplication by scalars being the field multiplication). If the dimension of (L, +) over K is infinite, then we say that L is infinite extension of K. If the dimension of (L, +) over K is finite, then the dimension of (L, +) over K is called the degree of the extension, and it is denoted by [L : K]. Proposition 8.1.1 Let K be a finite field. Then the number of elements in K is pn for some prime p, and for some n ∈ N.

Proof Since K is a finite field, its characteristic is a prime p.Themapi  i · 1isan injective homomorphism from the field Zp to the field K. Thus, Zp can be considered as a subfield of K, and since K is finite, it is a finite-dimensional vector space over Zp. Suppose that the dimension of K over Zp is n. Then K, as a vector space over Z Zn | |= n  p, is isomorphic to p. This shows that K p . Proposition 8.1.2 Let L/K and K/F be finite field extensions. Then L/Fisalso finite field extension, and [L : F]=[L : K][K : F].

Proof Suppose that [L : K]=n and [K : F]=m.Let{x1, x2,...,xn} be a basis of the vector space L over K, and {y1, y2,...,ym} be a basis of K over F. We show that S ={xiyj | 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L over F.Let x ∈ L. Since {x1, x2,...,xn} is a basis of L over K, x = a1x1 + a2x2 +···+anxn for some a1, a2,...,an in K. Further, since {y1, y2,...,ym} is a basis of K over

© Springer Nature Singapore Pte Ltd. 2017 265 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_8 266 8 Field Theory, Galois Theory

, = m ∈ =  F ai j=1αjiyj for some αji F. Thus, x j,iαjixiyj. This shows that S generates the vector space L over F. Now, we show that S is linearly independent over  = n (m ) = { , ,..., } F. Suppose that j,iαjixiyj 0. Then i=1 j=1αjiyj xi 0. Since x1 x2 xn m = { , ,..., } is linearly independent over K, j=1αjiyj 0 for all i. Again, since, y1 y2 ym is linearly independent over F, αji = 0 for all j, i. 

Corollary 8.1.3 Let L be a finite extension of K, and a F a subfield of L containing K. Then [L : F]/[L : K], and also [F : K]/[L : K]. 

Corollary 8.1.4 Let L be a finite field containing pn elements. Let K be a subfield of L. Then K contains pm elements, where m divides n.

Proof Since K is a subfield of L, charK = charL = p. Thus K contains pm elements. Suppose that [L : K]=r. Then L, as a vector space over K, is isomorphic to Kr, and so pn =|L |= (pm)r = pmr. Hence n = mr. 

Let L be a field extension of K.LetS be a subset of L. The subring of L generated by K S will be denoted by K[S]. Clearly, K[S] is the intersection of all subrings of L containing K S. The subfield of L generated by K S is the intersection of all subfields containing K S, and it is denoted by K(S). Clearly, K(S) is the field of fractions of K[S].IfS is finite, and K(S) = L, then we say that L is a finitely generated field extension of K. The ring K[S] is called the finitely generated ring extension of K. Let S ={α1, α2,...,αn} be a finite subset of L.Letf (X1, X2,...,Xn) be a polynomial in K[X1, X2,...,Xn]. Then f (α1, α2,...,αn) denotes the element of L which is obtained by substituting αi at the place of Xi in the polynomial ( , ,..., ) ( , ,..., ) f X1 X2 Xn for each i. It is clear that f α1 α2 αn belongs to each sub- ring which contains K S. Consider the map η from K[X1, X2,...,Xn] to L defined ( ( , ,..., )) = ( , ,..., ) by η f X1 X2 Xn f α1 α2 αn . Clearly, the map η is a ring homo- morphism whose image is the subring of L generated by K S. This subring is denoted by K[α1, α2,...,αn]. By the fundamental theorem of homomorphism, K[X1, X2,...,Xn]/kerη is isomorphic to K[α1, α2,...,αn].Ifkerη ={0}, then we say that the set {α1, α2,...,αn} is algebraically independent. More explic- itly, {α1, α2,...,αn} is said to be algebraically independent if there is no nonzero polynomial f (X1, X2,...,Xn) such that f (α1, α2,...,αn) = 0. It is said to be [ , ,..., ] algebraically dependent otherwise. It follows that K α1 α2 αn is the subfield K(α1, α2,...,αn) of L generated by K {α1, α2,...,αn} if and only if the kerη is a maximal ideal. We shall see that this is, indeed, the case if and only if each αi is a root of a nonzero polynomial in K[X]. Such elements are called algebraic elements over K. We first consider the case when n = 1. Let L be a field extension of K and α ∈ L. Consider the map η from K[X] to K[α] defined by η(f (X)) = f (α). Then η is a surjective ring homomorphism. Thus, by the fundamental theorem of homomorphism K[X]/kerη is isomorphic to K[α]. There are two cases: 8.1 Field Extensions 267

(i) kerη ={0}. (ii) kerη ={0}. In case (i), η is an isomorphism from K[X] to K[α], and we say that α is a transcendental element over K. More explicitly, α is a transcendental element over K if α is not a root of any nonzero polynomial f (X) ∈ K[X]. For example, π and the exponential e are transcendental over Q. In case (ii), there is a nonzero polynomial f (X) ∈ K[X] such that f (α) = 0, and in this case we say that α is an algebraic element over K. For example, a primitive cube 2πi 2 root ω = e 3 of unity, being a root of the nonzero polynomial X + X + 1 ∈ Q[X], is algebraic over Q. Suppose that α is an algebraic element over K. Then kerη ={0}. Since K[X] is a P.I.D, kerη is a nontrivial principal ideal. Suppose that kerη = < p(X)>= K[X]p(X).Letp(X) be another polynomial in K[X] such that kerη = < p(X)>= K[X]p(X). Then p(X) divides p(X), and also p(X) divides p(X). In turn, p(X) = ap(X) for some nonzero element a ∈ K. Thus, there is a unique monic polynomial p(X) ∈ K[X] such that kerη = < p(X)>. The unique monic polynomial p(X), thus obtained, is called the minimum polynomial of α over K, and it is denoted by minK (α)(X). Clearly, the minimum polynomial minK (α)(X) of α over K is the monic polynomial of smallest degree having α as a root. Conversely, suppose that f (X) is a monic polynomial of smallest degree having α as a root. Then minK (α)(X) divides f (X). By the division algorithm, there exist polynomials q(X) and r(X) such that minK (α)(X) = q(X)f (X) + r(X), where r(X) = 0orelsedeg(r(X)) < degf (X). Hence r(α) = 0. This implies that r(X) = 0, for other wise we shall arrive at a contradiction to the supposition that f (X) is the smallest degree polynomial having α as a root. Hence f (X) divides minK (α)(X).Itfollowsthatf (X) = minK (α)(X). Proposition 8.1.5 Let L be a field extension of K, and α ∈ L be an algebraic element over K. Then the minimum polynomial minK (α)(X) of α over K is an irreducible polynomial in K[X], and the ideal < minK (α)(X)>= K[X]minK (α)(X) is a maximal ideal of K[X].

Proof Suppose that minK (α)(X) = f (X)g(X), where f (X) and g(X) are noncon- stant polynomials in K[X]. Then degf (X)is a maximal ideal.  Proposition 8.1.6 Let L be a field extension of K, and α ∈ L. Then α is algebraic if and only if K[α]=K(α). Further, then [K(α) : K]=degminK (α)(X).  Proof Suppose that α is algebraic. The map η from K[X] to K[α] given by η(f (X)) = f (α) is a surjective homomorphism. From the fundamental theorem of homomorphism, K[X]/kerη is isomorphic to K[α]. By the above proposition, kerη = < minK (α)(X)>is a maximal ideal. Hence K[α] is a field, and so 268 8 Field Theory, Galois Theory

K[α]=K(α). Conversely, suppose that K[α]=K(α). Then K[α] is a sub- field of L. Hence α−1 ∈ K[α]. Suppose that

−1 2 n α = a0 + a1α + a2α + ··· + anα , where at least one ai is nonzero. Clearly,

2 n+1 f (X) =−1 + a0X + a1X + ··· + an+1X is a nonzero polynomial in K[X] such that f (α) = 0. It follows that α is algebraic. Finally, suppose that α is algebraic, and

2 n minK (α)(X) = a0 + a1X + a2X + ··· + anX .

It is sufficient to show that the set S ={1, α, α2,...,αn−1} is a basis of K(α) over K. Since K(α) = K[α], every element of K(α) is of the form f (α), where f (X) is a polynomial in K[X]. By the division algorithm, there exist polynomials q(X) and r(X) such that f (X) = q(X)minK (α) + r(X), where r(X) = 0, or else degr(X) ≤ n − 1. But, then f (α) = r(α). Thus, f (α) = r(α) is linear combination of members of S. This shows that S generates the vector space K(α) over K.Next, suppose that 2 n−1 a0 + a1α + a2α + ··· an−1α = 0, where ai ∈ K for each i. Then each ai is 0, for otherwise we get a nonzero polynomial

2 n−1 g(X) = a0 + a1X + a2X + ··· an−1X of degree less than the degree of minK (α)(X) such that g(α) = 0. This shows that S is also linearly independent. 

Proposition 8.1.7 Let L be a field extension of K. An element α ∈ L is algebraic over K if and only if K(α) is a finite extension of K.

Proof Suppose that α is algebraic over K. From the above proposition, it follows that K(α) is a finite extension of K. Conversely, suppose that K(α) is a finite exten- sion of degree n over K. Then the dimension of K(α) over K is n. Hence the set 2 n {1, α, α ,...,α } is linearly dependent. Thus, there exist a0, a1,...,an not all 0 such that 2 n a0 + a1α + a2α + ··· an−1α = 0. 8.1 Field Extensions 269

This gives us a nonzero polynomial

2 n g(X) = a0 + a1x + a2X + ··· anX such that g(α) = 0. It follows that α is algebraic over K. 

Proposition 8.1.8 Let L be a field extension of K, and α1, α2,...,αn be elements of L which are algebraic over K. Then K(α1, α2,...,αn) = K[α1, α2,...,αn].In particular, kerη is a maximal ideal of K[X1, X2,...,Xn].

Proof The proof is by the induction on n.Forn = 1, the result is the above proposition. Assume that the result is true for n.Letα1, α2,...,αn, αn+1 be elements of L which are algebraic over K. By the induction hypothesis,

K[α1, α2,...,αn+1]=K[α1, α2,...,αn][αn+1]=K(α1, α2,...,αn)[αn+1].

Since αn+1 is algebraic over K, it is also algebraic over K(α1, α2,...,αn).Fromthe above proposition,  K(α1, α2,...,αn)[αn+1]=K(α1, α2,...,αn+1).

Proposition 8.1.9 Let L be a field extension of K, and F be a subfield of L containing K. Let α ∈ L be algebraic over K. Then α is algebraic over F, and minF (α)(X) divides minK (α)(X) in F[X].

Proof Since minK (α)(X) belongs to K[X], it also belongs to F[X]. Again, since α is a root of minK (α)(X), the result follows. 

Definition 8.1.10 A field extension L of K is said to be an algebraic extension of K if every element of L is algebraic over K.

Evidently, we have the following proposition: Proposition 8.1.11 Let L be an algebraic extension of K and F be a subfield of L containing K. Then L is an algebraic extension of F. 

Proposition 8.1.12 Let L be a finite extension of K. Then L is an algebraic extension of K.

Proof Let L be a finite extension of K and α ∈ L.The[K(α) : K]≤[L : K] < ∞. Thus from the above proposition α is algebraic over K. 

Proposition 8.1.13 Let L be a field extension of K. Let L0 be the set of all algebraic elements of L over K. Then L0 is a subfield of L containing K which is the largest algebraic extension of K contained in L. 270 8 Field Theory, Galois Theory

Proof Every element α ∈ K is algebraic over K, for it is a root of X − α. Thus K ⊆ L0. Let α, β ∈ L0. Since α is algebraic over K, K(α) is a finite extension of K. Further since β is algebraic over K, it is also algebraic over K(α). Thus K(α, β) = K(α)(β) is a finite extension of K(α). It follows that K(α, β) is also a finite extension of K. Hence K(α, β) is an algebraic extension of K. Hence α ± β and α · β−1 are also algebraic over K, and so both of them belong to L0. This shows that L0 is a subfield. Clearly, this is the largest field contained in L which is algebraic over K. 

Definition 8.1.14 The subfield L0 of L in the above proposition is called the algebraic closure of K in L. Corollary 8.1.15 Let L be an algebraic extension of F, and F be an algebraic extension of K. Then L is an algebraic extension of K.

Proof Let α ∈ L. Since L is algebraic over F, α is algebraic over F.Let

n n−1 minF (α)(X) = X + a1X + ··· + an.

  Then α is algebraic over F = K(a1, a2,...an). Clearly, F (α) is a finite extension    of F , and [F (α) : F ]=n. Further, since a1 is algebraic over K, K(a1) is a finite extension of K. Again, since a2 is algebraic over K, and so also over K(a1),itfollows that K(a1)(a2) = K(a1, a2) is a finite extension of K(a1). In turn, it follows that  K(a1, a2) is a finite extension of K. Proceeding inductively, we find that F is a finite extension of K. But, then F(α) is also a finite extension of K. Hence every element of F(α) is algebraic over K. In particular α is algebraic over K. 

Definition 8.1.16 An extension L of K is called a simple extension if there is an element α ∈ L such that L = K(α). Such an element α is called a primitive element of the extension. Theorem 8.1.17 Let L be a finite extension of K. Then L is simple over K if and only if there are only finitely many intermediary field between L and K.

Proof Suppose that L = K(α) is a finite simple extension of K.LetF be a subfield of L containing K. Then α is algebraic over K and also over F. Clearly, minF (α)(X) is a divisor of minK (α)(X). We show that F is uniquely determined by the factor minF (α)(X) of minK (α)(X). Suppose that

2 n−1 n minF (α)(X) = a0 + a1X + a2X + ··· + an−1X + X ,

 where ai ∈ F. Consider the subfield K = K(a0, a1,...an−1) of F. Then   minK (α)(X) = minF (α)(X), and so [L : F]=[L : K ]. Hence F = K . This shows that F is uniquely determined by minF (α)(X). Since the number of monic polynomial divisors of minK (α)(X) are finitely many, we have only finitely many intermediary fields between L and K. Conversely, suppose that there only finitely many intermediary fields. Then we have to show that L is a simple extension of K. Suppose first that K is a finite field, 8.1 Field Extensions 271 and L is a finite extension of K. Then L is also a finite field. Further, we know that the multiplicative group L of nonzero elements of L is cyclic. Suppose that it is generated by α. Then it is clear that L = K(α). Assume, now, that K is infinite. Since [L : K] is finite, L is finitely generated extension of K. Suppose that L = K(α1, α2,...,αn). The proof is by the induction on n.Ifn = 1, then there is nothing to do. Suppose that the result is true for n. Suppose that L = K(α1, α2,...,αn, αn+1). Then, by the induction hypothesis K(α1, α2,...,αn) = K(α) for some α ∈ L. Clearly, L = K(α, αn+1). Consider the set {K(aα + αn+1) | a ∈ K}. Since the there are only finitely many intermediary fields between K and L, and K is infinite, there are distinct elements a, b ∈ K such that K(aα + αn+1) = K(bα + αn+1) = F (say). But, then α = ((aα + −1 αn+1) − (bα + αn+1))(a − b) belongs to K(aα + αn+1). This also shows that αn+1 ∈ K(aα + αn+1). Hence L = K(α, αn+1) = K(aα + αn+1). The proof is complete.  Now, we have some examples. Example 8.1.18 The field C of complex numbers is an extension field of R. Since {1, i} is a basis of the vector space C over R,wehave[C : R]=2. The field R of real numbers is an extension of the field Q of rational numbers. The dimension of R considered as a vector space over Q is infinite(since Qn is countable,√ any finite-dimensional vector space over Q is countable, but R is uncountable). 2isan element of R which is algebraic over Q, for it is a root of the polynomial X√2 − 2. Also, since√X2 − 2 is the smallest degree monic√ polynomial√ over Q of√ which 2isa ( )( ) = 2 − Q( ) = Q[ ]={ + | , ∈ Q} root, minQ 2√ X X 2. Further 2 2 a b 2 a b . Note that [Q( 2) : Q]=2. Example 8.1.19 Let ω be a primitive cube root of unity. Then ω is algebraic over Q, for it is a root of the equation X3 − 1. Further, since ω ∈/ Q, and it is also a root of X2 + X + 1, it follows that X2 + X + 2 is the minimum polynomial of ω over Q. Clearly, Q(ω) = Q[ω]={a + bω | a, b ∈ Q}, and [Q(ω) : Q]=2. √ √ √ Example 8.1.20 3 2 and ω are algebraic over Q. Hence Q[ 3 2, ω]=Q( 3 2, ω) is a finite extension of Q. We show that it is a simple extension of Q(in fact we shall show later that every finite extension of a field√ of characteristic 0 is√ simple). We find an element√ α ∈ C such√ that Q(α) = Q( 3 2, ω). Observe that√ 3 2 + ω belongs to Q( 3 2, ω). Put α = 3 2 + ω. We show that Q(α) = Q( 3 2, ω).We have (α − ω)3 = 2. Using the binomial expansion and the fact that ω2 =−1 − ω, we find that ω = (α3 − 3α − 3)(3α2 + 3α)−1. √ Note that α2 + α = 0.√ Thus, ω ∈ Q(α), and so 3 2 also belongs to Q(α).This shows that Q(α) = Q( 3 2, ω). Next, we find the minimum polynomial√ of α, and also the degree [Q(α) : Q]. Note√ that Q(α) is an extension of Q( 3 2), and it is also an√ extension of Q(ω). Since [Q( 3 2) : Q]=3(X3 − 2 is the minimum polynomial of 3 2) and [Q(ω) : Q]=2, it follows that 2 and 3 both divide [Q(α) : Q]. Thus 6 divides [Q(α) : Q].Next, 272 8 Field Theory, Galois Theory

α3 − 3α − 3 = ω(3α2 + 3α).

Putting the value of ω from the above equation in the equation ω2 + ω + 1 = 0, we get that

(α3 − 3α − 3)2 + (3α2 + 3α)2 + (3α2 + 3α)(α3 − 3α − 3) = 0.

This gives us a six degree monic polynomial

(X3 − 3X − 3)2 + (3X2 + 3X)2 + (3X2 + 3X)(X3 − 3X − 3) = 0. of which α is a root. Since [Q(α) : Q] is at least 6, it follows that this is the minimum polynomial of α, and [Q(α) : Q]=6. √ √ Example 8.1.21 Let p and q be distinct prime numbers. Then p and q are alge- Q 2 − 2 − braic√ over with minimum√ polynomials X √ p and X q respectively.√ Clearly,√ Q( ) ={+ | , ∈ Q} [Q( ) : Q]= ∈/ Q( ) p √ a b p a b √, and p 2. Also q p , Q( ) 2 − and√ so √q is algebraic over√ √p with√ minimum√ polynomial X q. Thus, [Q( , ) : Q]=[Q( , ) : Q( )][Q( ) : Q]= p √ q √ √ p√ q p √ p √ √4. We√ show Q( , ) = Q( + ) Q( + ) ⊆ Q( , ) that √ p √q p q . Note that p q p q . Put α = p + q. Then √ √ (α − p)2 = α2 + p − 2α p = q. √ √ 2 + − Q( ) ∈ ( ) This√ shows that α p 2α p belongs√ to √ α . Hence p Q α . Similarly, q ∈ Q(α). This shows that Q(α) = Q( p, q).Next,

(α2 + p − q)2 − 4pα2 = 0.

This shows that α satisfies a monic polynomial of degree 4 which is the minimum polynomial of α.

Example 8.1.22 Let p be a positive prime integer.√ By the Eisenstein irreducibility n − Q[ ] n criteria, X p is irreducible in X . Hence p is an algebraic√ element over Q of which Xn − p is the minimum polynomial. It follows that [Q( n p) : Q]=n.

Example 8.1.23 π and the exponential e are transcendental (not algebraic). It was Hermite who proved the transcendence of e in 1873. Earlier, the irrationality of e was established by Liouville. The transcendence of π was established by Lindemann. The proof of this fact can be found in Algebra by S. Lang. You may also refer to the corollary following the theorem 8.7.5. It is not known if π is transcendental over Q(e), or equivalently, it is not known if e is transcendental over Q(π).

Proposition 8.1.24 Let K(X) be the field of fractions of the polynomial ring K[X] (This is also called the function field over K in one variable). Let T ∈ K(X) − K. Then K(X) is an algebraic extension of K(T), and T is transcendental over K. 8.1 Field Extensions 273

= f (X) ( ) ( ) [ ] ( ) = Further,ifT g(X) , where f X and g X are co-prime in K X , then h Y Tg(Y) − f (Y) ∈ K(X)[Y] is a minimum degree polynomial over K(T) of which X is a root. In turn, [K(X) : K(T)]=max(degf (X), g(X)). = ( ) = f (X) Proof Put L K T , where T g(X) . We find the minimum degree polynomial in K(T)[Y] of which X is a root. Consider the polynomial h(Y) = Tg(Y) − f (Y) in K(T)[Y]. We first observe that h(Y) = 0 and degh(Y) = max(degf (X), degg(X)). If not, then the leading term of Tg(Y) and f (Y) should be same. Now, the leading coefficient of Tg(Y) is Ta for some a ∈ K, and the leading coefficient of f (Y) is b for some b ∈ K. This would mean that T = ba−1 ∈ K, a contradiction to the supposition that T ∈/ K. Clearly, X is a root of h(Y), and so X is algebraic over K(T). This also shows that K(X) is algebraic extension of K(T). It is sufficient, therefore, to show that h(Y) is irreducible in K(T)[Y]. We first observe that T is transcendental over K (in other words every element of K(X) − K is transcendental over K). For if not, then T would be algebraic over K. This says that K(T) is algebraic extension of K. Since K(X) is already algebraic extension of K(T), this would mean that K(X) is algebraic extension of K, a contradiction to the fact that X is transcendental over K. Thus, K[T] is isomorphic to the polynomial ring over K. Since f (X) and g(X) are co-prime in K[X], h(Y) = Tg(Y) − f (Y) is a primitive polynomial of degree 1 in K[Y][T]. Hence it is irreducible in K[Y][T]=K[T][Y], and so also it is irreducible in K(T)[Y]. The result follows. 

Remark 8.1.25 Luroth proved in 1876 a fundamental result in the theory of algebraic function fields that any subfield F of L = K(X) containing K is again of the form K(T), where, of course, T is transcendental over K. When K is algebraically closed field in the sense that there is no proper algebraic extension of K, then a result of Castelnuovo says that every subfield of K(X, Y) containing K is of the form K(T), or it is of the form K(T, S). The result of this kind is not true for algebraic function field in 3 variables.

Example 8.1.26 Let L = K(X) be the field of rational functions over K in one variable. We determine the Galois group of K(X) over K (the group of automorphisms of K(X) fixing the members of K). Let T ∈ K(X) be such that K(X) = K(T). Then ∈/ = f (X) ( ) ( ) [ ] T K. Suppose that T g(X) , where f X and g X are co-prime in K X . Then [K(X) : K(T)]=1, and so from the previous proposition, max(degf (X), g(X)) = = aX+b , , , ∈ 1. Thus, T cX+d for some a b c d K. Interchanging the role of X and T we = uT+v ,v, , ∈ observe that X pT+q for some u p q K. Thus

aX + b aX + b X(p( ) + q) = u( ) + v, cX + d cX + d or equivalently,

p(aX2 + bX) + q(cX2 + dX) = uaX + ub + vcX + vd. 274 8 Field Theory, Galois Theory

Thus, comparing the coefficient of same powers of X, we obtain that pa + qc = 0 = ub + vd, and pb + qd = ua + vc = 0. This shows that the matrix   ab cd is a nonsingular 2 × 2 matrix, and so it belongs to the group GL(2, K). Conversely, ( , ) = aX+b given such a matrix in GL 2 K , we can solve X in terms of T cX+d , and so K(X) = K(T). It follows that any element   ab cd

( , ) ( ( )/ ) aX+b of GL 2 K determines an element of G K X K which takes X to cX+d .This defines a surjective homomorphism η (say) from GL(2, K) to G(K(X)/K). Since the T = X if and only if a = d and b = 0 = c, the kernel of η is the normal subgroup of GL(2, K) consisting of the scalar matrices. The subgroup of the scalar matrices is precisely the center of GL(2, K). The quotient group of GL(2, K) mod- ulo its center is called the projective general linear group, and it is denoted by PGL(2, K). Thus, the group G(K(X)/K) of K—automorphisms of K(X) is iso- morphic to PGL(2, K). Exercise (( )) ={∞ n | ∈ , ∈ Z} 8.1.1 Let K be a field, and K X n=manX an K m be the set of formal power series with coefficients in K. Define + by

∞ ∞ ∞ n n n  anX +  bnX = (an + bn)X . n n n and the multiplication · by

∞ ∞ ∞ n−r n n n (  anX ) · (  bnX ) =  (  anbn−l)X . n=m n=r n=m+r l=m

Show that K((X)) is a field with respect to the above operation, and it is a field extension of K(X). 8.1.2 Give an example of an algebraic extension which is not a finite extension. √ √ 8.1.3 Show that Q( 4 5, 7) is a finite extension of Q. Find its degree. Find also a primitive element of the extension together with the minimum polynomial of that primitive element. 8.1.4 Find the minimum polynomial of a primitive 11th root of unity over Q.

8.1.5 Let L be a field extension of K.Letα1, α2,...,αr be elements of L which are algebraic over K. Show that 8.1 Field Extensions 275

r [K(α1, α2,...,αr) : K]≤ [K(αi) : K]. i=1

Show, by means of an example, that the strict inequality may hold. Find a sufficient condition for the equality. √ √ 8.1.6 Show that the fields Q( p) and Q( q), where p and q are distinct primes, are not isomorphic whereas they are isomorphic as vector spaces over Q.

8.1.7 Let L be an algebraic extension of K. Show that any subring of L which contains K is a field.

8.1.8 Let L be a finite field extension of K.LetF1 and F2 be intermediary fields. The smallest subfield of L containing F1 and F2 is called the composite of F1 and F2, and it is denoted by F1F2. Show that

[F1F2 : K]≤[F1 : K][F2; K].

Further, show that the equality holds provided that [F1 : K] and [F2 : K] are co-prime. Give an example to show that the strict inequality may hold. √ √ 8.1.9 Find the degree of Q( 5 2, 7) over Q, and also find a primitive element for the extension.

8.1.10 Let Q be the algebraic closure of Q in C. Show that [Q : Q] is infinite.

8.1.11 Let m be co-prime to [K(α) : K]. Show that K(α) = K(αm).

8.1.12 Show that the composite of two field extensions F1 and F2 is algebraic if and only if both are algebraic.

8.1.13 Show,√ by means of an example, that [L : K]=3 need not mean that L = K( 3 α) for some α.

8.1.14 Show that Sinm0 is algebraic for all rational m.

8.2 Galois Extensions

To each polynomial f (X) ∈ K[X], Galois attached a group, called the Galois group of the polynomial f (X). It is essentially a group of permutations of roots of f (X) in certain extension field L of K over which f (X) splits into linear factors. He showed that the polynomial equation f (X) = 0 can be solved using field and radical operations if and only if the Galois group of f (X) is a solvable group. Here, we follow a slightly different but equivalent approach due to Artin in which we proceed with the Galois group of an extension. 276 8 Field Theory, Galois Theory

Let L and L be two extensions of K. A ring homomorphism f from L to L such that f (a) = a for all a ∈ K is called a K-homomorphism.Clearly,aK-homomorphism f from L to L is also a vector space homomorphism from L to L considered as vector spaces over K,forf (αa) = f (α)f (a) = αf (a) for all α ∈ K, and a ∈ L. Since L is a field ker f ={0},orker f = L. Since f takes identity to identity, ker f ={0}.This shows that f is injective. If f is also a bijection, then f is called a K-isomorphism. Thus, if [L : K]=[L : K] < ∞, then f is an isomorphism (an injective linear transformation between vector spaces of same dimensions is an isomorphism). In particular, if L is a finite field extension of K, then any K-homomorphism from L to L is an isomorphism. This is called a K-automorphism of L. Definition 8.2.1 Let L be a field extension of K. Then the set of all K - automor- phisms of L form a group under composition of maps. This group is called the Galois group of the extension L of K, and it is denoted by G(L/K).

Definition 8.2.2 Let L be a field, and X be a subset of the group Aut(L).LetF(X) denote the set of all elements a ∈ L such that σ(a) = a for all σ ∈ X. Then, F(X) is a subfield of L, and it is called the fixed field of X. It is clear that F(X) = F(< X >).

Observe that if X ⊆ G(L/K), then F(X) is an intermediary field of the field extension L of K. Let S(G(L/K)) denote the set of all subgroups of the Galois group G(L/K) of the field extension L of K, and SF(L/K) denote the set of all intermediary fields. Then, we have a map  from S(G(L/K)) to SF(L/K) defined by (H) = F(H) (the fixed field of H), and a map  from SF(L/K) to S(G(L/K)) defined by (F) = G(L/F). The aim of this and the following two sections will be to show that in case L is a finite extension of K, these two maps are inverses to each other if and only if K = F(G(L/K)). Definition 8.2.3 Let L be an algebraic extension of K. We say that L is a Galois extension of K if K = F(G(L/K)).

Proposition 8.2.4 Let L be a field extension of K. Then,

1. K1 ⊆ K2, K1, K2 ∈ SF(L/K) =⇒ G(L/K2) ⊆ G(L/K1). 2. H1 ⊆ H2, H1, H2 ∈ S(G(L/K)) =⇒ F(H2) ⊆ F(H1). 3. S ⊆ G(L/K) =⇒ F(S) = F(< S >). 4. K ∈ SF(L/K) =⇒ K ⊆ F(G(L/K)). 5. S ⊆ G(L/K) =⇒ < S >⊆ G(L/F(< S >)). 6. F(H) = F(G(L/F(H))) for all H ∈ S(G(L/K)). 7. G(L/K) = G(L/F(G(L/K))) for all K ∈ SF(L/K).

Proof 1 and 2 follow from the definitions. Since S ⊆ < S >,ifa is fixed by every element of < S >, then it is also fixed by every element of S. Hence F(< S >) ⊆ F(S). Further, suppose that a is fixed by every element of S. Then it is fixed by every powerofelementsofS, and so also by the products of powers of elements of S. Thus, F(S) ⊆ F(< S >). This proves 3. 4 and 5 also follow from the definitions. 8.2 Galois Extensions 277

Now, we prove 6. Let a ∈ F(H).Ifσ ∈ G(L/F(H)), then by the definition σ(a) = a. This shows that a ∈ F(G(L/F(H))). Thus, F(H) ⊆ F(G(L/F(H))). Next, by 5, H ⊆ G(L/F(H)).By2,F(G(L/F(H))) ⊆ F(H). This completes the proof of 6. Finally, we prove 7. From 4, it follows that K ⊂ F(G(L/K)), and so by 1, G(L/F(G(L/K))) ⊆ G(L/K).Letσ ∈ G(L/K). Then by the definition σ will fix every member of F(G(L/K), and hence it belongs to G(L/F(G(L/K))).This completes the proof of 7. 

Example 8.2.5 G(C/R) ={IC, σ}, where σ denotes the complex conjugation of C:Letσ be a nonidentity R-automorphism of C.Leta + ib ∈ C. Then σ(a + ib) = σ(a) + σ(i)σ(b) = a + σ(i)b. Since σ is an automorphism −1 = σ(−1) = σ(i2) = σ(i)σ(i) = (σ(i))2. This shows that σ(i) =±i. Since σ is nonidentity, σ(i) =−i. Thus, σ(a + ib) = a − ib = a + ib.ItfollowsthatR = F(G(C/R)), and so the extension C of R is a Galois extension. √ Example 8.2.6 As in the√ above example√ it can be seen that G(Q( 2)/Q√) = { √ , } ( + ) = − ( ) IQ( 2) σ , where σ a b 2 a b 2. It follows that the extension Q 2 is also a Galois extension of Q. √ √ Example 8.2.7 Consider the extension Q( 3 5) of Q, where 3√5 is the real cube root 3 √ G(Q( )/Q) ={I 3 } of 5. It is not√ a Galois extension: We first show that 5 Q( 5) . 3 1 2 Clearly,√ Q( 5) ={a + b5 3 + c5 3 | a, b, c ∈ Q}. The other cube roots of√ 5 are not in Q( 3 5), for they are not real numbers. If σ is an automorphism of Q( 3 5), then 1 3 1 1 5 = σ(5) = (σ(5 3 ) ). Thus, σ(5 3 ) = 5 3 . Hence √σ is the identity map. Thus, the Galois group of the extension is trivial, and F(G(Q 3 5)) = Q. Hence it is not a Galois extension.

Since any field automorphism of the field R of real numbers fixes Q, and since it also preserves order, it is the trivial identity automorphism. Thus, the Galois group of R over Q is trivial. In turn, it follows that R is not a Galois extension of Q.

Proposition 8.2.8 Let L1 and L2 be a field extensions of K. Let σ beaK- homomorphism from L1 to L2. Let α be an element of L1 which is algebraic over K. Then σ(α) is also algebraic over K, and minK (α)(X) = minK (σ(α)(X).Fur- ther, σ permutes the roots of minK (α)(X) if L1 = L2.

Proof Since α is algebraic over K, there exists a nonzero polynomial f (X) ∈ K[X] such that f (α) = 0. Suppose that

2 n f (X) = a0 + a1X + a2X + ··· + anX , where each ai ∈ K. Then, since σ is a K-homomorphism,

= ( ) = ( ( )) = n ( ) ( )i = n ( )i = ( ( )). 0 σ 0 σ f α i=0σ ai σ α i−0aiσ α f σ α 278 8 Field Theory, Galois Theory

Thus, σ(α) is also algebraic. Further, minK (α)(σ(α)) = 0, and so minK (σ(α))(X) divides minK (α)(X), and since the later is irreducible, minK (α)(X) = minK (σ (α))(X).  Definition 8.2.9 Let L be a field extension of K. We say that an element α ∈ L is K-conjugate to an element β ∈ L if there exists an element σ ∈ G(L/K) such that σ(α) = β. Corollary 8.2.10 If α is an algebraic element of L over K, then there are at most deg(minK (α)(X)) conjugates of α in L over K. Proof It is clear from the above result that if β is conjugate to α, then β is a root of minK (α)(X), and minK (α)(X) = minK (β)(X).  Example 8.2.11 The number of conjugates to an algebraic element may be strictly less than deg(minK (α)(X)): Consider the field Zp(X) of rational functions in one variable over the prime field Zp.Let

1 1 1 r L = Zp(X)(X p ) = Zp(X)[X p ]={a0(X) + a1(X)X p +···+ar(X)X p |

ai(X) ∈ Zp(X), 0 ≤ r ≤ p − 1}.

Then it can be easily seen that L is a field with respect to the usual addition and 1 multiplication of polynomials, and it is a field extension of Zp(X). The element X p p is algebraic over Zp(X), for it is a root of Y − X in Zp(X)[Y]. By the Eisenstein p irreducibility criteria Y − X is irreducible in Zp(X)[Y], and so it is the minimum 1 1 1 polynomial of X p . Since Y p − X = (Y − X p )p, it follows that X p is self conjugate, and no other element is conjugate to it. Proposition 8.2.12 Let L be a field extension of K. Let α and β be elements of L which are algebraic over K. Suppose that they are K-conjugate. Then there is a K-isomorphism from K(α) to K(β) which takes α to β. Proof It follows from the results above that if α and β are conjugates, then minK (α)(X) = minK (β)(X).Themapf (X)  f (α) defines a surjective homomor- phism from K[X] to K[α]=K(α) whose kernel is the ideal < minK (α)(X)>. Thus, by the fundamental theorem of homomorphism, we have an isomorphism σ from K[X]/to K(α) such that σ(f (X)+ < minK (α)(X)>) = f (α). Clearly, σ(a+ < minK (α)(X)) = a for all a ∈ K and σ(X+ < minK (α)(X)) = α. Similarly, we have an isomorphism τ from K[X]/= K[X]/< −1 minK (α)(X)>to K(β) such that τ(f (X)+ < minK (α)(X)) = f (β). Clearly τoσ is a K-isomorphism from K(α) to K(β) which takes α to β.  Corollary 8.2.13 Let α be an algebraic element of L over K. Then there is a bijective map η from G(K(α)/K) to the set of roots of minK (α)(X) defined by η(σ) = σ(α). In particualr, | G(K(α)/K) | is the number of distinct roots of minK (α)(X) which areinK(α). 8.2 Galois Extensions 279

Proof If σ is a K-automorphism of K(α), then it is uniquely determined by σ(α) which has the same minimum polynomial as α. Conversely if β is a root of the minimum polynomial of α which belongs to K(α), then it follows from the above proposition that there is a K-isomorphism from K(α) to K(β). But, since β ∈ K(α), and [K(α) : K]=deg(minK (α)(X)) = deg(minK (β)(X)) =[K(β) : K] it follows that K(β) = K(α). The result follows. 

1 Example 8.2.14 In this example, we calculate the Galois group of Q(2 3 , ω) = 1 1 Q(2 3 +ω) over Q: Looking at the minimum polynomial of 2 3 +ω over Q, which we have already found in Example8.1.20, we see that the other 5 roots of the minimum 1 1 2 1 1 2 2 1 2 2 1 polynomial of 2 3 + ω are 2 3 + ω , 2 3 ω + ω, 2 3 ω + ω , ω2 3 + ω , and ω 2 3 + ω. The Galois group of this extension is, therefore, of order 6. Denote these elements by α1, α2, α3, α4, α5, α6.Letσi be the automorphism which takes α1 to αi. Then, σ1 is 2, 3, 2, 2, 3 the identity automorphism. It can be checked that σ2 σ3 σ4 σ5 σ6 are all identity maps. Thus, the Galois group of this extension is the symmetric group S3 of degree 3.

Proposition 8.2.15 Let L be a finite field extension of K. Then the Galois group G(L/K) is finite.

Proof Let {α1, α2,...,αr} be a basis of L over K. Clearly, L = K(α1, α2, ··· , αr), and any K-automorphism of L is uniquely determined by its effect on α1, α2,...,αr. Since any K-automorphism of L will take an element of L to one of its conjugates, and since there are only finitely many conjugates to an algebraic element α (at most deg(minK (α)(X)) conjugates of α are there), it follows that a K-automorphism of L can take finitely many values on each αi. Hence G(L/K) is finite. 

Definition 8.2.16 Let K be a field and G a group. A group homomorphism from G to K (the multiplicative group of nonzero members of K)iscalledacharacter of G in K. Thus, a character of G over K is just one dimensional representation of G over K.

Consider the set F(G, K) of all maps from G to K. This is a vector space over K with respect to the point wise addition and multiplication by scalars. The dimension of this space is the cardinality | G | of G (verify). Let Ch(G, K) denote the set of all characters of G in K. Then Ch(G, K) ⊆ F(G, K). We have the following result due to Dedekind. Theorem 8.2.17 (Dedekind) Ch(G, K) is linearly independent subset of the vector space F(G, K).

Proof Suppose the contrary. Then there is a finite subset of Ch(G, K) which is linearly dependent. Let S ={σ1, σ2,...,σn | σi = σj for i = j} be a minimal finite linearly dependent subset of Ch(G, K). Then there exist α1, α2,...,αn in K not all zero such that α1σ1 + α2σ2 + ··· + αnσn = 0 ······ , (8.2.1) 280 8 Field Theory, Galois Theory where 0 in the RHS is the zero of F(G, K). This means that

α1σ1(g) + α2σ2(g) + ··· + αnσn(g) = 0 ······ (8.2.2) for all g ∈ G. Indeed, all αi are nonzero, for otherwise we shall get a proper subset of S which is linearly dependent, a contradiction to the minimality of S. Since σ1 = σ2 there exists h ∈ G such that σ1(h) = σ2(h). Multiplying the Eq. (8.2.2)byσ1(h) we get that n ( ) ( ) = ······ i=1σ1 h αiσi g 0 (8.2.3) for all g ∈ G. Further substituting hg at the place of g in the Eq. (8.2.2), we get

n ( ) = i=1αiσi hg 0 for all g ∈ G. Since each σi is a homomorphism, we have

n ( ) ( ) = ······ i=1αiσi h σi g 0 (8.2.4) for all g ∈ G. Subtracting the Eq. 8.2.4 from the Eq. 8.2.3, we get that

n ( ( ) − ( )) ( ) = i=2 σ1 h σi h αiσi g 0 for all g ∈ G. Put bi = (σ1(h) − σi(h))αi, i ≥ 2. Then, since σ1(h) = σ2(h) and each αi = 0, it follows that b2 = 0. Also

b2σ2(g) + b3σ3(g) + ··· + bnσn(g) = 0 for all g ∈ G, where b2 = 0. This means that

b2σ2 + b3σ3 + ··· + bnσn = 0, where 0 in the RHS is the zero of F(G, K). This is a contradiction to the minimality of S.  Corollary 8.2.18 Let L be a finite field extension of K. Then | G(L/K) |≤ [L : K]. Proof We have already seen that under the hypothesis of the corollary G(L/K) is finite. Suppose that G(L/K) ={σ1, σ2,...σn} where σi = σj for i = j. Suppose the contrary. Then [L : K] < n. Suppose that [L : K]=m and {x1, x2,...,xm} is m a basis of L over K. Consider the elements C1, C2,...,Cn of the set L consisting of row vectors given by

Ci = (σi(x1), σi(x2),...,σi(xm)).

m Since L is a m-dimensional vector space over L and n > m,theset{C1, C2,...,Cn} is linearly dependent. Thus, there exist α1, α2,...,αn in L not all zero such that 8.2 Galois Extensions 281

α1C1 + α2C2 + ··· + αnCn = 0, where 0 in the RHS is the zero of Lm. This means that

α1σ1(xj) + α2σ2(xj) + ··· + αnσn(xj) = 0

∈ { , ,..., } , = m for all j.Letx L. Then, since x1 x2 xm is a basis of L over K x j=1βjxj. But then since each σi is a K automorphism, we have

n ( ) = m n ( ) = . i=1αiσi x j=1βj i=1αiσi xj 0

Since not all αi are zero, {σ1, σ2,...,σn} is linearly dependent. Further each σi is a  character of L in L, and hence by the Dedekind theorem {σ1, σ2,...,σn} should be linearly independent in F(L, L). This is a contradiction. Hence the assumption that m < n is false. 

Theorem 8.2.19 Let K be the fixed field of a finite group G of automorphisms of a field L. Then | G |= [L : K].

Proof By the definition G ⊆ G(L/K), and so by the previous theorem | G |≤| G(L/K) |≤ [L : K]. Suppose that | G | < [L : K].LetG ={σ1, σ2,...,σn}, where σi = σj for i = j. Then since n is supposed to be strictly less than [L : K], there exists a subset {x1, x2,..., xn+1} of L which contains n + 1 elements, and n which is linearly independent over K. Consider the subset {C1, C2,...,Cn+1} of L consisting of rows with n columns and entries in L, where

Ci = (σ1(xi), σ2(xi), . . . , σn(xi)) i = 1, 2,...,n+1. Since the dimension of Ln over L is n, this set is linearly dependent. Rearranging if necessary, we may assume that {C1, C2,...,Cm} is a minimal subset which is linearly dependent. Then there exist α1, α2,...,αm in L not all zero such that α1C1 + α2C2 + ··· + αmCm = 0,

n where 0 in the RHS is the zero of L . The minimality assumption implies that all αi are nonzero. Dividing by α1 we may assume that α1 = 1. Thus, we have

m ( ) = ······ i=1αiσj xi 0 (8.2.5) for all j, 1 ≤ j ≤ n, where α1 = 1, and no αi is zero. Let σ be an arbitrary element of G. Then applying σ on the above equation we get that

m ( ) ( ) = i=1σ αi σσj xi 0 282 8 Field Theory, Galois Theory for all j. Since multiplication by an element of the group to the elements of the group permutes the elements of the group, we have

m ( ) ( ) = ······ i=1σ αi σj xi 0 (8.2.6) for all j. Subtracting the Eq. 8.2.2 from the Eq. 8.2.1, and observing that α1 = 1, we see that m ( − ( )) ( ) = i=2 αi σ αi σj xi 0 for all j. It follows from the minimality assumption of m that (αi − σ(αi)) = 0 for all i. Thus, σ(αi) = αi for all i. Since σ is an arbitrary element of G, each αi belongs to K thefixedfieldofG. But, then, since each σi ∈ G, we see that

(m ) = σj i=1αixi 0 for all j. Since σj is an automorphism, we have

m = . i=1αixi 0

This is a contradiction to the fact that each αi is nonzero, and {x1, x2,...,xn+1} is linearly independent over K.  Corollary 8.2.20 Under the hypothesis of the above theorem, G = G(L/K).In other words G(L/F(G)) = G. Proof Already G ⊆ G(L/K). Also from the Dedekind theorem, and the above theorem, we have | G(L/K) |≤ [L : K]=|G |. The result follows.  Corollary 8.2.21 Let L be a finite field extension of K. The L is a Galois extension of K if and only if | G(L/K) |= [L : K]. Proof Suppose that L is a Galois extension of K. Then by the definition K = F(G(L/K)). Hence from the above theorem | G(L/K) |= [L : K]. Conversely, suppose that | G(L/K) |= [L : K].LetL1 = F(G(L/K)). Then K ⊆ L1.Fromthe hypothesis and from the previous theorem [L : K]=|G(L/K) |= [L : L1]≤[L : K]. Thus, [L : L1]=[L : K], and so K = L1.  Corollary 8.2.22 Let L be a field extension of K, and α be an element of L which is algebraic over K. Then K(α) is Galois extension of K if and only if there are degminK (α)(X) distinct roots of minK (α)(X) all belonging to K(α). Proof We have already seen that | G(K(α)/K) | is equal to the number of distinct roots of minK (α)(X) which belong to K(α). Further, we know that [K(α) : K]= deg(minK (α)(X)). From the previous theorem, it follows that K(α) is a Galois exten- sion of K if and only if | G(K(α)/K) |= [K(α) : K]. The result follows. 

1 1 1 Example 8.2.23 Q(2 3 , ω) = Q(2 3 +ω) is a Galois extension of Q,for| G(Q(2 3 + 1 ω)/Q) |=|S3 |= 6 =[Q(2 3 + ω) : Q]. 8.2 Galois Extensions 283

Example 8.2.24 Consider the field L = K(X1, X2,...,Xn) of rational func- tions in n variable over the field K. It is the field of fractions of the polynomial ring K[X1, X2,...,Xn].Ifp ∈ Sn is a permutation of degree n, then we have an unique isomorphism from K[X1, X2,...,Xn] to itself given by f (X1, X2,...,Xn)  f (Xp(1), Xp(2),...,Xp(n)) which extends uniquely to an automorphism of L. Thus, Sn is isomorphic to a subgroup of Aut(L).LetG be a subgroup of Sn. Then from what we have done above, it follows that L is a Galois extension of F(G), and the Galois group of this extension is G. This, in particular, says that every finite group will appear as a Galois group of some Galois extension.

Exercises

8.2.1 Determine the Galois group of the following extensions. Which of them are Galois extensions:

2πi (i) Q(e 5 ) over Q. 1 Q( 3 ) Q (ii) 5√ over√ . (iii) Q( p, q) over Q, where p and q are distinct primes. (iv) R over Q. (v) C over Q.

1 8.2.2 We have seen that Q(2 3 , ω) is Galois extension of Q with Galois group S3. Find the fixed field K of A3.IsK a Galois extension of Q? Support. Find also a fixed field of a subgroup of order 2 in S3. Is that a Galois extension of Q? support. 8.2.3 Suppose that L is a Galois extension of K, and F a subfield of L containing K. Show that L is also a Galois extension of F. Show by means of an example that F need not be a Galois extension of K. √ √ 8.2.4 Is Q( 3i) isomorphic to Q( 3)? support.

8.2.5 Let K and L be two extensions of Q of degrees 2. Are they necessarily iso- morphic? Support.

8.2.6 Find the Group of automorphisms of Zp(X). Show that it is finite. Find its order.

8.2.7 Let ρ be the primitive 4th root of unity. Find the Galois group of Q(ρ)/Q.Is this a Galois extension? Support.

8.2.8 Show that any field extension L of degree 2 of a field K of characteristic different from 2 is a Galois extension.

  8.2.9 Let L be a finite field extension of K, and L any extension of K.LetK (L, L ) be the set of all field homomorphisms from L to L which fix the members of K (in particular K (L, L) = G(L/K)). Use the Dedekind theorem and imitate the Proof  of Corollary 8.2.18 to show that | K (L, L ) |≤ [L : K]. 284 8 Field Theory, Galois Theory

8.2.10 Show that a finite field extension L of K is a Galois extension if and only if the following two conditions hold.

  (i) There is a field extension L of L such that | K (L, L ) |= [L : K]. (ii) Given any field homomorphism σ from L to a field extension L of L which fixes every element of K, σ(L) = L.

8.2.11 Prove the following generalization of the result in Exercise8.2.9:LetK and K be fields. Let L be a finite extension of K and L an extension of K. Show that the number of extensions of σ to homomorphisms from L to L is at most [L : K].

8.3 Splitting Field, Normal Extensions

A finite field extension L of K may cease (see the Exercise8.2.10 of the previous section) to be a Galois extension because of any of the following two reasons: (i) There is field extension L of L and a field homomorphism σ from L to L fixing each element of K such that σ(L) = L.   (ii) Given any field extension L of L, | K (L, L ) | < [L : K]. In this section, we study those finite extensions L of K for which (i) is not the reason.

Definition 8.3.1 A finite extension L of K is called a normal extension if given any field extension L of L, and a field homomorphism σ from L to L which fixes every element of K, σ(L) = L.

Definition 8.3.2 A finite extension L of K is called a separable extension if there   exists an extension L of L such that | K (L, L ) |= [L : K].

Thus, a finite extension is a Galois extension if and only if it is separable as well as normal. Separable extensions will be the subject matter of study of the next section. Theorem 8.3.3 Let K be a field, and f (X) be a nonconstant polynomial of degree n over K. Then there exists a field extension L of K such that [L : K]≤n, and f (X) has a root in L.

Proof Every nonconstant polynomial in K[X] is product of irreducible polynomials of positive degrees in K[X], and if an element α in a field extension L of K is a root of an irreducible factor of f (X), then it is also a root of f (X). Thus, it is sufficient to prove the result for irreducible polynomials over K[X] of positive degrees. Let p(X) be an irreducible polynomial in K[X] of degree n. Since K[X] is a principal ideal domain, the ideal < p(X)>generated by p(X) is a maximal ideal. Thus, the quotient ring K[X]/< p(X)>is a field. Let us denote it by F. Define a map η from K to F by η(a) = a + < p(X)>. It is easily seen that η is a ring homomorphism. 8.3 Splitting Field, Normal Extensions 285

Suppose that η(a) = η(b). Then a + < p(X)>= b + < p(X)>. This means that a − b belongs to < p(X)>. Since p(X) is irreducible polynomial of positive degree, this is possible if and only if a − b = 0. Thus a = b. This shows that η is injective homomorphism and so it is an embedding. Any element of F is of the form f (X) + < p(X)>= r(X) + < p(X)>, where r(X) is the remainder obtained when we divide f (X) by p(X). It is also clear that if r1(X) + < p(X)>= r2(X) + < p(X)>, and deg(ri(X)) < deg(p(X)), i = 1, 2, then r1(X) = r2(X). Let ℘n(K) denote the set of all polynomials in K[X] whose degrees are less that n. Then from what we have proved above it follows that the map ρ from ℘n(K) to K[X]/defined by ρ(r(X)) = r(X) + < p(X)>is a bijective mapping whose restriction to K is η. Pullback the operations of K[X]/to those of ℘n(K) with the help of the map ρ. The operations ⊕ and  on ℘n(K), thus obtained, are given by r(X) ⊕ s(X) = r(X) + s(X), and r(X)s(X) = t(X), where t(X) is the remainder obtained when r(X) · s(X) is divided by p(X). Clearly, ℘n(K) becomes a field such that ρ is an isomorphism, and K is a subfield of ℘n(K). Further, X ∈ ℘n(K), and X is a root of p(X),forp(X) represents 0 in ℘n(K).  Let f (X) be a polynomial in K[X]. We say that f (X) has all its roots in a field extension L of K if f (X) splits into product of linear factors in L[X] in the sense that f (X) = a(X − a1)(X − a2)...(X − an) for some a ∈ K and a1, a2,...,an in L. We also say that f (X) splits over L. Corollary 8.3.4 Let f (X) ∈ K[X] be a polynomial of degree n. Then there is a field extension L of degree at most n! such that f (X) has all roots in L. Proof The proof is by the induction on the degree of f (X).Iff (X) is of degree 1, then it has only one root which is in K. Assume that the result is true for all those polynomials whose degrees are less than n.Letf (X) be a polynomial of degree n. From the above theorem it follows that there is an extension L of K which has a root α (say) of f (X), and [L : K] is at most n. Then f (X) = (X − α)g(X), where g(X) is a polynomial in L[X] of degree n − 1. By the induction hypothesis there is an extension F of L of degree at most (n − 1)! in which g(X) has all its roots. Clearly, f (X) has all its roots in F, and [F : K]=[F : L][L : K] is at most n!.  Definition 8.3.5 Let K be a field, and f (X) be a polynomial in K[X]. A minimal field extension L of K such that f (X) splits over L is called a splitting field of f (X) over K. We have seen that given any polynomial f (X) ∈ K[X], there is a field extension L of K such that f (X) splits over K.Leta1, a2,...,an be all roots of f (X) in L. Then K(a1, a2,...,an) is a minimal field extension of K in which f (X) splits. Thus, we have 286 8 Field Theory, Galois Theory

Corollary 8.3.6 Let K be a field, and f (X) be a polynomial over K. Then f (X) has a splitting field over K.  The above result says that a splitting field of a polynomial exists. Our next aim is to show that it is unique up to K-isomorphism. Proposition 8.3.7 Let K and K be fields, and σ be an isomorphism from K to K. Let L be a field extension of K, and L be an extension of K. Let a be an element of L which is algebraic over K with minimal polynomial p(X), and b an element of L which is a root of pσ(X). Then there is an extension  of σ to an isomorphism from K(a) to K(b) such that (a) = b.

Proof Since σ is an isomorphism, and p(X) is irreducible, pσ(X) is also irreducible in K[X]. In particular, pσ(X) is the minimum polynomial of b.Themapη from K[X]/to K[X]/defined by η(f (X)+ < p(X)>) = f σ(X)+ < pσ(X)>is an isomorphism (verify). Further, we have an isomorphism φ from K(a) to K[X]/which fixes the members of K and takes a to X+ < p(X)>. Similarly, we have an isomorphism ψ from K(b) to K[X]/ which fixes the members of K and takes b to X+ < pσ(X)>. It is clear that the map ψ−1oηoφ has the desired property. 

Corollary 8.3.8 Let K and K be fields, and σ be an isomorphism from K to K. Let f (X) ∈ K[X]. Let L be a splitting field of f (X) over K, and L be a splitting field of f σ(X) over K. Then σ can be extended to an isomorphism from L to L.

Proof The proof is by the induction on the degree of f (X).Ifdegreeoff (X) is 1, then there is nothing to do. Assume that the result is true for all those polynomials whose degrees are less than n.Letf (X) a polynomial of degree n. Suppose that f (X) has a root a ∈ K. Then f (X) = (X −a)g(X), where g(X) is a polynomial in K[X] of degree n − 1. Clearly, L is a splitting field of g(X), and L is a splitting field of gσ(X) over K. By the induction hypothesis, σ can be extended to an isomorphism from L to L. Suppose that f (X) has no root in K.Letp(X) be an irreducible factor of f (X). Then degree of p(X) is greater than 1, and pσ(X) is an irreducible factor of f σ(X). Let a be a root of p(X) in L, and b be a root of pσ(X) in L. From the previous result, σ can be extended to an isomorphism σ from K(a) to K(b) such that σ(a) = b. Now (X − a) divides f (X) over K(a), and X − b divides f σ(X) over K(b). Suppose that f (X) = (X − a)g(X), and f σ(X) = (X − b)h(X), where g(X) ∈ K(a)[X], and  h(X) = gσ (X) in K(b)[X]. Clearly, L is a splitting field of g(X) over K(a), and L  is the splitting field of h(X) = gσ (X) over K(b)[X]. By the induction hypothesis, σ can be extended to an isomorphism from L to L. Thus, σ can be extended to an isomorphism from L to L. 

Corollary 8.3.9 Splitting field of a polynomial f (X) in K[X] is unique upto K- isomorphism. 8.3 Splitting Field, Normal Extensions 287

Proof Take σ = IK the identity map on K, and apply the above corollary. 

Corollary 8.3.10 If L is a splitting field of a polynomial f (X) ∈ K[X] of degree n, then [L : K]≤n!. 

Example 8.3.11 The equality may also hold in the above corollary. Consider the field L = K(X1, X2,...,Xn) of rational functions in n indeterminates over the field K. Define a map η from the symmetric group Sn of degree n to the group Aut(L) of automorphisms of L by

f (X , X ,...,X ) f (X ( ), X ( ),...,X ( )) η(p)( 1 2 n ) = p 1 p 2 p n g(X1, X2,...,Xn) g(Xp(1), Xp(2),...,Xp(n)) where p ∈ Sn. It is clear that η is an injective homomorphism. Let F = F(η(Sn)) thefixedfieldofη(Sn). Let us denote η(Sn) by G.Lets1, s2, ··· , sn be n elementary symmetric polynomials given by

=  ... , sk i1 < i2 < ··· < ik xi1 xi2 xik where summation is taken over all k-tuples of distinct elements i1 < i2 < ··· < ik. In particular, s1 = x1 + x2 + ··· + xn, and sn = x1x2 ...xn. We show that F = K(s1, s2,...,sn), and L is the splitting field of the polynomial

n n−1 n−2 n f (X) = X − s1X − s2X − ··· − (−1) sn over K(s1, s2,...,sn). Clearly, each sk ∈ F(G) = F, and so K(s1, s2,...,sn) ⊆ F. Further, since F is the fixed field of G,wehave[L : F]=|G |=|Sn |= n!.Next, we note that f (X) = (X − x1)(X − x2)...(X − xn)

(proof follows from easy expansion). Thus, L is the splitting field of f (X) over K(s1, s2,...,sn). In turn, we have

n!=[L : F]≤[L : K(s1, s2,...,sn)]≤n!.

The first inequality is true because K(s1, s2,...,sn) ⊆ F, and the second inequality is true because of the previous corollary. This shows that F = K(s1, s2,...,sn). 

Example 8.3.12 The field C of complex numbers is the splitting field the polynomial X2 + 1 over the field R of real numbers. It is also the splitting field of X2 + 3 over R.

1 3 Example 8.3.13 K = Q(2 3 , ω) is the splitting field of X − 2 over Q.Thisis 1 1 1 2 3 because K is the smallest field containing all the roots 2 3 , 2 3 ω, and 2 3 ω of X −2. 288 8 Field Theory, Galois Theory

Algebraically Closed Field and Algebraic Closure Definition 8.3.14 A field K is said to be an algebraically closed field if every poly- nomial f (X) in K[X] has all its roots in K, or equivalently, every irreducible element of K[X] is a linear polynomial of the type aX + b, a, b ∈ K, a = 0. Proposition 8.3.15 The following conditions on a field K are equivalent. 1. K is algebraically closed field. 2. If f (X) is any polynomial in K[X], then K is the splitting field of f (X). 3. Every nonconstant polynomial f (X) ∈ K[X] has a root in K. 4. There is no proper algebraic extension of K. 5. There is no proper finite extension of K. Proof 1 ⇒ 2. Assume 1. Let f (X) be a polynomial in K[X]. Since K is algebraically closed, f (X) has all its roots in K, and so K is the splitting field of f (X). 2 ⇒ 3. Assume 2. Let f (X) be a nonconstant polynomial in K[X].By2,K is the splitting field of f (X), and so K contains all roots of f (X). Since f (X) is nonconstant, it has at least one root. 3 =⇒ 4. Assume 3. Let L be an algebraic extension of K.Leta ∈ L. Suppose that a ∈/ K. Then the minimum polynomial minK (a)(X) is an irreducible polynomial of degree greater than 1, and so it will have no root in K. This a contradiction to the assumption. 4 =⇒ 5. Assume 4. Since every finite extension is an algebraic extension, there is no finite extension of K. 5 =⇒ 1. Assume 5. Then there is no finite extension of K.Letf (X) ∈ K[X] be a polynomial of positive degree. Let L be the splitting field of f (X). Then L is a finite extension of degree at most n!, where n is the degree of f (X).By5,L = K, and so K has all roots of f (X). By the definition, K is algebraically closed.  Example 8.3.16 We shall see later that the field C of complex numbers is an alge- braically closed field. This is known as the fundamental theorem of algebra which was first proved by Gauss. Proposition 8.3.17 No finite field can be algebraically closed.

Proof Let K ={a1, a2,...,an} be a finite field. Then f (X) = (X − a1)(X − a2)...(X − an) + 1 has no root in K.  Now, we shall show that every field can be enlarged to an algebraically closed field. Definition 8.3.18 An algebraic extension L of K is called an algebraic closure of K if L is an algebraically closed field. Thus, a maximal algebraic extension of K,if exists, is called an algebraic closure of K. Now, we shall show that every field K has an algebraic closure and it is unique upto K-isomorphism. The following proposition is essential to escape some set theoretic logical prob- lems in proving the existence of algebraic closure. 8.3 Splitting Field, Normal Extensions 289

Proposition 8.3.19 Let L be an algebraic extension of K. If K is finite, then the cardinality | L | of L is at most that of the set N of natural numbers(i.e., it is finite or countably infinite). If K is infinite, then the cardinality | L | of L is the same as the cardinality | K | of K.

Proof Let XK denote the set of irreducible monic polynomials over the field K.To every member p(X) ∈ XK , we associate the subset Yp(X) consisting of roots of p(X) in L. Yp(X) may be empty set also. Clearly, Yp(X) is a finite subset of Lcontaining at most n elements, where n is the degree of p(X). It is clear that L = Y ( ). p(X)∈XK p X Now, each irreducible monic polynomial of degree n is determined uniquely by a choice of an ordered n-tuple in K. Thus, the cardinality of XK is same as that of the n set n∈N K .IfK is finite, then since a countable union of disjoint finite sets is again countable, it follows, in this case, that XK has the same cardinality as that of N.Next, if K is infinite, then Kn has same cardinality as K. Since K is infinite, a countable union of sets having the same cardinality as that of K again has the same cardinality as that of K. Since Yp(X) is finite, the same argument shows that if K is finite, then the cardinality of L is at most that of N, and if K is infinite, then its cardinality is same as that of K. 

Theorem 8.3.20 Every field has an algebraic closure.

Proof Let K be a field. Observe that there is no set containing all algebraic extensions of K. Indeed, we need to consider a set of algebraic extensions of K so that every algebraic extension of K is K-isomorphic to a member of the set. For this purpose, let  be a set containing K, and whose cardinality is strictly larger than that of K N. This is possible because power set of a set always has larger cardinality than that of the set. Let  be the set of all fields which are algebraic extensions of K, and whose set part is contained in . Clearly,  is nonempty set for K ∈ . Define a partial order ≤ in  by L1 ≤ L2 if L1 (, ≤) { | ∈ } is a subfield of L2. Thus,  is a nonempty partially ordered set. Let Lα α  =  be a chain in .LetL0 α∈ Lα. Then L0 is also a field contained in of which all Lα are subfields, and which is an algebraic extension of K. Thus, L0 ∈  is an upper bound of the chain. By the Zorn’s lemma  has a maximal member L (say). Then L is an algebraic extension of K. We show that L is an algebraic closure of K by showing that it is an algebraically closed field. Suppose not. Then there is a proper algebraic extension F of L. From the above proposition, it follows that the cardinality of F is strictly less than that of . Hence there is a subset L of  containing L properly, and a bijective map η from L to F which is identity on L.We can pull back the operations of F to that of L so that L becomes a proper algebraic extension of L. Clearly, L is also an algebraic extension of K, and L ∈ .This contradicts the supposition that L is a maximal member of . This completes the proof of the fact that L is an algebraic closure of K. 

Theorem 8.3.21 Let σ be an isomorphism from a field K to a field K. Let Kbean algebraic closure of K, and K be an algebraic closure of K. Then σ can be extended to an isomorphism from KtoK. 290 8 Field Theory, Galois Theory

Proof Let  be the set of triples (L, σ, L), where L is a subfield of K containing K, L is a subfield of K containing K, and σ is an extension of σ to an isomorphism from L to L.Theset is a nonempty set, for (K, σ, K) is a member of . Define a relation ≤ on  by

( , ,  ) ≤ ( , ,  ) ⇐⇒ ⊆ ,  ⊆  . L1 σ1 L1 L2 σ2 L2 L1 L2 L1 L2 and σ2 an extension of σ1 (, ≤) {( , ,  ) | ∈ } Thus, is a nonempty partially ordered set. Let Lα σα Lα α be a  = ,  =   chain in .LetL0 α∈ Lα L0 α∈ Lα, and σ0 is a map from L0 to L0 defined by the property that σ0 restricted to Lα is σα. Then it is easy to observe that ( , ,  )  the triple L0 σ0 L0 is a member of which is an upper bound of the chain. By the Zorn’s lemma  has a maximal member (L, σ, L) (say). We show that L = K and L = K which will complete the proof the theorem. Suppose that a ∈ K − L. Then a is algebraic over L.Letp(X) be the minimum polynomial of a over L. Since a ∈/ L, p(X) is an irreducible polynomial of degree greater than 1. Since σ is an isomorphism, it follows that pσ(X) is an irreducible polynomial over L of degree greater than 1. Since K is algebraically closed there is a root b in K of pσ(X).By the Proposition8.3.7, we get an extension τ of σ to an isomorphism from L(a) to L(b). Thus, the triple (L(a), τ, L(b)) belongs to , and this is a contradiction to the maximality of the triple (L, σ, L). Hence L = K. But, then σ(K) is also an algebraically closed field of which K is an algebraic extension. This means that σ(K) = K. This completes the proof.   Taking K = K , and σ = IK in the above theorem, we get the following corollary: Corollary 8.3.22 Algebraic closure of K is unique upto K-isomorphism.  The algebraic closure of K will usually be denoted by K. Corollary 8.3.23 Let L be an algebraic extension of K. Then the algebraic closure L of L is K-isomorphic to the algebraic closure KofK. Proof Follows from the fact that the algebraic closure L of L is also an algebraic extension of K which is algebraically closed.  Corollary 8.3.24 Let K be a field and S a set of polynomials over K. Then there is a field extension L of K in which all the polynomials in S has a root. Further, given any two field extensions K1 and K2 of K such that all the polynomials in S split over K1 as well as over K2,letL1 be the subfield of K1 generated by K and the roots of the members of S belonging to K1, and L2 be the subfield of K2 generated by K and the roots of the members of S belonging to K2. Then L1 and L2 are K-isomorphic.

Proof If K is the algebraic closure of K, then all roots of S split over K.LetL1 and L2 be as in the hypothesis of the corollary. Then L1 and L2 are both algebraic extension of K.LetL1 be algebraic closure of L1, and L2 be the algebraic closure of L2. Then both of them are algebraic closure of K also. Hence there exists a K-isomorphism σ from L1 to L2. Clearly σ will take the roots of a polynomial in S to a root of the same polynomial in S. Thus, σ restricted to L1 will be an isomorphism from L1 to L2.  8.3 Splitting Field, Normal Extensions 291

Definition 8.3.25 The unique (upto K-isomorphism) field described in the above corollary is called the splitting field of the set S of polynomials over K. It is in fact the smallest field upto injective embeddings over which all polynomials in S split.

Remark 8.3.26 The algebraic closure of K is the splitting field of the set of all polynomials over K.

Theorem 8.3.27 Let L be an algebraic extension of K. Then the following conditions are equivalent. 1. L is splitting field of a family S of polynomials over K. 2. If σ is a K-homomorphism from L to its algebraic closure L, then σ(L) = L. 3. Every member σ ∈ G(L/K) restricted to L is a member of G(L/K). 4. If f (X) is an irreducible polynomial in K[X] which has a root in L, then it has all its roots in L.

Proof 1 =⇒ 2. Assume 1. Then by the definition of the splitting field of a family of polynomials, it follows that L is the subfield of L generated by K and the roots of the members of S. Since any K-homomorphism from L to L takes root of a polynomial in S to a root of the same polynomial, it follows that the σ takes the roots of members of S to the roots of members of S, and it also takes K to K. Hence σ(L) = L. 2 =⇒ 3. Assume 2. If σ ∈ G(L/K), then σ restricted to L is a K-homomorphism from L to L.By2,σ(L) = L, and so σ restricted to L is a member of G(L/K). 3 =⇒ 4. Assume 3. Let f (X) be an irreducible polynomial in K[X] which has a root a ∈ L.Letb be another root of f (X) in L. Then by Proposition 8.3.7, there is K-isomorphism σ from K(a) to K(b) such that σ(a) = b. Clearly, L is an algebraic closure of K(a) as well as of K(b). By the proposition 8.3.21 there is an isomorphism τ which extends σ.From3,τ(L) = L. Hence b ∈ L. 4 =⇒ 1. Assume 4. Let S be the set of all polynomials in K[X] having a root in L. Then from 4, all the roots of members of S are in L. Further, since L is algebraic over K, every element of L is a root of some polynomial in K[X].ItfollowsthatL is the splitting field of S over K. 

Definition 8.3.28 An algebraic extension L of K is called a normal extension if it satisfies any one (and hence all) of the above conditions.

Corollary 8.3.29 A finite extension L of K is a normal extension if and only if L is splitting field of a polynomial over K.

Proof If L is splitting field of a polynomial over K, then by the definition, L is a normal extension of K. Conversely, suppose that L is a finite normal extension. Then L = K(a1, a2,...,an) for some a1, a2,...,an in L.Letfi(X) be the minimum polynomial of ai.Letf (X) be the product of f1(X), f2(X), . . . , fn(X). Then clearly L is splitting field of f (X) (note that by the definition L has all roots of f (X)).  292 8 Field Theory, Galois Theory

Corollary 8.3.30 Every Galois extension is a normal extension.

Proof Follows from the Exercise8.2.10. 

Remark 8.3.31 A normal extension need not be a Galois extension: Let K = Zp(X) the field of rational functions over the field Zp in one variable. Consider the polyno- mial Y p − X in K[Y].LetL be the splitting polynomial of Y p − X over K. Then from the definition, L is a normal extension of K. Y p − X is irreducible K[Y] by the Eisenstein irreducibility criteria (for X is a prime element in Zp[X] of which K is the field of fractions). Let a be a root of this polynomial in L. Then ap = X. Thus, Y p − X = Y p − ap = (Y − a)p. Hence a is the only root of Y p − X 1 1 which is denoted by X p . This means that L = K(X p ).Ifσ is any automorphism 1 1 in G(L/K), then it permutes the roots of Y p − X. Hence σ(X p ) = X p .This shows that σ is the identity map. Thus, G(L/K) is the trivial group, but the degree [L : K]=deg(Y p − X) = p. Hence this extension is not a Galois extension.

Corollary 8.3.32 Let L be a normal extension of K. Let F be any intermediary field. Then L is also a normal extension of F whereas F need not be a normal extension of K.

Proof Let L be a normal extension of K and F an intermediary field. Then L is the splitting field of a family S of polynomials over K. Since every polynomial over K is also a polynomial over F, L is also the splitting field of the same set S of polynomials over F. This shows that L is a normal extension of F. Next, we show that F need 1 not be a normal extension of K. Consider the extension Q(2 3 , ω) of Q. This is the splitting field over Q of the polynomial X3 − 2, and so it is a normal (in fact it 1 is already seen to be a Galois extension). Further, Q(2 3 ) is an intermediary field 3 1 which is not normal. Indeed, X − 2 has a root in Q(2 3 ) but not all its roots are 1 in Q(2 3 ). 

Remark 8.3.33 If L is a normal extension of F, and F is a normal√ extension of K, then Q( ) L need not be a normal extension of K.√ For example, 2 is a normal√ extension 2 − Q Q( ) Q( ) (splitting√ field of X 2) of , and 2 is a normal extension of 2 whereas Q( ) Q 4 − √2 is not a normal extension of . Clearly, X 2 is the minimum√ polynomial Q 4 − ( ) of√ 2 over where as not all roots of X 2areinQ 2 . For example, 2i is also a root of the polynomial X4 − 2, and it is not in this field (Find the splitting field of this polynomial over Q).

Corollary 8.3.34 Let F be a finite extension of K. Then there is a smallest subfield LofF which contains F, and which is a normal extension of K.

Proof Suppose that F = K(a1, a2,...,an).Letfi(X) be minimum polynomial of ai over K.Letf (X) be the product of these polynomials, and L ⊆ F be the splitting field of f (X) over F. L is also the splitting field of f (X) over K, and it is the smallest normal extension of K containing F.  8.3 Splitting Field, Normal Extensions 293

Definition 8.3.35 The field L described in the above corollary is called the normal closure of F over K.

Proposition 8.3.36 Let F be a finite extension of K of degree m, and f (X) be an irreducible polynomial in K[X] of degree n. Suppose that m and n are co-prime. Then f (X) is also irreducible in F[X].

Proof Let L be the splitting field of the polynomial f (X) over F. We may assume that n > 1. Now, f (X) can not have any of its roots in F,forifa is a root of f (X) which is in F, then K(a) ⊆ F. But, then n = degf (X) =[K(a) : K] will divide [F : K]=m. This is a contradiction to the supposition. Let a be a root of f (X) in L, which as shown above, is not in F. Consider F(a). Then, since K(a) ⊆ F(a), [K(a) : K], and [F : K] both divide [F(a) : K]=[F(a) : F][F : K]. Thus, m · n divides [F(a) : F]·m. This shows that n divides [F(a) : F]. Since a is a root of f (X),itfollowsthat[F(a) : F]≤n. Hence [F(a) : F]=n = degf (X), and so f (X) is irreducible over F. 

As an application of the above proposition we have the following example: Example 8.3.37 The polynomial f (X) = X7 − 10X4 + 5X2 + 15X + 10 is 1 irreducible over Q(2 3 , ω): By the Eisenstein irreducibility criteria, f (X) is irreducible 1 over Q. Next, we have already seen that Q(2 3 , ω) is a Galois extension of degree 6 which is co-prime to 7. The assertion follows from the above proposition. Exercises

8.3.1 Show that Q(ω) is the splitting field of X4 + X2 + 1 over Q, where ω is a primitive cube root of unity.

n 2πi 8.3.2 Show that the splitting field of X − 1isQ(ρn), where ρn = e n is the primitive nth root of unity. Show that, in case n = p a prime, its degree is p − 1. More generally, we shall show that its degree is φ(n).

8.3.3 Determine the degrees of the splitting fields over Q of the following polyno- mials: (i) X4 + 1. (ii) X3 + X + 1. (iii) X9 + X3 + 1. (iv) X4 − 5.

8.3.4 Determine the degrees of the splitting fields of the following polynomials:

3 (i) X + X + 1 over Z5. 7 (ii) X − 5 over Z11. 8.3.5 Give examples of rational numbers r and s such that the splitting field of X3 + rX + s has degree 3 over Q. Can we characterise such r and s? 294 8 Field Theory, Galois Theory

8.3.6 Use the fact that if σ is a K-automorphism of a field extension L of K, and f (X) is a polynomial in K[X], then σ takes the roots of f (X) to that of f (X),toshow the following:

(i) If z is a complex number which is a root of a polynomial f (X) with real coeffi- cients, then the conjugate z is also a root of f (X). √ (ii) If r is a rational number which is not a square of a rational√ number, and a+b r is a root of a polynomial f (X) in Q[X], then a − b r is also a root of f (X). √ √ 8.3.7 Show that K = Q( 2 + 3) is a normal extension of Q of degree 4. Show that every irreducible polynomial of odd degree over Q is also irreducible over K.

8.3.8 Let K be a field of characteristic p. Show that

(i) Xp − a, where a ∈ K is either irreducible, or it has all its roots in K. (ii) Xp − X − a is irreducible, or it has all its roots in K. Deduce that if a = 0, then it is irreducible over Zp.

8.4 Separable Extensions

Now, we describe the finite extensions L of K for which the number of K embeddings of L into L = K is [L : K]. Consider the case when L = K(a).Letp(X) be the minimum polynomial of a. We know that [K(a) : K]=degp(X). Since any K-embedding of L will take a to a root of p(X), there are at the most as many K-embeddings of K(a) into K as many distinct roots of p(X). Further, given any root b of p(X), there is a unique Kisomorphism σ from K(a) to K(b) ⊆ K which takes a to b. This shows that there are exactly as many K-embeddings of K(a) into K as many distinct roots of the minimum polynomial p(X) of a. Thus, we have proved the following: Proposition 8.4.1 The number of K-embeddings of K(a) into Kis[K(a) : K] if and only if all the roots of p(X) are distinct.  The above proposition motivates to have the following definition. Definition 8.4.2 An irreducible polynomial p(X) in K[X] is called a separable polynomial if all its roots are distinct in its splitting field. A polynomial f (X) (not necessarily irreducible) is said to be separable if all its irreducible factors are sep- arable. An algebraic element a of an extension field L of K is said to be separable if the minimum polynomial of a is separable. An algebraic extension L of K is said to be a separable extension if every element of L is separable over K. An algebraic extension L of K which is not separable is said to be an inseparable extension.

Proposition 8.4.3 Let L be a separable extension of K and F be an intermediary field. Then L is a separable extension of F, and F is also a separable extension of K. 8.4 Separable Extensions 295

Proof Suppose that L is a separable extension of K. Then every element of L is separable over K. In particular every element of F is separable over K. This shows that F is separable over K. Further, if a is an element of L, and p(X) is the minimum polynomial of a over K, then the minimum polynomial of a over F is a factor of p(X). Since p(X) has all its roots distinct, any factor of p(X) will also have its roots distinct. Thus, a is separable over F also. 

Definition 8.4.4 Let L be a finite extension of K.Let[L : K]s denote the num- ber of distinct K-embeddings of L into L. The number [L : K]s is called the degree of separability of the extension L of K (the justification for this terminology will be clear a little later).

Thus, the above proposition says that a ∈ L is separable if and only if [K(a) : K]s =[K(a) : K]. We shall show that L is a separable extension of K if and only if [L : K]s =[L : K]. Proposition 8.4.5 Let L be a finite extension of K. Let σ be an isomorphism from K   to K . Then the number of extensions of σ to an embedding of L to K is [L : K]s.

Proof Let K be an algebraic closure of K containing L. The it follows by 8.3.21 that   σ can be extended to an isomorphism σ from K to K .Letσ(L, K ) be the set of   extensions of σ to homomorphisms from L to K . We have to show that | σ(L, K ) |  −1 =[L : K]s. Define a map η from σ(L, K ) to K (L, K) by η(g) = σ og.Now, −1 −1 η(g1) = η(g2) implies that σ og1 = σ og2. Since σ is bijective, g1 = g2.  Thus, η is injective. Let h ∈ K (L, K). Then g = σoh ∈ σ(L, K ) and η(g) = h.  This shows that η is surjective. Thus | σ(L, K ) |=|K (L, K) |. By the definition | K (L, K) |= [L : K]s. The result follows. 

Corollary 8.4.6 Let L be a finite field extension of K, and let F be an intermediary field. Then [L : K]s =[L : F]s ·[F : K]s.

Proof Let K be an algebraic closure of K containing L. Then from the above propo- sition it follows that every K-embedding of F into K has exactly [L : F]s extensions to embeddings of L into K. Further, since there are [F : K]s, K-embeddings of F into K, it follows that there are [L : F]s ·[F : K]s K-embeddings of L into K.The result follows from the definition. 

Corollary 8.4.7 Let L be a finite extension of K. Then L is a separable extension of K if and only if [L : K]s =[L : K].

Proof The proof is by the induction on [L : K].If[L : K]=1, then L = K, and then there is nothing to do. Suppose that the result is true for all extensions whose degrees are less than n.LetL be a separable extension of K of degree n. Then every element of L is separable over K.Leta ∈ L − K. Then by the definition a is separable over K.LetK be an algebraic closure of K containing L.Wehave already seen that the number of K-embeddings of K(a) in K is the number of distinct roots of the minimum polynomial p(X) of a, and it is the same as deg(p(X)) = 296 8 Field Theory, Galois Theory

[K(a) : K]. Thus, [K(a) : K]s =[K(a) : K]. We have also observed that L is separable over any intermediary field, and so L is separable over K(a).Bythe induction assumption, [L : K(a)]s =[L : K(a)]. From the previous corollary, we have [L : K]s =[L : K(a)]s ·[K(a) : K]s =[L : K(a)]·[K(a) : K]=[L : K]. Conversely, suppose that [L : K]s =[L : K].Leta ∈ L. We have to show that a is separable over K. It is sufficient to show that [K(a) : K]s =[K(a) : K]. Suppose that [K(a) : K]s < [K(a) : K]. Then since [L : K(a)]s ≤[L : K(a)],wesee that [L : K]s =[L : K(a)]s ·[K(a) : K]s is strictly less than [L : K].Thisisa contradiction to the hypothesis. 

Corollary 8.4.8 Let L be finite separable extension of F and F a finite separable extension of K. Then L is a separable extension of K.

Proof Under the hypothesis of the corollary, [L : K]s =[L : F]s ·[F : K]s =[L : F]·[F : K]=[L : K]. 

L Corollary 8.4.9 Let L be a finite extension of K. Let Ks denote the set of all elements L of L which are separable over K. Then Ks is a subfield of L. , ∈ L ( ) ( ) Proof Let a b Ks . Then we have already seen that K a and K b are separable extensions of K. Since b is separable over K, it follows from the previous result ( )( ) = ( , ) ( , ) ⊆ L that K a b K a b is a separable extension of K. Thus K a b Ks . Hence − , · −1, = L L  a b a b and a a 0areinKs . This shows that Ks is a subfield of L.

L Definition 8.4.10 The subfield Ks is called the separable closure of K in L.The separable closure of K in its algebraic closure is called the separable closure of K

Corollary 8.4.11 A finite extension L of K is a Galois extension if and only if it is separable as well as normal.

Proof We know that a finite extension L of K is a Galois extension if and only if | G(L/K) |= [L : K]. Suppose that L is a finite Galois extension. Since each member of G(L/K) can be viewed as K-embedding of L into K, there are at least [L : K] K-embeddings of L into K. There can not be more. Hence [L : K]s =[L : K]. This shows that L is a separable extension of K. We have already seen that a Galois extension of K is also a normal extension. Conversely, suppose that L is a separable normal extension. Since it is separable, there are [L : K] K-embeddings of L into K. Since L is also a normal extension of K any K-embedding of L into K takes L to itself. This shows that | G(L/K) |= [L : K], and so L is a Galois extension of K. 

Corollary 8.4.12 A finite field extension L of K is Galois extension if and only if it is splitting field of a separable polynomial over K.

Proof A finite extension is a normal extension if and only if it is splitting field of a polynomial. Thus, it is sufficient to show that the splitting field L of a polynomial f (X) over K is a separable extension if and only if f (X) is a separable polyno- mial. Suppose that L is splitting field of a separable polynomial f (X) ∈ K[X].Let 8.4 Separable Extensions 297

L = K(a1, a2,...,an), where a1, a2,...,an are all roots of f (X). Then minimum polynomials of each ai is a divisor of f (X). Since factor of a separable polynomial is separable, it follows that each ai is separable. This shows that the separable closure of K in L is K(a1, a2,...,an) = L. In other words L is a separable extension of K. Conversely, suppose that L = K(a1, a2,...,an) is a splitting field of f (X) where a1, a2,...,an are roots of f (X), and which is a separable extension of K. Then nonconstant irreducible factors of f (X) are precisely the minimum polynomials of a1, a2,...,an which are all separable elements. Hence f (X) is separable. 

Corollary 8.4.13 A finite extension F of K is a separable extension if and only if there is a Galois extension L of K such that F is contained in L.

Proof Since F is a finite extension, F = K(a1, a2,...,an), and since it is also separable, the minimum polynomial fi(X) of ai is separable for each i.Letf (X) = f1(X)f2(X)...fn(X). Then f (X) is separable. Let L be the splitting field of f (X) which contains F. Then from the above corollary, it follows that L is a Galois extension of K. 

Our next aim is to have a test for the separability of a polynomial. We first define the concept of formal derivative of a polynomial. Definition 8.4.14 The formal derivative f (X) of a polynomial

2 n f (X) = a0 + a1X + a2X + ··· + anX

is defined to be the polynomial

 2 n−1 f (X) = a1 + 2a2X + 3a3X + ··· + nanX .

The proof of the following proposition is straightforward and it is left as an exercise.

Proposition 8.4.15 Let f (X), g(X) be polynomials in K[X] and a, b ∈ K. Then (i) (af + bg)(X) = af (X) + bg(X). (ii) (f · g)(X) = f (X)g(X) + f (X)g(X). (iii) (fog)(X) = f (g(X))g(X). ( + ) = ( ) + ( ) + X2 ( ) + ···+ Xn n( )  (iv) f X a f a Xf a 2! f a n! f a . Proposition 8.4.16 Let f (X) be a nonconstant polynomial in K[X]. Then a is a multiple root of f (X) in a splitting field L of f (X) over K if and only if it is a common root of f (X) and f (X).

Proof If f (X) = (X − a)g(X), then f (X) =(X − a)g(X) + g(X). Thus, a is a root of g(X) if and only if a is a common root of f (X) and f (X).  298 8 Field Theory, Galois Theory

Proposition 8.4.17 Let f (X) and g(X) be polynomials in K[X]. Let L be a field extension of K. Then f (X) and g(X) are co-prime in K[X] if and only if they are co-prime in L[X].

Proof Suppose that f (X) and g(X) are co-prime in K[X]. Then by the Euclidean algorithm there are polynomials u(X) and v(X) in K[X] such that u(X)f (X) + v(X)g(X) = 1. Since a polynomial in K[X] are also polynomials in L[X],this is an identity in L[X] also. This shows that they are co-prime in L[X]. Conversely, suppose that they are co-prime in L[X]. Then again by the Euclidean algorithm there are polynomials h(X) and k(X) in L[X] such that h(X)f (X) + k(X)g(X) = 1. If d(X) in K[X] divides f (X) as well as g(X) in K[X], then they divide f (X) and g(X) in L[X] also. Hence d(X) divides 1 in L[X]. Hence d(X) is a unit. This means that f (X) and g(X) are co-prime in K[X] also. 

Proposition 8.4.18 Let f (X) ∈ K[X] be a nonconstant polynomial. Let L be an splitting field of f (X) over K. Then all the roots of f (X) in L are distinct if and only if f (X) and f (X) are co prime in K[X].

Proof Suppose that f (X) and f (X) are co-prime in K[X]. Suppose that a is a multiple root of f (X). Then f (X) = (X − a)2g(X) for some g(X) ∈ L[X].Nowf (X) = 2(X − a)g(X) + (X − a)2g(X). This shows that X − a divides f (X) as well as f (X). Hence f (X) and f (X) are not co-prime in L[X]. From the above proposition, f (X) and f (X) are not co-prime in K[X]. Conversely, suppose that all roots of f (X) are distinct in L.LetL be the splitting field of f (X) over L. Suppose that f (X) and f (X) have a common root a in L, and so also in L (note that all roots of f (X) are supposed to be in L). Suppose that f (X) = (X −a)g(X). Then f (X) = g(X) + (X −a)g(X). Since a is also a root of f (X), it follows that (X −a) divides g(X). But, then (X −a)2 divides f (X). This is a contradiction to the supposition that f (X) has no multiple root. Hence f (X) and f (X) have no common root in L. This also shows that f (X) and f (X) are co-prime in L[X], and so (by the above proposition) they are also co-prime in K[X]. 

Corollary 8.4.19 Let f (X) be an irreducible polynomial in K[X]. Then f (X) is separable (i.e., all its roots distinct) if and only if f (X) does not divide f (X).

Proof Since f (X) is irreducible greatest common divisor (f (X), f (X)) is a unit or it is an associate of f (X). By the definition f (X) is separable if and only if it has no multiple roots. From the above proposition, this is equivalent to say that (f (X), f (X)) is a unit. This in turn is equivalent to say that that f (X) does not divide f (X). 

Corollary 8.4.20 An irreducible polynomial f (X) in K[X] is separable if and only if f (X) = 0.

Proof If f (X) = 0, then it is a polynomial of lower degree than f (X), and so f (X) can not divide f (X). The result follows from the above proposition.  8.4 Separable Extensions 299

Since in a field of characteristic 0, f (X) = 0 if and only if f (X) is a constant polynomial (verify), the following corollary is immediate. Corollary 8.4.21 Every polynomial over a field K of characteristic 0 is separable.  Corollary 8.4.22 Let K be a field of characteristic 0. Then every algebraic extension L of K is separable. Proof By the definition L is separable over K if and only if all elements of L are separable over K. This, in turn, means that the minimum polynomial of each element of L over K is separable. The result follows.  Corollary 8.4.23 Let K be a field of characteristic p = 0. Let f (X) be an irreducible polynomial in K[X]. Then f (X) is separable if and only if there is no polynomial g(X) ∈ K[X] such that f (X) = g(Xp). Proof We have seen that an irreducible polynomial f (X) is separable if and only if f (X) = 0. Let K be a field of characteristic p = 0. Let

2 n f (X) = a0 + a1X + a2X + ··· + anX be an irreducible polynomial in K[X]. Then

 2 n−1 f (X) = a1 + 2a2X + 3a3X + ··· + nanX = 0 if and only if iai = 0 for all i. This means that if p does not divide i, ai = 0. This shows that f (X) = 0 if and only if f (X) is of the form

p 2p rp f (X) = a0 + apX + a2pX + ··· + arpX for some r. This is equivalent to say that f (X) = g(Xp), where

2 r g(X) = a0 + apX + a2pX + ··· + arpX .

The result follows.  Proposition 8.4.24 Let K be a field of characteristic p = 0. Let a ∈ K. Then the polynomial Xp − a is irreducible over K if and only if it has no root in K. Proof If Xp − a is irreducible, then obviously it has no roots. Conversely, suppose that it has no root in K.LetL be a splitting field of this polynomial, and let b ∈ L be a root of Xp − a. Then b ∈/ K and bp = a. Further,

(Xp − a) = (Xp − bp) = (X − b)p.

Now, any nonunit factor of Xp − a in K[X] will be of the form (X − b)r for some r, 1 ≤ r ≤ p. Suppose that (X − b)t ∈ K[X]. Then 1 < t for b ∈/ K. 300 8 Field Theory, Galois Theory

Suppose that t < p. Then since (X − b)t ∈ K[X],itfollowsthattb ∈ K. Since t is co-prime to p, b ∈ K. This is a contradiction to the supposition. Hence, in this case, it is irreducible. 

Corollary 8.4.25 Let K be a field of characteristic p = 0. Then every algebraic extension of K is separable over K if and only if Kp ={ap | a ∈ K}=K.

Proof To say that every algebraic extension of K is separable and is equivalent to say that every polynomial over K is separable. This, in turn, is equivalent to say that every irreducible polynomial over K is separable. Suppose that every irreducible polynomial over K is separable. Let a be a nonzero element of K. Consider the polynomial Xp − a. If it is irreducible, then since its derivative is 0, it is not separable. Hence Xp − a is not irreducible. From the previous proposition, it follows that Xp − a has a root b ∈ K. This shows that a = bp ∈ Kp. Conversely, suppose that Kp = K. We show that every irreducible polynomial over K is separable. Suppose contrary. Let f (X) be an irreducible polynomial in K[X] which is not separable. Then we have f (X) = g(Xp) for some polynomial g(X) ∈ K[X]. Suppose that

2 n g(X) = a0 + a1X + a2X + ··· + anX .

p = ∈ p = Since K K,wehavebi K such that bi ai for all i. Hence

p 2 n p f (X) = g(X ) = (b0 + b1X + b2X + ··· + bnX )

This contradicts the supposition that f (X) is irreducible. 

Definition 8.4.26 A field K is said to perfect if every algebraic extension of K is separable.

The above results can be restated in the light of the above definition: Corollary 8.4.27 Every field of characteristic 0 is a perfect field. A field K of char- acteristic p = 0 is perfect if and only if Kp = K. 

Corollary 8.4.28 Every finite field is perfect.

Proof Let K be a finite field of characteristic p. Consider the map σ from K to K defined by σ(a) = ap. Then σ(a) = σ(b) implies that ap = bp.Now, (a − b)p = ap − bp = 0. Since K is a field, a − b = 0. Thus, σ is injective. Since K is finite, it is surjective. This means that Kp = K. From the previous result, K is perfect. 

We know that the order of every finite field is pn for some prime p and n ≥ 1. Theorem 8.4.29 Given any prime p and n ≥ 1, there is one and only one (up to isomorphism) field of order pn. 8.4 Separable Extensions 301

Proof Consider the field Zp of residue classes modulo p. Consider the polynomial pn X − X in Zp[X]. Since its derivative is −1 = 0, all its roots in the splitting field L of this polynomial are distinct. We show that L is precisely the set of roots of Xpn − X. It is sufficient to show that the set of roots of Xpn − X form a field. Let a and b be roots of this polynomial. Then apn = a and bpn = b.Now,

n n n (a + b)p = ap + bp = a + b,

n n n (ab)p = ap bp = ab, and if a = 0, then n n (a−1)p = (ap )−1 = a−1.

This shows that a+b, ab and a−1 are all roots of the above polynomial. This completes n the existence of a field of order p . For uniqueness, let L1 and L2 be fields of order pn. Then both are splitting fields of Xpn − X over their respective prime subfields. Since their prime subfields are isomorphic (to Zp), it follows from Corollary8.3.9 that L1 and L2 are isomorphic. 

Corollary 8.4.30 Every finite extension of a finite field is a Galois extension. If K is a field containing q elements, and L is a field extension of degree n, then the Galois group G(L/K) is a cyclic group of order n generated by σ, where σ is defined by σ(a) = aq.

Proof Clearly, L is the splitting field of Xqn − X over K. Hence L is normal as well as separable extension of K. By the Corollary 8.4.11, it is Galois extension. Further, since aq = a for all a ∈ K, it is clear that the map σ defined by σ(a) = aq is n qn n a member of G(L/K).Alsoσ (a) = a = a for all a ∈ L, and so σ = IL. If m < n, then σm can not be the identity map, for other wise every element of L would turn out to be a root of Xqm − X. This shows that σ generates a cyclic subgroup of G of order n. Since L is Galois over K, | G(L/K) |= [L : K]=n. Thus, σ generates G(L/K). 

Remark 8.4.31 Consider the algebraic closure L of Zp.LetK be a subfield of L order pn, and F a subfield of order pm. Then K ⊆ F if and only if n divides m.Thisis clear, for if K ⊆ F, then F is a vector space over K of dimension r (say), and then F should contain exactly (pn)r = pnr elements. This means that m = nr. Conversely, if m = nr, then Xpn − X divides Xpm − X. In fact,

nr n n 2n (r−1)n (Xp − X) = (Xp − X)(1 + Xp + Xp + ··· + Xp ).

This shows that the splitting field K of Xpn − X is contained in the splitting field F of Xpm − X.

Proposition 8.4.32 Let K be a finite field containing q = pm elements. Let f (X) be a monic irreducible polynomial of degree n over K. Let a be a root of f (X) in a 302 8 Field Theory, Galois Theory

field extension L of K. Then K(a) is the splitting field of f (X), and all roots of f (X) are of the form aqr , r ≥ 1.

Proof Clearly, [K(a) : K]=degf (X) = n. Thus, K(a) is a field extension of K of degree n. From the above results, K(a) is a Galois extension, and it is splitting field of Xqn − X over K. Thus, all the roots of f (X) lie in K(a). Further, G(K(a)/K) is the cyclic group of order n generated by σ where σ is defined by σ(b) = bq.This shows that {σr(a) | 1 ≤ r ≤ n}={aqr | 1 ≤ r ≤ n} is the set of all distinct roots of f (X). 

These results say that to find a field K of order pn, it is sufficient to find irreducible polynomials of degree n over Zp, and then K is simply isomorphic to the field Zp[X]/, where f (X) is an irreducible polynomial over Zp of degree n. There is an effective procedure using the division algorithm in Zp[X] to enumerate the elements of K, and also to determine the addition and multiplications in K.How to determine irreducible polynomials of degree n in Zp[X]? As observed if f (X) is a monic irreducible polynomial over Zp of degree n, then the splitting field of f (X) is same as the splitting field of Xpn − X.Ifa is a root of f (X), then it is also a root of Xpn − X. Since f (X) is the minimum polynomial of a,itdividesXpn − X.This shows that all irreducible monic polynomials of degree n are the irreducible factors pn of X − X. Further, let g(X) be any monic irreducible polynomial over Zp[X] of degree m, where m divides n. Then the splitting field of g(X) is also the splitting field of Xpm − X, and it is contained in the splitting field of Xpn − X. Hence any root of g(X) is also a root of Xpn − X. Conversely, if g(X) is an irreducible monic polynomial of degree m which is a factor of Xpn − X, then the splitting field of g(X) is of order pm, and it is contained in the splitting field of Xpn − X. Thus, m divides n. This shows that all irreducible polynomials of degrees m, where m divides n are factors of Xpn − X, and they are the only(upto associate) factors of Xpn − X. Since the roots of this polynomial are all distinct, it has no repeated irreducible factors. The arguments above combine to give the following:

pn Theorem 8.4.33 The polynomial X − XinZp[X] is the product of distinct irre- ducible monic polynomials in Zp[X] whose degrees are divisors of n.  pn One can develop a computer program to factorize X − X in Zp[X] for small primes p and for small n. Example 8.4.34 Consider the case p = 2 and n = 3. We wish to find all monic 23 irreducible polynomial of degree 3 in Z2[X], and also factorize X − X as product of irreducible factors. Since 1 and 3 are the only divisors of 3, irreducible factors of X8 − X are irreducible polynomials of degree 1 and irreducible polynomials of degree 3. The irreducible polynomials of degree 1 are X and X + 1 only. Consider the irreducible polynomials of degree 3. Let us enumerate the polynomials of degree 3. They are X3, X3 + X2, X3 + X, X3 + 1, X3 + X2 + X, X3 + X2 + 1, X3 + X + 1 3 2 and X + X + X + 1. Polynomials of degree 3 which have no roots in Z2 are X3 + X2 + 1 and X3 + X + 1. They are all irreducible also. Thus, 8.4 Separable Extensions 303

X8 − X = X(X + 1)(X3 + X2 + 1)(X3 + X + 1).

If a is a root of X3 + X2 + 1, then a3 = a2 + 1. Note that a2 and a4 are also roots of X3 + X2 + 1. If b is a root of X3 + X + 1, then b3 = b + 1. 3 2 3 Z2[X]/ and Z2[X]/ are both fields of order 8. Determine an explicit isomorphism between them. Compare this with the example of a field of order 8 given in Chap. 7, of algebra 1. Example 8.4.35 Consider the case when p = 2 and n = 5. Since 1 and 5 are the only divisors of 5, it follows that only irreducible polynomials of degrees 1 and 5 are factors of X32 − X. Irreducible polynomials of degree 1 (as above) are X and X + 1. Irreducible polynomials in Z2[X] of degree 5 are precisely

X5 + X3 + 1, X5 + X2 + 1, X5 + X4 + X3 + X + 1, X5 + X4 + X2 + X + 1, X5 + X4 + X3 + X2 + 1, and X5 +X3 +X2 +X +1. It is easily seen that X32 − X is product of these irreducible polynomials. Exercises

8.4.1 Give an example of a inseparable polynomial over some field. p Hint. Consider the field K = Zp(X) and take the polynomial Y − X in K[Y]. 8.4.2 Let K be a field of characteristic p = 0. Show that the field K(X) is not a perfect field. 8.4.3 Let K be a field of characteristic p = 0. Let f (X) be an irreducible polynomial in K[X]. Show that there exists n ≥ 1 and a separable polynomial g(X) in K[X] such that f (X) = g(Xpn ). 8.4.4 Let L be a field extension of K and the characteristic of K is p = 0. Let a be an element of L which is algebraic over K. Show that there is a positive integer n such that apn is separable over K. 8.4.5 Call an element a of the extension L of a field K of characteristic p = 0a purely inseparable element if it is algebraic over K, and its minimum polynomial has only one root namely a. Show that a is purely inseparable over K if and only if apn ∈ K for some n. 8.4.6 Show that an element a of L is separable as well as purely inseparable if and only if a ∈ K. 8.4.7 Show that if a ∈ L is purely inseparable over K, then K(a) is splitting field of the minimum polynomial of a, and G(K(a)/K) is trivial. 8.4.8 Let L be an algebraic extension of K. Show that every element of L is purely inseparable over the separable closure Ks of K in L. 304 8 Field Theory, Galois Theory

8.4.9 Call an algebraic extension L of K to be a purely inseparable extension if every element of L is purely inseparable over K. Show that L is purely inseparable of Ks. 8.4.10 Show that if L is a finite purely inseparable extension of K, then [L : K]=pn for some prime p and n. 8.4.11 Let L be a finite extension of K, and F be an intermediary field. Show that L is purely inseparable extension of K if and only if L is purely inseparable over F, and F is purely inseparable over K.

8.4.12 Let L be a field extension of K.LetKi denote the set of all purely inseparable elements of L. Show that Ki form a subfield of L called purely inseparable closure of K in L. Observe that L need not be separable over Ki.

8.4.13 Define [L : K]i =[L : Ki] and call it the degree of inseparability. Show that [L : K]i =[L : F]i[F : K]i, where F is an intermediary field.

8.4.14 Let L be a finite normal extension of K. Show that Ks is a Galois extension of K, and G(L/K) is isomorphic to G(Ks/K). Deduce that | G(L/K) |= [L : K]s. 8.4.15 Let K be a field of characteristic p = 0 and a ∈ K such that a ∈/ Kp. Show that Xp − a is irreducible over K.

16 8.4.16 Find all irreducible polynomials of degree 4 over Z2. Factorize X − X as product of irreducible factors over Z2. Determine the structure of a field of order 16. 27 8.4.17 Determine the cubic irreducible polynomials over Z3, and factorize X − X over Z3. Determine the structure of a field of order 27. 4 8.4.18 Express X + 1 as product of irreducible elements in Z3[X]. Determine its splitting field.

4 8.4.19 Show that X − 7 is irreducible over Z5. Hint. Observe that 7 is not quadratic residue mod 5.

8.4.20 Determine irreducible polynomials of degree 5 over Z3, and factorize 243 X − X as product of irreducible elements in Z3[X]. 8.4.21 Let ψ(q, d) denote the number of irreducible polynomials of degree d over a field Kq containing q elements. Then show that

n q = d/ndψ(q, d).

Use the inversion formula to show that

n nψ(q, n) = d/nμ(d)q d .

8.4.22 Find the number of irreducible polynomials of degree 9 over a field K of order 125. 8.5 Fundamental Theorem of Galois Theory 305

8.5 Fundamental Theorem of Galois Theory

In this section, we relate the intermediary fields of Galois extensions with the sub- groups of the Galois groups. We translate problems in the field theory to the prob- lems in group theory. As a simple application, we prove the fundamental theorem of algebra. Other applications of fundamental theorem of Galois theory will follow in the following sections. Theorem 8.5.1 (Fundamental theorem of Galois theory). Let L be a finite Galois extension of K. Let S(G(L/K)) denote the set of all subgroups of the Galois group G(L/K). Let S(L/K) denote the set of all intermediary subfields of the field extension L of K. Then we have a bijective map φ from S(G(L/K)) to S(L/K) given by φ(H) = F(H) (the fixed field of H), and a bijective map ψ from S(L/K) to S(G(L/K)) given by ψ(F) = G(L/F) such that φ and ψ are inverses of each other. Further, the following conditions hold.

(i) H1 ⊆ H2 =⇒ F(H2) ⊆ F(H1). (ii) F1 ⊆ F2 =⇒ G(L/F2) ⊆ G(L/F1). (iii) | H |= [L : F(H)], and [F(H) : K]=[G(L/K) : H]. (iv) H is normal subgroup of G(L/K) if and only if F(H) is a Galois extension of K, and then G(F(H)/K) is isomorphic to the quotient group G(L/K)/H. Proof Clearly, φ and ψ are maps. By the Corollary8.2.20, H = G(L/F(H)). Also since L is a Galois extension of K, for any intermediary field F, L is also a Galois extension of F, and so it follows from the definition of Galois extension (Definition8.2.3) that F(G(L/F)) = F. This shows that φ and ψ are inverses of each other. In particular, they are bijective maps also. (i) and (ii) are restatements of Proposition8.2.4. The part (iii) follows from the Theorem 8.2.19. (iv) Suppose that H is a normal subgroup of G(L/K). We have to show that F(H) is a Galois extension of K. Since L is a Galois, and so separable extension of K,every element of L is separable over K. In particular, every element of F(H) is separable over K. Thus, in any case F(H) is a separable extension of K. We show that under the assumption that H is normal subgroup of G(L/K), F(H) is also a normal extension of K. It suffices to show that σ(F(H)) ⊆ F(H) for all σ ∈ G(L/K).Leta ∈ F(H), and σ ∈ G(L/K).Letτ ∈ H. Then τ(σ(a)) = σ((σ−1oτoσ)(a)) = σ(a),forσ−1oτoσ belongs to H and a ∈ F(H). This shows that σ(a) ∈ F(H) for all σ ∈ G(L/K). Thus, F(H) is a Galois extension of K. Conversely, suppose that F(H) is a Galois extension of K. Then any σ ∈ G(L/K) restricted to F(H) is an automorphism of F(H). This enables us to define a map φ from G(L/K) to G(F(H)/K) by φ(σ) = σ/F(H) (the restriction of σ to F(H)). Clearly, this is a homomorphism. Since L is a Galois extension of K, any element of G(F(H)/K) can be extended to an element of G(L/K). Thus, φ is a surjective homomorphism. Further, Kerφ = G(L/F(H)) = H. This shows that H is a normal subgroup of G(L/K), and by the fundamental theorem of homomorphism, G(L/K)/H is isomorphic to G(F(H)/K).  306 8 Field Theory, Galois Theory

Corollary 8.5.2 Let L be a finite Galois extension of K. Then there are only finitely many intermediary fields. In particular, it is simple extension.

Proof Since G(L/K) is finite of order [L : K], it has only finitely many subgroups. By the fundamental theorem of Galois theory, there is a bijective correspondence between the set of subgroups of G(L/K) to the set of intermediary fields. The result follows. Finally by the Theorem 8.1.17, it follows that L is a simple extension of K.

Corollary 8.5.3 Every finite separable extension is simple.

Proof Let L be a finite separable extension of K. Suppose that L = K(a1, a2,..., an).Letfi(X) be the minimum polynomial of ai. Then each fi(X) is separable. Let   L be the splitting field of f (X) = f1(X)f2(X)...fn(X) containing L. Then L is a finite Galois extension of K. Hence there are only finitely many intermediary field between L and K. In particular, there are only finitely many intermediary field in between L and K. Again by the Theorem 8.1.17, L is a simple extension of K. 

Corollary 8.5.4 Every finite field extension of a field K of characteristic 0 is simple.

Proof Since every finite extension of a field of characteristic 0 is separable, the result follows. 

Our next aim will be to use the fundamental theorem of Galois theory to enumerate intermediary fields in a Galois extension L of K by calculating the Galois group G(L/K), enumerating all subgroups of G(L/K), and finding fixed fields of these subgroups. In this section we give some simple examples to illustrate it. Example 8.5.5 Recall Examples 8.2.14, 8.2.23, and 8.3.13. The extension L = 1 1 Q(2 3 , ω) = Q(2 3 + ω) is a Galois extension of Q with Galois group isomor- phic to S3. Since S3 has 6 subgroups, there are 6 intermediary fields. We enumerate them. Clearly, the fixed field F(S3) of S3 is Q. The fixed field of the trivial sub- 1 group is the field Q(2 3 + ω). Consider the subgroup < σ > of G(L/Q) generated 1 1 by the element σ, where σ takes 2 3 to ω2 3 , and takes ω to itself. Clearly, < σ > is the unique normal subgroup of G(L/Q) of order 3, and it is isomorphic to A3. Thus, F(< σ >) is a Galois extension of Q of degree [G(L/Q) :< σ >]=2. Also ω ∈ F(< σ >). Thus, Q(ω) ⊆ F(< σ >). Since [Q(ω) : Q]=2. 2 Hence F(< σ >) = Q(ω) = Q(ω ).Letτ1, τ2, τ3 be elements of G(L/K) 1 1 1 1 2 given by τ1(2 3 ) = 2 3 , τ1(ω) = ω; τ2(2 3 ) = ω2 3 , τ2(ω) = ω , and 1 2 1 2 τ3(2 3 ) = ω 2 3 , τ3(ω) = ω . It is clear that < τ1 >, < τ2 >, and < τ3 > 1 are all 3 subgroups of order 2 of G(L/K).Now,τ1 fixes the field Q(2 3 ), and 1 1 [Q(2 3 ) : Q]=3 =[G(L/K) :< τ1 >]. Thus, F(< τ1 >) = Q(2 3 ). 2 1 1 Similarly, F(< τ2 >) = Q(ω 2 3 ), and F(< τ3 >) = Q(ω2 3 ). This determines all intermediary fields.

Example 8.5.6 Consider the polynomial X4 − 5inQ[X]. By the Eisenstein irre- ducibility criteria, it is irreducible over Q.LetL be the splitting field of this poly- nomial over Q. Then L is a Galois extension of Q. We find its Galois group, and 8.5 Fundamental Theorem of Galois Theory 307 also all intermediary fields. Let α be a real fourth root of 5. Then ±α, ±iα are all four roots of X4 − 5. Thus, ±i ∈ L. Hence L = Q(α, i). It is easy to ver- ify that L = Q(α + i). [Q(i) : Q]=2. Consider the polynomial X4 − 5 over Q(i). Note that Q(i) is the field of fractions of Z[i]. It is easy to observe that 1 + 2i is an irreducible element of Z[i] which divides 5 (5 = (1 + 2i)(1 − 2i)) but (1 + 2i)2 does not divide 5. By the Eisenstein irreducibility criteria, X4 − 5is irreducible over Q(i). Since α is a root of X4 − 5, L = Q(i)(α) is of degree 4 over Q(i). Thus, [L : Q]=[L : Q(i)][Q(i) : Q]=8. Now, we find G(L/Q). We first find G(L/Q(i)). Consider the automorphism σ defined by σ(α) = αi and σ(i) = i. Clearly, σ ∈ G(L/Q(i)), and it generates a cyclic group of order 4. Thus, < σ > = G(L/Q(i)). It has a unique proper subgroup < σ2 > of order 2. Since σ2(α2) = α2, it follows that the fixed field of < σ2 > is Q(α2, i) (find β such that Q(α2, i) = Q(β)). Thus, we have a maximal tower Q ⊆ Q(i) ⊆ Q(α2, i) ⊆ L of intermediary subfields. Next, consider Q(α) which is the fixed field of the subgroup < τ > of order 2, where τ is defined by τ(i) =−i, and τ(α) = α (in fact τ is complex conjugation), and which is degree 4 extension of Q. Clearly, this is not a Galois extension of Q, as such τ does not lie in the center. Therefore, the Galois group is neither abelian nor the quaternion group. It is, therefore, the dihedral group 4 2 3 with presentation < σ, τ; σ = IL = τ , τστ = σ >. The fixed field of the subgroup < σ2, τ > which is a normal subgroup of G(L/Q) of order 4 is clearly, Q(α2). This gives us another maximal tower Q ⊆ Q(α2) ⊆ Q(α) ⊆ L of intermedi- ary fields. Consider the element σ2τ. This element takes α to −α and i to −i.Thisis an element of order 2. The fixed field of < σ2τ > is Q(αi) which is a field extension of Q of degree 4, and it is isomorphic to Q(α). Clearly, it contains Q(α2). This gives another maximal tower Q ⊆ Q(α2) ⊆ Q(αi) ⊆ L. Similarly, looking at other towers of subgroups of the group G(L/K), we can find all towers of intermediary subfields. This is left as an exercise.

Example 8.5.7 Let K = C(X) the field of fractions of C[X]. Then, since X is a prime element in C[X], by the Eisenstein irreducibility criteria, Y n − X is irreducible in K[Y].LetL be the splitting field of this polynomial over K. We determine the Galois group, and also all the intermediary fields. Let α be a root of Y n − X.Let 2πi i ρ = e n beaprimitiventh root of unity. Then ρ α,1≤ i ≤ n are all roots of the polynomial Y n − X. Since ρ ∈ C ⊆ K, L = K(α) is the splitting field of Y n − X. This shows that L is a Galois extension of K, and the Galois group is of order [L : K]=n. We show that this is a cyclic group of order n. Thus, there will be τ(n) (the number of divisors of n) subgroups of the Galois group, and so also the intermediary subfields. We determine them. We have an automorphism σ in G(L/K) given by σ(α) = ρα. Then σi(α) = ρiα. This shows that σ is an element of order n. Thus, G(L/K) = < σ >. Corresponding to any positive divisor m of n, there is a unique subgroup < σr > of order m, where n = mr. Consider σr(αk) = ρrkαk. This shows that K(αm) is contained in the fixed field F(< σr >). Since αm is a root of Y r − X, and Y r − X is irreducible, it follows that [K(αm) : K]=r =|< σr >|.This shows that K(αm) is the fixed field of < σr >. This determines all τ(n) intermediary fields. 308 8 Field Theory, Galois Theory

Now, we prove the fundamental theorem of algebra as an application of the fun- damental theorem of Galois Theory. The fundamental theorem of algebra states that the field C of complex numbers is algebraically closed. This result was first proved by Gauss. The routine proof is usually given in a standard complex analysis course using the fact that there is no bounded function which is analytic throughout the complex plane. We first prove some basic results in the form of Lemmas. Lemma 8.5.8 There is no proper odd degree extension of the field R of real numbers.

Proof Let L be a field extension of the field R of reals such that [L : R] is odd. Since R is of characteristic 0, it is a separable and since every finite separable extension is simple, there is an element a ∈ L such that L = R(a).Letp(X) be the minimum polynomial of a. Then p(X) is irreducible polynomial of odd degree greater than 1. It is sufficient therefore to show that no polynomial of odd degree greater than 1 is irreducible over R.Letf (X) be a polynomial of degree 2n + 1 over R. Then f (X) =−∞ f (X) =∞ Limn−→ − ∞ X2n and Limn−→ ∞ X2n . Thus, there exists a such that f (a)>0 and f (−a)<0. By the intermediate value theorem, there is a c ∈ R such that f (c) = o, and so f (X) can not be irreducible. 

Lemma 8.5.9 There is no Galois extension L of the field C of complex numbers such that [L : C]=2n, n ≥ 1.

Proof Suppose the contrary. Let L be a Galois extension of C such that | G(L/K) | = 2n, where n ≥ 2. Then G(L/K) has a maximal normal subgroup H of order 2n−1. Since H is a normal subgroup, F(H) is an extension of C of order 2. Suppose that F(H) = C(a). Then a ∈/ C, and a2 ∈ C. This is impossible, for if a2 = reiθ, then 1 iθ a =±r 2 e 2 belongs to C. 

Theorem 8.5.10 (Fundamental Theorem of Algebra). The field C of complex num- bers is algebraically closed.

Proof Let K be a finite extension of C. We have to show that K = C. Suppose not. Then K = C(a1, a2,...,an) = R(i, a1, a2,...,an), and [K : C]≥2. Let fi(X) be 2 minimum polynomial of ai over R.Letf (X) = (X + 1)f1(X)f2(X)...fn(X).LetL be the splitting field of f (X) over R. Then L is a Galois extension of R containing K. Since [L : R]=[L : C][C : R]=2[L : C] is even, G(L/R) is of even order, and so it has a Sylow 2—subgroup H of order 2m (say). Consider the fixed field F(H) of H. Then [F(H) : R]=[G(L/R) : H] is odd. Since R has no proper extension of odd degree, we have F(H) = R. This means that G(L/R) = H is a 2− group. Hence G(L/C) ⊆ G(L/R) is also of order 2r for some r ≥ 1. This is a contradiction to the above lemma. 

Exercises

8.5.1 Let K be a field, and f (X) ∈ K[X]. Find the splitting field L, the Galois group G(L/K), and all intermediary subfields in each of the following cases. 8.5 Fundamental Theorem of Galois Theory 309

(i) K = Q, f (X) = X4 − 11. (ii) K = Q, f (X) = X8 − 10. 4 (iii) K = Z5, f (X) = X − 2. 3 (iv) K = Z2, f (X) = X + X + 1. (v) K = Q, f (X) = X5 − 11. √ √ 8.5.2 Find the Galois group of L = Q( 2, 3) over Q. Find all intermediary fields.

8.5.3 Let L be a Galois extension of K with Galois group Z100. Find the number of intermediary subfields and also find them.

8.5.4 Let L be a finite Galois extension of K and L = K(a). Show that {σ(a) | σ ∈ G(L/K)} is a basis of L over K (This result is known as normal basis theorem).

8.5.5 Let L be a finite Galois extension of K.LetL1 ⊆ L2 be intermediary fields which are Galois extensions of K. Show that G(L1/K) is isomorphic to G(L2/K)/G(L2/L1).

8.5.6 Let L1 and L2 be intermediary fields of a Galois extension L of K. Suppose that L2 is Galois extension of K. Show that L1L2 (the smallest subfield of L contain- ( / ) ing L1 and also L2) is a Galois extension of L1, and G L1L2 L1 is isomorphic to G(L2/L1 L2).

8.5.7 Let L be a Galois extension of K.Let

K = K1 ⊆ K2 ⊆ ···⊆Kn = L ··· (8.5.1) and K = L1 ⊆ L2 ⊆ ···⊆Lm = L ······ (8.5.2) be two towers of intermediary extensions of the Galois extension L of K, where Ki+1 is a Galois extension of Ki, and Lj+1 is a Galois extension of Lj for all i and j. Show that there are refinements

K = F1 ⊆ F2 ⊆ ···⊆Ft = L and =  ⊆  ⊆ ···⊆ = K F1 F2 Ft L of 1 and 2 respectively such that after some rearrangement G(Fi+1/Fi) is isomorphic (  / ) to G Fi+1 Fi for all i. 310 8 Field Theory, Galois Theory

8.5.8 State and prove the analogue of the Jordan Holder theorem for towers of intermediary subfields of a Galois extensions.

8.5.9 Let L be a finite Galois extension of K with no proper intermediary fields. Show that G(L/K) is a cyclic group of prime order.

8.5.10 Let L be a finite Galois extension with G(L/K) simple. Let F be an inter- mediary field such that K = F = L. Show that there is a K-automorphism σ of L such that σ(F) = F.

8.5.11 Let L be a Galois extension of degree 15 over a field K. Find the number of intermediary fields.

8.5.12 Let L be a finite Galois extension of K, and H a subgroup of G(L/K). Show that F(N(H)) is the smallest subfield contained in F(H) such that F(H) is a Galois extension of F(N(H)).

8.5.13 Let L be a finite Galois extension of K of degree 35. Let F be any intermediary field. Show that F is Galois extension of K.

8.5.14 Let L be a finite extension of K such that pn is the largest power of prime p dividing [L : K]. Show that any two intermediary fields F1 and F2 such that n [F1 : K]=[F2 : K]=p are K-isomorphic. If m is the number of intermediary fields which are extensions of degrees pn, then show that m divides [L : K], and it is of the form 1 + kp.

8.5.15 Let L be a finite Galois extension of K, and H be a subgroup of G(L/K). Let F be the normal closure of F(H). ( / ) = ( ) −1 Show that G L F σ∈G(L/K) σF H σ . 8.5.16 Let L be a finite Galois extension of K, and H a subgroup of G(L/K).Let F(H) = K(a), and S a left transversal of G(L/K) modulo H. Show that  (X − σ(a)) σ∈S is the minimum polynomial of a. Deduce that  n (X − σ(a)) = p(X) r , σ∈G(L/K) where n is the degree [L : K], r is the index of H in G(L/K), and p(X) is the minimum polynomial of a over K. Further, deduce the normal basis theorem (Exercise 8.5.4).

8.5.17 Let F be a field of characteristic p = 0. Let L = F(X, Y) be the field of fractions of F[X, Y] and K = F(Xp, Y p). Show that [L : K]=p2 is a finite extension which is not simple. Show that there are infinitely many intermediary fields. 8.6 Cyclotomic Extensions 311

8.6 Cyclotomic Extensions

Definition 8.6.1 A finite extension L of K is called a cyclotomic extension of K if there is an element ξ ∈ L such that L = K(ξ), and ξn = 1forsomen.A Galois extension L of K is called an abelian extension if the Galois group G(L/K) is abelian. It is called cyclic if the Galois group is cyclic.

A root of Xn − 1inafieldK is called a nth root of unity. An element α ∈ K is called primitive nth root of unity if its order in the multiplicative group K is n.It follows that any root of unity is a primitive root of unity for some n. Let K be a field. Consider the polynomial Xn − 1inK[X]. Suppose first that characteristic of K is p = 0. Suppose that p divides n and n = prm, where p and m are co-prime. Then Xn − 1 = (Xm − 1)pr . Thus, the roots of Xn − 1 and that of Xm − 1 in any extension field are same. In other words, the splitting field of Xn − 1, and the splitting field of Xm − 1aresame.LetL be the splitting field of Xn − 1, and so also of Xm − 1. Since p does not divide m, all the roots of Xm − 1 are distinct. Let G be the group of roots Xm − 1inL. Since it is a finite subgroup  of L of order m, it is a cyclic group of order m. A generator ξm of G is a primitive ={r | ≤ ≤ − } mth root of unity. Thus, G ξm 0 r m 1 . Note that if ξm is a primitive mth root of unity, then any mth primitive root of unity r ≤ ≤ − ( , ) = = ( ) is of the form ξm, where 1 r m 1 and r m 1. Clearly, L K ξm .AnyK- automorphism η of L is uniquely determined by its effect on ξm, and whose restriction to G is an automorphism of G. This defines an injective group homomorphism from the Galois group G(L/K) to Aut(G). From the theory of cyclic groups, it follows that Aut(G) is isomorphic to the group Um of prime residue classes modulo m. n 2πi Next, if the characteristic of K is 0, then all roots of X − 1 are distinct, and e n is a primitive nth root of unity. The above argument is valid when m is replaced by n. We can be summarize the above discussion in the following theorem. Theorem 8.6.2 Let K be a field, and ξ be a root of unity. Then K(ξ) is a Galois extension of K. Further, if ξm is a primitive mth root of unity, then K(ξm) is an Abelian extension of K, and G(K(ξm)/K) is isomorphic to a subgroup of Um.In particular, [K(ξm) : K] divides φ(m). 

In general, the Galois group G(K(ξm)/K) need not be exactly Um. It, in general, depends on the field K. For example, G(Q(i)(ξ8)/Q(i)) is a subgroup of order 2 of the group U8, and U8 is of order 4. Note that G(Q(ξ8)/Q) is isomorphic to U8. Remark 8.6.3 In general, minimum polynomials of distinct primitive nth roots of unities are distinct over fields of characteristic p = 0. For example, we have the factorization

X7 − 1 = (X − 1)(X3 + X + 1)(X3 + X2 + 1)

3 over Z2. Clearly, X + X + 1 is the minimum polynomial of three of its roots in the splitting field which are all primitive 7th roots of unity, and similarly, X3 + X2 + 1 312 8 Field Theory, Galois Theory is minimum polynomial of the rest of the primitive 7th roots of unity. We shall see below that all the primitive nth roots of unity over Q have same minimum polynomial, and we shall describe it.

Definition 8.6.4 Let {ρ1, ρ2,...,ρφ(n)} denote the set of all primitive nth roots of unity in the field C of complex numbers. The polynomial

φ(n) φn(X) = (X − ρi) i=1 is called the nth cyclotomic polynomial.

Thus, the degree of φn(X) = φ(n). Further, since every nth root of unity is a primitive dth root of unity for a unique positive divisor of n,wehave  n Proposition 8.6.5 X − 1 = φd(X).  d/n

Proposition 8.6.6 φn(X) ∈ Z[X].

Proof Consider L = Q(ρ1), where ρ1 is a primitive nth root of unity over Q. Then L is the splitting field of Xn − 1 over Q. Since any Q-automorphism of L takes a primitive root to a primitive root, it follows that φn(X) is fixed by all members of the Galois group G(L/Q). Since L is a Galois extension of Q, it follows that φn(X) is a polynomial in Q[X] whose leading coefficient is 1. First, observe that if f (X), g(X) ∈ Z[X], and h(X) ∈ Q[X] are polynomials with leading coefficient 1 such that f (X) = g(X)h(X), then h(X) ∈ Z[X]. Now, we prove, by the induction on n, that φn(X) ∈ Z[X] for each n. Clearly, φ1(X) = X + 1 ∈ Z[X]. Assume that the result holds for all m less than n. From the previous proposition, we have  n X − 1 = ( φd(X))φn(X). d/n,d

The left hand side is a monic polynomial in Z[X], and by the induction hypothesis the first factor in the RHS (being product of monic polynomials in Z[X]) is a monic polynomial in Z[X]. From the earlier observation, φn(X) ∈ Z[X]. 

Example 8.6.7 The above proposition gives inductive procedure to find nth cyclo- tomic polynomial. We illustrate it in this example. If p is prime, then all pth roots of 1 except 1 are primitive roots. Thus,

Xp − 1 φ (X) = = 1 + X + X2 + ···+ Xp−1. p X − 1

2 Clearly, φ1(X) = X − 1, φ2(X) = X + 1, φ3(X) = X + X + 1. Hence

6 2 X − 1 = (X − 1)(X + 1)(X + X + 1)φ6(X). 8.6 Cyclotomic Extensions 313

Thus, X6 − 1 φ (X) = . 6 (X2 − 1)(X2 + X + 1)

Theorem 8.6.8 φn(X) is irreducible over Q for each positive integer n.

Proof Suppose the contrary. Then φn(X) is reducible in Q[X]. Since φn(X) is a monic polynomial in Z[X], it follows that φn(X) is reducible in Z[X].Letf (X) be an monic irreducible factor of φn(X) in Z[X]. Suppose that φn(X) = f (X)g(X), where g(X) is also a monic polynomial in Z[X] of positive degree. Let ρ be a root of f (X) in the splitting field. Then ρ is a primitive nth root of 1. We show that ρr are roots of f (X) for all r co-prime to n. The proof is by the induction on r.Ifr = 1, then there is nothing to prove. Assume that the statement is true for all t less than r ≥ 2 and co-prime to n.Letp be a prime dividing r and r = ps. Then since r is co-prime to n, it follows that p does not divide n. Thus, ρp is also a primitive nth root of 1, and so it is a root p of φn(X). We claim that it is a root of f (X). Suppose not. Then ρ is a root of g(X). But then ρ is a root of g(Xp). Since f (X) is the minimum polynomial of ρ, it follows that f (X) divides g(Xp) in Q[X]. Since f (X) is monic, it follows that f (X) divides g(Xp) in Z[X]. Reducing it modulo p, we see that f (X) divides g(Xp) = (g(X))p in Zp[X].Leth(X) be an irreducible factor of f (X) (note that f (X) in Zp[X] is neither 2 zero nor a unit). Then h(X) divides g(X) in Zp[X]. This means that h(X) divides φn(X) = f (X)g(X). But, then φn(X) will have repeated roots in splitting field of n n n X − 1 over Zp[X]. Since φn(X) divides X − 1inZp[X],itfollowsthatX − 1 has repeated roots in its splitting field. This is impossible for p does not divide n. Thus, we see that ρp is a root of f (X). By the induction hypothesis ρr = ρps = (ρp)s is a root of f (X). Thus, all roots of φn(X) are also roots of f (X), and since all the roots are distinct, f (X) = φn(X). This is a self contradiction. 

Corollary 8.6.9 All the primitive nth roots of 1 have same minimum polynomials over Q.Ifρ is a primitive nth root of 1, then Q(ρ) is a Galois extension of Q with Galois group Un. In particular, Q(ρ) is an abelian extension, and [Q(ρ) : Q]=φ(n).

Proof We have already seen that Q(ρ) is a Galois extension with Galois group isomorphic to a subgroup of Un. From the above result it follows that [Q(ρ) : Q]= φ(n) =|Un |. The result follows. 

Now, we shall describe the cyclic extensions. Recall that a Galois extension L of K is called a cyclic extension if the Galois group is cyclic. Since the structure of a cyclic group is easy to describe, one can describe the intermediary fields easily. We shall first discuss cyclic extensions L of K of order n, where K contains a primitive nth root of 1. Theorem 8.6.10 Let K be a field which contains a primitive nth root ρ of 1. Let L be a Galois extension of K such that G(L/K) is a cyclic group of order n generated −1 by σ. Then there is a nonzero element α ∈ L√ such that ρ = σ(α)α . Further, L = K(α), where αn = β ∈ K, i.e., L = K( n β) for some β ∈ K. 314 8 Field Theory, Galois Theory

Proof We first show that ρ ∈ K is an eigenvalue of σ, or equivalently, ρ is a root of the characteristic polynomial of σ. Since σ is of order n, it satisfies the polynomial Xn − 1. Further, by the Dedekind theorem G(L/K) ={I, σ, σ2,...,σn−1} is linearly independent. Hence σ can not be a root of lower degree polynomial. This shows that Xn − 1 is the minimum polynomial of σ. Since the dimension of L over K is n,the characteristic polynomial is also Xn − 1. Since ρ is a primitive nth root of 1, it is a root of the characteristic polynomial of σ. Hence, σ(α) = ρα for some α. It follows that σr(α) = ρrα = α if and only if n divides r. This means that G(L/K(α)) ={I}. This shows that L = K(α).Alsoσ(αn) = (σ(α))n = ρnαn = αn. This shows that αn ∈ F(G(L/K)) = K.Takeβ = αn. 

The following theorem is converse of the above theorem in some sense. Theorem 8.6.11 Let K be a field which contains a primitive nth root of 1. Let L be a finite extension of K. Let α ∈ L such that αn ∈ K. Then K(α) is a cyclic extension of K. Further, if r is the order√ of the Galois group of K(α) over K, then r divides n and αr ∈ K, i.e., K(α) = K( r β) for some β in K.

Proof Suppose that αn = a ∈ K. Then α is a root of Xn − a. Since K has a primitive nth root of 1, it follows that characteristic of K does not divide n, and so all the roots of Xn − a in its splitting field are distinct (the derivative of Xn − a is nonzero). Let ρ be a primitive nth root of 1. Then each ρiα is a root of Xn − a. It follows that K(α) is the splitting field of Xn − a, and so it is a Galois extension. Let G denote the group of roots of 1 in K. Then G is a cyclic group of order n. Define a map η from G(K(α)/K) to G by η(σ) = σ(α)α−1 (note that σ(α) = ρrα for some r). It is easy to see that η is a homomorphism which is injective (a member of G(K(α)/K) is uniquely determined by its effect on α). Since subgroup of a cyclic group is cyclic, it follows that the Galois group of K(α) over K is cyclic, and it is isomorphic to a subgroup of G. Suppose that the order of the Galois group G(K(α)/K) is r. Then r divides n.Letσ be a generator of G(K(α)/K). Then σm = I if and only if r divides ( ) = s s = n m. Suppose that σ α ρ α. Then ρ is of order r. This means that s r .The result follows if we take β = ρsα. Clearly ρs is primitive rth root of 1.

Theorem 8.6.12 (Artin-Schreier) Let K be a field of characteristic p = 0. Let L be a cyclic Galois extension of K of degree p. Then there is a ∈ K and α ∈ L such that L = K(α), where αp − α − a = 0. Conversely, if there is a ∈ K such that there is no α ∈ K such that αp − α = a, then Xp − X − a is irreducible in K[X], and its splitting field is cyclic Galois extension of K of degree p.

Proof Suppose that L is a cyclic Galois extension of K of degree p.Letσ be a generator of G(L/K). Then σp − I = 0. Thus, σ satisfies the polynomial Xp − 1. Since G(L/K) ={I, σ, σ2,...,σp−1} is linearly independent (Dedekind theorem), σ can not satisfy a polynomial of lower degree. In other words, Xp − 1isthe minimum polynomial of σ.LetT denote the linear transformation σ − I. Then T p = (σ − I)p = σp − I = 0. Thus, imageT p−1 ⊆ KerT . T p−1 = 0, for otherwise σ will satisfy a polynomial of degree p − 1. Further, KerT ={a ∈ L | 8.6 Cyclotomic Extensions 315

0 = T(a) = σ(a) − a}={a ∈ L | σ(a) = a}=F(G(L/K)) = K. Since imageT p−1 is a nonzero subspace of KerT = K, it follows that imageT p−1 = K. Let β ∈ K such that T p−1(β) = 1. Take α = T p−2(β). Then T(α) = 1, and so σ(α) = α + 1. Since α is not fixed by σ, it follows that α ∈/ K. Since [K(α) : K] divide [L : K]=p, it follows that L = K(α). Further, σ(αp − α) = σ(α)p − σ(α) = (α + 1)p − (α + 1) = αp + 1 − α − 1 = αp − α.This shows that αp − α belongs to K. Put a = αp − α. Then Xp − X − a is the minimum polynomial of α. Conversely, Let a ∈ K be such that there is no β ∈ K such that βp − β = a.LetL be the splitting field of f (X) = Xp − X − a. Let α ∈ L be a root of f (X). By the hypothesis α ∈/ K. Since ip = i for all i in theprimefieldofK,wehavep distinct roots α, α + 1, α + 2,...,α + p − 1. Thus all the roots of f (X) are in K(α). This shows that L = K(α). Now, we show that f (X) is irreducible. Suppose not. Let f (X) = p1(X)p2(X)...pm(X), m > 1bethe factorization of f (X) as product of irreducible factors. Let αi be a root of pi(X). Then it is also a root of f (X). From the earlier argument, it follows that L = K(αi) for each i. Thus, degpi(X) =[L : K]. This shows that p = degf (X) =[L : K]m.This is a contradiction, for p is prime, [L : K] > 1 and m > 1. This shows that f (X) is irreducible, and so [L : K]=p. Hence the Galois group of L over K is prime cyclic. 

At last, in this section, we prove a very important and useful result known as Hilbert theorem 90. First we have some definitions. Let L be a finite field extension of K.Leta ∈ L.ThemapLa from L to L defined by La(x) = ax is a K—linear endomorphism of the vector space L over K. Definition 8.6.13 Let L be a finite field extension of K. We define two functions NL/K and TL/K from L to K,thenorm and the trace functions, by NL/K (a) = DetLa and TL/K (a) = TraceLa.

It follows easily that Lab = LaLb, La+b = La + Lb and Lαa = αLa for all a, b ∈ L and α ∈ K. Since the determinant of product of two linear transformations is the product of their determinants, and the trace is a linear functional, it follows that NL/K (ab) = NL/K (a)NL/K (b), and TL/K is a linear functional on L.AlsoLa−1 = −1   (La) , and so NL/K is a group homomorphism from L to K . Proposition 8.6.14 Let L be a finite extension of degree n, and a ∈ L. Let p(X) be the minimum polynomial of a over K. Then p(X) is also the minimum polynomial of the linear transformation La. Further, if degp(X) = m, then the characteristic n polynomial of La is p(X) m .

Proof The map η from L to EndK (L) defined by η(a) = La is easily seen to be an injective algebra homomorphism (observe that both sides are algebras over K). This shows that the minimum polynomial of a over K is same as the minimum polynomial of La.Ifχ(X) is the characteristic polynomial of La, then it is a fact of elementary linear algebra (Cayley Hamilton Theorem) that the minimum polynomial p(X) of La divides the characteristic polynomial, and they have same irreducible factors. This shows that χ(X) = p(X)r for some r. Comparing the degrees, and observing that 316 8 Field Theory, Galois Theory the degree of the characteristic polynomial is same as the dimension n of the vector n space L over K, we obtain that χ(X) = p(X) m . 

Theorem 8.6.15 Let L be a finite Galois extension of K of degree n. Let a ∈ L. Then the characteristic polynomial χ(X) of La is given by  χ(X) = (X − σ(a)). σ∈G(L/K)  ( − ( )) ( ) Proof Let us denote the polynomial σ∈G(L/K) X σ a by ψ X . Since a is a root of ψ(X), it follows that the minimum polynomial p(X) of a divides ψ(X). Also given any member σ ∈ G(L/K), pσ(X) = p(X), and so σ(a) is a root of p(X) for all σ ∈ G(L/K). In other words, each σ(a) has same minimum polynomial p(X). This also says that the only irreducible factor of ψ(X) is p(X). Comparing the n degrees, we obtain that ψ(X) = p(X) m . From the previous proposition, it follows that χ(X) = ψ(X). 

Corollary 8.6.16 Let L be a finite Galois extension of K, and a ∈ L. Then  NL/K (a) = σ(a), σ∈G(L/K) and TL/K (a) =  σ(a). σ∈G(L/K)

Proof Since the determinant of a linear transformation is product of the characteristic roots, and the trace is the sum of the characteristic roots, the result follows from the above theorem. 

Definition 8.6.17 Let L be a Galois extension of K.Amapf from G(L/K) to L is called a 1-cocycle of G(L/K) in L if f satisfies the following Emmy Noether equation. f (στ) = f (σ)σ(f (τ)) for all σ, τ ∈ G(L/K).

A 1-cocycle is also called a crossed homomorphism:

Proposition 8.6.18 Let L be a finite Galois extension of K, and f be a 1-cocycle of G(L/K) in L. Then there is an element a ∈ L such that f (σ) = σ(a)a−1 for all σ ∈ G(L/K).

Proof Since f (σ) = 0 for all σ ∈ G(L/K), and by the Dedekind theorem, G(L/K) is linearly independent, it follows that σ∈G(L/K)f (σ)σ = 0. Hence there is an element −1 b ∈ L such that σ∈G(L/K)f (σ)σ(b) = 0. Let a = (σ∈G(L/K)f (σ)σ(b)) . Since f is a 1-cocycle, for any τ ∈ G(L/K),wehave 8.6 Cyclotomic Extensions 317

f (τ)(τ(a))−1 = f (τ)τ(a−1) = f (τ)τ(  f (σ)σ(b)) = σ∈G(L/K)  f (τ)τ(f (σ))(τσ)(b) =  f (τσ)(τσ)(b) = a−1. σ∈G(L/K) σ∈G(L/K)

This shows that f (τ) = τ(a)a−1 for all τ ∈ G(L/K).  Theorem 8.6.19 (Hilbert Theorem 90) Let L be a cyclic Galois extension of K of degree n, and σ be a generator of G(L/K). Then the kernel of the homomorphism   −1  NL/K from L to K is {σ(a)a | a ∈ L }.

Proof It follows from Corollary8.6.16 that NL/K (a) = NL/K (σ(a)) for all a ∈ L and −1  σ ∈ G(L/K). Hence σ(a)a is in the kernel of NL/K for all a ∈ L and σ ∈ G(L/K). Conversely, suppose that u belongs to the kernel of NL/K . Then NL/K (u) = 1.  Define a map f from G(L/K) to L by f (IL) = 1, and for 1 ≤ i ≤ n − 1, we define f (σi) = uσ(u)σ2(u)...σi−1(u). We show that f is a 1-cocycle of G(L/K) in L. Suppose that 0 ≤ i, j ≤ n−1. There are two cases: (i) i +j ≤ n−1 and (ii) i +j ≥ n. In case (i)

f (σiσj) = f (σi+j) = uσ(u)σ2(u)...σi+j−1 = f (σi)σi(f (σj)).

Now, consider the case (ii). In this case

f (σiσj) = f (σi+j−n) = uσ(u)σ2(u)...σi+j−n−1(u).

Also

f (σi)σi(f (σj)) = uσ(u)σ2(u)...σi−1(u)σi(uσ(u)...σj−1(u)) = uσ(u)...σi+j−n−1(u)σi+j−n(uσ(u)...σn−1(u)) = i j i+j−n i j f (σ σ )σ (NL/K (u)) = f (σ σ ).

This shows that f is a 1-cocycle. From the earlier proposition, there is a a ∈ L such that f (σi) = σi(a)a−1. In particular, u = f (σ) = σ(a)a−1.  Exercises

8.6.1 Find out a primitive 16th root of 1, and also the minimum polynomial of a primitive root of 1 over Q. What is the degree of the corresponding cyclotomic extension over Q. Find also the intermediary subfields. Q n − Q Q 8.6.2 Let n denote the splitting field of X 1 over , and m the spitting field m of X − 1. Show that Qn Qm = Qd, where d is the g.c.d of n and m.

8.6.3 Show that the nth cyclotomic polynomial φn(X) over Q is also given by  n μ(d) φn(X) = (X d − 1) , d/n 318 8 Field Theory, Galois Theory where μ is the Mobius function. Hint. Use the Mobius inversion formula and Proposition 8.6.5.

8.6.4 Find all subfields of Q20.

2π 2π 8.6.5 Show that cos n and sin n are both algebraic numbers. Q( 2π ) Q 8.6.6 Is cos n Galois over . If yes, what is the Galois group? 8.6.7 Let L be a Galois extension of K, and F be an intermediary field such that F is a Galois extension of K. Show that NL/K = NF/K oNL/F and TL/K = TF/K oTL/F .

8.6.8 Use the theorem Hilbert 90 to prove Theorem 8.6.10.

8.6.9 Let L be a cyclic Galois extension of K.Amapf from G(L/K) to L is called a 1-cocycle of G(L/K) in L if f (στ) = f (σ) + σ(f (τ)) for all σ, τ ∈ G(L/K). Use the methods of the Proposition 8.6.18, and Theorem8.6.12 to prove that the null space of TL/K is {σ(a) − a | a ∈ L}. This is known as the additive Hilbert Theorem 90.

8.7 Geometric Constructions

We shall be basically interested in problem of constructions using straight edge and compass. First, we try to understand the meaning of geometric constructions. We start with two points O and P in a plane and take the length of the segment OP as unit. We draw the line OP indefinitely and take it as X axis with O as origin and P as marked point. We construct points, lines, and circles inductively. At each point of construction, we draw a line through an already constructed point, or draw a circle with center as one of the constructed point and radius as segment through that point and another constructed point, and then take the intersection of newly constructed line or circle with already existing lines and circles to construct new points. We can construct a line through O and perpendicular to the line OP to draw Y axis. By drawing a circle with center as O and radius OP, we get a point Q on Y axis whose coordinate is (0,1). Thus, by drawing perpendicular to OQ at Q and to OP at P, and then taking their intersection we determine a point whose coordinate is (1, 1).Thisishow,we proceed, and do the constructions. A real number a is said to be constructible if we can construct two points which are at a distance | a | apart. We recall some standard school level geometric constructions by ruler and com- pass. (i). We can draw a line perpendicular to a line from any point on that line. (ii). We can draw perpendicular to a line from any point outside the line. (iii). We can draw a perpendicular bisector to any segment of a line. 8.7 Geometric Constructions 319

(iv). Given any segment of a line we can draw an equilateral triangle with the given segment as a base. In particular, we can construct a 60◦ angle. (v). Given a quadrilateral, we can construct a triangle which has the same area as the given quadrilateral. (vi). Given segments of lengths a and b units, we can construct segments of lengths a + b, a − b, a > 0, ab, and a/b. In particular, given a unit segment, we can construct a segment of length r > o, where r is a rational number. √ (vii). Given any segment of length a, we can construct a segment of length a. In particular, given segments of lengths | a | and | b |, we can construct segments of lengths | α | and | β | where α and β are solutions of the equation X2 − (a + b)X + ab. If a point (a, b) is constructible, then drawing perpendicular from that point to X-axis and also to Y-axis, we see that (a, 0) and (0, b) are also constructible points. Conversely, if (a, 0) and (0, b) are constructible points, then drawing perpendicular on X-axis through (a, 0) and on Y-axis through (0, b), and then taking the intersection, we construct the point (a, b). Next, suppose that the points U and V are constructible points, and the length of the segment UV is l. Then we can construct the point (l, 0) (and also (0, l)) as follows: We can draw a parallelogram with sides OU and UV. Let W be the other vertex of the parallelogram. Then the length of the segment OW is l. Draw a circle with center O and radius on the segment OW. The point of the intersection of this circle with X axis is the point (l, 0). The above discussion concludes to the following proposition. Proposition 8.7.1 Let L denote the set of all constructible numbers. Then L × L is the set of all constructible points. Further, L is subfield of R, and every positive member of L has a square root in L. In other words, there is no proper real quadratic extension of L. 

Proposition 8.7.2 Let a be a positive real number such that there is a tower Q = K0 ⊆ K1 ⊆ K2 ⊆ ··· ⊆ Kn of extensions such that Kn ⊆ R, [Ki+1 : Ki]≤2, and a ∈ Kn. Then a is constructible.

Proof The proof is by the induction on n.Ifn = 0, then a ∈ Q, and since the set of all constructible numbers is a subfield, a is constructible. Assume the result for n.Leta ∈ Kn+1, where Q = K0 ⊆ K1 ⊆ K2 ⊆ ··· ⊆ Kn ⊆ Kn+1 is a tower of extensions such that [Ki+1 : Ki]≤2. By the induction hypothesis each member of Kn is constructible. Since [Kn+1 : Kn]≤2, Kn+1 = Kn,orelse[Kn+1 : Kn]=2. If a ∈ Kn, then√ there is nothing to do. Suppose that a ∈/ Kn, then [Kn+1 : Kn]=2, and so a = b for some b ∈ Kn. By the induction hypothesis b is constructible. Hence from our basic constructions a is also constructible. 

Now, we aim to prove the converse of the above proposition. First, we have some definitions. Definition 8.7.3 Let K be a subfield of R. An element of K × K will be called a point in a plane of K. A line passing through two points of the plane of K is called 320 8 Field Theory, Galois Theory a line in the plane of K. A circle with center in the plane of K, and radius a positive member of K, is called a circle in the plane of K

Proposition 8.7.4 A line aX + bY = cinR is a line in the plane of K if and only if there is a λ ∈ R such that λa, λb, λc ∈ K.

Proof Suppose that aX + bY = c is a line in the plane of K. Then there are distinct points (α1, β1) and (α2, β2) in the plane of K which lie on the line. Thus,

aα1 + bβ1 = c = aα2 + bβ2.

Then a(α1 − α2) = b(β2 − β1). Since the points are distinct, α1 = α2 or β2−β1 β2−β1 β1 = β2. Suppose that α1 = α2. Then a = b and c = b + bβ2. b = 0, α1−α2 α1−α2 for otherwise a = 0 = c which is not possible. Now, it is clear that λ = b−1 will serve the purpose. For converse, we may assume that a, b, c ∈ K. Suppose that a = 0 = b. Then, the line passes through the points (ca−1, 0) and (0, cb−1) in the plane of K. Suppose that a = 0 = b. Then the line passes through (0, cb−1) and (1, cb−1) which are in the plane of K. Similarly, if a = 0 = b, then the line passes through the points (ca−1, 0), and (ca−1, 1) in the plane of K. 

Proposition 8.7.5 Suppose that X2 + Y 2 + 2uX + 2vY + w = 0 is a circle in K. Then u,v,w ∈ K. Conversely, suppose that u,v,w√∈ K. Then it represents a circle with center in the plane of K, and the radius in K( u2 + v2 − w).

Proof If the given circle is in the√ plane of K, then the center (−u, −v) is a point in the plane of K, and also the radius u2 + v2 − w belongs to K. This means that u,vand u2 +v2 −w belongs to K, and so u,v,w ∈ K. Conversely, suppose√ that u,v,w ∈ K. Then the center (−u, −v) is in the plane of K, and the radius is u+v2 − w. 

Proposition 8.7.6 If two lines in the plane of K intersect, they intersect in the plane of K. 

Proof Let aX + bY = c and aX + bY = c be two lines in the plane of K.We may assume that all the coefficients are in K. Since these lines are not parallel, the simultaneous equation has a unique solution in terms of the coefficients which are in K. Thus, these two lines intersect in the plane of K. 

Proposition 8.7.7 The intersection of a line in the plane of K and a circle in the plane of K is either√ empty set, or a point in the plane of K, or it consists of two points in the plane of K d for some positive d in K.

Proof Let aX + bY = c be a line in the plane of K, and

X2 + Y 2 + uX + vY + w = 0 8.7 Geometric Constructions 321 a circle in the plane of K. We may assume that all the coefficients are in K and b = 0. = (−aX + c) Substituting y b in the equation of the circle, and then solving, we get that either the solutions are imaginary, and in this case they do not intersect, or they intersect at one point in the plane of K (this is when the discriminant of the√ quadratic equation obtained is 0), or it intersects at two points in the plane of K( d), where d is the discriminant of the quadratic equation obtained after the substitution of the = −aX + c  value y b of y in the equation of the circle. Proposition 8.7.8 Given two circles in the plane of K one and only one of the following holds: (i). They do not intersect. (ii). They touch each other at a point in the plane of√ K. (iii). They intersect at two points in the plane of K( d) for some positive d ∈ K.

Proof Let X2 + Y 2 + 2uX + 2vY + w = 0 and X2 + Y 2 + 2uX + 2vY + w = 0 be two circles in the plane of K. Then u,v,w,u,v,w belong to K. The intersection of these circles is the same as the intersection of the plane

(u − u)X + (v − v)Y + w − w = 0 with any of the two given circles. Now, the result follows from the previous propo- sition. 

Theorem 8.7.9 A real number a is constructible if and only if there is a tower

Q = K0 ⊆ K1 ⊆ K2 ⊆ ···⊆Kn of field extensions such that a ∈ Kn and [Ki+1 : Ki]≤2.

Proof From the Proposition 8.7.2, it follows that if a lies in such a tower, then it is constructible. We prove the converse. Suppose that a is constructible. Then the point (a, 0) can be constructed from a finite number of steps starting from the points in the plane of Q. This is obtained by taking intersections of constructible lines and circles Q starting from lines and circles in the plane of√ . In the first step the point will lie in the plane of Q or in the plane of K1 = Q( a1) for some positive a1 ∈ Q.Inthe second step the point will lie in the plane of K1 or in the plane of K2 = K1(a2). Proceeding inductively we arrive at the result. 

Corollary 8.7.10 A real number a is constructible only if [Q(a) : Q] is a power of 2. 322 8 Field Theory, Galois Theory

Proof Suppose that a is constructible. From the above theorem, a ∈ Kn, where m [Kn : Q]=2 for some m. Since [Q(a) : Q] divides [Kn : Q], the result follows.  We are now ready to answer the classical problems on geometric constructions. Proposition 8.7.11 An angle θ is constructible if and only if cosθ, or equivalently, sinθ is constructible.

Proof Suppose that cosθ is constructible. Then we can construct the point P(cosθ, 0). Draw a perpendicular line to the X-axis from this point, and also a unit circle with center origin. Let R be the point of intersection of the line with this unit circle. Then the angle < POR is the required angle. Conversely, suppose that the angle θ is constructible. Let OQ be a line making angle θ with X-axis. Draw a unit circle with O as center, and let R be the intersection of this circle with line OQ.Draw the perpendicular from R to X-axis which meats it at a point P (say). Then OP is a segment of length cosθ. 

Theorem 8.7.12 It is impossible to construct 20◦ angle using ruler and compass only.

Proof From the previous result, if we can construct 20◦ angle, then a = cos20◦ is constructible. Now 1 = cos60◦ = 4cos320◦ − 3cos20◦ = 4a3 − 3a. 2

Thus, a is a root of the polynomial 8X3 − 6X − 1. Since this polynomial has no rational root (prove it), it is irreducible. Hence [Q(a) : Q]=3. From the Proposition 8.7.2, a is not constructible. 

Corollary 8.7.13 Trisection of a 60◦ angle is impossible by ruler and compass construction.

Proof Since 60◦ angle can be constructed by ruler and compass, if it is possible to trisect 60◦ angle, then 20◦ angle is constructible. This is impossible because of the above theorem. 

Corollary 8.7.14 It is impossible to duplicate a unit cube by the ruler compass constructions.

Proof If we can duplicate a unit cube by ruler and compass, then we can 1 1 construct a segment of length 2 3 . Since [Q(2 3 ) : Q]=3 is not a power of 2, this is impossible. 

Recall that a complex number a is called algebraic number if it is algebraic over Q. We state a result without proof.

Theorem 8.7.15 (Lindemann–Weierstrass). Let a1, a2,...,an be n distinct alge- braic numbers. Then {ea1 , ea2 ,...,ean } is linearly independent set over Q.  8.7 Geometric Constructions 323

Corollary 8.7.16 π and e are not algebraic over Q. Proof If e is a root of a nonzero polynomial over Q, then there are rational numbers α0, α1,...,αn not all zero such that

1 n α0 + α1e + ··· + αne = o.

This means that {e0, e1,...,en} is linearly dependent. Since 0, 1, 2,...,n are alge- braic over Q, we arrive at a contradiction to the theorem of Lindemann and Weier- strass. Thus, e is not algebraic over Q. Since e0 and eπi are rational numbers, they are linearly dependent over Q.Bythe theorem of Lindemann and Weierstrass it follows that 0 and πi both are not algebraic. Since 0 is algebraic, it follows that πi is not algebraic. Again, since i is algebraic, it follows that π is not algebraic.  Remark 8.7.17 It is not known if π is algebraic over Q(e). Theorem 8.7.18 It is impossible to construct a square by ruler and compass whose area is the area bounded by the unit circle. √ Proof Suppose that it is possible to construct such√ a square. Then π will be a constructible√ number. Since π is not algebraic, π is also not algebraic. But, then Q( π) is of infinite degree over Q. This is a contradiction.  Theorem 8.7.19 A regular polygon of n side is constructible (by ruler and compass) if and only if φ(n) is a power of 2. 2π Proof A regular polygon of n side is constructible if and only if the angles n at 2π the center are constructible. This is possible if and only if cos n is constructible. 2πi ρ+ρ−1 = n 2π = Let ρ e denote the primitive nth root of 1. Since cos n 2 ,itfollows 2π ∈ Q( ) ∈/ R 2π ∈ R Q( 2π ) that cos n ρ . Since ρ and cos n ,itfollowsthat cos n is a ( ) 2 − 2π + proper subfield of Q ρ . Further, ρ is a root of the polynomial X 2cos n X 1, [Q( ) : Q( 2π )]= 2π it follows that ρ cos n 2. Hence if cos n is constructible, then φ(n) =[Q(ρ) : Q] is a power of 2. Conversely, suppose that φ(n) is a power of 2. Since Q(ρ) is a Galois extension, m and which is abelian (isomorphic to Un)ofdegreeφ(n) = 2 , all the subgroups (Q( )/Q) (Q( )/Q( 2π )) of the Galois group G ρ are normal. In particular, G ρ cos n is = (Q( 2π )/Q) m−1 normal, and H G cos n is an abelian group of order 2 .Fromthe theory of Abelian groups, we have a normal series

H = H1  H2  ··· Hm ={I}, where [Hi : Hi+1]=2. Taking the fixed fields, we obtain a chain

2π Q ⊂ F(H − ) ⊂ F(H − ) ···⊂F(H ) = Q(cos ) m 1 m 2 1 n such that [F(Hi) : F(Hi−1)]=2. The result follows from Proposition 8.7.2.  324 8 Field Theory, Galois Theory

In particular, it is impossible to construct a regular polygon with 9 sides by ruler and compass. It is of course possible to construct a regular polygon of 17 (φ(17) = 24) sides. An explicit algorithm to construct a regular polygon of 17 sides was given by Gauss in 1801. Exercises

8.7.1 Can we divide a right angle in 10 equal parts by ruler and compass? Support.

8.7.2 Let a and b be positive rationals. Can we construct a square whose area is same as that of the ellipse with major axis 2a and minor axis 2b? Support.

2π 8.7.3 Can we construct an angle 9 ? Support.

2π 8.7.4 Can we construct a circular arc of length 17 ? Support. 8.7.5 Show that a regular polygon of n side is constructible by ruler and compass if and only if all odd prime divisors of n are of the form 2m + 1.

8.7.6 Think of a machine which can construct cube root of a rational number.

8.8 Galois Theory of Equation

In this section, we determine a necessary and sufficient condition (due to Galois) for the solvability of polynomial equations by the field and the radical operations. Definition 8.8.1 Let L be a field extension of K. We say that L is a radical extension of K if there exists elements a1, a2,...,ar in L, and positive integers n1, n2,...,nr = ( , ,..., ) ni ∈ ( , ,... ) ≥ such that L K a1 a2 ar , where ai K a1 a2 ai−1 for all i 1. If we can take n = ni for all i, then we say that it is n - radical extension of K. Thus, every radical extension is an algebraic extension. To say that L is a radical extension of K is to say that there is tower of finite extensions K = K0 ⊆ K1 ⊆ K2 ⊆ ··· ⊆ Kr = L of finite simple extensions. Further, any radical extension is n radical extension for n = n1n2 ...nr. Thus, every cyclotomic extension is a radical extension. Also observe that if L is a n-radical extension of F, and F is also a n-radical extension of K, then L is n-radical extension of K. Proposition 8.8.2 Let L be a n-radical extension of K, and L the normal closure of the extension L of K. The L is also n-radical extension of K. = ( , ,..., ) n ∈ ( , ,..., ) Proof Let L K a1 a2 ar , where ai K a1 a2 ai−1 . The proof is by = ( ) n ∈ ( ) the induction on r. Suppose that L K a1 such that a1 K.Letp1 X be the n − = n ∈ minimum polynomial of a1 over K. Since a1 is a root of X a, where a a1 K, n it follows that p1(X) divides X − a. Thus, every root β of p1(X) satisfies the relation n  β = a ∈ K. The normal closure L = K(β1, β2,...βs), where βj are all roots 8.8 Galois Theory of Equation 325 of p1(X) over K is therefore n-radical extension of K. Assume that the result is true for r.LetL = K(a1, a2,...,ar, ar+1) be n-radical extension of K.LetL0 be the normal closure of K(a1, a2,...,ar) over K. Then L0 is the splitting field of the set {pi(X) | 1 ≤ i ≤ r}, where pi(X) is irreducible polynomial of ai over K, and by the  induction hypothesis L0 is n-radical extension of K. Then the normal closure L of L over K is the splitting field of pr+1(X) over L0.Letβ1, β2,...,βt be the roots of ( ) n ∈ ( , ,..., ) ⊆ pr+1 X . Then from the assumption, it follows that βj K a1 a2 ar L0.  Thus L is n-radical extension of L0, and L0 is n-radical extension of K. This shows that L is n-radical extension of K. 

Definition 8.8.3 Let K be a field and f (X) ∈ K[X]. Then f (X) is said to be solvable by radical operations if the splitting field of f (X) is contained in a rad- ical extension of K.

Proposition 8.8.4 Let K be a field containing nth root of 1, and L be an abelian Galois extension such that the exponent of G(L/K) divides n. Then L is a n-radical extension of K.

Proof From the structure theorem of finite abelian groups, we have

G(L/K) = C1 × C2 ×···×Cr, where Ci are prime power order cyclic groups such that order of each cyclic group divides n.LetLi denote the fixed field of

Hi = C1 × C2 ×···×Ci−1 × Ci+1 ×···×Cr.

Then G(Li/K) is isomorphic to Ci for each i. Thus, Li over K is a cyclic Galois extension of degree mi, and since mi divides n, K also contains primitive mith root of = ( ) mi ∈ n ∈ 1. From the Theorem 8.6.10, we see that Li K ai , where ai K, and so ai K ( , ,..., ) = ... ={} for all i. Clearly, K a1 a2 ar L1L2 Lr is the fixed field of i Hi I . Hence K(a1, a2,...,ar) = L. This shows that L is n-radical extension of K. 

Theorem 8.8.5 (Galois) Let K be a field of characteristic 0 and f (X) ∈ K[X]. Let L be the splitting field of f (X). Then f (X) is solvable by radical if and only if the Galois group G(L/K) is a solvable group.

Proof Suppose that f (X) is solvable by radical operations. Let F be a n-radical extension of K containing the splitting field L of f (X). Since char K=0, we may assume, by Proposition8.8.2, that F is also a Galois extension. Let L be the splitting field of Xn − 1 over F. Since characteristic K is 0, L = F(ρ), where ρ is a primitive nth root of 1. From the previous proposition, L is also n-radical extension of K(ρ). Thus, we have tower

 K = K0 ⊆ K1 = K(ρ) ⊆ K2 ⊆ K3 ⊆···⊆Kr = L , 326 8 Field Theory, Galois Theory

= ( ), ≥ n ∈ , ≥ where Ki+1 Ki ai i 1forsomeai such that ai Ki. Since each Ki i 1 contains a primitive nth root of 1, it follows by Theorem 8.6.11 that Ki+1 is a cyclic Galois extension of Ki for all i ≥ 1. Since K1 is a cyclotomic extension it follows  that it is abelian. It is also clear that L is a Galois extension of each Ki, and we have a normal series

    G(L /K)  G(L /K1)  G(L /K2)  ··· G(L /Kr) ={I}

  whose factors G(L /Ki)/G(L /Ki+1) ≈ G(Ki+1/Ki) are all abelian (as observed above). Thus, G(L/K) is a solvable group. By the fundamental theorem of Galois theory, it follows that G(L/K) is isomorphic to the quotient G(L/K)/G(L/L).This shows that G(L/K) is solvable. Conversely, suppose that G(L/K) is solvable, and

G(L/K) = G0  G1  ··· Gr ={I} a normal series of G(L/K) with abelian factors. Let Fi = F(Gi). Then from the fundamental theorem of Galois theory, it follows that Fi+1 is Galois over Fi with Galois group isomorphic to abelian group Gi/Gi+1.Letρ be a primitive nth root of 1, where n is the exponent of G(L/K). Note that it exists because the characteristic of K is 0. Let Li = Fi(ρ). Then we get a tower of extensions

K ⊆ L0 ⊆ L1 ⊆···⊆Lr = L(ρ).

It follows, by induction, that Li+1 = LiFi+1, and by the Exercise 8.6.6, G(Li+1/Li) is isomorphic to a subgroup of the abelian group G(Fi+1/Fi). Clearly exponent of G(Li+1/Li) divides n. From the above proposition, it follows that Li+1 is n radical extension of Li for each i. Since L0 = K(ρ) is already a n-radical extension of K, it follows that L(ρ) is n-radical extension of K, and so L is contained in a radical extension of K. This means that f (X) is solvable by the radical operations. 

Let L denote the field K(X1, X2,...Xn) of fractions of the polynomial ring K[X1, X2,...Xn], where K is a field of characteristic 0. Then every permutation p ∈ Sn defines uniquely an automorphism of L. Indeed, Sn is isomorphic to a sub- group of Aut(L) (see Example 8.3.11) which we denote by Sn again. Let F denote the fixed field of Sn. Then we have seen (Example8.3.11) that F = K(s1, s2,...,sn). The polynomial

n n−1 n f (X) = X − s1X − ··· − (−1) sn = (X − X1)(X − Xn)...(X − Xn) in F[X] is called a general nth degree polynomial over K. The question is: can we determine a formula for X1, X2,...,Xn in terms of the symmetric polynomials using field and radical operations? In other words, can we solve a general nth degree equation over K by the field and the radical operations. Since the Galois group of 8.8 Galois Theory of Equation 327 f (X) over F is Sn, and Sn is not a solvable group for n ≥ 5, the following result of Abel and Ruffini follows from the above theorem of Galois. Theorem 8.8.6 (Abel–Ruffini) A general nth degree equation, n ≥ 5, over a field K of characteristic 0 is not solvable by radicals. 

Example 8.8.7 Let f (X) be an irreducible polynomial over Q of degree p, where p ≥ 5 is a prime such that it has exactly 2 imaginary roots (which are of course conjugate to each other). Then the Galois group of f (X) over Q is Sp: We prove it. Let L be the splitting field of f (X). Then since f (X) is irreducible, p divides | G(L/Q) |.AlsoG(L/Q) is a subgroup of Sp (it acts faithfully on the roots of f (X) which are all distinct). Since p divides | G(L/Q) |, by the Cauchy theorem, G(L/Q) has an element of order p. This means that it is a cycle of length p. Further, the complex conjugation is a member of G(L/Q) which interchanges two roots, and therefore represents a transposition. Since a transposition together with a cycle of length p generates Sp, it follows that the Galois group G(L/Q) is Sp. Since Sp is not solvable, by the above theorem of Galois, f (X) is not solvable by radicals. In particular, using elementary calculus, and the Eisenstein irreducibility criteria, we see that X5 − 4X + 2 satisfies the conditions mentioned above, and so it is not solvable by radical.

Since Sn, n ≤ 4 is solvable, every general equation of degree n, n ≤ 4 is solvable by radical. In what follows we determine formula for these lower degree general polynomial equations. 2 1. Quadratic Equations. For general quadratic polynomial X − s1X + s2,the formula for X1 and X2 is given by the Sridharacharya formula. It is given by

± 2 − s1 s1 4s2 X = . i 2

2. Cubic Equations.WegivetheCardanos method to solve a cubic equation by radical operations. Consider the general 3◦ equation

3 2 X − s1X + s2X − s3 = 0 ······ (8.8.1)

= + s1 Substituting X Y 3 in the above equation, we get

Y 3 + pY + q = 0 ······ (8.8.2) where s2 p = s − 1 ······ (8.8.3) 2 3 and s3 s s q =−2 1 + 1 2 − s ······ (8.8.4) 9 3 3 328 8 Field Theory, Galois Theory

2 1 If p = 0, then the roots are α, αω and αω , where α = (−q) 3 , and ω a primitive cube root of 1. Suppose that p = 0. Substitute Y = U + V in 2, and then the equation reduces to

U3 + V 3 + q + (3UV + p)(U + V ) = 0 ······ (8.8.5)

We set two equations U3 + V 3 + q = 0 ······ (8.8.6) and 3UV + p = 0 ······ (8.8.7) in U and V . Since p = 0, U and V both are nonzero. Substituting the value of V obtained from the Eq. 8.8.7 in the Eq. 8.8.6, we get that

p3 4U6 + qU3 − = 0 ······ (8.8.8) 27

3 3 =−q ± − D This is a quadratic equation in U . Solving we get U 2 108 , where D =−4p3 − 27q3 (called the discriminant of Y 3 + pY + q). If we take 3 =−q + − D 3 =−q − − D U 2 108 , and denote it by P, then by symmetry V 2 108 , and this we denote by Q. Pairing√ the three√ cube roots√ of P and√ Q each√ such that√ their − p 3 + 3 , 3 + 2 3 , 2 3 + 3 product is 3 , we get the roots P Q ω P ω Q ω P ω Q of Y 3 + pY + q = 0. This gives all roots of the given cubic equation. 3. Bi − quadratic equations. We describe Ferraris method of solving the gen- eral 4◦ equation

4 3 2 X − s1X + s2X − s3X + s4 = 0 ······ (8.8.9)

The above equation can also be written as

−s X + Y (X2 + 1 )2 = AX2 + BX + C, 2

s2 − 2 = 1 − + , = s1Y + = Y − where A 4 s2 Y B 2 s3, and C 4 s4. We chose Y so that the RHS becomes a perfect square. This is so if B2 − 4AC = 0. This gives us a cubic equation in Y which can be solved by the Cardano’s method. Suppose that the RHS becomes (PX + Q)2. Then

−s X + Y X2 + 1 =±(PX + Q). 2 At last, this quadratic equation can be solved, and we get the solutions of the given bi-quadratic equation. 8.8 Galois Theory of Equation 329

Exercises

8.8.1 Solve the cubic equation X3 − 3X2 + 2X + 1 = 0.

8.8.2 Solve the X4 + 4X − 1 = 0.

8.8.3 Find a polynomial of degree 7 over Q whose Galois group is S7. Chapter 9 Representation Theory of Finite Groups

In this chapter, we develop the elementary theory of linear representations of finite groups over a field F. We shall assume that the characteristic of F does not divide the order | G | of G. The representations over fields F where characteristic of F divides the order of G are called Modular representations or Brauer representations, and the theory was developed by Brauer. We shall have occasions, of course rare, to make some comments about modular representation theory.

9.1 Semi-simple Rings and Modules

One of the crucial properties of a field (skew field) F is that given any module M over F and a submodule N of M there is a submodule L of M such that M is direct sum of N and L. In this section, we study rings with this property. All the rings in this section are assumed to be rings with identities.

Definition 9.1.1 AleftR-module M is called a simple left module if it has no proper submodules.

Nontrivial simple left modules over a division ring R are precisely one-dimensional spaces. Simple Z-modules are precisely prime cyclic groups.

Example 9.1.2 Let D be a division ring, and Mn(D) be the ring of n×n matrices with entries in D.LetDn denote the additive group of column vectors. Then Dn is a left Mn(D)-module with respect to the matrix multiplication. Let X be a nonzero column vector in Dn, and Y = 0 be any other nonzero column vector. It is an elementary fact of linear algebra that there is a matrix A in Mn(D) such that A · X = Y . Thus, n D has no proper Mn-submodules, and so it is simple. Let Di denote the subset of Mn(D) consisting of those matrices all of whose columns except possibly ith column is zero. Then an easy calculation shows that Di is a minimal nonzero left © Springer Nature Singapore Pte Ltd. 2017 331 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_9 332 9 Representation Theory of Finite Groups ideal. As a module this is isomorphic to the module Dn. Also observe that the subset  ( ) Di of Mn D consisting of matrices whose ith column is zero is also a left ideal, and ( )/  n it is maximal because Mn D Di is isomorphic to the simple left module D .An elementary calculation shows that Mn(D) has no proper two-sided ideal.

Proposition 9.1.3 A left R-module M is simple if and only if there is a maximal left ideal A of R such that M as a module is isomorphic to R/A.

Proof Since a submodule of R/A is of the form B/A, where B is a left ideal con- taining A,itfollowsthatR/A is simple if and only if A is a maximal left ideal. Let M be a simple left R-module. Let x = 0 be an element of M. Since M is simple Rx = M. Thus the map a  ax is a surjective R-homomorphism. By the funda- mental theorem R/A is isomorphic to M where A is the kernel of the map. Since M is simple, A is maximal left ideal. 

Proposition 9.1.4 (Schur’s Lemma) Let M and N be simple left R modules. Then a nonzero homomorphism from M to N is an isomorphism. In particular, EndR(M) is a division ring with respect to the addition of endomorphisms, and the product as composition of maps.

Proof Let f be a nonzero homomorphism from M to N. Then f (M) is a nonzero submodule of N. Since N is simple, f (M) = N and so f is surjective. Next Kerf is also a submodule of M different from M, and since M is also simple, Kerf is {0}. This means that f is injective. In particular, every nonzero element of EndR(M) is invertible, and so EndR(M) is a division ring. 

Theorem 9.1.5 Let M be a left R-module. Then the following conditions are equiv- alent.

1. M is sum of its simple submodules (equivalently M is generated by its simple submodules). 2. M is direct sum of simple submodules. 3. Every submodule of M is direct summand. 4. Every short exact sequence of the type f g 0 N M L 0

splits.

Proof (1 =⇒ 2). Suppose that M = α∈ Mα, where {Mα | α ∈ } is the family of all simple submodules of M.Let

X ={J ⊆  | α∈ J Mα =⊕α∈J Mα}.

Then each {α}∈X. Hence X =∅. X with inclusion is a nonempty partially ordered { | ∈ } = set. Let Jμ μ be a chain in X. Then J μ∈ Jμ is a member of X,for 9.1 Semi-simple Rings and Modules 333

α∈J Mα =⊕α∈J Mα (verify). Thus, J is an upper bound of the chain. By the =⊕ Zorn’s Lemma X has a maximal element J0 (say). We show that M α∈J0 Mα. =⊕ ∈  ∈  ∈/ Put N α J0 Mα. Suppose that Mβ N for some β . Then β J0. Since { }  Mβ is simple, Mβ N is 0 ,oritisMβ. Since it is assumed that Mβ N, it follows  ={} + = ⊕ =⊕ ∈ { } that Mβ N 0 . Thus, Mβ N Mβ N α J0 β Mα. This shows that J0 {β}∈X. This is a contradiction to the supposition that J0 is a maximal element of X. Hence Mα ⊆ N for each α ∈ . This shows that N = M. 2 =⇒ 1 is obvious. 2 =⇒ 3. Assume 2. Let M =⊕α∈J Mα, where each Mα is a simple submodule of M.LetN be a proper submodule of M. As in the proof of 1 =⇒ 2, consider

X ={F ⊆ J | N + α∈F Mα = N ⊕ α∈F Mα}.

= , ∈ Since N M, there is Mα α J such that Mα is not contained in N. Since Mα is simple, N Mα ={0}, and so N + Mα = N ⊕ Mα. Hence such a {α} is in X, and so X is nonempty set. Order it through inclusion. As in the proof of 1 =⇒ 2, X ⊕  = has a maximal element F0(say), and then N α∈F0 Mα M. This shows that N is a direct summand. 3 =⇒ 4. Assume 3. Then f (N) is a submodule of M, and M = f (N) ⊕ K for some K . Every element of M is uniquely expressible as f (n) + k, where n ∈ N and k ∈ K . It is clear that the map s from M to N defined by s( f (n) + k) = n defines a splitting. 4 =⇒ 3. Assume 4. Let N be a submodule of M. Then we have a short exact sequence

0 −→ N −→ M −→ M/N −→ 0, where the map from N to M is inclusion, and the map from M to M/N is quotient map. From 4, it is a split exact sequence, and so M = N ⊕ K for some submodule K (isomorphic to M/N)ofM. 3 =⇒ 1. Let M ={0} bealeftR-module such that every submodule of M is direct summand of M. We first show that every submodule N of M also has this property. Let K be a submodule of N. Then K is also a submodule of M, and hence there is a submoduleL ofM such that M = K ⊕ L. We show that N = K ⊕ (L N). Clearly, K (L N) ={0}.Letx ∈ N. Then x = y + z, wherey ∈ K and z ∈ L. Since x, y ∈ N, z ∈ N. Hence z ∈ L N, and so N = K +(L N). Thus, every submodule of N is direct summand of N. Next, we show that every nonzero submodule N of M contains a nonzero simple submodule. Let x ∈ N, x = 0. Then Rx ={0} is a submodule of N. Consider the surjective homomorphism f from R to Rx defined by f (a) = ax. Since f is surjective, and Rx ={0}, it follows that Kerf = R. Suppose that A = Kerf. Then A is a proper left ideal of R.Bythe Krull’s theorem, A can be embedded in a maximal left ideal B (say). By the first isomorphism theorem R/B ≈ Rx/f (B) = Rx/Bx. Since B is maximal, R/B is simple, and so Rx/Bx is also simple. Since Rx is a submodule of M, and Bx is a submodule of Rx, Rx = Bx ⊕ T for some nonzero submodule T of Rx. T being 334 9 Representation Theory of Finite Groups isomorphic to Rx/Bx is simple. Thus, N contains a nonzero simple submodule of M.LetM0 be the sum of all simple submodules of M. Suppose that M0 = M. Then from 3, there exists a nonzero submodule N0 of M such that M = M0 ⊕ N0.From what we have proved above N0 contains a nonzero simple submodule L0 of M.But, then L0 is a simple submodule of M not contained in M0. This is a contradiction to the choice of M0. Hence M0 = M.  Definition 9.1.6 AleftR-module M is said to be semi-simple if it satisfies any one (and hence all) of the four conditions in the above theorem. Every vector space is semi-simple, for all subspaces are direct summands. Since Z-simple modules are prime cyclic groups, Z semi-simple modules are precisely direct sum of prime cyclic groups. More generally, simple R-modules over a P.I.D. are isomorphic to R/Rp, where p is irreducible element of R. In particular, a P.I.D. R is a left semi-simple module over itself if and only if it is a field. Proposition 9.1.7 Every submodule of a semi-simple left R-module is semi-simple. Every homomorphic image (and so quotient) module of a semi-simple left module is semi-simple.

Proof Let M be a semi-simple left module, and N be a submodule of M. Then as observed in the proof of 3 =⇒ 1 (Theorem 9.1.5), it follows that every submodule of N is a direct summand of N. Hence N is semi-simple. Let β be a surjective homomorphism from M to N. Since M is semi-simple, the exact sequence 0 Kerβ M N 0 splits. Hence, there exists a homomorphism t from N to M such that βot = IN . Clearly, t is injective homomorphism, and as a result N is isomorphic to the sub- module t(N) of M. Since submodule of a semi-simple left module is semi-simple, t(N), and so N is semi-simple.  Since a direct sum of direct sums of simple left modules is a direct sum of simple modules, we have the following proposition. Proposition 9.1.8 Direct sum of a family of semi-simple left modules is a semi- simple left module. 

Definition 9.1.9 AringR is said to be a Left semi-simple ring if it is a semi-simple left module over itself. A field is a semi-simple ring, for it itself is a simple module over itself. Z is not semi-simple, for it cannot be direct sum of simple left ideals. Subring of a left semi- simple ring need not be left semi-simple. For example, Q is semi-simple ring but Z is not. Theorem 9.1.10 A ring R is left semi-simple ring if and only if every left module over R is semi-simple. 9.1 Semi-simple Rings and Modules 335

Proof If every left module over R is semi-simple, then in particular, R is semi-simple left module over itself. By the definition, R is left semi-simple. Conversely, suppose that R is left semi-simple ring. Then every free left R-module, being direct sum of copies of R, is semi-simple left module. Since every left module is quotient of a free left module and quotient of a semi-simple left module is semi-simple, it follows that every left module over R is semi-simple. 

The following corollary is immediate from the previous results.

Corollary 9.1.11 Let R be a ring. Then the following conditions are equivalent.

1. R is left semi-simple. 2. Every short exact sequence of left R-modules splits. 3. Every left R-module is semi-simple. 4. Every left R-module is projective. 5. Every left R-module is injective. 

It follows from the definition of left semi-simple ring that a ring R is left semi- simple if it is direct sum of minimal nonzero left ideals. Since Mn(D), where D is a division ring, is the direct sum of its minimal nonzero left ideals Di ,itfollowsthat Mn(D) is left semi-simple.

Definition 9.1.12 A left semi-simple ring is said to be simple if it has no nonzero proper two-sided ideals.

Example 9.1.13 Since Mn(D) has no nonzero proper two-sided ideals, it follows from the above discussion that Mn(D) is a left simple ring. We shall see that every simple ring is isomorphic to Mn(D) for some n, and for some division ring D.

Let R1 and R2 be rings. Then left ideals of R1 ×{0}, and those of {0}×R2 are also left ideals of R1 × R2. Thus, if R1 and R2 are direct sum of minimal left ideals, then R1 × R2 is also direct sum of minimal nonzero minimal left ideals. This proves the following proposition.

Proposition 9.1.14 Direct product of left semi-simple rings is a left semi-simple ring. 

In particular, we have the following corollary.

Corollary 9.1.15 Let D1, D2,...,Dr be division rings, and n1, n2,...,nr be pos- itive integers. Then

( ) × ( ) ×···× ( ) Mn1 D1 Mn2 D2 Mnr Dr is a left semi-simple ring.  336 9 Representation Theory of Finite Groups

One of the main results of this section is to show that any left semi-simple ring is isomorphic to such a ring. In particular, left semi-simple rings and right semi-simple rings are same. Let F be a field and G a group. Recall (see the Sect. 7.6 on polynomial rings, Algebra 1) the definition of the group ring F(G). F(G) is at first a vector space over F with members of G as basis. Thus, the members of F(G) are formal sums g∈G αgg, where all but finitely many αg are 0. The multiplication · in F(G) defined by

(g∈G αgg) · (g∈G βgg) = (g∈G γgg), where γg = hk =gαhβk , makes F(G) a ring. Thus, F(G) is an algebra over F, and it is called a group algebra. The following result is the first basic result in the representation theory of groups.

Theorem 9.1.16 (Maschke) Let F be a field, and G be a finite group such that the characteristic of the field F does not divide the order of the group G. Then F(G) is a semi-simple ring.

Proof Assume that the characteristic of F does not divide the order of the group G. We shall show that every left F(G)-module is semi-simple. Let M be a left F(G)- module and N a submodule of M. Since F is a subfield of F(G), M is a vector space over F, and N is a subspace of M. Hence, there is a F-subspace L of M such that M is a vector space direct sum N ⊕F L of N and L.Letp1 be the first projection from M to N. Then p1 is a F-homomorphism from M to N such that p1(x) = x for all x ∈ N. We average p1 to make it a F(G)-homomorphism. Define a map p1 from M to N by

−1 −1 p1(m) = (n · 1) g∈G g · p1(g · m), where n is the order of G(note that characteristic of F does not divide n, and so n · 1 = 0inF). Further,

−1 −1 −1 −1 p1(m1 + m2) = (n · 1) g∈G g · p1(g (m1 + m2)) = (n · 1) g∈G g · p1(g m1+ −1 −1 −1 −1 g m2) = (n · 1) g∈G (g · p1g (m1) + g · p1g (m2)) = p1(m1) + p1(m2), and

−1 −1 p1(αh · m) = (n · 1) g∈G g · p1(g αhm).

Putting g−1h = x in the above equation, we get that

−1 −1 p1(αhm) = (n · 1) αhx∈G xp1(x m) = αh p1(m) 9.1 Semi-simple Rings and Modules 337 for all α ∈ F, h ∈ G, and m ∈ M. This shows that p1 is a F(G)-homomorphism. −1 −1 Also since N is a F(G)-submodule, for each x ∈ N, g x ∈ N, and so p1(g x) = g−1x. Hence for each x ∈ N,wehave

−1 −1 −1 −1 −1 p1(x) = (n · 1) g∈G g · p1(g x) = (n · 1) g∈G gg x = (n · 1) nx = x.

Thus, p1oi = IN , where i is the inclusion map from N to M. Hence, M is F(G)- direct sum of N and Kerp1. This completes the proof of the fact that M is F(G)- semi-simple. 

Our next aim is to determine the structure of a semi-simple ring. Let

M = M1 ⊕ M2 ⊕···⊕Mn and

N = N1 ⊕ N2 ⊕···⊕Nm be some direct sum decompositions of left R-modules M and N.LetMMN denote the set of matrices [φsr ], where φrs ∈ HomR(Mr , Ns ).Let fsr denote the homo- morphism ps ofoir from Mr to Ns , where ir is the natural inclusion of Mr in M, and ps the sth projection of N to Ns . It is easy to verify that the map ηMN from HomR(M, N) to MMN defined by

ηMN( f ) =[fsr ] is a bijective map. In fact every x ∈ M is uniquely expressed as x1 + x2 +···+xn, ∈ ( ) = n where xi Mi , and then f x r=1 fsr xr . In matrix form it is expressed by ⎡ ⎤ ⎡ ⎤ x1 f11 f12 ·· f1n ⎢ ⎥ ⎢ ·· ⎥ ⎢ x2 ⎥ ( ) = ⎢ f21 f22 f2n ⎥ · ⎢ · ⎥ . f x ⎣ ·····⎦ ⎢ ⎥ ⎣ · ⎦ fm1 fm2 ·· fmn xn

Further, let MNL denote the set of matrices with respect to the above given direct decomposition of N, and with respect to a direct decomposition of L. Then the matrix multiplication induces an external multiplication from MMN × MNL to MML (multiplication of entries are composition of maps). It is also easy to observe that ηML(gof) = ηNL(g) · ηMN( f ). In particular, if L is a left R-module with a given direct sum decomposition, and MLL denote the corresponding set of matrices, then MLL is a ring with respect to matrix addition and multiplication. Clearly, this ring is isomorphic to the ring EndR(L). Let us denote by Mn the n times direct sum of M. 338 9 Representation Theory of Finite Groups

n n = 1 ⊕ 2 ⊕···⊕ nr { , ,..., } Proposition 9.1.17 Let M M1 M2 Mr , where M1 M2 Mr is a set of pairwise nonisomorphic simple left R-modules. Then the ring EndR(M) is isomorphic to the direct product

( ) × ( ) ×···× ( ), Mn1 D1 Mn2 D2 Mnr Dr

= ( ) ( ) × where Di EndR Mi is a division ring, and Mni Di is the ring of ni ni matrices with entries in Di .

Proof Since each Mi is simple left R-module, by the Schur’s Lemma EndR(Mi ) is a division ring, and for i = j, HomR(Mi , M j ) ={0}. It follows from the discussion prior to the proposition that EndR(M) is isomorphic to the ring of matrices of the type ⎡ ⎤ A1 0 ···· · 0 ⎢ ⎥ ⎢ 0 A2 0 ··· · 0 ⎥ ⎢ ⎥ ⎢ · · ···· · · ⎥ ⎢ ⎥ ⎢ · · ···· · · ⎥ ⎢ ⎥ ⎢ · · ···· · · ⎥ ⎢ ⎥ ⎢ · · ···· · · ⎥ ⎣ ⎦ 00····Ar−1 0 00···· 0 Ar

∈ ( ) , where Ai Mni Di . The map which associates to each matrix of the above type to ( , ,..., ) ( ) ( )× ( )× A1 A2 Ar defines an isomorphism from EndR M to Mn1 D1 Mn2 D2 ··· ( )  Mnr Dr .

Proposition 9.1.18 Let R be a ring with identity. Let a ∈ R. Then the map fa from R to R defined by fa(x) = x · a is a member of EndR(R, +), where (R, +) is treated as a left R-module. Further, the map f from R to EndR(R, +) defined by f (a) = fa is an anti-isomorphism.

Proof fa(x + y) = (x + y) · a = x · a + y · a = fa(x) + fa(y).Also fa(b · x) = b · x · a = b · fa(x). Thus, fa ∈ EndR(R, +).Next, f (a + b)(x) = fa+b(x) = (a + b) · x = a · x + b · x = fa(x) + fb(x) = f (a)(x) + f (b)(x) = ( f (a)+ f (b))(x) for all a, b, x ∈ R. Thus, f (a+b) = f (a)+ f (b).Next, fab(x) = x · ab = fb( fa(x)) for all a, b, x ∈ R. This shows that f (ab) = f (b) f (a) for all a, b ∈ R, and so f is a anti-homomorphism. Suppose that f (a) = f (b). Then a = a · 1 = fa(1) = f (a)(1) = f (b)(1) = fb(1) = b · 1 = b. Thus, f is an injective anti-homomorphism. Further, given any f ∈ EndR(R, +), f = f ( f (1)). This shows that f is also surjective. 

Theorem 9.1.19 Let R be a left semi-simple ring. Then there are only finitely non- isomorphic simple left modules each isomorphic to a simple left ideal of R. Let {M1, M2,...,Mr } be a set of pairwise nonisomorphic simple left R-modules such that each simple left R-module is isomorphic to Mi for some i. Let Di denote the 9.1 Semi-simple Rings and Modules 339 division ring EndR(Mi ). Then there exists positive integers n1, n2,...,nr such that the ring R is isomorphic to the ring

( ) × ( ) ×···× ( ). Mn1 D1 Mn2 D2 Mnr Dr

Proof Let R be a left semi-simple ring. Let M be a nontrivial simple left R-module. Then there is a maximal left ideal A of R such that R/A is isomorphic to M. Since R is left semi-simple, there is a left ideal B of R such that R = A ⊕ B.But, then B, being isomorphic to R/A, is a simple (minimal nontrivial)left ideal, and it is isomorphic to M. Since R is a left semi-simple ring, it is direct sum of simple left ideals. Suppose that

R =⊕α∈ Aα, where Aα is a simple left ideal for each α ∈ .Thus,1∈ R can be uniquely expressed as

= + + ··· + , 1 eα1 eα2 eαt

∈  = where αi , and each eαi 0 is a member of Aαi . Since each Aαi is a simple left = ideal, Reαi Aαi . Thus,

= · = + + ··· + = ⊕ ⊕···⊕ . R R 1 Reα1 Reα2 Reαr Aα1 Aα2 Aαr

In turn, R is direct sum of finitely many simple left ideals. Suppose that

R = A1 ⊕ A2 ⊕···⊕ At , where Ai is simple left ideal of R. We show that any nontrivial simple left module is isomorphic to Ai for some i.LetM be a nonzero simple left R-module. Then {0} = M = RM. Hence Ai M ={0} for some i. Since Ai M is a nonzero submodule of M and M is simple, it follows that Ai M = M. In turn, it follows that Ai x ={0} for some x ∈ M. Since Ai x is also a submodule of M,wehaveAi x = M. Define a map f from Ai to M by f (a) = a · x. Then f is clearly a surjective R- homomorphism. Since Ai is simple and Kerf = Ai , it follows that Kerf ={0}. Thus, f is an isomorphism from Ai to M. This shows that every simple left R-module is isomorphic to Ai for some i, and so there are only finitely many nonisomorphic simple left R-modules. Let {M1, M2,...,Mr } be a set of pairwise nonisomorphic simple left R-modules such that every simple left R-module is isomorphic to Mi for some i. Suppose that ni of A1, A2,...,At are isomorphic to Mi . Then as left R-module, R is isomorphic to

n1 ⊕ n2 ⊕···⊕ nr . M1 M2 Mr 340 9 Representation Theory of Finite Groups

Thus, EndR(R, +) is isomorphic to

( ) × ( ) ×···× ( ). Mn1 D1 Mn2 D2 Mnr Dr

 t ( ) The map A A defines an anti-isomorphism from Mni Di to itself. Thus, the above ring is anti-isomorphic to itself. Further, we have seen that R is anti-isomorphic to EndR(R, +). Since composition of two anti-isomorphisms is isomorphisms, the result follows.  Corollary 9.1.20 Every left semi-simple ring is anti-isomorphic to itself. 

It is clear that R is left semi-simple if and only if the opposite ring of R is right semi-simple. Thus, we have the following corollary. Corollary 9.1.21 Left semi-simple rings and right semi-simple rings are same. 

From now onward, we shall simply write semi-simple ring instead of left semi- simple or right semi-simple. Corollary 9.1.22 A semi-simple ring is commutative if and only if it is direct product of fields.

Proof The result follows if we observe that Mn(D) is commutative if and only if D is a field, and n = 1. 

Let M bealeftR-module, and S = EndR(M). Then M is also a S-left module with respect to · defined by f · x = f (x).

Theorem 9.1.23 (Jacobson Density Theorem) Let M be a left simple R-module and S = EndR(M). Let f ∈ EndS(M), and x1, x2,...,xr ∈ M. Then there exists a ∈ R such that f (xi ) = axi for all i. Proof We first prove the result for r = 1, and for a semi-simple left R-module M. Assume that M is semi-simple. Then there is a submodule N of M such that M = Rx1 ⊕ N. The first projection p1 is a member of EndR(M). Hence f (x1) = f (p1(x1)) = p1 · f (x1) = p1( f (x1)). Thus, f (x1) ∈ Rx1, and so f (x1) = ax1 for some a ∈ R. Next, assuming that M is simple, we prove the result for arbitrary r. Consider the map f r from Mr to Mr defined by

r f (u1, u2,...,ur ) = ( f (u1), f (u2),..., f (ur )).

 r Clearly, S = EndR(M ) = Mr (S) is the ring of r × r matrices with entries in S. Now, f r preserves addition, and

r ( · ( , ,..., )) = r (r · ,r · ,...,r · ), f h u1 u2 ur f i=1hi1 u1 i=1hi2 u2 i=1hir ur

 r where h =[hij]∈S . Applying the definition of f , and observing that f ∈ r EndS(M), it follows that the above is same as h · f (u1, u2,...,ur ). This shows 9.1 Semi-simple Rings and Modules 341

r r r r that f ∈ EndS (M ). Since M is semi-simple, for (x1, x2,...,xr ) ∈ M , there r is a a ∈ R such that f (x1, x2,...,xr ) = a · (x1, x2,...,xr ). This shows that f (xi ) = axi for all i. 

Remark 9.1.24 The term density theorem for the above result is justified in the following sense. Give M the discrete topology. The set M M of all maps from M to M can be considered as product of M copies of M. Give the product topology to M M M . EndS(M) is a subset of M . Give the subspace topology to EndS(M).Let a ∈ R.Themap fa from M to M defined by fa(x) = a · x is easily seen to be a member of EndS(M), and the map f from R to EndS(M) defined by f (a) = fa is a ring homomorphism. The Jacobson density theorem can be restated by saying that f (R) is dense in EndS(M)(justify).

Corollary 9.1.25 (Burnside) Let V be a finite-dimensional vector space over an algebraically closed field F. Then V is simple EndF (V )-module, and it cannot be simple over any proper sub-algebra of EndF (V ).

Proof Let v and w be nonzero members of V . Then there is a member T of EndF (V ) such that T (v) = w. This says that V is a simple left EndF (V )-module. Now, let R be a sub-algebra of EndF (V ) such that V is left simple over R. We show that R = EndF (V ). Since V is a simple left R-module, EndR(V ) is a division ring. Let a ∈ F.Themap fa from V to V defined by fa(v) = a ·v is a member of EndR(V ), for fa(g · v) = a · g(v) = g(a · v) = g · fa(v) for all g ∈ EndF (V ).Themap f from F to EndR(V ) defined by f (a) = fa is an embedding of F into EndR(V ),for fa = fb implies that a ·v = b ·v for all v ∈ V , and this, in turn, implies that a = b. Also if h ∈ EndR(V ), then ( faoh)(v) = a · h(v) = h(a · v) = (hofa)(v) for all a ∈ F. This shows that F is embedded as subfield of EndR(V ) contained in the center. We identify the embedded subfield by F. More precisely, we identify a and fa.Leth ∈ EndR(V ).LetF(h) denote the subfield of EndR(V ) generated by F and h(note that h commutes with each element of F). Further, observe that EndR(V ) is a F-subspace of EndF (V ) which is finite dimensional. Hence, F(h) is also a finite- dimensional subspace over F. Thus, F(h) is a finite field extension of F. Since F is algebraically closed, it follows that h ∈ F. This shows that F = EndR(V ).Let {v ,v ,...,v } ∈ ( ) = ( ) 1 2 r be a basis of V over F.LetT EndF V EndEndR (V ) V .By the Jacobson density theorem, there is a h ∈ R such that T (vi ) = h(vi ) for all i, and so T = h. This shows that R = EndF (V ). 

Let F be a field, and G be a finite group. Let V be a finite-dimensional vector space over F which is also a F(G)-module. Then the map f from F(G) to EndF (V ) defined by f (g∈G αgg)(v) = (g∈G αgg)·v is an injective algebra homomorphism. Thus, F(G) can be thought of as a sub-algebra of EndF (V ). The following corollary is restatement of the above corollary in this situation.

Corollary 9.1.26 Let F be an algebraically closed field, and G be a finite group. Let V be a simple F(G)-left module which as F-space is of dimension n. Then the set { fg | g ∈ G} generates EndF (V ) as a F-space, where fg is the linear transformation 342 9 Representation Theory of Finite Groups

2 given by fg(v) = g · v. In particular, G contains at least n elements, and we have 2 a subset S of G containing n elements such that the set { fg | g ∈ S} is a basis of EndF (V ). 

Let F be a field, and G be a subgroup of the general linear group GL(n, F). Then every element A∈G αA A of F(G) can also be viewed as a member of Mn(F) = n EndF (F ). In other words, we have a algebra homomorphism from F(G) to Mn(F). This makes F n aleftF(G)-module. We say that G is an irreducible subgroup if the F(G) - module described above is simple. This amounts to say that given v = 0inF n, n and any w ∈ F , there is an element A∈G αA A in F(G) such that A∈G αA A·v = w, or equivalently, the subspace generated by {A · v | A ∈ G} is V . An other way to express this is to say that F n has no nontrivial proper G-invariant subspace.

Corollary 9.1.27 Let F be an algebraically closed field, and G be an irreducible subgroup of GL(n, F). Suppose further that the set {Tr(A) | A ∈ G} is finite, and it contains m elements. Then G is finite and contains at most mn2 elements.

Proof It follows from the above corollary that the F-subspace of Mn(F) generated by G is Mn(F). We can therefore choose a basis of Mn(F) out of elements of G. { 1, 2,..., n2 } ( ) p =[ p ]∈ ≤ 2 Let A A A be a basis of Mn F , where A aij G for all p n . =[ ] p p Let A aij be an arbitrary element of G. Let us denote the trace of A A by αA. Thus,

n p = p . i, j=1aija ji αA

2 This shows that the entries a ji of A are a solution to the system of n linear equations

n p = p i, j=1aijx ji αA

1 2 n2 in unknowns x ji. Since {A , A ,...,A } is a basis of Mn(F), the above system of linear equations has a unique solution. Hence, A is uniquely determined by the trace p p αA of A A. Since the number of traces of elements of G is at most m,wehaveat p n2 most m choices for αA for each p. The choices for A, therefore, are at most m . This shows that G contains at most mn2 elements. 

Corollary 9.1.28 Let G be a subgroup of GL(n, F) with finitely many conjugacy classes, where F is an arbitrary field. Then G is finite.

Proof Since GL(n, F) is a subgroup of GL(n, F), where F is algebraic closure of F, we may assume that F is algebraically closed. We may assume that G is irreducible. Since G has only finitely many conjugacy classes, and conjugate elements have same trace, it follows that there are only finitely many traces of elements of G. The result follows from the above corollary. 

We give few applications of the above result to the linear groups. 9.1 Semi-simple Rings and Modules 343

Theorem 9.1.29 (Burnside) Let F be a field of characteristic 0. Then all finite exponents subgroups of GL(n, F) are finite. Indeed, if G is of exponent m, then it contains at most mn3 elements.

Proof Clearly, GL(n, F) is a subgroup of GL(n, F), where F is algebraic closure of F. Thus, there is no loss in assuming that F is algebraically closed. The proof is by the induction on n.Ifn = 1, then G is subgroup of F  of finite exponent m. Since the number of solutions of the equation X m = 1inF is at most m, order of a subgroups of F  of exponent m is at most m = m13 . Assume that the result is true for subgroups of GL(r, F), where r < n. Then we prove the result for subgroups of GL(n, F).LetG be a subgroup of GL(n, F) of finite exponent m, where F is an algebraically closed. Suppose that G is irreducible. Since Am = I for all A ∈ G, all eigenvalues of elements of G are the mth roots of 1. Since trace of a matrix is sum of its eigenvalues, there are at most mn traces of the members of G.Itfollows from the above corollary that G is of order at most (mn)n2 = mn3 . Now, suppose that G is reducible. Then there is a nontrivial proper subspace W of F n which is invariant under the multiplication by elements of G. Clearly, DimW = s < n, and DimFn/W = t < n. We have a homomorphism ρ from G to GL(W) defined by ρ(A) = A/W, where A/W is the restriction of the matrix multiplication on F n to W. Then ρ(G) is a subgroup of GL(W) of exponent at most m. By the induction s3 hypothesis, ρ(G) is finite of order at most m .LetH1 be the kernel of ρ. Then by the fundamental theorem of homomorphism, H1 is a normal subgroup of G of index s3 at most m . Clearly, H1 ={A ∈ G | A · w = w for all w ∈ W}. Next, since W is invariant under multiplication by the members of G, we have a homomorphism η from G to GL(F n/W) defined by η(A)(v + W) = A · v + W. By the induction t3 hypothesis, η(G) being of exponent at most m is finite of order at most m .LetH2 t3 be the kernel of η. Then H2 is also a normal subgroup of G of index at most m .Let s3 t3 s3+t3 H = H1 H2. Then H is also normal of index at most m · m = m . Since s + t = n and s3 + t3 ≤ (s + t)3,itfollowsthatG/H is of order at most mn3 .Also H acts trivially on W as well as on F n/W. Hence, we can find a basis of F n with respect to which representation of all elements are upper triangular, and all of whose diagonal entries are 1. Since F is of characteristic 0, all nonidentity unitriangular matrices are of infinite order, and so H is trivial. This shows that G is of order at most mn3 . 

The group GL(n, F) can be thought of as a subgroup of GL(n + 1, F) by iden- tifying a n × n matrix A by

A 0n,1 . 01,n 1

The union of the chain

GL(1, F) ⊂ GL(2, F) ⊂···⊂ GL(n, F) ⊂ GL(n + 1, F) ⊂··· 344 9 Representation Theory of Finite Groups is a group denoted by GL(F). A subgroup of GL(F) is called a linear group. It is clear that every finitely generated linear group is subgroup of GL(n, F) for sufficiently large n. Thus, we have the following corollary.

Corollary 9.1.30 Every finitely generated linear group over a field of characteristic 0 is finite if and only if it is of finite exponent. 

Burnside conjectured that all finitely generated groups of finite exponent is finite. This conjecture turns out to be false. In fact, we have uncountably many 2-generator infinite simple groups all of whose nontrivial proper subgroups are cyclic groups of same prime order p (p sufficiently large). As such, another modified conjecture known as restricted Burnside conjecture was framed. The restricted Burnside conjec- ture asserts that for all n and r, there is a finite group RB(n, r) of exponent r which is generated by n elements such that every n-generator finite group of exponent r is quotient of RB(n, r). This conjecture was finally settled by Zelmanov in 1994.

Theorem 9.1.31 (Schur) Every torsion subgroup of GL(n, Q) is finite. In fact, there exists a function f from N to N such that order of every torsion subgroup of GL(n, Q) is less than or equal to f (n).

Proof It is sufficient to show the existence of a function g such that order of each finite-order element of GL(n, Q) is at most g(n), for then, using the above result of Burnside, order of each torsion subgroup of GL(n, Q) is at the most f (n), where f (n) = g(n)n3 . We show that if m is order of an element of GL(n, Q), then φ(m) ≤ n, where φ is the Euler’s phi function. This, in turn, implies that there is a function g on N such that m is bounded by a function g(n). The proof of the assertion is by the induction on n.Ifn = 1, then GL(1, Q) = Q. The only elements of Q of finite order are 1 and −1. Since φ(1) = φ(2) = 1, the result follows for n = 1. Assume the result for all GL(r, Q), wherer < n. Consider the subgroup G = < x > of GL(n, Q), where x is an element of order m. Suppose that G is irreducible. Then n n Q is a simple Q(G) module. Thus, EndQ(G)(Q ) = D is a division algebra over Q. Clearly, the center F of D is a subfield containing Q and x. Since x is of order m, it is root of the cyclotomic polynomial m (X) over Q, and m(X) is irreducible of degree φ(m) over Q. Since x ∈/ Q, it is irreducible polynomial of x. Thus, there exists v ∈ Qn,v= 0 such that {v, xv, x2v,...,xφ(m)−1v} is linearly independent; otherwise, x will be a root of a polynomial of lower degree. This means that φ(m) ≤ n. This completes the proof of the theorem. 

AmatrixA in Mn(F) is called unipotent if all its characteristic roots are 1. Clearly, unipotent matrices are nonsingular.

Proposition 9.1.32 Let F be an algebraically closed field. Every subgroup G of GL(n, F) consisting of unipotent matrices is conjugate to a subgroup of the subgroup U(n, F) of uni-upper triangular matrices.

Proof To say that G is conjugate to a subgroup of U(n, F) is to say that there is a basis of F n such that the matrix representation of linear transformations from F n to 9.1 Semi-simple Rings and Modules 345

F n determined by the multiplications by elements of G are uni-upper triangular. The proof is by the induction on n.Ifn = 1, then there is nothing to do. Assume that the result is true for all subgroups of unipotent transformations in GL(r, F) for r < n. We prove it for a subgroup G of GL(n, F). Suppose that G is irreducible. Then, since each member of G is unipotent, the trace of each member of G is n. Hence, G contains at most 1n2 = 1 element. But, then G is the trivial group. Assume that G is not irreducible. Then there is a subspace W of F n such that W is invariant under multiplication by elements of G. This defines a homomorphism ρ from G to GL(W) defined by ρ(A) = A/W, where A/W is the restriction of the multiplication by A to W. Clearly, ρ(G) consists of unipotent transformations in GL(W).By the induction assumption, we can find a basis {w1,w2,...,ws } of elements of W such that the matrix representation of elements of ρ(G) with respect to this basis is uni-upper triangular. Also elements of G induce unipotent transformations on F n/W. This gives us a homomorphism η from G to GL(F n/W) such that η(G) is a group of unipotent transformations. By the induction hypothesis, we can find n a basis {v1 + W,v2 + W,...,vt + W} of F /W so that the matrix representation of each member of η(G) with respect to this basis is uni-upper triangular. Clearly, n {w1,w2,...,ws ,v1,v2,...,vt } is a basis of F with respect to which all members of G are uni-upper triangular. 

Exercises

9.1.1 Describe all semi-simple modules over a P.I.D.

9.1.2 Describe all simple modules over R[X].

9.1.3 What are integral domains which are semi-simple rings?

9.1.4 Let G be a cyclic group of order p. Show that Zp(G) is not semi-simple. Z2 Z Hint. p is a p vector space. It is G-module with respect to the multiplication i defined by a · (u, v) = (u + iv,v). Show that Zp ×{0} is a Zp(G) submodule but it is not a direct summand.

9.1.5 Let G be a finite p-group. Show if a Zp(G) module M is simple, then it is of dimension 1 over Zp.

9.1.6 Show that the theorem of Schur is not true in GL(n, C).IsittrueinGL(n, R)?

9.1.7 Generalize the last result of the section for arbitrary fields.

9.1.8 Let V be a vector space of dimension n over a field F with basis {e1, e2,...,en}. ( ) · n = n Define a F Sn module structure on V by p i=1ai ei i=1ai ep(i). Show that it is not simple. Determine a simple submodule of V and its direct compliment. 346 9 Representation Theory of Finite Groups

9.2 Representations and Group Algebras

Let G be a group and V a vector space over a field F. A homomorphism ρ from G to GL(V ) is called a linear representation of G over F. We shall be interested in case when V is finite dimensional. Such representations are called finite-dimensional representations. The dimension of V is called the degree of the representation. If we fix a basis of V , then we get an isomorphism from GL(V ) to GL(n, F), and so a homomorphism from G to GL(n, F). This is also called a representation, or matrix representation of G of degree n. Let ρ be a representation of a group G on a vector space V over a field. Then V becomes a left F(G)-module with respect to the external multiplication · defined by (g∈G αgg)·v = g∈G αgρ(g)(v). This module will be termed as module associated to the representation ρ. Conversely, suppose that V is a left F(G)-module. Then already V is a vector space over F, and for each g ∈ G,themapρ(g) from V to V defined by ρ(g)(v) = g · v is a linear transformation such that ρ(g1g2) = ρ(g1)oρ(g2) for all g1, g2 ∈ G, and ρ(e) = IV .Itfollowsthatρ(g) is bijective and (ρ(g))−1 = ρ(g−1) for all g ∈ G. This says that ρ is a representation of G on V .This representation will be termed as the representation associated to the F(G)-module. This correspondence between representations over F and F(G)-modules is faithful in the sense that each can be recovered from the other. We have already developed the language of modules. The language of module theory and the representations correspond in the following manner. 1. F(G)-module ←→ representation over F. 2. F(G) submodule ←→ sub-representation. 3. Simple F(G)-module ←→ irreducible representation. 4. Direct sum of F(G) modules ←→ direct sum of representation. 5. If V and W are left F(G) modules, then both of them are vector spaces over F. We can make V ⊗ W a F(G) module by defining g · (v ⊗ w) = (g · v)⊗ (g · w). The representation thus obtained is called the tensor product of the representation corresponding to F(G) module V and the F(G) module W. 6. Let ρ be a representation of G associated to the F(G) module V . Then the rth r n exterior power V is a vector space of dimension Cr , where n is the dimension of V . This can be made a F(G)-module by defining

g · (v1 v2 ··· vr ) = g · v1 g · v2 ··· g · vr .

The representation thus obtained is called the rth exterior power of ρ, and it is denoted by r ρ. 7. Let ρ be the representation associated to the F(G)-module V .LetSr (V ) denote r r the rth symmetric power of V . More precisely, S (V ) = (⊗ V )/Ar ), where Ar is the subspace of ⊗r V generated by the elements of the type

v1 ⊗ v2 ⊗···⊗vr − vp(1) ⊗ vp(2) ⊗···⊗vp(r), 9.2 Representations and Group Algebras 347

r where p is a permutation in Sr . We already have a F(G)-module structure on ⊗ V defined above which affords the rth tensor power of the representation ρ associated r to the F(G)-module V . It is easily noticed that Ar is a F(G)-submodule of ⊗ V . Hence, Sr V is also a F(G) module. The associated representation is called the rth r symmetric power of ρ, and it is denoted by S ρ. 2 2 2 Consider the quotient map ν from ⊗ V to S V . The kernel of this map is V (verify). Thus, ⊗2ρ = S2ρ ⊕ 2 ρ. 8. Representations associated to isomorphic F(G)-modules are called equivalent. Suppose that ρ is the representation of G associated to the F(G)-module V , and η a representation associated to the F(G)-module W. Then ρ is equivalent to η if and only if there is a F(G)-module isomorphism T from V to W. Clearly, T is a vector space isomorphism from V to W such that T (g · v) = g · T (v) for all g ∈ G and v ∈ V , or equivalently, T (ρ(g)(v)) = η(g)(T (v)) for all g ∈ G and v ∈ V .This means that η(g) = T ρ(g)T −1 for all g ∈ G. Thus, ρ is equivalent to η if there is a nonsingular linear transformation T from V to W such that η(g) = T ρ(g)T −1 for all g ∈ G. Given a representation ρ from G to GL(V ), a representation η from G to GL(W) is called a subrepresentaion if ρ(g)(W) ⊆ W for all g ∈ G, and then η(g) is the restriction ρ(g)/W for all g ∈ G. A representation ρ from G to GL(V ) is irreducible if there is no nontrivial proper subspace W of V such that ρ(g)(W) ⊆ W for all g ∈ G. The Mashcke’s theorem can be restated as follows: Theorem 9.2.1 Let G be a group and F a field. Suppose that characteristic of F does not divide | G |. Then every representation of G over F is direct sum of irreducible representations over F. 

Thus, to determine all representations of a finite group G over a field whose char- acteristic does not divide | G |, it is sufficient to determine nonequivalent irreducible representations of G. Example 9.2.2 Let G be a group, and V be a vector space over F. The trivial homo- morphism which takes every element of G to the identity map IV on V is a represen- tation called the trivial representation on V . It is irreducible if and only if dimV = 1.

A one-dimensional representation of a group G over a field F is exactly homo- morphism from G to F . These are also irreducible representation of G. Distinct one-dimensional representations over F are all nonequivalent (why?). Proposition 9.2.3 Every irreducible representation of an abelian group over an algebraically closed field is one dimensional.

Proof Let G be an abelian group, F an algebraically closed field, and ρ an irreducible representation of G on V . Consider ρ(g). Since F is algebraically closed field, ρ(g) ∈ ={} has an eigenvalue λg F(say). The corresponding eigen subspace Vλg 0 . ∈ v ∈ ( )( ( )(v)) = ( )(v) = ( )(v) = Let h G and Vλg . Then ρ g ρ h ρ gh ρ hg ( )( ( )(v)) = ( )(v) ρ h ρ g λgρ h . Thus, Vλg is invariant under G. Since ρ is irreducible = ( ) ∈ Vλg V , and so ρ g is multiplication by a scalar for each g G. This shows that 348 9 Representation Theory of Finite Groups each subspace of V is invariant under G. Since ρ is irreducible representation, there should not be any proper subspace of V , and so V is one dimensional. 

Thus, to find irreducible representations of abelian groups over an algebraically closed field, it is sufficient to find all homomorphisms from G to F . By the Maschke’s theorem, F(G) is semi-simple if the characteristic of F does not divide | G |. By Theorem 9.1.19, it follows that every simple F(G)-module is isomorphic to a left ideal of F(G) which is also direct summand of F(G) considered as a left module. It is, therefore, necessary to find the structure of the group algebra F(G). First, we find the division rings EndF(G)(V ), where V is a simple F(G)- module. In case F is an algebraically closed field, we have the following proposition.

Proposition 9.2.4 Let F be an algebraically closed field, G a finite group, and V a simple F(G) module. Then for any T ∈ EndF(G)(V ), there exists a unique λT ∈ F such that T is multiplication by λT . Further,the map λ from EndF(G)(V ) to F defined by λ(T ) = λT is an isomorphism.

Proof Let T ∈ EndF(G)(V ). Then T is a linear transformation on V . Since F is = algebraically closed, T has an eigenvalue λT (say). Consider the eigenspace VλT { } v ∈ ∈ , ( · v) = · (v) = · v = · v 0 . Given any VλT and h G T h h T h λT λT h . ( ) = This shows that VλT is a F G -submodule of V . Since V is simple, V VλT , and so T is multiplication by λT .Themapλ which takes T to λT is clearly an isomorphism. 

Theorem 9.2.5 Let F be an algebraically closed field, and G be a finite group such that characteristic of F does not divide | G |. Then there are only finitely many nonequivalent irreducible representations of degrees n1, n2,...,nr such that the following holds.

(i) The group algebra F(G) is isomorphic as F algebra to

( ) × ( ) ×···× ( ). Mn1 F Mn2 F Mnr F

2 + 2 + ··· + 2 =| | (ii) n1 n2 nr G . (iii) n1 = 1 corresponds to the degree of the trivial representation. (iv) The number r of nonequivalent irreducible representations is the number of conjugacy classes of G (called the class number of G).

Proof From Theorem 9.1.19 and the above proposition, it follows that there are positive integers n1, n2,...,nr such that F(G) as F algebra is isomorphic to

( ) × ( ) ×···× ( ). Mn1 F Mn2 F Mnr F

Comparing the F-dimension of F(G) and that of

( ) × ( ) ×···× ( ), Mn1 F Mn2 F Mnr F 9.2 Representations and Group Algebras 349 we obtain (ii). Clearly, the simple left ideals of the above algebra are isomorphic ( ) = , ,... to the simple left ideals of Mni F for i 1 2 r. All simple left ideals of ( ) , ,..., Mni F are isomorphic, and are of dimension ni . Thus, n1 n2 nr represent the dimensions over F of simple left F(G)-modules and so they represent the degrees of the irreducible representations of G over F. Since the trivial representation is of degree 1, we may assume that n1 = 1. Finally, we prove (iv) by comparing the dimension of the center of F(G) and that of algebra

( ) × ( ) ×···× ( ). Mn1 F Mn2 F Mnr F

The center of Mn(F) is the ring of all scalar matrices, which is a vectorspace of dimension 1 over F. Hence, the dimension of the center of

( ) × ( ) ×···× ( ) Mn1 F Mn2 F Mnr F is r.Let{C1, C2,...,Ct } be the set of all distinct conjugacy classes of G.Let =  −1 =  −1 =  = ui x∈Ci x. Since gui g x∈Ci gxg y∈Ci y ui , it follows that ui is in the center of F(G) for each i. We show that {u1, u2,...,ut } is a basis of the center of F(G). Since distinct conjugacy classes are disjoint, and the set G is linearly independent in F(G),itfollowsthat{u1, u2,...,ut } is linearly independent. −1 Let g∈G αgg be a member of the center of F(G). Then hg∈G αggh = g∈G αgg for each h ∈ G. Since G is linearly independent, comparing the coefficients, we get that αg = αhgh−1 for all h ∈ G. Thus, αg = αh whenever g and h are in same conjugacy class. Let αi = αg for each g ∈ Ci . Then g∈G αgg = t { , ,..., } ( ) i=1αi ui . This shows that u1 u2 ut form a basis of the center of F G . Thus, t is the dimension of the center of G. Hence, r = t is the number of conjugacy classes of G. 

Remark 9.2.6 We shall show later that the degrees of irreducible representations divide the order of the group.

Example 9.2.7 Let F be an algebraically closed field of characteristic different from 2. We find the irreducible representations of the Klein’s four group V4. From the above results, it follows that there are 4 irreducible representations of V4, and they are all of  degree 1. Thus, we have to find all 4 distinct group homomorphisms from V4 to F . We list them. Let ρ1 denote the trivial homomorphism which maps each element of V4 to 1. Let ρ2 denote the map which takes e and a to 1 and b and c to −1. Check that it is indeed a homomorphism. Let ρ3 denote the map which takes e and b to 1 and the rest to −1. ρ4 is the map which takes e and c to 1 and the rest to −1. These are the only irreducible representations of V4. Note that all these irreducible representations of V4 are realized on any field of characteristic different from 2.

Example 9.2.8 We find all irreducible representations of the Quaternion group Q8 over an algebraically closed field F of characteristic different from 2. There will be as many irreducible representations of Q8 as many conjugacy classes of Q8. There 350 9 Representation Theory of Finite Groups are 5 conjugacy classes of Q8.Theyare{1}, {−1}, {i, −i}, { j, − j} and {k, −k}. Thus, there are 5 irreducible representations of degrees 1, n2, n3, n4, n5 such that + 2 + 2 + 2 + 2 = = = = 1 n2 n3 n4 n5 8. The only possible solution is n2 n3 n4 1, and n5 = 2. In other words, there are 4 irreducible representations including the trivial representation of degrees 1, and there is a unique two-dimensional irreducible repre- sentation. Welist them. All one-dimensional representations are just homomorphisms   from Q8 to F . Note that the kernel of any homomorphism from Q8 to F contains  the commutator subgroup {1, −1} of Q8 (forF is abelian). Since Q8/{1, −1} is iso-  morphic to the Klein’s four group, we get the four homomorphisms from Q8 to F as in the above example. Thus, we have 4 one-dimensional representations, viz., ρ1 the trivial representation, ρ2 the homomorphism which takes 1 and −1to1,i, −i to 1, and the rest of them to −1. Similarly, we have two other homomorphisms from Q8 to F . Finally, we determine the two-dimensional irreducible representation. Since F is algebraically closed field of characteristic different from 2, X 4 − 1 = 0 has 4 distinct roots, which form a cyclic group of order 4. Let ξ denote the primitive 4 roots of unity. Then the map

ξ 0 01 0 ξ i  , j  , k  0 −ξ −10 ξ 0 defines a representation which is irreducible.

Example 9.2.9 Let F be an algebraically closed field of characteristic different from 2 and 3. We find all the irreducible representations of the symmetric group S3 over F. Since there are 3 conjugacy classes of S3, there are 3 irreducible representation of , + 2 + 2 = S3 over F. Suppose that there degrees are 1, n2 n3. Then 1 n2 n3 6. The only possible solution is n2 = 1 and n3 = 2. Thus, there are 2 one-dimensional irreducible representations ρ1, ρ2, and 1 two-dimensional irreducible representation ρ3. The one-  dimensional representations ρ1 and ρ2 are just homomorphisms from S3 to F .We  have a trivial homomorphism ρ1 from S3 to F which maps every member of S3 to 1,  and a nontrivial homomorphism ρ2 from S3 to F given by ρ2(p) = χ(p), where χ is the alternating map. Now, we describe two-dimensional irreducible representation ρ3.LetV be a vector space over F of dimension 3 with a basis {e1, e2, e3}. Consider the representation ρ from S3 to GL(V ) defined by ρ(p)(x1e1 + x2e2 + x3e3) = x1ep(1) + x2ep(2) + x3ep(3). Consider the subspace U ={α(e1 + e2 + e3) | α ∈ F} of V . Clearly, U is such that ρ(g)(U) ⊆ U. In the language of modules U is a F(S3)- submodule of V . The sub-representation thus obtained is the trivial representation ρ1. Consider the subspace W ={x1e1 + x2e2 + x3e3 | x1 + x2 + x3 = 0} of V . Clearly, W is of dimension 2, and it is also a F(S3)-submodule. The corresponding representation ρ3 is 2 dimensional. We show that it is irreducible by showing that this is simple F(S3) module. Let w = x1e1 + x2e2 + x3e3 be a nonzero element of W. Then at least two of x1, x2, x3 are nonzero, and x1 + x2 + x3 = 0. Suppose that x1 = 0 = x2. We show that there is a permutation p ∈ S3 such that w and ρ3(p)(w) = p · w are linearly independent. Suppose not. Then w and p · w are linearly dependent for all p ∈ S3. Thus, for each p ∈ S3, there is a scalar αp such 9.2 Representations and Group Algebras 351 that w = αp p · w. Taking p = (2, 3) and comparing the coefficients of e1, e2, e3, we find that αp = 1, and x2 = x3. Similarly, x1 = x2. Since x1 + x2 + x3 = 0, we see that xi = 0 for all i. This is a contradiction. Hence, there is a p ∈ S3 such that w and p · w are linearly independent. This shows that W has no nontrivial F(S3)- submodule, and so ρ3 is the two-dimensional irreducible representation. The exterior 2 power ρ3 is a one-dimensional representation which maps even permutations to 1, and the odd permutations to −1 (this representation is called the sign representation).

Exercises

9.2.1 Find all irreducible representations of a group of order 15 over the field C of complex numbers as well as over the field Q of rational numbers.

9.2.2 Find all irreducible representations of the dihedral group D8 and also of A4 over C and also over Q.

9.2.3 Find the number of irreducible representations of S4 over C. Find also the degrees. Determine the structure of C(S4). Find some of the irreducible representa- tions of S4 using the method of the last example of this section. 9.2.4 Show that over any field the number of irreducible representations of a group G can be at the most the class number of G.

9.2.5 Find the number of nonequivalent complex irreducible representation of each of the extra special p-groups of order p3. Find also their degrees.

9.3 Characters, Orthogonality Relations

Let ρ be a representation of a group G on a finite-dimensional vector space V over a field F.Themapχρ from G to F defined by χρ(g) = traceρ(g) is called the character of G afforded by the representation ρ. Characters afforded by irreducible representations are called irreducible characters.

Proposition 9.3.1 Characters are class functions in the sense that they are constants on conjugacy classes of G.

Proof Let χρ be the representation afforded by the representation ρ. Then ρ(ghg−1) = ρ(g)ρ(h)ρ(g)−1 for all g, h ∈ G. Since similar transformations have −1 same trace, it follows that χρ(h) = χρ(ghg ) for all g, h ∈ G.  Proposition 9.3.2 Equivalent representations afford same characters.

Proof Let ρ and η be equivalent representations on vector spaces V and W, respec- tively. Then there is an isomorphism T from V to W such that η(g) = T ρ(g)T −1 for all g ∈ G. Hence, χη(g) = trace(η(g)) = trace(ρ(g)) = χρ(g) for all g ∈ G.  352 9 Representation Theory of Finite Groups

= + Proposition 9.3.3 Let ρ1 and ρ2 be representations. Then χρ1⊕ρ2 χρ1 χρ2 and = · χρ1⊗ρ2 χρ1 χρ2 .

Proof The result follows from the fact that trace(T1 + T2) = traceT1 + traceT2 and trace(T1 ⊗ T2) = trace(T1) · trace(T2). 

From the above result, it follows that sums and products of characters are charac- ters, and the set of characters form a semi-ring. We complete this semi-ring to ring by putting negatives of characters called the virtual characters. The ring thus obtained is called the character ring, and it is denoted by Ch(G). Let ρ be the representation afforded by the F(G)-module M, μ the representation afforded by the F(G)-submodule N of M, and ν the representation afforded by the quotient module M/N. It follows from elementary linear algebra that traceρ(g) = traceμ(g) + traceν(g) for all g ∈ G. Next, since M is finite-dimensional vector space, there is a composition series of F(G)-module M whose factors are simple. This proves the following proposition.

Proposition 9.3.4 Every character (even if the characteristic F divides the order of the group) is sum of irreducible characters. 

The members of F(G) can be viewed as function from G to F. Indeed, we identify the member g∈G αgg by the function α from G to F defined by α(g) = αg.A character χ of G is, therefore, a member of F(G). Since characters are class functions, they belong to the center of F(G). Let ρ be a representation of G on a finite-dimensional vector space V over a field 2 F.Let{x1, x2,...,xn} be a basis of V . Then we get n functions ρij from G to F defined by

( )( ) = n ( ) . ρ g x j i=1ρij g xi

( ) = n ( ) The character χρ of ρ is given by χρ g i=1ρii g . Suppose that the characteristic of F does not divide | G |. Define a map <, > from F(G) × F(G) to F by

−1 −1 < α, β > = (| G |·1) g∈G α(g)β(g ), where 1 denotes the identity of the field F. It is easy to observe that <, > is a symmetric bilinear form on F(G). Suppose that < α, β > = 0 for all β in F(G). Then for each h ∈ G,wehaveα(h) = < α, ih−1 > = o, where ih−1 is the map from G to F which takes h−1 to 1, and the rest of the elements to 0. This shows that α = 0, and so <, > is a nondegenerate symmetric bilinear form on F(G). Such a bilinear form is also called an inner product on F(G).Now,weshallshow that the set of irreducible characters of G over F form an orthonormal basis of the center of F(G). The results which follow are due to Frobenius, and are called the orthogonality relations. 9.3 Characters, Orthogonality Relations 353

Theorem 9.3.5 Let G be a finite group, and F be an algebraically closed field whose characteristic does no divide the order of G. Let ρ and η be nonequivalent irreducible representations. Then

−1 −1 (| G |·1) g∈G ρik(g )ηpj(g) = 0 for all i, j, k and p.

Proof Let V and W be F(G) modules corresponding to representations ρ and η, respectively. Let {x1, x2,...,xn} be a basis of V , and {y1, y2,...,ym } be a basis of W.LetTji be the linear transformation from V to W which takes xi to y j and xk to 0fork = i. Tji need not be a F(G)-module homomorphism. We average it to make a F(G)-module homomorphism. Define Tji by

−1 −1 Tji(v) = (| G |·1) g∈G η(g)Tji(ρ(g )(v)).

Then Tji is a F(G)-module homomorphism from V to W (see the proof of the Maschke’s theorem). Since V and W are simple and nonisomorphic F(G)-modules, any F(G)-homomorphism from V to W is the 0 map. Since Tji is a F(G)- homomorphism from V to W, it follows that Tji = 0. Hence,

= ( ) = (| |· )−1 ( )( (n ( −1) ). 0 Tji xk G 1 g∈G η g Tji l=1ρlk g xl

This in turn gives

(| |· )−1 ( −1)m ( ) = . G 1 g∈G ρik g p=1ηpj g yp 0

Since {y1, y2,...,ym } is a basis, we see that

−1 −1 (| G |·1) g∈G ρik(g )ηpj(g) = 0 for all i, j, k and p. 

Corollary 9.3.6 Let χρ and χη be distinct irreducible characters of G over an alge- braically closed field whose characteristic does not divide | G |. Then < χρ, χη > = 0.

Proof Since χρ = χη, ρ and η are nonequivalent. In the above theorem putting k = i and p = j, and then summing over all i and j, we see that < χρ, χη > = 0. 

Theorem 9.3.7 Let ρ be an irreducible representation of a finite group G on a vector space V of dimension n over an algebraically closed field whose characteristic does not divide | G |. Let {x1, x2,...,xn} be a basis of V , and [ρij(g)] be the matrix of ρ(g) with respect to this basis. Then 354 9 Representation Theory of Finite Groups

−1 (i) g∈G ρij(g )ρkl(g) = 0 if j = kori= l and −1 (ii) n · g∈G ρij(g )ρ ji(g) =|G |·1.

Proof V is a simple F(G)-module. By Proposition 9.2.4 every F(G)-endomorphism of V is multiplied by a scalar. Fix i and j, and then consider the average Tji of the linear transformation Tji which maps xi to x j . Suppose that this endomorphism Tji of V is multiplication by α ji. Then Tji(xk ) = α jixk for all i, j and k.Now

−1 −1 Tji(xk ) = (| G |·1) g∈G ρ(g)(Tji(ρ(g )(xk ))).

In turn, we get that

= (| |· )−1 n ( ) ( −1) . α jixk G 1 g∈G l=1ρlj g ρik g xl

Since {x1, x2,...,xn} is a basis, we get that

−1 −1 (| G |·1) g∈G ρlj(g)ρik(g ) = 0 if k = l and

−1 −1 α ji = (| G |·1) g∈G ρkj(g)ρik(g ).

Applying the above argument again, we see that α ji = 0ifi = j.Also

−1 −1 αii = (| G |·1) g∈G ρki(g)ρik(g ) = αkk for all i, k.Now,

= (| |· )−1(n ( ( ) ( −1))) nαii G 1 k=1 g∈G ρki g ρik g = (| |· )−1( (n ( ) ( −1))). G 1 g∈G k=1ρki g ρik g

= ( −1 ) = n ( ) ( −1) Next, 1 ρii g g k=1ρki g ρik g . Hence

−1 nαii = (| G |·1) (| G |·1) = 1.

This shows that

−1 −1 n · (| G |·1) g∈G ρki(g)ρik(g ) = 1.

Multiplying | G |·1 we get the result. 

Corollary 9.3.8 (Orthogonality relation) Let G be a finite group and F an alge- braically closed field whose characteristic does not divide the order of G. Then the set of irreducible characters of G over F form an orthonormal basis of the center of F(G) (which can be interpreted as vector space of class functions on G). 9.3 Characters, Orthogonality Relations 355

Proof By Corollary 9.3.6, it follows that distinct irreducible characters are orthogo- nal. Next, if χρ is an irreducible character afforded by the irreducible representation ρ, then from the above theorem it follows that

−1 −1 < χρ, χρ > = (| G |·1) g∈G χρ(g)χρ(g ) = (| |· )−1( (n ( )n ( −1))) G 1 g∈G i=1ρii g k=1ρkk g −1 −1 = n · (| G |·1) g∈G ρii(g)ρii(g ) = 1.

This shows that the set of irreducible characters form an orthonormal set. Since the number of irreducible characters is the class number of G, and the class number is the dimension of the center (the space of class functions on G)ofF(G),itfollowsthat the set of irreducible characters form an orthonormal basis of the center of F(G).  Corollary 9.3.9 Let G be a finite group, and F be an algebraically closed field of characteristic 0. Then a representation ρ over F is equivalent to a representation η over F if and only if χρ = χη. Proof Clearly, equivalent representations have same characters. Conversely, sup- pose that ρ and η are representation such that χρ = χη.Let{ρ1, ρ2,...,ρr } be the set of pairwise nonequivalent irreducible representations such that each irreducible representation is equivalent to one of them (r is the class number of G). Then there exists nonnegative integers n1, n2,...,nr , and m1, m2,...,mr such that ρ is equivalent to n1ρ1 ⊕ n2ρ2 ⊕ ··· ⊕ nr ρr , and η is equivalent to ⊕ ⊕ ··· ⊕ = + + ··· + m1ρ1 m2ρ2 mr ρr . Then χρ n1χρ1 n2χρ2 nr χρr , and = + + ··· + = χη m1χρ1 m2χρ2 mr χρr . Since χρ χη, by the orthogonality · = < , > = < , > = · relation, we have ni 1 χρ χρi χη χρi mi 1. Since F is of characteristic 0, we get that ni = mi for all i. This shows that ρ is equivalent to η.  Corollary 9.3.10 Let ρ be a representation of G over an algebraically closed field of characteristic 0. Then ρ is irreducible if and only if < χρ, χρ > = 1.

Proof Let {ρ1, ρ2,...,ρr } be a complete set of pairwise nonequivalent irreducible representations. Then ρ = m1ρ1⊕m2ρ2⊕···⊕mr ρr , where each mi is a nonnegative = + +···+ integer. Then χρ m1χρ1 m2χρ2 mr χρr . From the orthogonality relation, < , > = 2 + 2 +···+ 2 < , > = we find that χρ χρ m1 m2 mr . Thus, χρ χρ 1 if and only if mi = 1 for a unique i, and the rest of m j = 0. Thus, < χρ, χρ > = 1if and only if ρ is equivalent to ρi for some i.  Let ρ be a irreducible representation of a finite group G over a field F whose characteristic does not divide | G |. Then V is a simple F(G)-module, and we have a homomorphism ρ from the ring F(G) to EndF (V ) defined by ρ(g∈G αgg) = g∈G αgρ(g).Ifg∈G αgg is in the center of F(G), then ρ(g∈G αgg) commutes with ρ(g) for all g ∈ G, and so it belongs to EndF(G)(V ). Since V is a sim- ple F(G)-module, members of EndF(G)(V ) are multiplications by scalars. Let { , ,..., } =  C1 C2 Cr be the set of distinct conjugacy classes of G.Letui g∈Ci g. Then as observed, {u1, u2,...,ur } form a basis for the center of F(G).Fromthe previous observation ρ(ui ) is multiplication by a scalar αi (say). 356 9 Representation Theory of Finite Groups

Proposition 9.3.11 Let G be a finite group, F an algebraically closed field of char- acteristic 0 and ρ, an irreducible representation of G over F. Then the scalars αi described in the above paragraph are algebraic integers. =  { , ,..., } Proof Let ui g∈Ci g, where C1 C2 Cr is the set of all distinct conjugacy v ∈ k v ={( , ) ∈ classes of G.Let Ck and aij denote the cardinality of the set Xij g h −1 Ci × C j | gh = v}.Ifw ∈ Ck , then there is x ∈ G such that w = xvx .The ( , )  ( −1, −1) v w map g h xgx xhx is clearly a bijective map from Xij to Xij. Thus, the k , , v ∈ integer aij depends only on i j k, and not on the choice of Ck . This also shows that

= r k . ui u j k=1aijuk

Thus,

( ) ( ) = ( ) = r k ( ). ρ ui ρ u j ρ ui u j k=1aijρ uk

The left-hand side is multiplication by αi α j , and the R.H.S. is multiplication by r k k=1aijαk . This shows that

= r k αi α j k=1aijαk for all i, j and k. We can take C1 to be the conjugacy class {e}, and so u1 = e. Since ρ(u1) = ρ(e) = IV , α1 = 1 = 0. The above equation shows that the column vector ⎡ ⎤ α1 ⎢ ⎥ ⎢ α2 ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢ ⎥ ⎢ · ⎥ ⎣ · ⎦

αr

[ ] = k is an eigenvector of the matrix b jk , where b jk aij, and the corresponding eigenvalue is αi . Thus, αi is a root of the monic polynomial det(xIr −[b jk]) whose = k coefficients are all integers (note that b jk aij are all nonnegative integers). This shows that each αi is an algebraic integer.  Corollary 9.3.12 Let G be a finite group, and F be an algebraically closed field of characteristic 0. Let ρ be an irreducible representation over F of degree n. Let ( ) ∈ =[ : ( )] mχρ g g G. Let m G CG g be the number of conjugates to g. Then n is an algebraic integer(observe that every algebraically closed field of characteristic 0 contains the field of algebraic numbers).

Proof Let Ci be the conjugacy class determined by g. Then the trace of ρ(g) = the trace of ρ(x) for each x ∈ Ci . Thus, traceρ(ui ) = m·traceρ(g) = m·χρ(g). Since 9.3 Characters, Orthogonality Relations 357

ρ is multiplication by αi , and the degree of ρ is n,itfollowsthattraceρ(ui ) = n ·αi . ( ) = mχρ g  Thus, αi n . The result follows from the above theorem. Corollary 9.3.13 The degree of every irreducible representation of a finite group G over an algebraically closed field F of characteristic 0 divides the order of the group.

Proof Let ρ be an irreducible representation of a finite group G of degree n over an algebraically closed field F.Let{C1, C2,...,Cr } be the set of all distinct conjugacy classes of G. Then χρ is constant on each Ci .Letβi be the value of χρ on Ci . Then mi βi from the above corollary, it follows that n is an algebraic integer, where mi is the number of elements in Ci .Letp be the permutation of {1, 2,...,r} defined by −1 = −1 −1 Ci C p(i), where Ci is the set of inverses of the members of Ci (note that Ci is again a conjugacy class). By the orthogonality theorem, we have

1 −1 1 = < χ , χ > =  ∈ χ (g)χ (g ). ρ ρ | G | g G ρ ρ

| |= r Thus, G i=1mi βi βp(i). In turn, | | G r mi βi =  β ( ). n i=1 n p i

mi βi ( ) From the previous result n is an algebraic integer. Also each β j is trace of ρ g t t for g ∈ C j . Since G is finite, g = e for some t, and so ρ(g) = IV . This shows that the eigenvalues of ρ(g), being roots of unity, are algebraic integers. Since sum of algebraic integers are algebraic integers, it follows that βp(i) = traceρ(g) = the sum of the eigenvalues of ρ(g), g ∈ C p(i), is an algebraic integer. Again, since sums and products of algebraic integers are algebraic integers, it follows from the above |G| identity that n is an algebraic integer. We also know that a rational number is an |G| algebraic integer if and only if it is an integer. Thus, n is an integer. This means that n divides | G |. 

Following is a simple application of representation theory.

Proposition 9.3.14 Let G be a finite simple group of order n, and p be a prime n such that the number of conjugacy classes of G is greater than p2 . Then Sylow p-subgroups of G are abelian.

Proof We may assume that p2 divides n. Since G is simple, every nontrivial complex = + 2+ 2+···+ 2 , ,..., representation of G is injective. Now, n 1 n2 n3 nr , where n2 n3 nr are the degrees of nontrivial irreducible representations, and r the class number of G. n ≥ < Since the class number r is greater than p2 , there is i 2 such that ni p. Consider the corresponding irreducible representation ρi .LetP be a Sylow p-subgroup of G. Consider the restriction ρi /P to P. Then ρi /P is a faithful representation of P of degree less than p. The degrees of the irreducible components of ρi /P must divide 358 9 Representation Theory of Finite Groups

t the | P |= p , where t ≥ 2. Since the degree of ρi /P < p,itfollowsthat all irreducible components of ρi /P are of degree 1. Since ρi /P is faithful, P is abelian. 

Proposition 9.3.15 Let χρ be an irreducible character afforded by the irreducible representation ρ of degree n of a finite group G over an algebraically closed field F of characteristic 0. Let g be an element of G with m conjugates such that m and n χρ(g) are co-prime. Then n is an algebraic integer. Proof By the Euclidean algorithm, there exists integers u and v such that um+vn = ( ) · ( ) · ( ) χρ g = · m χρ g + v ( ) m χρ g 1. Hence n u n χρ g . By the previous result, n is an algebraic |G| integer. Also since eigenvalues of ρ(g) are roots of unity (note that (ρ(g) = IV )), it follows that χρ(g) is an algebraic integer. Since sums and products of algebraic ( ) χρ g  integers are algebraic, it follows that n is an algebraic integer.

Proposition 9.3.16 Under the hypothesis of the above proposition, χρ(g) = 0,or else ρ(g) is multiplication by a scalar.

Proof We first show that ρ(g) is multiplication by scalar if and only if all the eigen- values of ρ(g) are same. One way is evident. Suppose that all the eigenvalues of ρ(g) are same. Consider the restriction ρ/ of ρ to the cyclic subgroup generated by g. By the Maschke’s theorem ρ/ is direct sum of irreducible representa- tions of < g >. Since irreducible representations of < g > are one dimensional, it follows that ρ(g) is diagonalizable. Since all the eigenvalues of ρ(g) are same, it is multiplication by a scalar. Now, suppose that ρ(g) is not multiplication by a scalar. Then all eigenvalues of ρ(g) are not same. Let λ1, λ2,...,λn be eigenvalues of ρ(g). Then each λi is a root | |= | ( ) |=|n | < of unity, and so λi 1. Since all of them are not same χρ g i=1λi n. ( ) ( ) χρ g | χρ g | < By the previous proposition, n is an algebraic integer such that n 1. Let σ be an automorphism of a finite Galois extension K of Q containing each ( ) ( χρ g ) λi . Then σ n is also an algebraic integer whose modulus is less than 1. Let  ( ) = ( χρ g ) | | < z σ∈Aut(K ) σ n . Then z is an algebraic integer and z 1. It is clear that σ(z) = z for all σ ∈ Aut(K ). Since K is a Galois extension of Q, it follows that z ∈ Q. The only rational algebraic integers are integers, and therefore z ∈ Z. Since ( ) | | < = χρ g = ( ) =  z 1, it follows that z 0. This shows that n 0, and so χρ g 0. Following is a criteria, due to Burnside, for non-simplicity of a finite group.

Theorem 9.3.17 Let G be a finite group which has a conjugacy class containing pm elements, where p is a prime and m ≥ 1. Then G can not be simple.

Proof If G is abelian, then there is nothing to do. Assume that G is non-abelian. Suppose that G is simple, and g ∈ G be such that there are exactly pm conjugates to g.Letρ be a nontrivial irreducible complex representation of degree n such that p does not divide n. Since G is simple and ρ is nontrivial, it follows that ρ is injective, and so G is isomorphic to ρ(G). Suppose that χρ(g) = 0. Then from the previous 9.3 Characters, Orthogonality Relations 359 result ρ(g) is multiplication by a scalar. This means that ρ(g) is in the center of ρ(G). Since ρ(G) is simple, ρ(g) = IV . Again, since ρ is injective, g = e the identity of G. This is a contradiction to the supposition that g has exactly pm > 1 conjugates. Hence, χρ(g) = 0 whenever the degree of ρ is not divisible by p.Letχreg denote the character of the regular representation ρreg. Then

χreg = χ1 + n2χ2 + ··· + nr χr , where χ1 is the trivial character, and χi is the irreducible character of degree ni .From Proposition 9.3.16 and the previous observation, it follows that χreg(g) ≡ 1(modp). One also observes that the matrix of ρreg(g) with respect to the basis G of F(G) has no nonzero entry in the diagonal. Hence, χreg(g) = 0. This is a contradiction. 

Corollary 9.3.18 (Burnside) Every group of order pr qs is solvable, where p and q are primes.

Proof Assume contrary. Let G be a counter example of the smallest order. Then if H is a nontrivial proper normal subgroup of G, then H and G/H are both solvable. But this will mean that G is solvable. Hence G is simple. Suppose that | G |= pr qs . Clearly, r, s ≥ 1. Let Q be a Sylow q-subgroup of G.Letg = e be a member of the center of Q. Then CG (g) contains Q, and it is not G (for then g will be in the center, a contradiction to the assumption of simplicity of G). This shows that m [G : CG (g)]=p for some m ≥ 1. Again from the previous theorem G cannot be simple, a self-contradiction. 

Remark 9.3.19 The representation theoretic proof of the above result came quite early in the twentieth century. A nonrepresentation theoretic proof of the result was given by Thomson, Bender, and Goldschmidt quite late around 1976. Now, we have a more general result due to Kegel and Wielandt which says that product of any two nilpotent groups is solvable.

Exercises

9.3.1 Show by means of an example that nonequivalent representations over a field of positive characteristic may have same character.

9.3.2 Show by means of an example that the degree of an irreducible representation over a field F need not divide the order of the group.

9.3.3 Find all irreducible representations of Q8 and D8 over C.

9.3.4 Can we realize all complex irreducible representations of a finite group G over Q.

9.3.5 Determine the number of irreducible complex representations of S4, and also their degrees. Find them explicitly. 360 9 Representation Theory of Finite Groups

9.3.6 Determine the number of irreducible complex representations of A4, and also find their degrees.

9.3.7 Let F be an algebraically closed field of characteristic 0, and G be a finite group. Let ρ and η be representations associated to F(G) modules V and W, respec- tively. Show that < χρ, χη > = DimF (HomF(G)(V, W)).

9.3.8 Let G = < a > be a cyclic group of prime order p. Consider the group Q( ) = 1  p−1 i = − algebra G .Letσ p i=0 a and τ a σ. Show that the following holds. (i) σ2 = σ. (ii) Q · σ is a subfield isomorphic to Q. (iii) (1 − σ)2 = 1 − σ). (iv) Q · (1 − σ) is also a subfield isomorphic to Q. (v) τ is a root of the irreducible polynomial X p−1 + X p−2 +···+ X + 1 over Q · (1 − σ). (vi) The subring F of Q(G) generated by Q · (1 − σ) and τ is isomorphic to 2πi Q(e p ). (vii) Every element of Q(G) can be written uniquely as sum of an element of Q · σ and F. (viii) Product of any element of Q · σ with an element of F is 0. 2πi (ix) Q(G) is isomorphic to Q × Q(e p ).

9.3.9 Determine irreducible representations of a cyclic group of order 3 over Q.

9.3.10 Let G = H × K be the direct product of finite groups H and K .Let F be an algebraically closed field of characteristic 0. Let ρ and η be irreducible representations of H and K on vector spaces V and W, respectively. Let ρ⊗η be the representation of G on V ⊗W defined by (ρ⊗η)(h, k)(v⊗w) = ρ(h)(v)⊗η(k)(w). Suppose that μ and ν are also irreducible representations of H and K , respectively. Show that

< χρ⊗η, χμ⊗ν > = 1, if ρ = μ and η = ν, and

< χρ⊗η, χμ⊗ν > = 0, otherwise. Deduce that these are irreducible representations, and every irreducible representation of G is obtained in this manner.

9.3.11 Show that the Grothendieck group of the group algebra C(G) is the character ring Ch(G) of G. Find the Grothendieck groups of C(Zm ), C(V4), C(S3), C(Q8) and C(D8). 9.4 Induced Representations 361

9.4 Induced Representations

Let H be a subgroup of a group G, and ρ be a representation of a group G. Then the restriction of ρ to H denoted by ρH is a representation of H. One may observe, by means of an example, that the restriction of an irreducible representation need not be an irreducible representation(the two-dimensional irreducible representation of S3 when restricted to A3 is not irreducible). Now, we describe the adjoint to the restriction. Let H be a subgroup of G, and ρ be a representation of H on W. Then W is a left F(H)-module. Since F(H) is a sub-algebra of F(G), we see that F(G) is a bi-(F(G), F(H)) module. Hence V = F(G) ⊗F(H) W is a left F(G)-module. This gives us a representation of G which we denote by ρG , and call it the induced representation of G induced by the representation ρ of the subgroup H of G.LetS be a left transversal to H in G. Then F(G) as right F(H)-module can be written as ⊕x∈S xF(H). Thus, V can be written as

V =⊕x∈S x ⊗ W.

Consider an element x ⊗ w, w ∈ W, x ∈ S in one of the direct summands of V . Suppose that gx = yh, h ∈ H and y ∈ S. Then ρG (g)(x ⊗ w) = g(x ⊗ w) = gx ⊗ w = yh ⊗ w = y ⊗ hw = y ⊗ w, where w = hw = ρ(h)(w). Clearly, G DimV = DimW ·[G : H]. Thus, degρ = degρ ·[G : H].If{w1,w2,...,wr } is a basis of W and S ={x1, x2,...,xs }. Then {xi ⊗ w j | 1 ≤ i ≤ s, 1 ≤ j ≤ r} is G a basis of V . The character χρG of G is denoted by χρ , and it is called the induced character. Proposition 9.4.1 Let H be a subgroup of a finite group G and F a field whose characteristic does not divide | G |. Let ρ be a representation of H. Then

G 1  −1 χ (g) =  ∈ χ (xgx ), ρ | H | x G ρ

 = − where χρ χρ on H and 0 on G H.

Proof Let ρ be a representation of H on W, and {w1,w2,...,wr } be a basis of W. Let S ={x1, x2,...xs } be a left transversal to H in G.LetV = F(G) ⊗F(H) W. Then as observed, {xi ⊗ w j } form a basis of V . Now, the basis element xi ⊗ w j will G contribute in the diagonal entry of ρ (g) only if gxi = xi h for some h ∈ H, and then G ρ (g)(xi ⊗ w j ) = xi ⊗ ρ(h)(w j ). Thus, for such a xi , the sum of the contributions G in the diagonal entries of ρ (g) corresponding to the set {xi ⊗ w j , |, 1 ≤ j ≤ r} is ( −1 ) χρ xi gxi to the diagonal entry. This shows that

G s  −1 1  −1 χ (g) =  χ (x gx ) =  ∈ χ (x gx). ρ i=1 ρ i i | H | x G ρ

The last equality holds because χρ is a class function.  362 9 Representation Theory of Finite Groups

Theorem 9.4.2 (Frobenius reciprocity Law) Let H be a subgroup of a finite group G. Let ρ be a representation of H, and η be a representation of G over a field whose characteristic does not divide | G |. Then

< G , > = < , > , χρ χη G χρ χηH H where ηH denotes the restriction of η to H, <, >G denote the inner product in F(G), and <, >H the inner product in F(H).

Proof We have

< G , > = 1  G ( ) ( −1) = χρ χη G |G| g∈G χρ g χη g 1  ( 1   ( −1 ) ( −1 −1 )) = 1   ( ) ( −1) = |G| g∈G |H| x∈G χρ x gx χη x g x |H| y∈G χρ y χη y 1 −1  ∈ ( ) ( ) = < , > . |H| h H χρ h χηH h χρ χηH H 

In practice, to determine irreducible representations of a group G, we look at the representations of some special type of subgroups, induce it to G, and then decompose it into irreducible representations.

Remark 9.4.3 Observe that the Frobenious reciprocity holds even if we replace char- acters by the class functions.

Example 9.4.4 Let H be a subgroup of a finite group G.LetS be a left transversal to H in G.Let1H denote the trivial representation of H over a field F whose characteristic does not divide | G |, and V =⊕x∈S x ⊗ F the right vector space ˜ ={ ⊗ | ∈ } G over F with S x 1 x S as a basis. Then the induced representation 1H G ( )( ⊗ ) = ⊗ = ⊗ = is the representation of G on V given by 1H g x 1 gx 1 yk 1 y ⊗ 1 (k)(1) = y ⊗ 1, where gx = yk, y ∈ S, k ∈ H. The character χ G is given H 1H G by χ G (g) = trace1 (g) =|{x ∈ S | gx = xk forsome k ∈ H}|=|{x ∈ S | 1H H gxH = xH}|for all g ∈ G. Using the Frobenious reciprocity law,

< χ G , χ > = < χ , χ /H > = < χ , χ > = 1. 1H 1G G 1H 1G H 1H 1H H

It follows that the trivial representation 1G of G appears once and only once in the G representation of 1H as the direct sum of irreducible representations. More explicitly, G = ⊕ ( ) ( ) 1H 1G sH G , where sH G is the representation of G with no summands as 1G . We shall call sH (G) as the standard representation of G induced by the subgroup H of G. What is s{e}(G)? Describe the representation sH (G). Further,

1 −1  G ( ) ( ) = . g∈G χ1 g χ1G g 1 | G | H 9.4 Induced Representations 363

In turn,

1  ∈ |{x ∈ S | gxH = xH}|= 1. | G | g G

Now, let θ be a left transitive action of G on X. Then θ induces a representation ρ of G on the vector space FX over F with X as a basis. If H is the isotropy subgroup of the action at a point x1 ∈ X, then X can be realized as a left transversal to H in G G, and the representation ρ is equivalent to 1H . Thus, in this case,

1  ∈ |{x ∈ S | gθx = x}|= 1. | G | g G

More generally, let G be a finite group which acts on a finite set X through a left action θ.LetF be a field whose characteristic does not divide | G |, and V a vector space over F with X as a basis. The action θ of G on X determines a representation ρ of G on V .Let{X1, X2,...,Xr } be the set of distinct orbits of the action. The action of G on X induces transitive actions of G on each Xi . Further, V = FX = FX1 ⊕ FX2 ⊕···⊕ FXr , and ρ induces representations ρi of G on FXi for each i with ρ = ρ1 ⊕ ρ2 ⊕···⊕ρr .LetHi denote the isotropy subgroup of the action ∈ = G at a point xi Xi . Then as observed above, ρi 1H for each i, and the character =  i χρ i χρi . In turn, using the Frobenius reciprocity,

< χ , χ > =  < χ , χ > =  < χ , χ > = r. ρ 1G G i 1Hi 1G G i 1Hi 1Hi Hi

We get

1  ∈ |{x ∈ S | gθx = x}|= r, | G | g G where r is the number of orbits of the action (see Exercise 9.1.11 of Algebra 1). Also G note that 1{e} is the regular action ρreg of G. Example 9.4.5 Let G be a finite group which acts transitively on a finite set X through a left action θ.LetH denote the isotropy subgroup of the action at a point x ∈ X. Then H also acts on X.Letρ denote the representation of G associated to = G the action θ. Then ρ 1H , and the representation of H associated to the induced G / action of H on X is the restriction 1H H. It follows from the discussion in the above example that the number r of H-orbits of the action is given by

1 −1 = < G , > = < G , G > =  ( ) ( ). r χ1 /H χ1H H χ1 χ1 G g∈G χρ g χρ g H H H | G |

Further, χρ(g) = traceρ(g) is the number of fixed points of the action of the element g on X, and which is the same as the number of fixed points of g−1. This shows that −1 χρ(g) = χρ(g ). Thus, 364 9 Representation Theory of Finite Groups

1 2 1 2 r =  ∈ (χ (g)) =  ∈ (|{x ∈ X | gθx = x}|) . | G | g G ρ | G | g G

Let us further assume that G acts doubly transitively on X. Then the isotropy subgroup H of the action at x0 ∈ X acts transitively on X −{x0}, and so the number of orbits of the action of H on X is 2. From the above discussion, it follows that

1 2  ∈ (|{x ∈ X | gθx = x}|) = 2 | G | g G

G = ⊕ ( ) (see Exercise 9.1.23, Algebra 1). Also 1H 1G sH G , where 1G does not appear as a summand in sH (G). Hence

< , > = . χsH (G) χsH (G) G 1

This shows that the standard representation sH (G) of G is irreducible provided that the action of G on X is transitive as well as doubly transitive (For example, Sn or An, n ≥ 4 acts transitively as well as doubly transitively on a set containing n elements).

Let H and K be subgroups of a group G. A subset K gH ={kgh | k ∈ Kandh∈ H} is called a (K, H) double coset.Thesetofall(K, H) double cosets will be denoted by [K, G, H]. What are [{e}, G, H], [H, G, {e}] and [G, G, H]? It is easily observed that [K, G, H] is a partition of G. The set of representatives obtained by choosing one and only one member from each (K, H)-double coset is called a double coset representative system. For convenience, we choose e to represent double coset KH. Let S be a left transversal to H in G. Then [K, G, H]={KsH | x ∈ S}. Further, G and so also K acts on S in a natural manner. It follows that the number of K -orbits of this action is precisely the number |[K, G, H]|of (K, H)-double cosets. Using the arguments in Examples 9.4.4 and 9.4.5, we see that

1 |[K, G, H]|=  ∈ |{x ∈ S | kθx = x}|. | K | k K  Since kθx = x if and only if k ∈ x−1 Hx K ,itfollowsthat  1 x−1 |[K, G, H]|=  ∈ | H K | . | K | x S

We state few results due to Brauer and Artin without proof.

Theorem 9.4.6 (Artin) Every character of G over C is a rational linear combination of characters induced from characters of cyclic subgroups. 

Theorem 9.4.7 (Brauer) Every character of G is integral linear combination of characters induced by one-dimensional characters of subgroups of G.  9.4 Induced Representations 365

Exercises

9.4.1 Show that the regular ρreg representation of G is the same as the induced representation of the trivial representation of the trivial subgroup. 9.4.2 Let H be a subgroup of G of finite index. Let W be the F-vector space with (G/H)l as a basis. Then the action of G on (G/H)l gives rise to a representation of G. Show that this representation is the representation induced by the trivial repre- sentation of H. When can this representation be irreducible?

9.4.3 Describe irreducible components of all representations of S3 induced by the representations of proper subgroups.

9.4.4 Call a group G to be a Frobenius group, if it has a proper subgroup H such −1 that H xHx ={e} for all x ∈ G − H. It is a fact, which can be proved using = − ( −{ }) −1 induced representation theory, that N G g∈G g H e g is a normal subgroup, called the Frobenius kernel, such that G = HN and H N ={e}. Show that a finite group G is a Frobenius group if and only if it is a transitive nonregular permutation group in which no nonidentity element has more than one fixed point.

9.4.5 Show that D4n+2 is a Frobenius group. 9.4.6 Let G be a Frobenius group with Frobenius kernel N, and the Frobenius compliment H. Show that | H | divides | N |−1.

9.4.7 Let H be a cyclic group of order n H . Define a map μH from H to Z by μH (h) = n H if h is a generator of H, and 0 otherwise. Show that μH is a class function. Let νH = φ(n H )χreg − muH , where χreg is the regular character of H (note that νH is zero map on trivial cyclic group). Show that νH is also a class function on H. = − 9.4.8 Let G be a finite group of order m.Letχ χreg χIG . Using the Frobenious reciprocity, show that for any class function η on G,

< , > =  < G , > , mχ η G H∈ νH η G

 =  G where is the set of subgroups of G. Deduce that mχ H∈νH . 9.4.9∗ Let η be a degree 1 character of a cyclic group H. Show that

< νH , η >H = h∈X (1 − η(h)), where X is the set of generators of H. Using the fact that η(h) is an algebraic integer, deduce that < νH , η >H is positive integer. 9.4.10 Using the above exercises, show that χ (defined in Exercise 9.4.8) is positive linear combination of characters induced by degree 1 characters of cyclic subgroups of G. 366 9 Representation Theory of Finite Groups

9.4.11 Let H be a subgroup of a finite group G, and ρ be a representation of H on a finite-dimensional vector space W.LetρG be the induced representation of G.Let K be a subgroup of G and S a set of(K, H)-double coset representative system. −1 Let Hx denote the subgroup x Hx K .Letρx denote the representation of Hx ( ) = ( −1) K defined by ρx a ρ xax and ρx the induced representation of K induced by G G ρx .LetresK (ρ ) denote the restriction of ρ to K . Show that

( G ) =⊕ K . resK ρ x∈Sρx

9.4.12 Refer to Exercise9.4.11 with K = H. Show that ρG is irreducible if and < , ( ) > = only if ρ is irreducible, and χρx χresHx ρ 0. This result is termed as Mackey irreducibility criteria. Chapter 10 Group Extensions and Schur Multiplier

The Chap. 8 was devoted to the field extensions and Galois Theory. This chapter cen- ters around the study of group extension and Schur multiplier. The guiding problem in Group Theory is to classify groups up to isomorphisms. The solution, in general, is beyond the dream to mathematicians. However, mathematicians always roam around this problem. Let us restrict our self to the problem of classifying finite groups up- to isomorphisms. Every finite group has a composition series, and the composition length is an invariant of the group. If

G = G1  G2  ··· Gn  Gn+1 ={e}. is a composition series of G, then Gi/Gi+1 is a finite simple group for each i.As such, the problem of classifying finite groups reduces to the following two problems: 1. Classify all finite simple groups. 2. Given a finite group H and a finite simple group K, to classify all groups G (up-to isomorphism) having H as a normal subgroup such that G/H is isomorphic to K. Finite simple groups have been classified. They are of the following four types: (i) Prime Cyclic groups. (ii) The alternating groups An, n ≥ 5. (iii) Finite simple groups of Lie types such as PSL(n, q). (iv) 26 Sporadic simple groups. The reader is referred to the book “Finite simple groups: An introduction to their Classification, by D. Gorenstein” for their detailed description. The solution to the problem 2 is still beyond the dream to mathematicians, and it is addressed in the theory of extensions of groups and co-homology theory of groups. In this chapter, for convenience, we may frequently use the language of category theory. The reader may refer to the appendix of the Algebra 1 for the purpose.

© Springer Nature Singapore Pte Ltd. 2017 367 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0_10 368 10 Group Extensions and Schur Multiplier

10.1 Schreier Group Extensions

In this section, we shall describe Schreier theory of group extensions. A sequence αn−2 αn−1 αn αn+1 ··· → Gn−1 → Gn → Gn+1 → ··· of groups Gn together with homomorphisms αn is said to be an exact sequence at Gn if image αn−1 = kerαn. The sequence is said to be exact if it is exact at each Gn. A finite term exact sequence of the type

α β 1 −→ H → G → K −→ 1 with 1 representing the trivial group is called a short exact sequence. Thus, to say that the above sequence is exact is to say that α is injective, β is surjective, and image α = ker β. In particular, α(H) is a normal subgroup of G such that β induces an isomorphism from G/α(H) to K. The above short exact sequence is also termed as an extension of H by K. By the abuse of language, we also say that G is an extension of H by K. Example 10.1.1 For any positive integer m, we have the short exact sequence

i ν {0}−→mZ → Z → Zm −→ { 0}, where i is the inclusion map, and ν is the quotient map. Thus, this is an extension of mZ by Zm. We have another extension of mZ by Zm given by the short exact sequence

i1 p2 {0}−→mZ → mZ ⊕ Zm → Zm −→ { 0}, where i1 is the inclusion in the first component, and p2 is the second projection. Note that Z is not isomorphic to mZ ⊕ Zm. Example 10.1.2 We have the exact sequence

i χ {0}−→A3 → S3 →{1, −1}−→{0}, where i is the inclusion map, and χ is the alternating map. Note that A3 is a cyclic group of order 3, and {1, −1} is cyclic group of order 2. We have another extension of a cyclic group of order 3 by a cyclic group of order 2 given by the exact sequence

i ν {0}−→Z3 → Z6 → Z2 −→ { 0}, where Z3 is included as a cyclic group of order 3 in Z6, and ν is the corresponding quotient map. Note that S3 is not isomorphic to Z6. 10.1 Schreier Group Extensions 369

Example 10.1.3 Let G be a group, η the natural map from G to Aut(G) given by −1 η(g) = fg the inner automorphism determined by g (fg(x) = gxg ), and ν the natural quotient map from Aut(G) to Out(G). Then the sequence

i η ν {e}−→Z(G) → G → Aut(G) → Out(G) −→ 1 is an exact sequence.

Example 10.1.4 The sequences

(i ,0) p {0}−→Z →1 Z ⊕ Z →2 Z −→ { 0} and (i ,−i ) p +p {0}−→Z 1→2 Z ⊕ Z 1→ 2 Z −→ { 0}, where (i1, 0) is the first inclusion given by n → (n, 0), p2 is the second projection, (i1, −i2) is the map given by n → (n, −n), and p1 + p2 is the map given by (n, m) → n + m, are short exact sequences. As such, both are extensions of Z by Z. Note that the middle term is also same.

Let EXT denote the category (see appendix of algebra 1 for the notions in category theory) whose objects are short exact sequences

α β 1 −→ H → G → K −→ 1 of groups, and a morphism between two extensions E1 and E2 given by the short exact sequences α1 β1 1 −→ H1 → G1 → K1 −→ 1 and α2 β2 1 −→ H2 → G2 → K2 −→ 1 is a triple (λ, μ, ν), where λ is a homomorphism from H1 to H2, μ is a homomorphism from G1 to G2, and ν is a homomorphism from K1 to K2 such that the following diagram is commutative:

α1 β1 1 - H1 -G1 -K1 - 1

λ μ ν ? α2 ? β2 ? 1 - H2 -G2 -K2 - 1 370 10 Group Extensions and Schur Multiplier

The category EXT is called the category of Schreier extensions of groups. The isomorphisms in this category are called the equivalences of extensions. Theorem 10.1.5 (Five Lemma) Consider the following commutative diagram

α1 α2 α3 α4 G1 - G2 - G3 - G4 - G5

f1 f2 f3 f4 f5 ? β1 ? β2 ? β3 ? β4 ? H1 - H2 - H3 - H4 - H5, where rows are exact sequences of groups, and the vertical maps are homomorphisms. (i) If f1 is surjective and f2,f4 are injective, then f3 is injective. (ii) If f5 is injective and f4,f2 are surjective, then f3 is surjective. (iii) If f1, f2, f4, f5 are isomorphisms, then f3 is also an isomorphism.

Proof (i). The proof is the imitation of the proof of the five lemma (Theorem7.2.3) for modules. However, we repeat those arguments again. Suppose that f1 is surjective, f2 and f4 are injective. We have to show that f3 is injective. Let g3 ∈ G3 such that f3(g3) = e (e will denote the identity of all the groups under consideration). Then e = β3(e) = β3(f3(g3)) = f4(α3(g3)) (from the commutativity of the diagram). Since f4 is injective, α3(g3) = e. Thus, g3 ∈ kerα3 = imageα2 (exactness), and hence there is an elementg2 ∈ G2 such that α2(g2) = g3. Further, e = f3(g3) = f3(α2(g2)) = β2(f2(g2)) (commutativity of the diagram). Thus, f2(g2) ∈ kerβ2 = imageβ1 (exactness). Hence, there exists h1 ∈ H1 such that β1(h1) = f2(g2). Since f1 is surjective, there is an elementg1 ∈ G1 such that f1(g1) = h1.Now,f2(α1(g1)) = β1(f1(g1)) = β1(h1) = f2(g2). Since f2 is injective, α1(g1) = g2. But, already α2(g2) = g3. Hence g3 = α2(α1(g1)) = e (for imageα1 = Kerα2). This shows that f3 is injective. (ii). Suppose that f5 is injective, f2 and f4 are surjective. Let h3 ∈ H3.Wehave to show the existence of a g3 ∈ G3 such that f3(g3) = h3.Now,β3(h3) ∈ H4. Since f4 is surjective, there is an element g4 ∈ G4 such that f4(g4) = β3(h3).Now, f5(α4(g4)) = β4(f4(g4)) (commutativity of the diagram)= β4(β3(h3)) = e (exact- ness). Since f5 is injective, α4(g4) = e. Thus, g4 ∈ kerα4 = imageα3. Hence there is an element g3 ∈ G3 such that α3(g3) = g4. Since β3(f3(g3)) = f4(α3(g3)) = −1 −1 f4(g4) = β3(h3), β3(h3(f3(g3)) ) = e. Thus, h3(f3(g3)) ∈ kerβ3 = imageβ2. −1 Hence there exists h2 ∈ H2 such that β2(h2) = h3(f3(g3)) . Since f2 is surjective, −1 there is an element g2 ∈ G2 such that f2(g2) = h2.Nowh3(f3(g3)) = β2(h2) = β2(f2(g2)) = f3(α2(g2)). This shows that f3(α2(g2)g3) = h3, and so f3 is surjective. (iii). Follows from (i) and (ii). 

Corollary 10.1.6 Let the triple (λ, μ, ν) be a morphism between two extensions E1 and E2. Suppose that λ and ν are isomorphisms. Then μ is also an isomorphism.  10.1 Schreier Group Extensions 371

Corollary 10.1.7 The triple (λ, μ, ν) is an isomorphism in the category EXT if and only if λ and ν are isomorphisms between corresponding groups. 

Now, we shall give another description of an equivalence class in this category. Let α β 1 −→ H → G → K −→ 1 ······ (10.1) be an extension of H by K. Since β is surjective, there is a map t (not necessarily a homomorphism) from K to G with t(e) = e (called a section or a transversal) such that βot = IK (note that we are using the axiom of choice). α(H) = kerβ is a normal subgroup of G. Thus, for each x ∈ K and h ∈ H, t(x)α(h)t(x)−1 belongs ( ) t ( ) to α H . Since α is injective, there is a unique element σx h in H depending on x and h such that ( ) ( ) ( )−1 = ( t ( )) ······ . t x α h t x α σx h (10.2)

t t ( ) = t ( ) This gives us a map σx from H to H given by (10.2). Suppose that σx h1 σx h2 . ( ) ( ) ( )−1 = ( t ( )) = ( t ( )) = ( ) ( ) ( )−1 Then t x α h1 t x α σx h1 α σx h2 t x α h2 t x . Hence ( ) = ( ) = t α h1 α h2 . Since α is injective, h1 h2. This shows that σx is an injec- tive map from H to H. Next, let h ∈ H. Then there is an element a ∈ H such that t(x)−1(α(h))t(x) = α(a).Now,t(x)α(a)t(x)−1 = α(h). By the definition t ( ) = t σx a h. This shows that σx is also a surjective map from H to H.Again, ( t ( )) α σx h1h2 −1 = t(x)α(h1h2)t(x) −1 = t(x)α(h1)α(h2)t(x) −1 −1 = t(x)α(h1)t(x) t(x)α(h2)t(x) = ( t ( )) ( t ( )) α σx h1 α σx h2 = ( t ( ) t ( )) α σx h1 σx h2 . t ( ) = t ( ) t ( ) t Since α is injective, σx h1h2 σx h1 σx h2 . This shows that σx is an automor- phism of H. Thus, given an extension

α β 1 −→ H → G → K −→ 1 of H by K, every section t from K to H determines a map σt from K to Aut(H) given by the Eq. 10.2 (note that σt depends on the chosen section t). Since t(e) = e,

t = ······ σe IH (10.3)

Further, β(t(x)t(y)) = β(t(x))β(t(y)) = xy = β(t(xy)). Hence (t(x)t(y)) (t(xy))−1 belongs to kerβ = imageα. Thus, there is a unique element f t(x, y) ∈ H depending on t, x and y such that

t(x)t(y) = α(f t(x, y))t(xy) ······ . (10.4) 372 10 Group Extensions and Schur Multiplier

Again, as t(e) = e, f t(e, y) = e = f t(x, e) ······ (10.5) for all x, y ∈ K. For x, y, z ∈ K,

(t(x)t(y))t(z) = α(f t(x, y))t(xy)t(z) = α(f t(x, y))α(f t(xy, z))t((xy)z) = α(f t(x, y)f t(xy, z))t((xy)z)

On the other hand,

t(x)(t(y)t(z)) = t(x)α(f t(y, z))t(yz) = t(x)α(f t(y, z))t(x)−1t(x)t(yz) = ( t ( t( , )) ( t( , )) ( ( )) = ( t ( t( , )) t( , )) ( ( )) α σx f y z α f x yz t x yz α σx f y z f x yz t x yz

Equating both the expression for t(x)t(y)t(z), we find that

( t( , ) t( , )) = ( t ( t( , )) t( , )). α f x y f xy z α σx f y z f x yz

Since α is injective,

t( , ) t( , ) = t ( t( , )) t( , ) ······ . f x y f xy z σx f y z f x yz (10.6)

Next, for x, y ∈ K and h ∈ H

(t(x)t(y))α(h) = α(f t(x, y))t(xy)α(h) = ( t( , )) ( ) ( ) ( )−1 ( ) = ( t( , )) ( t ( )) ( ) = α f x y t xy α h t xy t xy α f x y α σxy h t xy ( t( , ) t ( )) ( ). α f x y σxy h t xy

On the other hand,

( )( ( ) ( )) = ( )( ( ) ( ) ( )−1 ( )) = ( )( ( t ( )) ( )) = t x t y α h t x t y α h t y t y t x α σy h t y ( )( ( t ( ))) ( )−1 ( ) ( ) = ( t ( t ( ))) ( t( , )) ( ) = t x α σy h t x t x t y α σx σy h α f x y t xy ( t ( t ( )) t( , )) ( ). α σx σy h f x y t xy

Equating both the expression for t(x)t(y)α(h), and using the injectivity of α, we find that

t( , ) t ( ) = t ( t ( )) t( , ) ··· f x y σxy h σx σy h f x y (10.7)

We are prompted to have the following definition. 10.1 Schreier Group Extensions 373

Definition 10.1.8 A Quadruple (K, H, σ, f ), where K and H are groups, σ amap from K to aut(H), and f a map from K × K to H, is called a factor System if the following conditions hold: (i) σe = IH (For convenience we denote the image of x under the map σ by σx). (ii) f (x, e) = 1 = f (e, y) for all x, y ∈ K, where 1 denotes the identity of H, and e denotes the identity of K. (iii) f (x, y)f (xy, z) = σx(f (y, z))f (x, yz) for all x, y, z ∈ K. (iv) f (x, y)σxy(h) = σx(σy(h))f (x, y) for all x, y ∈ K and h ∈ H. Remark 10.1.9 The condition (iii) can be viewed as the non-abelian version of a 2 co-cycle. The proof of the following proposition follows from the discussions which moti- vated the Definition 10.1.8. Proposition 10.1.10 Every extension E of H by K with a choice of a section t determines a factor system Fac(E, t) ≡ (K, H, σt, f t), where σt and f t are described by the Eqs.(10.2) and (10.4) above. This factor system is termed as a factor system associated to the extension E.  Conversely, we have the following proposition. Proposition 10.1.11 Let (K, H, σ, f ) be a factor system. Then there exists an exten- sion E of H by K, and a section t of E such that Fac(E, t) = (K, H, σ, f ). Proof Let G = H × K. Define a product · in G by

(a, x) · (b, y) = (aσx(b)f (x, y), xy).

Using (i) and (ii) in the definition, (1, e) · (b, y) = (1σe(b)f (e, y), ey) = (b, y), and also (a, x) · (1, e) = (aσx(1)f (x, e), xe) = (a, x). This shows that (1, e) is the identity of G.Let(a, x) ∈ G. To find the inverse of (a, x),wehavetofinda(b, y) −1 so that (1, e) = (a, x) · (b, y) = (aσx(b)f (x, y), xy). Obviously, then, y = x , −1 and b should be such that aσx(b)f (x, x ) = 1. Since σx is a bijective map on H, = −1( −1 ( , −1)−1) ( −1( −1 ( , −1)−1), −1) b σx a f x x . More precisely, σx a f x x x is the inverse of (a, x). Finally, to ensure that G is a group, we have to establish the associativity of ·.Now,

((a, x) · (b, y)) · (c, z) = (aσx(b)f (x, y), xy) · (c, z) = (aσx(b)f (x, y)σxy(c)f (xy, z), (xy)z).

On the other hand,

(a, x) · ((b, y) · (c, z)) = (a, x) · (bσy(c)f (y, z), yz) = (aσx(bσy(c)f (y, z))f (x, yz), x(yz)).

Since the multiplication in K is already associative, to ensure the associativity in G, we need to show that 374 10 Group Extensions and Schur Multiplier

aσx(b)f (x, y)σxy(c)f (xy, z) = aσx(bσy(c)f (y, z))f (x, yz) for all a, x, b, y, c, z. Since σx is an automorphism, we need to verify that

f (x, y)σxy(c)f (xy, z) = σx(σy(c))σx(f (y, z))f (x, yz)

Using the (iii) part of the Definition 10.1.8, the RHS is transformed to

σx(σy(c))f (x, y)f (xy, z).

Hence we need to verify that

f (x, y)σxy(c) = σx(σy(c))f (x, y)

This is true because of the part (iv) of the Definition 10.1.8. Thus, G is a group with respect to the product · defined above. The map α from H to G defined by α(h) = (h, e), and the map β from G to K defined by β(a, x) = x are easily seen to be homomorphisms. In turn, we get the extension

α β E ≡ 1 −→ H → G → K −→ 1 of H by K. Consider the section t of E given by t(x) = (1, x). Then t(x) · t(y) = (1, x) · (1, y) = (f (x, y), xy) = (f (x, y), e) · (1, xy) = α(f (x, y)) · t(xy).This shows that f t(x, y) = f (x, y) for all x, y ∈ K. Thus, f t = f .Again,t(x) · α(h) · −1 −1 −1 t(x) = (1, x) · (h, e) · (1, x) = (σx(h)f (x, e), x) · ((σx) ((f (x, −1 −1 −1 −1 −1 −1 x )) ), x ) = (σx(h)(f (x, x )) f (x, x ), e) = (σx(h), e) = α(σx(h)) (note ( , )−1 = (( )−1(( ( , −1))−1), −1) t ( ) = ( ) that 1 x σx f x x x ). This shows that σx h σx h for all x ∈ K and h ∈ H. It follows that σt = σ and f t = f . Thus, Fac(E, t) is the given factor system (K, H, σ, f ). 

Now, we describe the category EXT of extensions as the category of Factor systems. Let (λ, μ, ν) be a morphism between the extensions E1 and E2 given by the following commutative diagram:

α1 β1 1 - H1 -G1 -K1 - 1

λ μ ν ? α2 ? β2 ? 1 - H2 -G2 -K2 - 1

t1 t1 Let t1 be a section of E1, and t2 be a section of E2.Let(K1, H1, σ , f ) and t2 t2 (K2, H2, σ , f ) be the corresponding factor systems. Let x ∈ K1. Then μ(t1(x)) ∈ G2 and β2(μ(t1(x))) = ν(β1(t1(x))) = ν(x) = β2(t2(ν(x))). Thus, μ(t1(x))(t2 10.1 Schreier Group Extensions 375

−1 (ν(x))) ∈ kerβ2 = imageα2. In turn, we have a unique g(x) ∈ H2 such that −1 α2(g(x)) = μ(t1(x))(t2(ν(x))) . Equivalently,

μ(t1(x)) = α2(g(x))t2(ν(x)) ······ . (10.8)

Since t1(e) = e = t2(e), it follows that

g(e) = 1 ······ (10.9)

Now, using the commutativity of the diagram and the Eq. 10.8,wehave

t1 t1 μ(t1(x)t1(y)) = μ(α1(f (x, y))t1(xy)) = α2(λ(f (x, y))μ(t1(xy)) = t1 t1 α2(λ(f (x, y))α2(g(xy))t2(ν(xy)) = α2(λ(f (x, y))α2(g(xy))t2(ν(x)ν(y))

On the other hand, since μ is a homomorphism, using again the Eq. 10.8,

μ(t1(x)t1(y)) = μ(t1(x))μ(t1(y)) = α2(g(x))t2(ν(x))α2(g(y))t2(ν(y)) = −1 α2(g(x))t2(ν(x))α2(g(y))(t2(ν(x))) t2(ν(x))t2(ν(y)) = ( ( )) ( t2 ( ( ))) ( ( )) ( ( )) = α2 g x α2 σν(x) g y t2 ν x t2 ν y t ( ( )) ( 2 ( ( ))) ( t2 ( ( ), ( ))) ( ( ) ( )). α2 g x α2 σν(x) g y α2 f ν x ν y t2 ν x ν y

Equating the two expressions for μ(t1(x)t1(y)), and observing that α2 is an injective homomorphism, we obtain the following identity:

( t1 ( , )) ( ) = ( ) t2 ( ( )) t2 ( ( ), ( )) ··· λ f x y g xy g x σν(x) g y f ν x ν y (10.10)

Further, by Eq. 10.2,

− ( ) ( )( ( )) 1 = ( t1 ( )). t1 x α1 h t1 x α1 σx h

Applying the homomorphism μ on the above equation, and using the commutativity of the diagram, we get

− ( ( )) ( ( ))( ( ( ))) 1 = ( ( t1 ( )). μ t1 x α2 λ h μ t1 x α2 λ σx h

Using the Eq. 10.8,

− − ( ( )) ( ( )) ( ( ))( ( ( ))) 1 ( ( ) 1) = ( ( t1 ( )). α2 g x t2 ν x α2 λ h t2 ν x α2 g x α2 λ σx h

Using the Eq. 10.2 for the extension E2, we get

− ( ( )) ( t2 ( ( ))) ( ( ) 1) = ( ( t1 ( )). α2 g x α2 σν(x) λ h α2 g x α2 λ σx h 376 10 Group Extensions and Schur Multiplier

Since α2 is an injective homomorphism,

− ( ) t2 ( ( )) ( ) 1 = ( t1 ( )) ··· g x σν(x) λ h g x λ σx h (10.11)

Thus, a morphism (λ, μ, ν) between extensions E1 and E2 together with choices of sections t1 and t2 of the corresponding extensions, induces a map g from K1 to H2 such that the triple (ν, g, λ) satisfies (10.9), (10.10) and (10.11), and it may be t1 t1 t2 t2 viewed as a morphism from the factor system (K1, H1, σ , f ) to (K2, H2, σ , f ). Let (λ1, μ1, ν1) be a morphism from an extension

α1 β1 E1 ≡ 1 −→ H1 → G1 → K1 −→ 1 to an extension α2 β2 E2 ≡ 1 −→ H2 → G2 → K2 −→ 1, and (λ2, μ2, ν2) be that from the extension E2 to

α3 β3 E3 ≡ 1 −→ H3 → G3 → K3 −→ 1.

Let t1, t2 and t3 be corresponding choice of the sections. Then as in (10.8)

μ1(t1(x)) = α2(g1(x))t2(ν1(x)). and μ2(t2(u)) = α3(g2(u))t3(ν2(u)), where g1 is the uniquely determined map from K1 to H2, and g2 is that from K2 to H3. In turn,

μ2(μ1(t1(x))) = μ2(α2(g1(x)))μ2(t2(ν1(x))) = μ2(α2(g1(x)))α3(g2(ν1(x)))t3(ν2(ν1(x))) = α3(λ2(g1(x)))α3(g2(ν1(x)))t3(ν2(ν1(x))) = α3(g3(x))t3(ν2(ν1(x))).

It follows that the composition (λ2 ◦ λ1, μ2 ◦ μ1, ν2 ◦ ν1) induces the triple (ν2 ◦ ν1, g3, λ2 ◦ λ1), where g3(x) = λ2(g1(x))g2(ν1(x)) for each x ∈ K1. Prompted by the above discussion, we introduce the category FACS whose objects 1 1 2 2 are factor systems, and a morphism from (K1, H1, σ , f ) to (K2, H2, σ , f ) is a triple (ν, g, λ), where ν is a homomorphism from K1 to K2, λ a homomorphism from H1 to H2, and g a map from K1 to H2 such that (i) g(e) = 1 ( 1( , )) ( ) = ( ) 2 ( ( )) 2( ( ), ( )) (ii) λ f x y g xy g x σν(x) g y f ν x ν y ( ) 2 ( ( )) ( )−1 = ( 1( )) (iii) g x σν(x) λ h g x λ σx h 1 1 2 2 The composition of morphisms (ν1, g1, λ1) from (K1, H1, σ , f ) to (K2, H2, σ , f ) 2 2 3 3 and the morphism (ν2, g2, λ2) from (K2, H2, σ , f ) to (K3, H3, σ , f ) is the triple 10.1 Schreier Group Extensions 377

(ν2 ◦ ν1, g3, λ2 ◦ λ1), where g3 is given by g3(x) = g2(ν1(x))λ2(g1(x)) for each x ∈ K1. The following theorem is the consequence of the above discussion.

Theorem 10.1.12 Let tE be a choice of a section of the extension E of a group by another group (such a choice function t exists because of axiom of choice). Then the association Fac which associates to each extension E the factor system Fac(E, tE) is an equivalence between the category EXT of extensions to the category FACS of factor systems.  Fix a pair H and K of groups. We try to describe the equivalence classes of extensions of H by K.LetG be an extension of H by K given by the exact sequence

α β E ≡ 1 −→ H → G → K −→ 1.

Let (λ, μ, ν) be an equivalence from this extension to an other extension G of H by K given by the exact sequence

α β E ≡ 1 −→ H → G → K −→ 1

Then it follows that G is also an extension of H by K given by the exact sequence

α ◦λ ν−1◦β E ≡ 1 −→ H → G → K −→ 1

such that (IH , μ, IK ) is an equivalence from E to E .Also,(λ, IG , ν) is an equivalence from E to E . As such, there is no loss of generality in restricting the concept of equivalence on the class E(H, K) of all extensions of H by K by saying that two extensions

α1 β1 E1 ≡ 1 −→ H → G1 → K −→ 1. and α2 β2 E2 ≡ 1 −→ H → G2 → K −→ 1. in E(H, K) are equivalent if there is an isomorphism φ from G1 to G2 such that the diagram

α1 β1 1 - H -G1 -K - 1

IH φ IK ? α2 ? β2 ? 1 - H -G2 -K - 1 378 10 Group Extensions and Schur Multiplier is commutative. Indeed, for any extension E in EXT which is equivalent to a member E of E(H, K), there is a member E of E(H, K) such that E is equivalent to E in the category EXT and E in E(H, K) is equivalent E in the sense described above. Let α β E ≡ 1 −→ H → G → K −→ 1. be an extension of H by K.Lett be a section of the extension. Then t induces a map t ( ) t σ from K to Aut H as described by the Eq. 10.2. In turn, it induces a map E from ( ) t ( ) = t ( ) K to Out H given by E x σxInn H .Let

α β E ≡ 1 −→ H → G → K −→ 1. be another equivalent extension in E(H, K).Let(IH ,,IK ) be an equivalence from

E to E , and t beasectionofE . It induces an equivalence (IK , g, IH ) from the factor system (K, H, σt, f t) to the factor system (K, H, σt , f t ), where g is a map from K to H.From(10.10) and (10.11), we have

t1 ( , ) ( ) = ( ) t2 ( ( )) t2 ( , ) ······ f x y g xy g x σx g y f x y (10.12) and ( ) t (( )) ( )−1 = t ( )) ······ g x σx h g x σx h (10.13)

, ∈ ∈ t t for all x y K and h H. Thus, σx and σx differ by an inner automorphism of H. t ( ) = t ( ) = t ( ) = t ( ) ∈ Hence E x σxInn H σx Inn H E x for each x K. This shows that t ( ) the map E from K to Out H is independent of a representative E of the equivalence class, and also independent of the choice of a section t. Thus, without any ambiguity t  [ ] we can denote E by [E], where E denote the equivalence class determined by t t ◦ t the extension E. Further, from (10.7), it follows that σxy and σx σy differ from an inner automorphism of H. Hence [E](xy) = [E](x)[E](y) for all x, y ∈ K. Definition 10.1.13 A homomorphism from K to Out(H) = Aut(H)/Inn(H) is called a coupling or an abstract kernel of K to H. We have established the following theorem: Theorem 10.1.14 Let Ext (H, K) denote the set of all equivalence classes of exten- sions in E(H, K). Then there is a natural map  from Ext (H, K) to the set Hom(K, Out(H)) of all abstract kernels (couplings) of K to H given by ([E]) = [E] as defined above.  The map  described in the above theorem is called the abstract kernel map or the coupling map. Example 10.1.15 The abstract kernel map  need not be injective. In other words, two non-equivalent extensions H by K may induce same abstract kernels of K to H. For example, consider the extensions 10.1 Schreier Group Extensions 379

(i1,0) p2 E1 ≡{0}−→Z → Z ⊕ Z3 → Z3 −→ { 0} and m3 ν E1 ≡{0}−→Z → Z → Z3 −→ { 0} of the group Z by Z3, where m3 is the multiplication by 3, and ν is the natural quotient map. These two extensions E1 and E2 are not equivalent as Z ⊕ Z3 and Z are not isomorphic. Since Out(Z) is a group of order 2 and Z3 is a group of order 3, the only abstract kernel of Z3 to Z is the trivial map. As such, [E1] =[E2], where as ([E1]) = ([E2]). We shall see that the map  may not be surjective also. Indeed, we have two basic problems in the theory of extensions of groups. 1. To determine the abstract kernels η ∈ Hom(K, Out(H)) which are realizable from an extension E of H by K in the sense that ([E]) = η. 2. Given an abstract kernel η ∈ Hom(K, Out(H)) which is realizable from an extension, to determine and classify all extensions E up to equivalence such that (E) = η. Such abstract kernels are call couplings. Theorem 10.1.16 Let H be a group with Z(H) ={e}. Then the map  from Ext (H, K) to the set Hom(K, Out(H)) is bijective. More explicitly, every abstract kernel η of K to H determines and is determined uniquely by an equivalence class of extensions in Ext (H, K).

Proof Let η ∈ Hom(K, Out(H)) be an abstract kernel of K to H. Consider the Pull Back Diagram p2 G - K

p1 η ? ν ? Aut(H) -Out(H)

More explicitly, G is the subgroup of the direct product Aut(H) × K given by G = {(σ, x) | σ ∈ Aut(H) and σInnH = η(x)}, p1 the first projection and p2 the second projection. Clearly, p2 is a surjective homomorphism from G to K.Thekerp2 = {(σ, e) | σInnH = η(e) = Inn(H)}=Inn(H) ×{e}. Since the center Z(H) of H is trivial, the map α from H to G defined by α(h) = (ih, e) (ih denotes the inner automorphism determined by h) is an injective homomorphism with imageα = kerp2. This gives an extension E of H by K given by the exact sequence

α p E ≡ 1 −→ H → G →2 K −→ 1.

Using the axiom of choice, there is a map ξ from K to Aut(H) such that ξ(x)Inn (H) = η(x). This determines a section t of the extension E given by t(x) = (ξ(x), x). 380 10 Group Extensions and Schur Multiplier

Recall that the abstract kernel (E) associated to the extension E is given by ( )( ) = t ( ) t E x σxInn H , where σx is given by (see Eq. 10.2)

( ) ( ) ( )−1 = ( t ( )). t x α h t x α σx h

Now, ( t ( )) = ( ) ( ) ( )−1 = ( ( ), )( , )( ( ), )−1 = α σx h t x α h t x ξ x x ih e ξ x x −1 (ξ(x)ih(ξ(x)) , e) = (iξ(x)(h), e) = α(ξ(x)(h)).

t ( ) = ( )( ) ∈ t = ( ) Thus, σx h ξ x h for all h H. In turn, σx ξ x . By the definition, ( )( ) = t ( ) = ( ) ( ) = ( ) ∈ E x σxInn H ξ x Inn H η x for all x K. This shows that (E) = η, and so  is surjective. To prove the injectivity, suppose that (E1) = (E2), where E1 and E2 are extensions of H by K given by

α1 β1 E1 ≡ 1 −→ H → G1 → K −→ 1. and α2 β2 E2 ≡ 1 −→ H → G2 → K −→ 1.

t1 t1 Let t1 beasectionofE1 with corresponding factor system (K, H, σ , f ), and t2 t2 t2 beasectionofE2 with the corresponding factor system (K, H, σ , f ). Under our assumption

t1 ( ) = ( )( ) = ( )( ) = t2 ( ) σx Inn H E1 x E2 x σx Inn H

∈ t1 t2 for all x K, where σx and σx are given by the equations

− ( ) ( ) ( ) 1 = ( t1 ( )). t1 x α1 h t1 x α1 σx h and − ( ) ( ) ( ) 1 = ( t2 ( )). t2 x α2 h t2 x α2 σx h

t1 ( ) = t2 ( ) ∈ Since σx Inn H σx Inn H for all x K, and since H is center less, there is a unique map g from K to H such that

t1 = ◦ t2 ······ σx ig(x) σx (10.14) for all x ∈ K.Againby(10.7), we have,

t1 ( , ) t1 ( ) = t1 ( t1 ( )) t1 ( , ) ······ f x y σxy h σx σy h f x y (10.15) and t2 ( , ) t2 ( ) = t2 ( t2 ( )) t2 ( , ) ······ f x y σxy h σx σy h f x y (10.16) 10.1 Schreier Group Extensions 381

Using (10.14), (10.15), and (10.16), we get

− t1 ( , ) ( ) t2 ( ) ( ) 1 f x y g xy σxy h g xy = t1 ( , ) t1 ( ) f x y σxy h = t1 ( t1 ( )) t1 ( , ) σx σy h f x y − = ( ) t2 ( t1 ( )) ( ) 1 t1 ( , ) g x σx σy h g x f x y − − = ( ) t2 ( ( ) t2 ( ) ( ) 1) ( ) 1 t1 ( , ) g x σx g y σy h g y g x f x y − − = ( ) t2 ( ( )) t2 ( t2 ( )) t2 ( ( ) 1) ( ) 1 t1 ( , ) ······ g x σx g y σx σy h σx g y g x f x y (10.17)

In turn, − − − − − t2 ( ( ) 1) ( ) 1 t1 ( , ) ( ) t2 ( )( t2 ( ( ) 1) ( ) 1 t1 ( , ) ( )) 1 σx g y g x f x y g xy σxy h σx g y g x f x y g xy = t2 ( t2 ( )) σx σy h − = t2 ( , ) t2 ( )( t2 ( , )) 1 f x y σxy h f x y . ∈ t2 for all h H. Since σxy is a bijective map on H, and H is center less,

− − t2 ( , ) = t2 ( ( ) 1) ( ) 1 t1 ( , ) ( ), f x y σx g y g x f x y g xy or equivalently,

t1 ( , ) ( ) = ( ) t2 ( ( )) t2 ( , ) ······ . f x y g xy g x σx g y f x y (10.18)

The Eqs. (10.14) and (10.18) tells that (IH , g, IK ) is an isomorphism from the factor system (K, H, σt1 , f t1 ) to (K, H, σt2 , f t2 ) in the category FACS .From Theorem 10.1.12, E1 is equivalent to E2. 

Indeed, the proof of the above theorem establishes the following more general result.

Proposition 10.1.17 Let E1 and E2 be extensions of H by K with (E1) = (E2). / ( ) Then the following induced extensions E1 and E2 of H Z H by K given below are equivalent:

≡ −→ / ( ) →α1 / ( ( )) →β1 −→ , E1 1 H Z H G1 α1 Z H K 1

≡ −→ / ( ) →α2 / ( ( )) →β2 −→ . E2 1 H Z H G2 α2 Z H K 1

Split Extensions, Semi-direct Products

Definition 10.1.18 An extension

α β E ≡ 1 −→ H → G → K −→ 1. 382 10 Group Extensions and Schur Multiplier of H by K is called n split extension, if there is a section t which is a homomorphism. Such a section t is called a splitting of the extension. The corresponding factor system (K, H, σt, f t) is such that f t is trivial in the sense that f t(x, y) = 1 for all x, y ∈ K, and then σt is a homomorphism from K to aut(H).

The Example 10.1.2 (both the extensions), and the Example10.1.4 (both the exten- sions) are split extensions, whereas the extension

i ν {0}−→mZ → Z → Zm −→ { 0} is not a split extension as the only homomorphism from Zm to Z is the zero homo- morphism. Recall that a group G is said to be the semi-direct product of a normal subgroup H of G with a subgroup K of G if (i) G = HK, and (ii) H K ={e}. Symbolically, we write it as H K. Proposition 10.1.19 Let

α β E ≡ 1 −→ H → G → K −→ 1. be a split extension of H by K with a splitting t. Then G = α(H) t(K) is the semi-direct product of α(H) with t(K). Conversely, if G = H K, then there is a natural projection p from G to K such that

i p 1 −→ H → G → K −→ 1. is a split extension of H by K.

Proof Suppose that E is a split extension with splitting t. Clearly, α(H) = kerβ is a normal subgroup of G.Letg ∈ G. Then β(gt(β(g−1))) = β(g)β(t(β(g−1))) = β(g)β(g−1) = e. This shows that gt(β(g−1)) ∈ kerβ = imageα. Hence there is a unique h ∈ H such that gt(β(g−1)) = α(h). In turn, g = α(h)t(β(g)). Thus, G = α(H)t(K). Since t is an injective homomorphism, t(K) is a subgroup of G isomomorphic to K.Letα(h) ∈ α(H) t(K), h ∈ H. There is a k ∈ K such that α(h) = t(k). But, then e = β(α(h)) = β(t(k)) =k. Since t is a homomorphism, e = t(e) = t(k) = α(h). This shows that α(H) t(K) ={e}. Conversely, suppose that G = H K. Every element g ∈ G is expressible as = ∈ ∈ = −1 = g hk, where h H and k K. Suppose that h1k1 h2k2. Then h2 h1 −1 ∈ ={} = = k2k1 H K e . This implies that h1 h2 and k1 k2. Thus, every element g ∈ G is uniquely expressible as g = hk, where h ∈ H and k ∈ K.This gives us a surjective map p from G to K given by p(g) = k, where g = hk.Also ( )( ) = −1 −1 ∈ ∈ h1k1 h2k2 h1k1h2k1 k1k2, where h1k1h2k1 H and k1k2 K.Itfollowsthat p is a surjective homomorphism from G to K with kernel H. We get the extension 10.1 Schreier Group Extensions 383

i p 1 −→ H → G → K −→ 1. of H by K with the inclusion map from K to G as splitting. 

Following are some applications of the above results. Definition 10.1.20 A group H is called a complete group if the homomorphism h → ih (ih being the inner automorphism determined by h) is an isomorphism from H to aut(H). More explicitly, the center Z(H) of H is trivial, and all automorphisms of H are inner.

Example 10.1.21 There are many complete groups. The symmetric group Sn, n = 6, the group aut(G) of automorphisms of a non-abelian simple group G (or more generally, automorphism groups of direct products of non-abelian simple groups) are all complete groups. Let H be a cyclic group of odd order. Consider the symmetric group Sym(H) on the set H.Letρ(H) denote the image of the Cayley representation of H in Sym(H). Then the subgroup G of Sym(H) generated by ρ(H) aut(H) is also a complete group. We shall give proof of some of them.

Proposition 10.1.22 Let H be a complete group. Then any extension of H by K is equivalent to direct product extension

i p 1 −→ H →1 H × K →2 K −→ 1, where i1 is the inclusion h → (h, 0), and p2 is the second projection. More explicitly, if G is any group containing H as a normal subgroup, then there is a subgroup K of G such that G is direct product of H and K.

Proof Since H is complete, the center Z(H) of H is trivial, and also Out(H) is trivial. Thus, for any group K there is only one abstract kernel of K to H which is the trivial homomorphism from K to Out(H). It follows from the Theorem 10.1.16 that there is only one extension of H by K (up to equivalence) which, of course, is the one given in the proposition. 

Conversely, we have the following result due to Baer. Proposition 10.1.23 (Baer) Let H be a center less group. Suppose that for any group K, there is only one extension (up to equivalence) of H by K, then H is a complete group.

Proof Let H be a center-less group such that all extensions of H are direct product extensions. Let α ∈ Aut(H). We wish to show that α is an inner automorphism of H. Consider the symmetric group Sym(H) of permutations on the set H.Forh ∈ H,let Lh denote the left multiplication by h on H.Themapχ from H to Sym(H) defined ( ) = ( ) by χ h Lh is an injective homomorphism from H to Sym H .LetG denote the −1 subgroup of Sym(H) generated by χ(H) {α}.LetLh ∈ χ(H). Then α ◦ Lh ◦ α = 384 10 Group Extensions and Schur Multiplier

Lα(h) ∈ χ(H). This shows that the subgroup χ(H) is a normal subgroup of G, and we have an extension

χ ν 1 −→ H → G → G/χ(H) −→ 1.

By our assumption, this is a direct product extension. Hence there exists a subgroup K of G isomorphic to G/χ(H) such that G = χ(H) ⊕ K. As such, elements of χ(H) will commute with elements of K, and every element of G is uniquely expressible as product of an element of χ(H) and an element of K. Suppose that α = Lxk, x ∈ H −1 −1 −1 and k ∈ K.Now,Lα(h) = αLhα = LxkLhk (Lx) = LxLhLx−1 = Lxhx−1 . Since χ is injective, α(h) = xhx−1 for all h ∈ H. This shows that α is the inner automorphism determined by x. 

Following is an other characterization of a complete group. Proposition 10.1.24 A group H is a complete group if and only if it has a charac- teristic subgroup K with trivial centralizer in H such that all the automorphisms of K are induced by inner automorphisms of H.

Proof Suppose that H is complete. Then we can take K = H, which is a character- istic subgroup of H, and since H is complete, every automorphism of H is an inner automorphism of H.Also{e}=Z(H) = CH (H). Conversely, let H be a group with a characteristic subgroup K whose centralizer CH (K) in H is trivial, and all of whose automorphisms are those induced by inner automorphisms of H. Since CH (K) is trivial, H is center less. It is sufficient (Proposition10.1.23), therefore, to show that for any group G containing H as a normal subgroup, there is a subgroup L of G such that G is direct product of H and L.LetG be a group containing H as a normal subgroup. Then K, being a characteristic subgroup of H, is normal in G.Letg be any element of G. Then the inner automorphism ig restricted to K is an automorphism of K. By our hypothesis, there is an element h ∈ H such that ih and ig agree on K.This −1 ∈ ( ) = ( ) means that h g CG K . Thus, G HCG K. Since K is a normal subgroup of G, CG (K) is a normal subgroup of G.AlsoH CG (K) = CH (K) ={e}.This shows that G is direct product of H and CG (K). 

Corollary 10.1.25 Let G be a non-abelian simple group. Then Aut(G) is a complete group.

Proof Since G is non-abelian simple, Inn(G) is isomorphic to G, and so Inn(G) is simple. In the light of the above proposition, it is sufficient to show that Inn(G) is a characteristic subgroup of Aut(G) whose centralizer in Aut(G) is triv- ial, and all automorphisms of Inn(G) are induced by the inner automorphisms of Aut(G). Inn(G) is already seen to be a normal subgroup of Aut(G).Letα ∈ −1 CAut(G)(Inn(G)). Then α ◦ ig = ig ◦ α for all g ∈ G. Hence α(g)α(x)α(g) = gα(x)g−1 for all x, g ∈ G. This shows that g−1α(g) ∈ Z(G) for all g ∈ G. Since G is center less, α(g) = g for all g ∈ G. This means that α = IG . Thus, CAut(G)(Inn(G)) ={IG }. Next, we show that Inn(G) is a characteristic subgroup 10.1 Schreier Group Extensions 385 of Aut(G).Letα ∈ Aut(Aut(G)). Since Inn(G) is a normal subgroup of Aut(G), α(Inn(G)) is also a normal subgroup of Aut(G), and so α(Inn(G)) Inn(G) is a normal subgroup of Inn(G), and also of α(Inn(G)). Since Inn(G) and α(Inn(G)) ( ( )) ( ) ={ } ( ( )) ( ) = ( ) = are simple, α Inn G Inn G IG or else α Inn G Inn G Inn G α(Inn(G)). Suppose that α(Inn(G)) Inn(G) ={IG }. Then the elements of α(Inn(G)) commute with elements of Inn(G). This is a contradiction to the fact that CAut(G)(Inn(G)) ={IG }. Hence α(Inn(G)) Inn(G) = Inn(G) = α(Inn(G)). This shows that Inn(G) is a characteristic subgroup of Aut(G). Let χ ∈ Aut(Inn(G)). Then there is a bijective map μ from G to G such ( ) = ∈ = ( ) = ( ) = that χ ig iμ(g) for all g G. Further, iμ(g1g2) χ ig1g2 χ ig1 ig2 ( ) ( ) = ( ) = ( ) ( ) , ∈ χ ig1 χ ig2 iμ(g1)iμ(g2). This shows that μ g1g2 μ g1 μ g2 for all g1 g2 G. −1 Thus, μ ∈ Aut(G).Now,(μigμ )(x) = iμ(g)(x) = χ(ig)(x) for all g, x ∈ G.This −1 shows that χ(ig) = μigμ for all g ∈ G. Thus, χ is the automorphism of Inn(G) induced by an inner automorphism of Aut(G).  We described the extensions of groups with trivial centers. Let us consider the other extreme case when center of the group is the group itself. More explicitly, we describe the extensions of abelian groups. Let H be an abelian group. We shall adopt the additive notation + for the binary operation of H. The group Out(H) is naturally identified with Aut(H). An abstract kernel of K to H is a homomorphism σ from K to Aut(H). We discuss the following problem: Problem Let H be an abelian group. Classify all extensions of H by K (up to equivalence) with the given abstract kenel σ.

Let us denote by EXTσ(H, K) the set of equivalence classes of extensions of an abelian group H by a group K with the given abstract kernel σ. We have at least one such extension, viz., the semi-direct product extension of H by K associated to the homomorphism σ. Clearly, the factor system associated to the split extension is (K, H, σ, f0), where f0 is trivial in the sense that f0(x, y) = 0 for all x, y ∈ K.Let 2( , ) ( , , , ) Zσ K H denote the set of factor systems K H σ f associated to the abstract ker- 2( , ) nel σ. Indeed, a factor system in Zσ K H determines, and it is uniquely determined by the corresponding map f which satisfies the condition

f (x, y) + f (xy, z) = σx(f (y, z)) + f (x, yz) for all x, y, z ∈ K. By the abuse of language, we shall call such a f as a factor system 2( , ) ( , , ) in Zσ K H . f is also called a 2-co-cycle associated to K H σ . The justification 2( , ) for the notation Zσ K H , and the 2- co-cycle terminology will follow later. Suppose 2( , ) that f and f are two members of Zσ K H . Then (f + f )(x, y) + (f + f )(xy, z) = f (x, y) + f (xy, z) + f (x, y) + f (xy, z)

= σx(f (y, z)) + f (x, yz) + σx(f (y, z)) + f (x, yz)

= σx((f + f )(y, z)) + (f + f )(x, yz). This shows that f + f is also a factor system associated to the abstract kernel σ. − ∈ 2( , ) ∈ 2( , ) 2( , ) Also f Zσ K H for all f Zσ H K . Thus, Zσ K H is an abelian group with 386 10 Group Extensions and Schur Multiplier

2 ( , ) respect to the operation defined above. f0 is the identity of the group. Let Bσ K H denote the set of factor systems which are equivalent to the trivial factor system f0. ∈ 2 ( , ) More precisely, from (10.12), f Bσ K H if and only if there is a map g from K to H with g(e) = 0 such that f (x, y) = σx(g(y)) − g(xy) + g(x) (see Eq. 10.12 written additively). Note that for any map g with g(e) = 0, f defined by f (x, y) = ( ( )) − ( ) + ( ) 2 ( , ) σx g y g xy g x is a factor system. The members of Bσ K H are called the ( , , ) 2( , )/ 2 ( , ) 2- co-boundaries associated to K H σ . The quotient group Zσ K H Bσ K H is called the second co - homology group associated to (K, H, σ), and it is denoted 2( , ) by Hσ K H . Theorem 10.1.26 Let H be an abelian group, and K be a group. Let σ an abstract kernel of K to H. Then, there is a natural bijective correspondence  between the set EXTσ(H, K) of equivalence classes of extensions of H by K with the given abstract 2( , ) kernel σ and the second co-homology group Hσ K H . Proof Let E be an extension of H by K with the abstract kernel σ.Lett be a section of the extension, and (K, H, σ, f t) be the corresponding factor system. Then f t ∈ Z2(K, H).LetE be another equivalent extension of H by K, and t beasectionof σ the extension E .Let(K, H, σ, f t ) be the corresponding factor system. Then (see the Eq. 10.18) there is a map g from K to H with g(e) = 0 such that f t(x, y) + g(xy) = ( ( )) + ( ) + t ( , ) t + 2 ( , ) = t + 2 ( , ) σx g y g x f x y . This shows that f Bσ K H f Bσ K H . ( , ) → t  ( , ) 2( , ) Thus, the association E t f induces a map from EXTσ H K to Hσ K H ([ ]) = t + 2 ( , ) ∈ 2( , ) given by E f Bσ K H , where t is a section of E.Letf Zσ K H . Then by the theorem 10.1.11, there is an extension E of H by K, and a section t such t that f = f . This shows that  is surjective. Let E1 and E2 be extensions of H by K with sections t1 and t2 and abstract kernel σ such that ([E1]) = ([E2]). Then t1 + 2 ( , ) = t2 + 2 ( , ) f Bσ K H f Bσ K H . Hence there exists a map g from K to H with t t g(e) = 0 such that f 1 (x, y) + g(xy) = σx(g(y)) + g(x) + f 2 (x, y). It follows t1 t2 that the factor system f is equivalent f . Hence the corresponding extensions E1 and E2 are equivalent.  Let H be a group (not necessarily abelian), and K be a group. Though, H may not be abelian, we use the additive notation + for the operation in H, and also for the oper- ation in any extension G of H by K. The operation of K is denoted by juxtaposition. Thus, the identity of H is denoted by 0, and that of K by e.Letψ : K −→ Out(H) = Aut(H)/Inn(H) be an abstract kernel. Since the center Z(H) of H is a characteris- tic subgroup of H, we have a homomorphism χ : Aut(H) −→ Aut(Z(H)) given by χ(α) = α/(Z(H).Letσ : K −→ Aut(H) be a lifting of ψ with σ(e) = IH in the sense that νoσ = ψ, where ν is the quotient map from Aut(H) to Out(H). Since ψ is a homomorphism, σ(xy)Inn(H) = (σ(x)σ(y))Inn(H). Hence there is a map f from K × K to H such that σ(x)σ(y) = if (x,y)σ(xy) (recall that ih denote the inner automor- phism determined by h). It follows that (χoσ)(xy) = (χoσ)(x)(χoσ)(y) for all x, y ∈ K. This means that χoσ is a homomorphism from K to Aut(Z(H)).Letτ be another lifting of ψ. Then σ(x)Inn(H) = τ(x)Inn(H) for all x ∈ K. Hence there is a map g from K to H with g(e) = 0 such that σ(x) = ig(x)τ(x) for all x ∈ K. But, then (χoσ(x)) = (χoτ(x)) for all x ∈ K. Thus, χoσ depends only on ψ and not on any 10.1 Schreier Group Extensions 387 particular lifting σ. In turn, χ induces a map χ from the set Hom(K, Out(H)) of abstract kernels from K to H to the set Hom(K, Z(H)) of abstract kernel from K to Z(H) given by χ(ψ) = χoσ, where σ is a lifting of ψ. Proposition 10.1.27 Let

α β E ≡ 1 −→ H → G → K −→ 1. and α β E ≡ 1 −→ H → G → K −→ 1. be extensions of H by K such that ψE = ψE = ψ. Then there is a section t of E and a section t of E such that σt = σt = χ(ψ), and −f t(x, y) + f t (x, y) ∈ Z(H) for all x, y ∈ K. Further, then the map h from K × KtoZ(H) defined by ( , ) =−t( , ) + t ( , ) 2 ( , ( )) h x y f x y f x y is a 2 co-cycle in Zχ(ψ) K Z H .

Proof Let t beasectionofE, and s beasectionofE . Since ψE = ψE , σt(x)Inn(H) = σs(x)Inn(H) for all x ∈ K. This means that there is a function g t s from K to H with g(e) = 0 such that σ (x) = ig(x)σ (x) for all x ∈ K.Themap t from K to G given by t (x) = g(x) + s(x) is also a section of E . Further, t s t t t then σ (x) = ig(x)σ (x) = σ (x) for all x in K. This shows that σ = σ .Now f t(x, y) = t(x) + t(y) − t(xy) and f t (x, y) = t (x) + t (y) − t (xy). Hence − − = t( ) t( )( t( )) 1 = t ( ) t ( )( t ( )) 1 = , ∈ if t (x,y) σ x σ y σ xy σ x σ y σ xy if t (x,y) for all x y K.

= − t( , ) + t ( , ) ∈ ( ) Thus, i−f t (x,y) + f t (x,y) IH . This shows that f x y f x y Z H for all x, y ∈ K. Put h(x, y) =−f t(x, y) + f t (x, y). Then h(x, y) + h(xy, z)

=−f t(x, y) + f t (x, y) − f t(xy, z) + f t (xy, z)

=−f t(xy, y) − f t(x, y) + f t (x, y) + f t (xy, z)

=−(f t(x, y) + f t(xy, z)) + f t (x, y) + f t (xy, z) =−( t ( t( , )) + t( , )) + t ( , ) + t ( , ) σx f y z f x yz f x y f xy z =−t( , ) − t ( t( , )) + t ( t ( , )) + t ( , ) f x yz σx f y z σx f y z f x yz =−t( , ) + t (− t( , ) + t ( , )) + t ( , ) t = t f x yz σx f y z f y z f x yz ,forσ σ =−t( , ) + t ( , ) + t (− t( , ) + t ( , )) f x yz f x yz σx f y z f y z = t (− t( , ) + t ( , )) − t( , ) + t ( , ) σx f y z f y z f x yz f x yz = t ( ( , )) + ( , ) σx h y z h x yz , , ∈ ∈ 2 ( , ( ))  for all x y z K. This shows that h Zχ(ψ) K Z H . Theorem 10.1.28 Let ψ : K −→ Out(H) be an abstract kernel from K to H which is realizable by an extension of H by K. Then the second co-homology group 2 ( , ( )) ( , ) Hχ(ψ) K Z H acts sharply transitively on the set EXTψ H K of equivalence classes of extensions of H by K associated to the abstract kernel ψ.

Proof Let E be an extension of H by K which realizes the abstract kernel ψ, and t be a section of E.Let(K, H, σt, f t) be the corresponding factor system. Then 388 10 Group Extensions and Schur Multiplier

( ) = t ( ) ∈ ∈ 2 ( , ( )) ψ x σxInn H for all x K.Leth Zχ(ψ) K Z H . It is easily seen that (K, H, σt, f t + h) is again a factor system. Let E  h denote the corresponding exten-  2 ( , ( )) sion. Clearly E h also realizes ψ.Leth be another 2 co-cycle in Zχ(ψ) K Z H [ ]= + 2 ( , ( )) =[ ]= + such that the co-homology class h h Bχ(ψ) K Z H h h 2 ( , ( )) 2 ( , ( )) : −→ ( ) ⊆ Bχ(ψ) K Z H in Hχ(ψ) K Z H . Then there is a map g K Z H H with g(e) = 0 such that h (x, y) = ∂g(x, y) + h(x, y) for all x, y ∈ K. Clearly f t + h = f t + h + ∂g. Hence (K, H, σt, f t + h) is equivalent to (K, H, σt, f t + h ). This shows that [E  h]=[E  h ].LetE and E be equivalent extensions of H ∈ 2 ( , ( )) by K which realize ψ and h Zχ(ψ) K Z H . By the Theorem 10.1.12,wehave sections t and t of E and E respectively such that (K, H, σt, f t) is equivalent to

(K, H, σt , f t ). Hence, there is a map g from K to H with g(e) = 0 such that f t(x, y) + g(xy) = g(x) + σt(x)(g(y)) + f t (x, y) for all x, y ∈ K. Clearly,

(K, H, σt, f t + h) is equivalent to (K, H, σt , f t + h), and so [E  h]=[E  h].  2 ( , ( ) ( , ) [ ]  [ ]= Thus, we get an action of Hχ(ψ) K Z H on EXTψ H K given by E h [E  h]. We show that this action is sharply transitive. Let E and E be extensions real- izing the abstract kernel ψ. By the Proposition10.1.27, there is a section t of E, and there is a section t of E such that σt = σt = χ(ψ), and the map h from K × K to ( ) ( , ) =−t( , ) + t ( , ) 2 ( , ( )) Z H defined by h x y f x y f x y is a 2 co-cycle in Zχ(ψ) K Z H . Clearly, [E]  [h]=[E ]. This shows that the action  is transitive. Next, suppose that [E]  [h]=[E]. Then there is a section t of E such that the factor system (K, H, σt, f t) is equivalent to (K, H, σt, f t + h). Hence there is a map g from K to H ( ) = t( , ) + ( , ) + ( ) = ( ) + t ( ( )) + t( , ) with g e 0 such that f x y h x y g xy g x σx g y f x y , ∈ ( ) + t ( ) − ( ) = t ( ) ∈ ∈ for all x y K and also g x σx h g x σx h for all x K and h H. t ( ) ∈ ( ) ∈ Since σx is an automorphism of H,itfollowsthatg x Z H for all x K. Thus, ( , ) = t ( ( )) − ( ) + ( ) , ∈ = h x y σx g y g xy g x for all x y K. This shows that h ∂g, where g is a map from K to Z(H) with g(e) = 0. It follows that [h]=0. This completes the proof of the fact that the action  is sharply transitive. 

Corollary 10.1.29 There is a bijective correspondence between EXTψ(H, K) to 2 ( , ( ))  Hχ(ψ) K Z H provided there is an extension of H by K which realizes ψ. 2( , ) Let H be an abelian group and K a group. As Hσ K H is an abelian group, the bijective map  (see Theorem 10.1.26) induces a group structure on the set EXTσ(H, K) of equivalence classes of extensions of H by K with the given abstract kernel σ. We shall try to describe the induced addition called the Baer sum on the class of extensions. Let

α1 β1 E1 ≡ 1 −→ H → G1 → K −→ 1. and α2 β2 E2 ≡ 1 −→ H → G2 → K −→ 1. be two extensions of H by K with abstract kernel σ. We have the extension E1 ⊕ E2 of H ⊕ H by K ⊕ K given by 10.1 Schreier Group Extensions 389

α1⊕α2 β1⊕β2 E1 ⊕ E2 ≡ 1 −→ H ⊕ H → G1 ⊕ G2 → K ⊕ K −→ 1.

Using this, we construct an other extension of H by K.Let denote the diagonal map from K to K ⊕ K defined by (x) = (x, x). We have the pull back diagram χ L - K

i  ? β1 ⊕ β2 ? G1 ⊕ G2 - K ⊕ K,

where L ={(g1, g2) | β1(g1) = β2(g2)}, i the inclusion map, and χ is given by χ(g1, g2) = β1(g1) (ensure that it is a pull back diagram). Clearly, χ is a surjective homomorphism, kerχ ={(g1, g2) ∈ L | χ(g1, g2) = β1(g1) = e}={(g1, g2) |  e = β1(g1) = β2(g2)}=H ⊕ H. Thus, we have an extension  (E ⊕ E) of H ⊕ H by K given by

 α1⊕α2 χ  (E1 ⊕ E2) ≡ 1 −→ H ⊕ H → L → K −→ 1.

Now, let ∇ denote the co-diagonal map from H ⊕ H to H given by ∇(h1, h2) = h1 + −1 h2. Since H is an abelian group, ∇ is a homomorphism. Let D ={(h, h ) | h ∈ H}. Then D is a normal subgroup of L, and we have the push out diagram (verify) α1 ⊕ α2 H ⊕ H - L

∇ ν ? η ? H - G, where G = L/D, ν the quotient map, and η is given by η(h) = (h, 0) + D. Clearly, η is a homomorphism. Suppose that η(h) = D. Then (h, 0) ∈ D. This implies that h = 0. Hence η is injective. Again the map χ from L to K takes (h, −h) to β1(h) = e. This shows that χ induces a surjective homomorphism χ from G to K given by χ((g1, g2) + D) = β1(g1).Alsokerχ ={(g1, g2) + D | e = β1(g1) = β2(g2)}={(h1, h2) + D | h1, h2 ∈ H}={(h1 + h2) + D | h1, h2 ∈ H}=imageη. Thus, we get an other extension

η χ 1 −→ H → G → K −→ 1.  of H by K, called the Baer sum of E1 and E2, and it is denoted by E1 E2. Further, let t1 beasectionofE1, and t2 beasectionofE2 with corresponding factor system t1 t2 f and f . Then we have the section t1 + t2 of E1 E2 given by (t1 + t2)(x) = t1+t2 t1 t2 (t1(x), t2(x)) + D. It can be easily seen that f = f + f . It follows that the bijective map  from the set EXTψ(H, K) of equivalence classes of extensions of 390 10 Group Extensions and Schur Multiplier

2 ( , ) H by K to the second co-homology group Hψ K H respects the addition. In turn, EXTψ(H, K) is an abelian group with respect to the Baer sum, and it is isomorphic 2 ( , )  to Hψ K H . Proposition 10.1.30 Let H be an abelian group, and K be a group of order m. Then 2( , ) ={} : −→ ( ) mHσ K H 0 for any abstract kernel σ K Aut H . ∈ 2( , ) ( ) = Proof Let f Zσ K H . Consider the map g from K to H given by g x z∈K f (x, z). Clearly g(e) = 0. Now,

∂g(x, y) = σx(g(y)) − g(xy) + g(x) = σx(z∈K f (y, z)) − z∈K f (xy, z) + z∈K f (x, z) = z∈K (σx(f (y, z))) − f (xy, z) + f (x, z) = mf (x, y) + z∈K − f (x, y) − f (xy, z) + σx(f (y, z)) + f (x, z) = mf (x, y) + z∈K σx(f (y, z)) − f (xy, z) + f (x, yz) − f (x, y) (for H is abelian) = mf (x, y) + z∈K ∂f (x, y, z) = ( , ) ∈ 2( , ) mf x y (for f Zσ K H ). ∈ 2 ( , ) [ ]=  Hence mf Bσ K H . This shows that m f 0. Corollary 10.1.31 Let H be an abelian group of order n, and K a group of order ( , ) = 2( , ) ={} : −→ m such that m n 1. Then Hσ K H 0 for all abstract kernel σ K Aut(H).

2( , ) ={} = Proof From the above proposition, mHσ K H 0 . Since nf 0 for all maps × 2( , ) ={} f from K K to H,itfollowsthatnHσ K H 0 . Since m and n are co-prime, 2( , ) ={}  Hσ K H 0 . Corollary 10.1.32 Let H be a finite abelian group of order n, and K a group of order m, where (m, n) = 1. Then every extension of H by K splits.

Proof It follows from Theorem10.1.26 that EXTσ(H, K) is in bijective correspon- 2( , ) dence with Hσ K H . From the above corollary, it is evident that there is only one equivalence class of extension, and indeed, it is the split extension. 

Let H and K be finite groups of co-prime orders. Z(H) and K are also of co-prime orders. It follows from the Theorem 10.1.28 and the above corollary that EXTψ con- tains at most one element. More explicitly, an abstract kernel ψ : K −→ Out(H) is either not realizable from an extension of H by K or there is only one equivalence class of extensions of H by K associated to the abstract kernel ψ. However, it is not clear if the unique extension is split extension. The following theorem asserts that it is, indeed, a split extension even if H is non-abelian. Theorem 10.1.33 (Schur–Zassenhauss) Let G be a finite group having a normal subgroup H such that H and G/H are of co-prime orders. Then G is a split extension of H by G/H (equivalently, G is semi-direct product of H with G/H). 10.1 Schreier Group Extensions 391

Proof If H is abelian subgroup of G, then the result follows from the above corollary. We prove it for general case. The proof is by induction on the order | G | of G.If | G |= 1, then there is nothing to do. Assume that the result is true for all those groups L for which | L | < | G |. We prove it for G.LetH be a normal subgroup of G such that | H | is co-prime to | G/H |. Suppose that there is a proper subgroup K of G such that G = HK. Then K His a normal subgroup of K with (| (K H) | , | K/K H |) = 1 (note that K/K H is isomorphic to KH/H = G/H). Since | K | < | G |, by the induction hypothesis,  there is a compliment L ofK H in K. But, then K = (K H)L and K H L ={e}. Hence G = H(K H)L = HL and H L ={e} (for L ⊆ K). This proves the result for G in case G has a proper normal subgroup K such that G = HK. Next, assume that there is no proper subgroup K of G such that G = HK. Suppose that there is a nontrivial normal subgroup M of G which is properly contained in H. Then H/M is a normal subgroup of G/M such that H/M and (G/M)/(H/M) ≈ G/H are of co-prime orders. By the induction hypothesis, there is a subgroup L/M of G/M such that G/M = (H/M)(L/M) and (H/M) (L/M) ={M}. In other words G = HL and H L = M. But, then L = G and so M = H L = H G = H, a contradiction to the supposition that M is properly contained in H. Thus, H is a minimal normal subgroup of G.Let p be a prime dividing the order | H | of H.LetP be a sylow p - subgroup of H. Since (p, | G/H |) = 1, P is a sylow p - subgroup of G also. Further, since all the sylow p - subgroups of G are conjugate, and they are all contained in H, G = HNG (P). Hence NG (P) = G. In other words P is a normal subgroup of G which is contained in H. Since H is a minimal normal subgroup of G, H = P is a p - subgroup. Thus, the center Z(H) ={e}. Since Z(H) is a characteristic subgroup of H, and H is normal in G, it follows that Z(H) is normal in G. Again the minimality of H ensures that H = Z(H) is abelian. From the previous corollary, G is split extension of H by G/H. 

10.2 Obstructions and Extensions

Let us discuss the conditions under which an abstract kernel ψ of K to H can be realized from an extension of H by K. Here, H is not assumed to be an abelian group. Let σ be a map from K to Aut(H) with σe = IH such that ψ(x) = σxInn(H) for each x ∈ K (axiom of choice ensures that such a map exists). Such a map σ will be termed as lifting of ψ. Since ψ is a homomorphism, σxσyInn(H) = σxyInn(H). Let f be a map from K × K to H with f (e, x) = 1 = f (x, e) for all x ∈ K such that

σxσy = if (x,y)σxy ······ (10.19) for all x, y ∈ K (existence of such a f is ensured by the axiom of choice). Now,

σxσyσz = if (x,y)σxyσz = if (x,y)if (xy,z)σxyz. 392 10 Group Extensions and Schur Multiplier

On the other hand,

= = ( )−1 = . σxσyσz σxif (y,z)σyz σxif (y,z) σx σxσyz iσx (f (y,z))if (x,yz)σxyz

Equating both the expressions,

= . if (x,y)f (xy,z) iσx (f (y,z))f (x,yz) for all x, y, z ∈ K. Hence there is a map φ from K × K × K to Z(H) with 1 = φ(e, y, z) = φ(x, e, z) = φ(x, y, e) such that

f (x, y)f (xy, z) = φ(x, y, z)σx(f (y, z))f (x, yz) ······ . (10.20)

2( , ) Clearly, f is a factor system in Zσ K H if and only if φ is identically trivial. It is natural, therefore, to term φ as the obstruction to f for being a factor system. This is also call an obstruction associated to the abstract kernel ψ. We have the following proposition. Proposition 10.2.1 An abstract kernel ψ can be realized from an extension if and only if one of its obstruction is trivial.  We further analyze the obstruction φ, and its dependence on the choice of σ and the function f .Firstφ(x, y, z) ∈ Z(H), and the center is a characteristic subgroup. Thus, σx(φ(y, z, t)) ∈ Z(H) for all x, y, z, t ∈ K. For convenience, we adopt the additive notation for the operation of H. Note that H need not be abelian. However, Z(H) is abelian. Thus, the Eq. (10.20) reads as

f (x, y) + f (xy, z) = φ(x, y, z) + σx(f (y, z)) + f (x, yz) ······ . (10.21)

Proposition 10.2.2 An obstruction φ associated to an abstract kernel ψ is a 3-cocycle in the sense that

σx(φ(y, z, t)) − φ(xy, z, t) + φ(x, yz, t) − φ(x, y, zt) + φ(x, y, z) = 0

Proof Using (10.21), we express f (x, y) + f (xy, z) + f (xyz, t) in two different ways. f (x, y) + f (xy, z) + f (xyz, t) = φ(x, y, z) + σx(f (y, z)) + f (x, yz) + f (xyz, t) = φ(x, y, z) + σx(f (y, z)) + φ(x, yz, t) + σx(f (yz, t)) + f (x, yzt) = φ(x, y, z) + φ(x, yz, t) + σx(f (y, z) + f (yz, t)) + f (x, yzt) = φ(x, y, z) + φ(x, yz, t) + σx(φ(y, z, t) + σy(f (z, t) + f (y, zt)) + f (x, yzt) = φ(x, y, z) + φ(x, yz, t) + σx(φ(y, z, t)) + σx(σy(f (z, t)) + σx(f (y, zt)) + f (x, yzt) 10.2 Obstructions and Extensions 393

= φ(x, y, z) + φ(x, yz, t) + σx(φ(y, z, t)) + σx(σy(f (z, t)) − φ(x, y, zt) + f (x, y) + f (xy, zt) = φ(x, y, z) + φ(x, yz, t) + σx(φ(y, z, t)) − φ(x, y, zt) + f (x, y) + σxy (f (z, t)) + f (xy, zt). On the other hand, f (x, y) + f (xy, z) + f (xyz, t) = f (x, y) + φ(xy, z, t) + σxy(f (z, t)) + f (xy, zt) = φ(xy, z, t) + f (x, y) + σxy(f (z, t)) + f (xy, zt) Equating the two expressions for f (x, y) + f (xy, z) + f (xyz, t), we get the desired result. 

Since Z(H) is a characteristic subgroup of H, and σx is an automorphism of H, σx/Z(H) ∈ Aut(Z(H)). Again, since σxσy = if (x,y)σxy, σx/Z(H)σy/ZH = σxy/Z(H). Thus, σ induces a homomorphism from K to Aut(Z(H) which associates x to σx/Z(H). This induced homomorphism is again denoted by σ. Further, if τ is another lifting of ψ, then σxInn(H) = τxInn(H) for all x ∈ K. Hence there is a map g from K to H with g(e) = 1 such that σx = ig(x)τx for all x. This means that the induced homomorphisms σ and τ from K to Aut(Z(H)) are same. It follows that the induced homomorphism σ from K to Aut(Z(H)) is independent of the lifting σ, and 3 ( , ( )) = 3( , ( )) it depends only on the abstract kernel ψ.LetZψ K Z H Zσ K Z H denote the set of 3 co-cycles associated to the abstract kernel ψ. Then this is an abelian group with respect to the obvious addition, and it is called the group of 3 co-cycles. Thus, for each choice of f satisfying the Eq. 10.19, we obtain an obstruction φ in 3( , ( )) Zσ K Z H described by the Eq. 10.21. Now, we examine as to how the obstruction changes with different choices of the function f satisfying (10.19). Let f be another map from K × K to H with f (e, y) = f (x, e) = 0 such that σxσy = if (x,y)σxy.Letφ be the obstruction to f . Then there is another map p from K × K to Z(H) such that

f (x, y) = f (x, y) + p(x, y) for all x, y ∈ K.Also

f (x, y) + f (xy, z) = φ (x, y, z) + σx(f (y, z)) + f (x, yz).

Putting the values of f (x, y), we get

f (x, y) + f (xy, z) + p(x, y) + p(xy, z) =

φ (x, y, z) + σx(f (y, z) + p(y, z)) + f (x, yz) + p(x, yz).

In turn,

φ(x, y, z) + σx(f (y, z)) + f (x, yz) + p(x, y) + p(xy, z) =

φ (x, y, z) + σx(f (y, z)) + σx(p(y, z)) + f (x, yz) + p(x, yz). 394 10 Group Extensions and Schur Multiplier

Thus, there is a map p from K × K to Z(H) with p(e, y) = 0 = p(x, e) for all x, y ∈ K such that

φ(x, y, z) − φ (x, y, z) = σx(p(y, z)) − p(xy, z) + p(x, yz) − p(x, y).

Let us call a map δ from K × K × K to Z(H) a 3co-boundary if there is a map p from K × K to Z(H) with p(e, y) = 0 = p(x, e) for all x, y ∈ K such that

δ(x, y, z) = ∂p(x, y, z) = σx(p(y, z)) − p(xy, z) + p(x, yz) − p(x, y). for all x, y, z ∈ K. It can be easily verified that a 3 co-boundary is also a 3 co-cycle, 3 ( , ( )) 3( , ( )) and the set Bσ K Z H of co-boundaries is a subgroup of the group Zσ K Z H . 3( , ( ))/ 3 ( , ( )) The quotient group Zσ K Z H Bσ K Z H is called the third co-homology group 3( , ( )) + 3 ( , ( )) = + 3 ( , ( )) denoted by Hσ K Z H . It follows that φ Bσ K Z H φ Bσ K Z H . + 3 ( , ( )) Thus, φ Bσ K Z H is independent of the choice of f , and we have a map Obs (called the obstruction map) from the set Hom(K, Out(H)) of abstract kernels to the 3( , ( )) ( ) = + 3 ( , ( )) third co-homology group Hσ K Z H defined by Obs ψ φ Bσ K Z H , where φ is an obstruction to the choice f . Proposition 10.2.3 An abstract kernel ψ can be realized by an extension if and only if Obs(ψ) = 0. Proof Suppose that ψ can be realized by an extension. Then there is an extension E of H by K.Lett beasectionofE. Then it gives rise to a factor system (K, H, σt, f t) ( ) = t ( ) t ( ) with ψ x σxInn H (note that σ restricted to the center Z H is independent of t). In turn, t t = t σxσy if t (x,y)σxy and t( , ) + t( , ) = t ( t( , )) + t( , ). f x y f xy z σx f y z f x yz

This shows that the obstruction φ to the choice f t is 0. Hence Obs(ψ) = 0. Con- versely, suppose that Obs(ψ) = 0. Then there is a map σ from K to Aut(H), and a map f from K × K to H such that σe = IH , f (e, y) = 0 = f (x, e), ψ(x) = σxInn(H) and σxσy = if (x,y)σxy for all x, y ∈ K. By our assumption, the obstruction φ to f is a co-boundary. Hence there is a map p from K × K to Z(H) with p(e, y) = 0 = p(x, e) for all x, y ∈ K such that

φ(x, y, z) = σx(p(y, z)) − p(xy, z) + p(x, yz) − p(x, y).

In other words, by (10.21),

f (x, y) + f (xy, z) = σx(p(y, z)) − p(xy, z) + p(x, yz) − p(x, y) + σx(f (y, z)) + f (x, yz).

for all x, y, z ∈ K.Takef = f − p. Then if (x,y) = if (x,y) for all x, y ∈ K and 10.2 Obstructions and Extensions 395

f (x, y) + f (xy, z) = σx(f (y, z)) + f (x, yz).

This shows that (K, H, σ, f ) is a factor system. Let E be the corresponding extension of H by K. Clearly, the associated abstract kernel is the given abstract kernel ψ. 

Proposition 10.2.4 Let H and K be groups, and φ be a 3 co-cycle representing an 3( , ( )) ( ( )) element in Hσ K Z H , where σ is a homomorphism from K to Aut Z H . Assume that K contains more than two elements. Then there is a group (free in some sense) G with Z(G) = Z(H), and an abstract kernel ψ ∈ Hom(K, Out(G)) inducing the homomorphism σ from K to Aut(G) such that Obs(ψ) =[φ].

3( , ( )) = × − ×{ }−{}× Proof Let φ be a 3 co-cycle in Zσ K Z H .LetX K K K e e K and F(X) the free group on X. For convenience, we use the additive notation for F(X) also, although it is non-commutative. Let G = Z(H) × F(X) be the direct product of Z(H) and F(X). Then Z(H) and F(X) are naturally identified as subgroups of G. Indeed, for convenience, an element (a, u) ∈ G, a ∈ Z(H), u ∈ F(X) will be written as a + u. Since X contains more than one element, it follows that Z(G) = Z(H). For each x ∈ K, we extend the map σx to an endomorphism σx of G by defining it on the free generating set X of F(X) as follows:

σx(y, z) = φ(x, y, z) + (x, y) + (xy, z) − (x, yz) ······ (10.22) for all x, y, z ∈ K −{e}. Since φ(x, y, z) = 0 whenever any of the x, y, z are e,the identity (10.22) will make sense for all x, y, z ∈ K if we identify (x, e) and (e, y) with the identity of F(X) for all x, y ∈ K. Clearly, σe = IG . We show that

σx σy = i(x,y)σxy ······ . (10.23)

Since σx is an extension of σx for all x ∈ K, and σ is a homomorphism from K to Aut(Z(H)),fora ∈ Z(H),

σx(σy(a)) = σx(σy(a)) = σxy(a) = (x, y) + σxy(a) − (x, y).

Thus, both sides of (10.23) are equal when restricted to Z(H) = Z(G). It is sufficient, therefore, to show that both sides of (10.23) evaluated on (z, t) give the same result for all z, t ∈ K.Using(10.22) and the fact that φ is a 3-co-cycle, we get

σx(σy((z, t))) = σx(φ(y, z, t) + (y, z) + (yz, t) − (y, zt)) = σx(φ(y, z, t)) + σx((y, z)) + σx((yz, t)) − σx((y, zt)) = σx(φ(y, z, t)) + φ(x, y, z) + (x, y) + (xy, z) − (x, yz) + φ(x, yz, t) + (x, yz) + (xyz, t) − (x, yzt) − (φ(x, y, zt) + (x, y) + (xy, zt) − (x, yzt)) = σx(φ(y, z, t)) + φ(x, y, z) + (x, y) + (xy, z) − (x, yz) + φ(x, yz, t) + (x, yz) + (xyz, t) − (x, yzt) + (x, yzt) − (xy, zt) − (x, y) − φ(x, y, zt) = σx(φ(y, z, t)) + φ(x, y, z) + φ(x, yz, t) − φ(x, y, zt) + (x, y) + (xy, z) − (x, yz) + (x, yz) + (xyz, t) − (x, yzt) + (x, yzt) − (xy, zt) − (x, y) 396 10 Group Extensions and Schur Multiplier

= (x, y) + φ(xy, z, t) + (xy, z) + (xyz, t) − (xy, zt) − (x, y) = i(x,y)(σxy(y, z)) Thus, σx σy = i(x,y)σxy for all x, y ∈ K. In particular,

σx σx−1 = i(x,x−1)σxx−1 = i(x,x−1)

This shows that σx is surjective, and σx−1 is injective for all x ∈ K. Hence σx is an automorphism of G.Itfollowsthatσ induces a homomorphism ψ from K to Out(G) whose obstruction is the given co-cycle φ. 

The abstract kernel ψ, thus obtained, is universal in some sense. Let χ be a homomorphism from K to Out(H) such that Z(H) = Z(K), and the obstruction of χ is the obstruction of ψ. Then there is a map τ from K to Aut(H) with τe = IH and amapf from K × K to H with f (e, y) = 0 = f (x, e) such that

(i) τxτy = if (x,y)τxy, (ii) τx = σx when restricted to Z(H) = Z(G) and (iii) f (x, y) + f (xy, z) − f (x, yz) − τx(f (y, z)) = φ(x, y, z) = (x, y) + (xy, z) − (x, yz) − σx(y, z) for all x, y, z ∈ K. Using the universal property of the free group F(X), we get a unique homomorphism ρ from G to H subject to ρ((x, y)) = f (x, y) and ρ(h) = h for all h ∈ Z(H) = Z(G). In turn, ρ ◦ σx = τx ◦ ρ for all x ∈ K. Example 10.2.5 In this example, we discuss the extensions of a group by a free group F (say). Let ψ is an abstract kernel of F to H. In other words ψ is a homomorphism from F to Out(H). Since F is free, we have a homomorphism σ from F to Aut(H) such that ν ◦ σ = ψ. The semi-direct product extension induced by σ is an extension of H by F with abstract kernel ψ. Further, since F is free, any extension E given by

α β E ≡ 1 −→ H → G → F −→ 1. splits. Thus, corresponding to any abstract kernel of F to H, there is one and only one equivalence class of extension of H by F which is a split extension. In particular, 2( , ) ={} Hσ F H 0 Example 10.2.6 In this example we discuss the extensions of a group by a cyclic group. The case of infinite cyclic group is already included in the above example. We discuss the extension of a group H by the cyclic group Zm, m ≥ 2. Let ψ be an abstract Z ( ) = ( ) kernel of m to H.Letσ1 be an automorphism of H such that σ1Inn H ψ 1 . Then Z ( ) = ( )i the map σ from m to Aut H given by σi σ1 is a lifting of the abstract kernel ( )m ∈ ( ) ψ. Clearly, σ1 Inn H . Note that, for convenience, the image of i under the map = × Z ∈ ( ) = σ is being denoted by σi.LetG H m.Leth0 H such that σ1 h0 h0. 10.2 Obstructions and Extensions 397

Note that at least one such h0 exists, for if worst comes h0 = 1 will do. Define a product in G by ( , )( , ) = ( ( ), + ) a i b j aσi b i j in case i + j ≤ m − 1, and

( , )( , ) = ( ( ) , + ), a i b j aσi b h0 i j otherwise. It can be easily seen that G is a group. The identity is (1, 0), the inverse −1 −1 of (h, i) is (k, m − i), where σi(k) = h (h0) . We have an extension E of H by Zm given by α β E ≡ 1 −→ H → G → Zm −→ 1, where α(a) = (a, 0) and β is the second projection. Consider the section t of E given by t(i) = (1, i). Then (σt(h), 0) = t(i)(h, 0)(t(i))−1 = (1, i)(h, 0)(1, i)−1 = i ( ( ), )( −1, − ) = ( ( ) −1 , ) = ( ( ), ) t = σi h i h0 m i σi h h0 h0 0 σi h 0 . This shows that σ σ.It follows that the abstract kernel associated to this extension is ψ. Thus, every abstract t kernel of Zm to H can be realized from an extension of H by Zm. Note that f (i, j) = 1 m m for i + j ≤ m − 1 and h0 otherwise. Observe that (t(1)) = ((1, 1)) = (h0, 0) ( )m = and σ1 ih0 . It also follows from the above discussion that any extension of H by Zm with given abstract kernel ψ is determined by a choice of an element h0 of H which is fixed by σ. m There is a corresponding section t such that t(1) = h0.Lett be an other section of the extension with σt = σt = σ. Then there is an element a ∈ H such that t (1) = ( , ) ( ) = ( , )( , ) = ( , ) ( ( ), ) = (( , )) = ( , ) = a 0 t 1 a 0 1 1 a 1 and σ1 h 0 i(1,1) h 0 i(a,1) h 0 ( , )( , )(( , ))−1 = ( ( ) −1, ) ( ) −1 = ( ) a 1 h 0 a 1 aσ1 h a 0 . This shows that aσ1 h a σ1 h for all m m h ∈ H. This means that a ∈ Z(H).Also,t (1) = (a, 1) = (Nσ(a)h0, 0), where 2 m−1 Nσ is a map from H to H given by Nσ(h) = hσ(h)σ (h) ···σ (h).ThemapNσ is called norm of σ. Clearly, Nσ maps Z(H) to itself, and when restricted to Z(H),itis a homomorphism. Also Nσ(a) and Nσ(a)h0 are invariant under σ. Clearly, the choice h1 = Nσ(a)h0 determines another equivalent extension of H by K with prescribed abstract kernel ψ in the manner described above, and also any equivalent extension determines such an element ainZ(H). To summarize, we have the following: “Let H be group, and m be a natural number. Let ψ be an abstract kernel of Zm to H, i.e., ψ is a homomorphism from Zm to Out(H). There is a lifting map σ (not necessarily a homomorphism) from Zm to Aut(H)with σ(0) = IH such that σi(h)Inn(H) = ψ(h) for all h ∈ H.LetFix(σ) denote the set {h ∈ H | σ(h) = h} of elements of H fixed by σ. Clearly, Fix(σ) is a subgroup of H.Also,Nσ(Z(H)) is a normal subgroup of Fix(σ). The construction described above which constructs, for each h ∈ Fix(σ) an extension of H by Zm induces a natural bijection from the group Fix(σ)/Nσ(Z(H)) to the set EXTσ(Zm, H) of equivalence classes of extensions of H by Zm. In turn, if H is an abelian group, then it induces an isomorphism from ( )/ ( ) 2(Z , ) Fix σ Nσ H to the second co-homology group Hσ m H . In particular, if σ is m 2 trivial, then Fix(σ) = H and Nσ(H) ={h | h ∈ H}. Thus, in this case H (Zm, H) 398 10 Group Extensions and Schur Multiplier

m 2 is the quotient group H/H . In particular, H (Zm, Z) is isomorphic to Zm.Alsoif th 2 D is a group such that every element of H is a m power, then H (Zm, H) ={0}” Exercises 10.2.1 Find all extensions of (i) Z by Z, (ii) Q by Z, (iii) Z by Z × Z, (iv) a finite group by Z, (v) Q8 by Z2, (vi) D8 by Zm, (vii) S5 by Z2, (viii) A4 by Z2, (ix) Z2 by Z2 × Z2, (x) Zp × Zp by Zp, (xi) Q8 by V4 up to equivalence. 10.2.2 Characterize groups all of whose extensions are split extensions. 10.2.3 There are several splittings of a split extension. How are they related? 10.2.4 Give a proof of the Proposition10.1.17. 10.2.5 Show that the kernel of the natural homomorphism η from Aut(G) to Aut(G/Z(G)) is CAut(G)(Inn(G)). 10.2.6 Show that a group G is free if and only if every extension by G splits. 10.2.7 Show, by means of an example, that the number of non-isomorphic groups G having a normal subgroup H such that G/H is isomorphic to a fixed group K may be strictly less than the number of equivalence classes of extensions of H by K. Hint. Look at the Exercise 10.2.1 (ix).

10.2.8 Describe the extensions of H by Zm × Zn. 10.2.9 Describe all extensions of a group of order 5 by a group of order 4. Hence describe all groups of order 20.

10.2.10 Let τ be the automorphism of the Kleins four group V4 given by τ(a) = , ( ) = 2(Z , ) Z ( ) b τ b c.FindHσ 3 V4 , where σ is the homomorphism from 3 to Aut V4 = given by σ1 τ.

10.3 Central Extensions, Schur Multiplier

In this section we study those extensions of abelian groups for which the associated abstract kernels are trivial homomorphisms. In other words, we are interested in extensions G of an abelian group A by a group G for which A is a subgroup of the center of G. More precisely, we have the following definition. 10.3 Central Extensions, Schur Multiplier 399

Definition 10.3.1 An extension E of H by K given by

α β E ≡ 1 −→ H → G → K −→ 1. is called a central extension if α(H) ⊆ Z(G).

Example 10.3.2 Any group G is naturally a central extension of its center Z(G) by the group Inn(G) of its inner automorphisms. Thus, the quaternion group Q8 is a central extension of Z2 by the Kliens four group V4. Let F be a field, and GL(n, F) be the general linear group of invertible n × n matrices with entries in the field F. The center Z(GL(n, F)) of GL(n, F) consists  of all scalar matrices aIn, a ∈ F = F −{0}. The quotient group GL(n, F)/Z (GL(n, F)) is called the projective general linear group, and it is denoted by PGL(n, F). Thus, GL(n, F) is a central extension of its center by the projective general linear group PGL(n, F). We have the short exact sequence

α ν 1 −→ F → GL(n, F) → PGL(n, F) −→ 1 ··· , (10.24)

 where F is the multiplicative group of the field, α is the map given by α(a) = aIn, and ν is the quotient map. The above exact sequence represents a central extension of F by PGL(n, F). Mostly, groups can be represented as a subgroup of a linear group. Recall that a homomorphism ρ from a group G to the group GL(n, F) is called a linear representation of G of degree n over the field F (see Chap. 9). A homomorphism from a group K to PGL(n, F) is called a projective representation of K over F.Itmay not be possible to lift a projective representation ρ from a group K to PGL(n, F) to a linear representation from K to GL(n, F). For example, the identity projective representation from PGL(n, F) to itself cannot be lifted to a linear representation as the exact sequence (10.24) does not split. A natural question is “When can we lift a projective representation to a linear representation”. This question was first tackled by Schur in the beginning of the twentieth century.

Let A be an abelian group, and G be a group. Then Hom(G, A) is again an abelian group with respect to the obvious operation. Let α be a homomorphism from G to a group K. Then α induces a homomorphism α from Hom(K, A) to Hom(G, A) defined by α(η) = η ◦ α. Proposition 10.3.3 Let

α β 1 −→ H → G → K −→ 1. be an exact sequence of groups, and A be an abelian group. Then the sequence

β α 0 −→ Hom(K, A) → Hom(G, A) → Hom(H, A) is exact. 400 10 Group Extensions and Schur Multiplier

Proof The proof of the proposition is again an imitation of the proof of the Theo- rem 7.2.11, and it is left as an exercise. 

Remark 10.3.4 (i) As observed in the Remark7.2.12, the sequence, in general, is not exact if we adjoin 0in the right side of the sequence. (ii) In the language of category theory, the above proposition is expressed by saying that for all abelian groups A, Hom(−, A) is a left exact contra-variant functor from the category of groups to the category of abelian groups.

Let α β 1 −→ H → G → K −→ 1 ··· . (10.25) be a central extension of H by K, and A be an abelian group. Let us denote by H2(K, A) 2( , ) the second co-homology group Hσ K A in case the homomorphism σ from K to Aut(A) is trivial. We define a homomorphism δ(called a connecting homomorphism) from Hom(H, A) to H2(K, A) as follows: Let t beasectionof(10.25). Since the extension is central, σt is trivial in the t ( ) = ∈ ∈ t × sense that σx h h for all x K and h H. The function f from K K to H is given by t(x)t(y) = α(f t(x, y))t(xy). Then f t belongs to group Z2(K, H) of 2 co-cycles. Though σt is independent of the choice of the section t, f t depends on the choice of t. However, if t is another section of the extension, then f t and f t differ by a 2 co-boundary in B2(K, H).Letη ∈ Hom(H, A). Then η ◦ f t is a map from K × K to A. Since η is a homomorphism and f t is a 2 co-cycle in Z2(K, H),

η ◦ f t is a 2 co-cycle in Z2(K, A).Alsoη ◦ f t and η ◦ f t differ by a 2 co-boundary. This defines, unambiguously, an element η ◦ f t + B2(K, A) in H2(K, A). We define δ(η) = η ◦ f t + B2(K, A). Clearly, δ defines a homomorphism. We have the following fundamental exact sequence associated to the central exten- sion (10.25). Proposition 10.3.5 For any abelian group A, we have the following natural fundamental exact sequence

β α δ 0 −→ Hom(K, A) → Hom(G, A) → Hom(H, A) → H2(K, A) ··· (10.26) associated to the central extension given by (10.25)

Proof In the light of the Proposition 10.3.3, it is sufficient to prove the exactness at Hom(H, A).Letχ ∈ Hom(G, A). Then by the definition, δ(α(χ)) = (χ ◦ α) ◦ f t + B2(K, A).Now,t(x)t(y) = α(f t(x, y))t(xy). Hence χ(t(x)) + χ(t(y)) = χ(α(f t(x, y))) + χ(t(xy)). Thus, we have the map g from K to A given by g(x) = χ(t(x)) with g(e) = 0 such that χ(α(f t(x, y))) = g(y) − g(xy) + g(x). This shows that (χ ◦ α) ◦ f t ∈ B2(K, A), and so δ ◦ α = 0. It follows that imageα ⊆ kerδ. Let η ∈ kerδ. Then η ◦ f t ∈ B2(K, A).Letg be a map from K to A with g(e) = 0 such that η(f t(x, y)) = g(y) − g(xy) + g(x) ······ . (10.27) 10.3 Central Extensions, Schur Multiplier 401

Every element of G is uniquely expressed as α(a)t(x) for unique a ∈ H and x ∈ K. Define a map χ from G to A by χ(α(a)t(x)) = η(a) + g(x). Then χ(α(a)t(x)α(b)t(y)) = ( ( ) ( t ( )) ( t( , )) ( )) χ α a α σx b α f x y t xy = χ(abf t(x, y)t(xy)) = η(abf t(x, y)) + g(xy) = η(a) + η(b) + η(f t(x, y)) + g(xy) = η(a) + η(b) + g(x) + g(y) (by (10.27)) = χ(α(a)t(x)) + χ(α(b)t(y)). This shows that χ is a homomorphism. Also χ(α(a)) = η(a) for all a ∈ H. Thus, η = χ ◦ α = α(χ). It also follows that Kerδ ⊆ imageα. 

In particular, we have the following exact sequence.

β α δ 0 −→ Hom(K, H) → Hom(G, H) → Hom(H, H) → H2(K, H) ··· . (10.28)

The main problem is to determine all central extensions by a group K up to equiva- lence. Some author term it as an extension of K instead of by K.But,weshallstick to our earlier terminology by calling it an extension by K. Definition 10.3.6 A central extension

α β E ≡ 1 −→ H → G → K −→ 1. by K is said to be a free central extension if given any central extension

α β E ≡ 1 −→ H → G → K −→ 1, and a homomorphism η from K to K , there exists a morphism (ρ, τ, η) (not neces- sarily unique) from the extension E to the extension E . Let i ν 1 −→ R → F → K −→ 1 ······ . (10.29) be a presentation of K, i.e., F is a free group, R a normal subgroup of F, i the inclusion, and ν a surjective homomorphism with kernel R (note that R is also free by the Neilson Schreier theorem). The subgroup [R, F]=< {[r, x]=rxr−1x−1 | r ∈ R, x ∈ F} > is a normal subgroup of F contained in R. As such, we get an extension

i ν 1 −→ R/[R, F] → F/[R, F] → K −→ 1 ··· . (10.30) by K induced by the presentation (10.29)ofK. Clearly, this is a central extension by K. 402 10 Group Extensions and Schur Multiplier

Proposition 10.3.7 The central extension given by (10.30) is a free central extension by K.

Proof Let α β 1 −→ H → G → K −→ 1 ······ . (10.31) be a central extension by K , and η be a homomorphism from K to K . Since F is a free group, from the projective property of a free group, there is a homomorphism γ from F to G such that β ◦ γ = η ◦ ν. Since β(γ(R)) = η(ν(R)) ={e}, γ(R) ⊆ kerβ.By the exactness of (10.31), γ(R) ⊆ α(H ). Again, since α(H ) ⊆ Z(G ),itfollowsthat γ([R, F]) =[γ(R), γ(F)]⊆[α(H ), G ]={e}. Thus, γ induces a homomorphism τ from F/[R, F] to G such that β ◦ τ = η ◦ ν. Clearly, β(τ(R/[R, F])) ={e}.By the exactness, τ(R/[R, F]) ⊆ α(H ). Since α is injective, it induces a homomorphism ρ from R/[R, F] to H such that α ◦ ρ is τ restricted to R/[R, F]. Thus, (ρ, τ, η) is a morphism from the extension (10.30)to(10.31). 

Proposition 10.3.8 Let

α β E ≡ 1 −→ H → G → K −→ 1. be a free central extension. Then the map δ in the associated fundamental sequence described in the Proposition10.3.5 is surjective. More explicitly, for any abelian group A, the sequence

β α δ 0 −→ Hom(K, A) → Hom(G, A) → Hom(H, A) → H2(K, A) −→ 0 is exact.

Proof Let μ be a 2 co-cycle in Z2(K, A). Then (K, A, σ, μ) with σ the trivial map from K to Aut(A) is a factor system. The corresponding associated extension

α β E ≡ 1 −→ A → G → K −→ 1. is a central extension with a section t such that t (x)t (y) = α (μ(x, y))t (xy). Since E is a free central extension, we have a homomorphism ρ from H to A and a homo- morphism τ from G to G such that the diagram

α β 1 - H -G -K - 1

ρ τ IK ? α ? β ? 1 - A -G -K - 1 10.3 Central Extensions, Schur Multiplier 403 is commutative. For each x ∈ K, chose a t(x) ∈ G such that τ(t(x)) = t (x). Then β(t(x)) = β (t (x)) = x. This shows that t is a section of E.Now,t(x)t(y) = α(f t(x, y))t(xy), where f t is a 2 co-cycle in Z2(K, H). Further, α (μ(x, y))t (xy) = t (x)t (y) = τ(t(x))τ(t(y)) = τ(t(x)t(y)) = τ(α(f t (x, y))t(xy)) = τ(α(f t(x, y)))τ(t(xy)) = τ(α(f t(x, y)))t (xy). This shows that α (ρ(f t(x, y))) = α (μ(x, y)). Since α is injective, ρ(f t(x, y)) = μ(x, y).Bythe definition, δ(ρ) = μ + B2(K, A). It follows that δ is surjective. 

Next, we try to describe the image of the connecting homomorphism δ in the fundamental exact sequence associated to a central extension under the assumption that A is a divisible group. Recall that an abelian group A is called a divisible group if for any a ∈ A and an integer n, there is an element b ∈ A such that nb = a.For example, (Q, +), (Q/Z, +), (R, +), (C, +), (C, ·), and the circle group (S1, ·) are all divisible groups. From the Corollary 7.2.31, a group D is a divisible group if and only if given any subgroup H of an abelian group G, all homomorphisms f from H to D are restrictions of homomorphisms from G to D. Equivalently, the functor Hom(−, D) from the category of abelian groups to itself takes a short exact sequence to a short exact sequence. Proposition 10.3.9 Let

α β E ≡ 1 −→ H → G → K −→ 1. be a central extension, and D be a divisible group. Then the image of δ in the fundamental exact sequence

β α δ 0 −→ Hom(K, D) → Hom(G, D) → Hom(H, D) → H2(K, D)  is isomorphic to Hom([G, G] α(H), D). In particular, if the central extension is free central extension, then H2(K, D) is isomorphic to Hom([G, G] α(H), D).

Proof By the fundamental theorem of homomorphism, image δ ≈ Hom(H, D)/  Kerδ ≈ Hom(H, D)/imageα .Themapα induces injective homomorphism α from −1  H/H α ([G, G]) to G/[G, G]. Since D is divisible, α is a surjective homomor- phism from Hom(G/[G, G], D) to Hom(H/H α−1([G, G]), D). Also, since D is abelian, ν from Hom(G/[G, G], D) to Hom(G, D) is an isomorphism, where ν is the quotient map. Further the diagram α  Hom(G/[G, G], D) - Hom(H/H α−1([G, G]), D)

ν ν ? α ? Hom(G, D) - Hom(H, D) 404 10 Group Extensions and Schur Multiplier is commutative. It follows that the image of α is the image of ν. Again, since D is divisible, the following sequence is exact:  ν 0 −→ Hom(H/H α−1([G, G]), D) → Hom(H, D)  i → Hom(H α−1([G, G]), D) −→ 1.   −1 Thus, Hom (H, D)/imageα is isomorphic to Hom(H α ([G, G]), D). Clearly, −1 Hom(H α ([G, G]), D) is isomorphic to Hom ([G, G] α(H), D).This shows that image δ is isomorphic to Hom([G, G] α(H), D). The last assertion follow from the Proposition10.3.8.  Corollary 10.3.10 Given a central extension

α β E ≡ 1 −→ H → G → K −→ 1.  by K, Hom([G, G] α(H), C) is a subgroup of H2(K, C).  Corollary 10.3.11 Given a presentation

i ν 1 −→ R → F → K −→ 1.  of K, H2(K, C) is isomorphic to Hom(([F, F] R)/[R, F], C). Proof The given presentation induces the free central extension

i ν 1 −→ R/[R, F] → F/[R, F] → K −→ 1. by K. The result follows from Proposition10.3.9.  Proposition 10.3.12 Let K be a finite group of order n. Then H2(K, C) is also finite abelian group in which order of each element divide n. Proof Let f ∈ Z2(K, C). Then

f (x, y)f (xy, z) = f (y, z)f (x, yz) for all x, y, z ∈ K. Taking the product of the equation over all z ∈ K, we get    f (x, y)n f (xy, z) = f (y, z) f (x, z) z∈K z∈K z∈K  C ( ) = ( , ) ( ) = Define a map g from K to by g x z∈K f x z . Then g e 1 and the above equation reads as f (x, y)n = g(y)g(xy)−1g(x). 10.3 Central Extensions, Schur Multiplier 405

This means that f n is a co-boundary. Hence order of each element of H2(K, C) divides n. Selecting a nth root u(x) of g(x) for each x ∈ K, with u(e) = 1, we get a map u from K to C. Define a map f from K × K to C by

f (x, y) = f (x, y)u(y)−1u(xy)u(x)−1.

It follows that f + B2(K, C) = f + B2(K, C) and f (x, y)n = 1. Thus, for each x, y ∈ K, there are only finitely many possibilities for f (x, y). Since K is finite, H2(K, C) is finite. 

Corollary 10.3.13 (Schur) Let G be a group such that G/Z(G) is finite. Then the commutator subgroup [G, G] of G is finite.

Proof Suppose that n =|G/Z(G) |. Then by the Proposition 10.3.12, H2(G/Z(G),  C ) is finite, and order of each of its element divide n. By the Proposition 10.3.10,  2  Hom([G, G]  Z(G), C ) is isomorphic to a subgroup of H (G/Z(G), C ). Thus,  Hom([G, G] Z(G), C ) is finite, and order of each element of Hom([G, G] Z(G), C) divides n.If[G, G] Z(G) contains an element a of infinite order,     then Hom(< a >, C ) ≈ C is infinite. Since C is a divisible group, i from   Hom(< a >, C ) to Hom([G, G] Z(G), C ) is injective. This contradicts the fact that Hom([G, G] Z(G), C) is finite. This shows that [G, G] Z(G) is a tor- sion group. Suppose that G is finitely generated. ThenZ(G), being a subgroup of finite index, is also finitely generated. Hence [G, G] Z(G) is finitely generated torsion abelian group, and so it is finite. Suppose, now, that G is not finitely gen- erated. Let S be a transversal to Z(G). Then S is finite. Let L = < S > be the subgroup generated by S. Then L is finitely generated subgroup of G.Letx, y ∈ G. Then there are elements a, b ∈ Z(G) and u,v ∈ S such that x = au and y = bv. But, then [x, y]=[u,v]. This shows that [L, L]=[G, G]. Since G = Z(G)L, the center Z(L) of L is contained in Z(G). Thus, Z(G) L = Z(L). Further, G/Z(G) = LZ(G)/Z(G) ≈ L/Z(G) L = L/Z(L). This means that L/Z(L) is finite. It follows from the earlier proved fact that [L, L] is finite. Hence [G, G] is finite. 

Corollary 10.3.14 (Schur–Hopf Formula) Let

i ν 1 −→ R → F → K −→ 1.

2  be a free presentation of a finite group K. Then H (K, C ) is isomorphic to ([F, F] R)/[R, F].  Proof By the Corollary 10.3.11, H2(K, C) is isomorphic to Hom(([F, F] R)/ [R, F], C). Further, R/[R, F]⊆Z(F/[R, F]). Since F/R, being isomorphic to K, is finite, it follows that (F/[R, F])/Z(F/[R, F]) is finite. From Corollary10.3.13 ,it follows that the commutator [F, F]/[R, F] of F/[R, F] is finite. In turn, ([F, F] R)/  [R, F] is finite abelian. Clearly, Hom(Zm, C ) ≈ Zm.AlsoHom respects direct sums in the sense that Hom(A ⊕ B, C) ≡ Hom(A, C) ⊕ Hom(B, C). Since every finite 406 10 Group Extensions and Schur Multiplier abelian group is direct sum of finite cyclic groups, it follows that for any finite  2  abelian group A, Hom(A, C ) ≈ A. This shows that for finite groups K, H (K, C ) is isomorphic to ([F, F] R)/[R, F].   It follows from the above result that for a finite group K, the group ([F, F] R)/ [R, F] is independent of the choice of a free presentation of K. Indeed, we show that for any group (not necessarily finite), it is independent of the choice of a free presentation of the group. Proposition 10.3.15 Let

i ν EF ≡ 1 −→ R → F → K −→ 1. be an extension by K representing a free presentation of K, and

i β E ≡ 1 −→ H → G → L −→ 1. an extension by L, where i denotes the inclusion map. Let γ be a homomorphism from K to L. Then there is a homomorphism τ from F to G (not necessarily unique) such that (τ/R, τ, γ) is a morphism from the extension EF to E. Further, the morphism (τ/R, τ, γ) induces a homomorphism τ from [F, F]/[R, F] to [G, G]/[H, G] such that the diagram

 i ν 1 - ([F, F] R)/[R, F] - [F, F]/[R, F] -[K, K] - 1

ρ τ γ ?  i ? β ? 1 - ([G, G] H)/[H, G] - [G, G]/[H, G] -[L, L] - 1

is commutative, where the maps i and ν in the rows are the obvious induced maps, while ρ and γ are the restrictions of τ and γ respectively. Further, if λ is an other homomorphism from F to G such that (λ/R, λ, γ) is a morphism from the extension EF to E. Then the induced homomorphism λ is same as τ.

Proof Since F is free, we have a homomorphism (not necessarily unique) τ from F to G such that β ◦ τ = γ ◦ ν. Clearly, (τ/R, τ, γ) is a morphism from the extension EF to E.Alsoτ maps [R, F] to [H, G]. Thus, τ induces a homomorphism τ from F/[R, F] to G/[H, G] such that the diagram 10.3 Central Extensions, Schur Multiplier 407

i ν 1 - R/[R, F] - F/[R, F] - K - 1

ρ τ γ ? i ? β ? 1 - H/[H, G] - G/[H, G] - L - 1 is commutative, where ρ is the restriction of τ. Again, since ν maps [F, F] to [K, K], β maps [G, G] to [L, L], and τ maps [F, F]/[R, F] to [G, G]/[H, G], the diagram in the statement of the proposition is commutative. Suppose that there is an other homomorphism λ from F to G such that (λ/R, λ, γ) is a morphism from the extension EF to E. Then as for τ, λ induces a homomorphism λ from F/[R, F] to G/[H, G] such that the diagram

i ν 1 - R/[R, F] - F/[R, F] - K - 1

θ λ γ ? i ? β ? 1 - H/[H, G] - G/[H, G] - L - 1 is commutative, where θ is the restriction of λ.Letx ∈ F/[R, F]. Then, β(λ(x)) = γ(ν(x)) = β(τ(x)). Hence λ(x) = u(x)τ(x) for some u(x) ∈ H/[H, G]. Since H/[H, G] is contained in the center of G/[H, G], λ([x, y]) =[λ(x), λ(y)]= [u(x)τ(x), u(y)τ(y)]=[τ(x), τ(y)]=τ([x, y]). This shows that the induced homomorphisms λ = τ when restricted to [F, F]/[R, F], and so also to ([F, F] R)/[R, F].  Corollary 10.3.16 Given two free presentations

i ν EF ≡ 1 −→ R → F → K −→ 1. and

i ν EF ≡ 1 −→ R → F → K −→ 1.   of K, the groups ([F, F] R)/[R, F] and ([F , F ] R )/[R , F ] are naturally iso- morphic.

Proof From the Proposition10.3.15, for the identity map IK from K to K,wehavea unique homomorphism ρ from ([F, F] R)/[R, F] to ([F , F ] R )/[R , F ] which ( / , , ) is induced by a morphism τ R τ IK from EF to EF and also a unique homo- morphism ρ from ([F , F ] R )/[R , F ] to ([F, F] R)/[R, F] which is induced 408 10 Group Extensions and Schur Multiplier

by a morphism (τ /R , τ , IK ) from EF to EF . Thus, ((τ ◦ τ)/R, τ ◦ τ, IK ), and ( / , , ) IF R IF IK are both morphisms from EF to itself and so they induce same homo- morphisms from ([F, F] R)/[R, F] to itself. This means that ρ ◦ ρ is a homomor- ([ , ] )/[ , ] ( / , , ) phism from F F R R F to itself which is induced by IF R IF IK . Hence

ρ ◦ ρ is the identity map on ([F, F] R)/[R, F]. Similarly ρ ◦ ρ is the identity map on ([F , F ] R )/[R , F ]. This shows that ρ and ρ are isomorphisms.   Since the group ([F, F] R)/[R, F] is independent of a particular choice of the presentation of K, we have right to have the following definition. Definition 10.3.17 Let

i ν 1 −→ R → F → K −→ 1.  be a free presentation of a group K. Then ([F, F] R)/[R, F] is called the Schur Multiplier of K, and it is denoted by M(K). Corollary 10.3.18 For finite groups K, M(K) is finite, and it is isomorphic to H2(K, C). If order of K is n, then the order of each element of M(K) divides n.

Proof Follows from Corollary10.3.14 and Proposition10.3.12.  Corollary 10.3.19 The Schur multiplier M defines a co-variant functor from the category of groups to the category of abelian groups.

Proof Let EK denote the standard free multiplication presentation of K.Morepre- cisely, i μ EK ≡ 1 −→ RK → FK → K −→ 1, where FK is the free group on K, μ the unique homomorphism from FK to ( ) = K induced by IK and RK the kernel of μ. Then the Schur multiplierM K (RK [FK , FK ])/[RK , FK ].Letλ be a homomorphism from K to L. Then from the Proposition 10.3.15, λ induces a unique homomorphism M(λ) from M(K) = (RK [FK , FK ])/[RK , FK ] to M(L) = (RL [FL, FL])/[RL, FL]. Further, if η is a homomorphism from the group L toagroupU, M(η) ◦ M(λ) is the unique homo- morphism which is induced by η ◦ λ. Hence M(η ◦ λ) = M(η) ◦ M(λ). Clearly, M(IK ) = IM(K).  Proposition 10.3.20 Let

i μ E ≡ 1 −→ R → F → K −→ 1

{ , ,..., } be a free presentation of a finite group K, where F is free group on a set x1 x2 xn consisting of n elements. Then, M(K) = (R [F, F])/[R, F] is the finite tor- sion subgroup of R/[R, F], and the torsion-free part (R/[R, F])/((R [F, F])/ [R, F]) ≈ R/(R [F, F]) is the free abelian of rank n. In turn, R/[R, F] is iso- morphic to the direct sum of M(K) and R/(R [F, F]). 10.3 Central Extensions, Schur Multiplier 409

Proof Since R/[R, F] is contained in the center of F/[R, F], it is abelian. Also F/[F, F] is free abelian of rank n. Further R/(R [F, F]) is isomorphic to R[F, F]/ [F, F] which is a subgroup of F/[F, F]. Since subgroup of a free abelian group is a free abelian group, R/(R [F, F]) is a free abelian group of rank at the most n.Also (F/[F, F])/(R[F, F]/[F, F]) ≈ F/R[F, F]. Hence (F/[F, F])/(R[F, F]/[F, F]) is finite. This means that R[F, F]/[F, F] and so also R/(R [F, F]) is free abelian of rank n. Next, by the Corollary10.3.18, M(K) = (R [F, F])/[R, F] is a finite subgroup of R/[R, F] such that (R/[R, F])/((R [F, F])/[R, F]) ≈ R/(R [F, F]) is free abelian. This shows that M(K) is a torsion part of R/[R, F] and R/(R [F, F]) is torsion-free part of R/[R, F]. It also follows that R/[R, F] is direct sum of M(K) and R/(R [F, F]). 

Corollary 10.3.21 Let

i μ E ≡ 1 −→ R → F → K −→ 1 be a free presentation of a finite group K, where F is free group on a set {x1, x2,...,xn} consisting of n elements, and R the normal subgroup of F generated as normal subgroup by a set {w1,w2,...,wr} consisting of r relators. Suppose that m is the minimum number of generators for M(K). Then r ≥ n + m. Equivalently, any set of generators of M(K) contains at least r − n elements.

Proof Since R is generated as a subgroup by the set {w1,w2,...,wr} and its con- −1 jugates and wi[R, F]=wwiw [R, F] for all w ∈ F, it follows that R/[R, F] is generated by the set {w1[R, F],w2[R, F],...,wr[R, F]}. From Proposition10.3.20, it follows that R/[R, F] is generated by at least n + m elements. Thus, r ≥ n + m. 

Corollary 10.3.22 Let K is a finite group having a presentation with generating set {x1, x2,...,xn}, and the set {w1,w2,...,wr} as irreducible set of defining relations. Then r ≥ n. If r = n, then the Schur multiplier M(K) is trivial. Further if r = n + 1, then M(K) is cyclic. If r = n + 2, then M(K) is either a finite cyclic group, or it is ap− group which is direct product of two cyclic groups.

Proof From the above corollary, r ≥ n.Ifr = n, then the minimum number m for generating set of M(K) is 0. Hence M(K) is trivial. Suppose that r = n + 1, then the minimum number m for generators of M(K) is at most 1 and so M(K) is finite cyclic. Suppose that r = n + 2. Then the minimum number m for generators for M(K) is at most 2. Since M(K) is a finite abelian group, it is a direct product finite cyclic groups of prime power orders. Since direct product of cyclic groups of co-prime orders is a cyclic group, it follows that M(K) is either cyclic or else it is a p − group which is direct product of two cyclic p− groups. 

Example 10.3.23 If F is a free group, then

i I 1 −→ { e} → F →F F −→ 1. 410 10 Group Extensions and Schur Multiplier is a free presentation of F. Hence by the definition M(F) ={0}. In particular, Schur multiplier of an infinite cyclic group is trivial. Further,

i ν {0}−→mZ → Z → Zm −→ 1{0}. is a free presentation of Zm. As such, the Schur multiplier M(Zm) ={0}. Alter- 2  natively, using, Corollary10.3.18, M(Zm) ≈ H (Zm, C ) and then, using, Exam- 2  ple10.2.6, we see that H (Zm, C ) ={0}. Using the fundamental theorem of finite abelian groups, we can find the Schur multipliers of all finite abelian groups provided we have a formula which relates M(A × B), M(A) and M(B) for all finite abelian groups. This will follow in sequel.

4 Example 10.3.24 Consider the quaternion group Q8. It has a presentation < i, j; i , i2j−2, iji−1 = j−1 >. Indeed, i4 is derivable from the other two relators as follows. 2 2 −1 2 −1 −2 −2 4 i = ii i = ij i = j = i . Hence i = 1. Thus, Q8 has a presentation < i, j; i2j−2, iji−1 = j−1 > with two generators and two defining relations. As such, by Corollary10.3.22, M(Q8) is trivial. More generally, a generalized quaternion 2n n −2 −1 group Q4n of order 4n has a presentation < x, y; x , x y , yxy x >. It is easily seen that this is a group of order 4n.Herealsox2n is derivable from the other 2 relations as follows. We have xn = y2 = yy2y−1 = yxny−1 = x−n. Hence x2n = e. Thus, Q4n is a finite group generated by two elements with two defining relations. As such, by Corollary10.3.22, M(Q4n) is trivial.

Proposition 10.3.25 Let

i μ 1 −→ R → F → K −→ 1 be a free presentation of a finite group K, where F is a free group of rank n. Suppose that L is finite subgroup of R/[R, F] such that the quotient of R/[R, F] modulo L is generated by n elements. Then M(K) ≈ L.

Proof Let A be the torsion part of R/[R, F] and B the torsion-free part of R/[R, F]. Then B is free abelian of rank n.AlsoR/[R, F]≈A ⊕ B. Since L is finite, L ⊆ A. Thus, (R/[R, F])/L ≈ (A/L) ⊕ B.IfL = A, then (R/[R, F])/L cannot be generated by n elements. This shows that L is the Torsion subgroup of R/[R, F]. The result follows from Proposition10.3.20. 

Example 10.3.26 The Dihedral group D2n of order 2n is given by a presentation < x, y; xn, y2, yxy−1 = x−1 >. This is generated by two elements with three defin- ing relations. Thus, M(D2n) has to be cyclic. Every element of D2n has unique rep- i j resentation as x y , 0 ≤ i ≤ n − 1, 0 ≤ j ≤ 1, and so D2n is a group of order 2n. D2n has a free central extension given by

i μ 1 −→ R/[R, F] → F/[R, F] → D2n −→ 1, 10.3 Central Extensions, Schur Multiplier 411 where F is the free group on {x, y}, and R the subgroup of F generated by {xn, y2, yxy−1x} and its conjugates. As already described in Proposition10.3.20, R/[R, F] is in the center of F/[R, F]. The torsion-free part of R/[R, F] is a free abelian −1 group of rank 2, and its torsion part is M(D2n). Since uwu [R, F]=w[R, F] for all u ∈ F and w ∈ R,itfollowsthatR/[R, F] is an abelian subgroup of F/[R, F] which is generated by {xn[R, F], y2[R, F], yxy−1x[R, F]}. Denote xn[R, F], y2[R, F] and yxy−1x[R, F] by a, b and c respectively. Then R/[R, F] is generated by {a, b, c}. Now, a = xn[R, F]=yxny−1[R, F]=(yxy−1)n[R, F]=(yxy−1xx−1)n[R, F]= (yxy−1xx−1[R, F])n = (yxy−1x[R, F](x−1[R, F])n = (yxy−1x[R, F])nx−n[R, F]= cna−1. Thus a2 = cn. Suppose that n = 2m is even. Put d = a−1cm. Then d2 = e. Suppose that d = e. Then a = cm and so xn(yxy−1x)−m ∈[R, F]. Since [R, F]⊆R, it follows that xn is derivable from the relation yxy−1x. But the group given by a pre- sentation < x, y; y2, yxy−1 = x−1 > is infinite Dihedral group. Hence d is an element of order 2 in R/[R, F]. Also the quotient group of R/[R, F] modulo the subgroup < d > generated by d is generated by {b < d >, c < d >}. It follows from the above proposition that < d > is the torsion part of R/[R, F]. This shows that M(D4m) is the cyclic group of order 2. Next, suppose that n = 2m + 1 is odd. Now, put d = ac−m. Then d2 = c. Hence a, c ∈< d >. Thus, in this case, R/[R, F] is generated by {b, d}. Already, R/[R, F] is direct sum of M(D4m+2) with a free abelian group of rank 2. If M(D4m+2) is nontrivial, R/[R, F] can not be generated by two elements. Hence M(D4m+2) is trivial.

Example 10.3.27 Consider the group G having presentation

< x, y; x5, y3,(xy)2 >.   11 01 If we take x = (12345) and y = (152) in A ,orx = and y = 5 01 −10 in PSL(2, 5), then x5 = y3 = (xy)2 represent the identities in the respective groups. Also they generate the respective groups. As such there is a surjective homomorphism from G to A5, and also a surjective homomorphism from G to PSL(2, 5). Using the coset enumeration method of Coxeter and Todd, one finds that the order of G is 60 which is same as that of A5, and also that of PSL(2, 5). 5 3 2 It follows that < x, y; x , y ,(xy) > is presentation of A5, and also a presentation of PSL(2, 5). It also turns out that A5 is isomorphic to PSL(2, 5).LetF denote the free group on {x, y} and R the normal subgroup of F generated by {x5, y3,(xy)2}. To find the Schur multiplier, we need to find the Torsion part of R/[R, F]. Put a = x5[R, F], b = y3[R, F] and c = (xy)2[R, F]. Then R/[R, F] is a central subgroup of F/[R, F] generated by {a, b, c}. We find relations between a, b, and c in R/[R, F]. Indeed, we show that c30 = a12b20. Since c is in the center of F/[R, F], c2 = (xy)2[R, F](xy)2[R, F]=x(xy)2yxy[R, F]=x2yxy2xy[R, F]. Again inserting (xy)2 between x2 and y in the above expression, we find that c3 = x3y(xy2)2xy[R, F]. Repeating this process by putting (xy)2 in between x3 and y and again in turn putting (xy)2 in between x4 and y in the resulting expression, we find that c5 = x5y(xy2)4xy[R, F]=ay(xy2)4xy[R, F]. Since c and a commute with 412 10 Group Extensions and Schur Multiplier all the elements of F/[R, F], a−1c5y−1[R, F]=a−1y−1[R, F]c5 = (xy2)4xy[R, F]. Hence c5 = a(xy2)5[R, F]

Putting again (xy)2 in between x and y2 in the above expression, we get c10 = a(x2yxy3[R, F])5 = a(y3[R, F])5(x2yx[R, F])5 = ab5(x2yx[R, F])5 = ab5 (x2y(x3y)4x[R, F]. Further, since c, a, b commute with all elements of F/[R, F], we have (x2y(x3y)4[R, F]=y−1x−3(ab5)−1c10.Inturn

c10 = ab5(x3y[R, F])5

Again, iterating the same procedure, i.e., putting c in between x3 and y in the above expression, we find

c15 = ab5(x4yx2[R, F])5

Iterating finally, we get

c30 = a12b20

Thus, if we put p = a6b10c−15, then p2 is the identity of R/[R, F].Alsoifweput q = a2b3c−5 and r = ab2c−3, then R/[R, F]=< a, b, c > = < p, q, r >. Hence the quotient group (R/[R, F])/ < p > is generated by two elements. Since torsion- free part of R/[R, F] is a free abelian group of rank 2, < p > is the torsion part of R/[R, F]. This shows that the Schur multiplier of A5(PSL(2, 5)) is a group of order at most 2. Further, we have a central extension

i μ 1 −→ A → SL(2, 5) → PSL(2, 5) −→ 1,

( , ) { | ∈ , 2 = }= where A is the center of SL 2 5 . Thus A is the group αI2 α Z5 α 1 {I2, −I2}. This means that A is a group of order 2. Since SL(2, 5) is a perfect group, by the Proposition10.3.9, Hom(A, C) ≈ A is a subgroup of the Schur multiplier 2  H (PSL(2, 5), C ) of PSL(2, 5). Thus, the Schur multiplier M(PSL(2, 5)) ≈ M(A5) is a group of order 2.

Five-Term Exact Sequence For convenience, in all the group extensions of the type

α β 1 −→ H → G → K −→ 1,

α will be treated as inclusion map i. Thus, H is treated as a normal subgroup of G. Needless to say that there is no loss of generality. 10.3 Central Extensions, Schur Multiplier 413

Theorem 10.3.28 To every group extension E given by

i β E ≡ 1 −→ H → G → K −→ 1, there is an associated connecting homomorphism δ(E) from M(K) to H/[H, G], and in turn the five-term exact sequence

M(β) δ(E) i β M(G) → M(K) → H/[H, G] → Gab → Kab −→ 1, which is natural in the sense that given any extension E of H by K , and a morphism (μ/H, μ, ν) from E to E , the diagram M(β) δ(E) i β M(G) M(K) H/[H, G] Gab Kab 1

M(μ) M(ν) μ μ ν M(β ) δ(E ) i β M(G ) M(K ) H /[H ,G] Gab Kab 1 is commutative, where Gab = G/[G, G] and K/[K, K] are the abelianizers of G and K respectively.

Proof By the Corollaries 10.3.16 and 10.3.17, M is a functor from the category of groups to the category of abelian groups. Thus, it is sufficient to establish the five-term exact sequence with a choice of a free presentation of G and that of K.Let

i μ 1 −→ R → F → G −→ 1, be a free presentation of G.LetS = μ−1(H). Then R ⊆ S and we have a free presentation i βoμ 1 −→ S → F → K −→ 1, of K. Clearly, μ takes S to H, and indeed, [S, F] to [H, G]. In turn, we have a natural map δ(E) from M(K) ≈ (S [F, F])/[S, F] to H/[H, G] given by δ(E)(s[S, F]) = μ(s)[H, G]. This gives us a five-term sequence

M(β) δ(E) i β M(G) → M(K) → H/[H, G] → Gab → Kab −→ 1.

We prove the exactness of the sequence. Since β is surjective, β is surjective. Again, since βoi is the trivial map, image i ⊆ kerβ. Suppose that β(g[G, G]) =[K, K]. Then β(g) ∈[K, K]. Hence there is a u ∈[G, G] such that β(g) = β(u). In turn, there is a h ∈ H such that g = hu. Clearly, i(h[H, G]) = g[G, G]. This proves the exactness at Gab. 414 10 Group Extensions and Schur Multiplier   Next, since μ(s) ∈ H [G, G] for all s ∈ S [F, F], it follows that

i(δ(E)(s[S, F])) = i(μ(s)[H, G]) = μ(s)[G, G]=[G, G].  This shows that imageδ(E) ⊆ keri.Leth[H, G]∈keri. Then h ∈ H [G, G].This means that h ∈ μ(S [F, F]). It follows that h[H, G]∈imageδ(E). Finally, we prove the exactness at M(K).Letr[R, F]∈M(G), r ∈ R [F, F]. Then by defini- tion, δ(E)(M(β)(r[R, F])) = δ(E)(r[S, F]) = μ(r)[H, G]=[H, G],forμ(r) = 0. This shows that imageM(β) ⊆kerδ(E). Further, suppose that δ(E)(s[S, F]) μ(s) [H, G]=[H, G], where s ∈ S [F, F]. Then μ(s) ∈[H, G]=μ([S, F]). Hence there is a t ∈[S, F] such that μ(s) = μ(t). In turn, μ(st−1) = e. Thus, s = rt for some r ∈ R. But, then s[S, F]=rt[S, F]=r[S, F]=M(β)(r[R, F]). It follows that imageM(β) = ker(δ(E)). 

We give another interpretation of the group M(G) as the group of commutator relations. For any group G, the commutator operation [x, y]=xyx−1y−1 can be easily seen to satisfy the following relations called the trivial commutator relations. (i) [x, x]=e. (ii) [x, y][y, x]=e. (iii) [xyx−1, xzx−1][x, z][z, xy]=e. (iv) [xyx−1, xzx−1][z, y][yzy−1z−1, x]=e. The group M(G) can be viewed as the group of nontrivial commutator rela- tions in G. More precisely, consider the free group F(X) on X, where X = G × G − (G ×{e} {e}×G). We identify (x, e) and (e, x) with identity of F(X). From the universal property of a free group, we have a unique homomor- phism η from F(X) to G given by η((x, y)) =[x, y]=xyx−1y−1.Let (G) denote the normal subgroup of F(X) generated by the set of elements of the types (x, x), (x, y)(y, x), (xyx−1, xzx−1)(x, z)(z, xy) and (xyx−1, xzx−1)(z, y)(yzy−1 z−1, x). Clearly, (G) is contained in the kernel of η. As such, it induces a unique homomorphism denoted, again, by η from F(X)/ (G) on to the commutator sub- group [G, G] of G. The proof of the following theorem involves some computations which we leave and refer to the book “Schur Multiplier” by Karpilovski for the details of the computations. Theorem 10.3.29 The kernel of the above-described map η is the Schur multiplier M(G), and we have the natural short exact sequence

i η 1 −→ M(G) → F(X)/ (G) →[G, G]−→1.

Thus, M(G) can be viewed as the group of commutator relations in G modulo the trivial commutator relations. Definition 10.3.30 The group F(X)/ (G) introduced above is called the non- abelian exterior power of G, and it is denoted by G ∧ G. 10.3 Central Extensions, Schur Multiplier 415

Corollary 10.3.31 If G is an abelian group, then M(G) ≈ G ∧ G. 

Tensor Product and Exterior Power of Groups Let K and L be groups. Let G be another group. A map η from K × L to G is called a bi-multiplicative map,if (i) η(kk , l) = η(k, l)η(k , l) and (ii) η(k, ll ) = η(k, l)η(k, l ) for all k, k ∈ K and l, l ∈ L. Note that if η is a bi-multiplicative map, then η(e, l) = e = η(k, e) for all k ∈ K and l ∈ L. Proposition 10.3.32 For any pair of groups K and L, there is a pair (K ⊗ L, η), where K ⊗ L is a group with η a bi-multiplicative map from K × LtoK⊗ Lwhich is universal in the sense that for any pair (G , η ) with η a bi-multiplicative map from K × LtoG , there is a unique homomorphism μ from K ⊗ LtoG such that μ ◦ η = η .

Proof Take K ⊗ L to be the group with presentation < X; R >, where the generating set X ={(k, l) ∈ K × L | k = e = l} and the set R of relators is given by R = {(kk , l)((k , l))−1((k, l))−1,(k, ll )((k, l ))−1((k, l ))−1 | k, k ∈ K and l, l ∈ L}. Thus, K ⊗ L = F(X)/H, where H is the normal subgroup generated by R.The map η is given by η(k, l) = (k, l)H. We denote η(k, l) by k ⊗ l. Clearly, η is a bi- multiplicative map, that is, kk ⊗ l = (k ⊗ l)(k ⊗ l) and also k ⊗ ll = (k ⊗ l)(k ⊕ l ).Let(G , η ) with η a bi-multiplicative map from K × L to G be an other pair. From the universal property of free group, we have a unique homomorphism χ from F(X) to G such that χ(k, l) = η (k, l). The supposition that η a bi-multiplicative map from K × L to G ensures that χ takes the relators R to e. This means that H is contained in the kernel of Kerχ. By the fundamental theorem of homomorphism, χ induces a unique homomorphism μ from K ⊗ L to G such that μ(η(k, l)) = μ((k, l)H) = χ(k, l) = η (k, l). 

Proposition 10.3.33 The pair (K ⊗ L, η) introduced above is unique in the sense that if (G , η ) is another such pair, then there is a unique isomorphism μ from K ⊗ L to G such that μ ◦ η = η .

Proof From the universal property of the pair (K ⊗ L, η) established in the above proposition, there is a unique homomorphism μ from K ⊗ L to G such that μ ◦ η = η . Since the pair (G , η ) is also assumed to have the same universal property, there is a unique homomorphism ν from G to K ⊗ L such that ν ◦ η = η. Thus, ν ◦ μ and IK⊗L are both homomorphisms from K ⊗ L to itself such that (ν ◦ μ) ◦ η = η = IK⊗L ◦ η. From the universal property of the pair (K ⊗ L, η), ν ◦ μ = IK⊗L.

Reversing the role of (K ⊗ L, η) and (G , η ), we get that μ ◦ ν = IG . Thus, μ is an isomorphism with the required property. 

Definition 10.3.34 The pair (K ⊗ L, η) is called the tensor product of K and L. By the abuse of language we also say that K ⊗ L is a tensor product of K and L.The image η((k, l)) is denoted by k ⊗ l. 416 10 Group Extensions and Schur Multiplier

Thus, (i) (kk ⊗ l) = (k ⊗ l)(k ⊗ l), (ii) (k ⊗ ll ) = (k ⊗ l)(k ⊗ l ). In turn, e ⊗ l = e = k ⊗ e and k−1 ⊗ l = (k ⊗ l)−1 = k ⊗ l−1. Proposition 10.3.35 Let η be a bi-multiplicative map from K × L to G. Then the image η(K × L) of η generates an abelian subgroup of G.

Proof For k, k ∈ K and l, l ∈ L,

η(kk , ll ) = η(kk , l)η(kk , l ) = η(k, l)η(k , l)η(k, l )η(k , l )

On the other hand,

η(kk , ll ) = η(k, ll )η(k , ll ) = η(k, l)η(k, l )η(k , l)η(k , l ).

Comparing, η(k , l)η(k, l ) = η(k, l )η(k , l) for all k, k ∈ K and l, l ∈ L.This means that the elements of η(K × L) commute pairwise. 

Corollary 10.3.36 The tensor product K ⊗ L of any two group K and L is abelian group, and it is isomorphic to Kab ⊗ Lab, where Kab denote the abelianizer K/[K, K] of K.

Proof Since K ⊗ L is generated by {k ⊗ l | k ∈ K, l ∈ L}, it follows from the above proposition that K ⊗ L is abelian. Let us denote the coset k[K, K] by k. Define a map η from K × L to Kab ⊗ Lab by η(k, l) = k ⊗ l. Evidently, η is a bi-multiplicative map. As such, it induces a unique homomorphism η from K ⊗ L to Kab ⊗ Lab subject to η(k ⊗ l) = k ⊗ l which is clearly surjective. We show that it is bijective by constructing its inverse. Now, [k, k ]⊗l = kk k−1k −1 ⊗ l = (k ⊗ l)(k ⊗ l)(k−1 ⊗ l)(k −1 ⊗ l) = e for all k, k ∈ K and l ∈ L. Since every element of [K, K] is product of commutators, and taking tensor product is bi-multiplicative, it follows that u ⊗ l = e for all u ∈[K, K]. Similarly, k ⊗ v = e for all k ∈ K and l ∈[L, L].It follows that k = k implies that k ⊗ l = k ⊗ l for all l ∈ L and also l = l implies that k ⊗ l = k ⊗ l for all k ∈ K. Thus, k = k and l = l implies that k ⊗ l = k ⊗ l . This ensures that we have a map χ from Kab × Lab to K ⊗ L defined by χ(k, l) = k ⊗ l. Clearly, χ is a bi-multiplicative map, and as such it induces a homomorphism χ from Kab ⊗ Lab to K ⊗ L given by χ(k ⊗ l) = k ⊗ l. Clearly, χ is inverse of η. 

It is evident from the above corollary that the theory of tensor product of groups reduces to the theory tensor product of abelian groups through their abelianizers. As such, we state few results which follow from the corresponding results on the tensor products of abelian groups (modules over Z) (refer to the Chap. 7 of the book). Corollary 10.3.37 K ⊗ L is isomorphic to L ⊗ K. 

Proposition 10.3.38 Let

α β 1 −→ H → G → K −→ 1 10.3 Central Extensions, Schur Multiplier 417 be an exact sequence, and L be a group. Then the sequence

α⊗I β⊗I H ⊗ L →L G ⊗ L →L K ⊗ L −→ 1 is exact.  Proposition 10.3.39 Let H, K, and L be groups. Then (H ⊕ K) ⊗ L is naturally isomorphic to (H ⊗ L) ⊕ (K ⊗ L).  Proposition 10.3.40 Let H, K, and L be groups. There is a tautological isomorphism from (H ⊗ K) ⊗ LtoH⊗ (K ⊗ L) which maps (h ⊗ k) ⊗ ltoh⊗ (k ⊗ l).  Proposition 10.3.41 Let H, K, and L be groups with L being abelian. Then there is a natural isomorphism from Hom(H, Hom(K, L)) to Hom(H ⊗ K, L).  We, further state few results without proof which can be used for some com- putations. For the proof we refer to the book “Schur Multiplier” by Karpilovski.

Proposition 10.3.42 Let H and K be groups. Then M(H ⊕ K) ≈ M(H) ⊕ M(K) ⊕ (H ⊗ K).  Thus, for finitely generated abelian group, M(A ⊕ B) ≈ A ⊗ B, and so the Schur multiplier of a finitely generated abelian group is easily determined. For free products, we have the following. Proposition 10.3.43 M(H  K) ≈ M(H) ⊕ M(K).  Exercises 10.3.1 Let K be a subgroup of a group G, and A be an abelian group which is a trivial 2 2 G-module. Show that we have a homomorphism res(G,K) from H (G, A) to H (K, A) 2 2 given by res(G,K)(f + B (G, A)) = f /K × K + B (K, A). The homomorphism res(G,K) is called the restriction homomorphism from G to K. Observe that if L is a subgroup of K, then res(K,L)ores(G,K) = res(G,L).

10.3.2 Let K be a subgroup of a group G of finite index n.Let={e = x1, x2,...,xn} be a right transversal to K in G. Given any element g ∈ G, for each xi ∈ S, there ( ) ∈  ∈ = is a unique element σxi g K, and a unique element xi g S such that xig ( )  2( , ) σxi g xi g.Letf be a 2 co-cycle in Z K A , where A is a trivial K-module. Show that the map f from G × G to A defined by

n ( ( ), ( )) f σxi x σxi xy i=1 is a 2 co-cycle in Z2(G, A). Show also that f ∈ B2(K, A) implies that f ∈ B2(G, A). 2 Deduce that we have a co-restriction homomorphism cores(K,G) from H (K, A) 2 2 2 to H (G, A) given by cores(K,G)(f + B (K, A)) = f + B (G, A). Show also that cores(K,G)o cores(L,K) = cores(L,G). 418 10 Group Extensions and Schur Multiplier

2 2 10.3.3 Let a = f + B (K, A) be an element of H (G, A). Show that (cores(K,G) n ores(G,K))(a) = a , where K is a subgroup of index n. 10.3.4 Let K be a normal subgroup of G of index n, and a ∈ H2(K, A). Show that n (res(G,K)o cores(K,G))(a) = a . 10.3.5 Use the fact that the group R has only one nonidentity element −1 of finite 2  2 order to show that H (G, R ) ≈ H (G, Z2) for any finite group G. 10.3.6 Let G be a finite group, and D be a divisible group. Show that H2(G, D) = H2(G, T(D)), where T(D) is the torsion part of D. Deduce that M(G) = H2 (G, Q/Z) for all finite groups G.

10.3.7 Compute Q8 ∧ Q8, and hence also M(Q8). 10.3.8 Use the five-term exact sequence associated to the extension

i ν 1 −→ { 1, −1} → Q8 → V4 −→ 1. to show that M(Q8) ={0}. 10.3.9 Compute P ∧ P, where P is a non-abelian group of order p3, p a prime. Hence compute M(P).

10.3.10 Compute the Schur multiplier of a non-abelian group of order pq, p, and q are primes.

10.3.11 Find the Schur multipliers of A4, and also of S4. 10.3.12 Let G be a finite nilpotent group. Show that the Schur multiplier of G is the direct products of Schur multipliers of its Sylow subgroups.

10.3.13 Let m, n, r be positive integers such that rn ≡ 1(mod m), (m, n) = 1 = (n, r − 1).LetG be a group having a presentation < {x, y}; | xm = 1 = yn = y−1xy−r >. Using the Tietze transformation (see Algebra 1), reduce the number of relators to 2, and then show that M(G) is trivial.

10.3.14 Show that there is a surjective homomorphism from M(GL(R)) to M(K1(R)).

10.4 Lower K-Theory Revisited

In Chap. 7, Sect. 7.4, we introduced the Grothendieck group K0(R) and the Whitehead group K1(R) of a ring R. Recall that K1(R) = GL(R)/E(R), where E(R) is the group generated by the elementary matrices. By the Whitehead lemma, E(R) is the commutator subgroup [GL(R), GL(R)] of GL(R). In this section, we introduce 10.4 Lower K-Theory Revisited 419

Milnor group K2(R) of a ring R which can be viewed in two ways: (i) The Schur multiplier M(E(R)) of the group of commutator relations among the elements of E(R) modulo the trivial commutator relations, and (ii) the group of relations among λ the transvections Eij modulo the group of trivial relations, viz., the group of Steinberg relations. We describe it in detail. Note: Usually, in the literature, an extension

i β E ≡ 1 −→ H → G → K −→ 1 is termed as extension of K, but we shall adhere to our terminology by calling it extension by K. Proposition 10.4.1 Let K be a perfect group in the sense that [K, K]=K, and

i β E ≡ 1 −→ H → G → K −→ 1 be a central extension by K. Then the commutator subgroup [G, G] is perfect, and  i β 1 −→ H [G, G] →[G, G] → K −→ 1 is also a central extension by K. Proof Since K is perfect, β([G, G]) =[K, K]=K. Thus, given any element a ∈ G, there is an element u ∈[G, G] such that β(a) = β(u). This means that every element in G is of the type hu for some h ∈ H and u ∈[G, G].Leta = hu, b = h u , where h, h ∈ H ⊆ Z(G), u, u ∈[G, G] be arbitrary elements of G. Then [a, b]=[u, u ]∈[[G, G], [G, G]]. This shows that [G, G]⊆[[G, G], [G, G]]. Hence [G, G] is perfect. The rest is evident.  Proposition 10.4.2 Let

i β E ≡ 1 −→ H → G → K −→ 1 be a central extension by K, where G is perfect. Let λ and μ be a homomorphisms from G to G inducing the morphisms (λ/H, λ, IK ) and (μ/H, μ, IK ) from E to a central extension E given by

i β E ≡ 1 −→ H → G → K −→ 1.

Then λ = μ. Proof Since G is perfect, the commutators generate G. Thus, it is sufficient to show that λ([a, b]) = μ([a, b]) for all a, b ∈ G. For each x ∈ G, β(λ(x)) = β(μ(x)), and so there is an element u(x) ∈ H such that λ(x) = u(x)μ(x).Now,λ([a, b]) = [λ(a), λ(b)]=[u(a)μ(a), u(b)μ(b)]=[μ(a), μ(b)]=μ([a, b]).  420 10 Group Extensions and Schur Multiplier

Definition 10.4.3 A central extension

α β K ≡ 1 −→ H → U → K −→ 1 is called a universal central extension by K if given any central extension

α β E ≡ 1 −→ L → G → K −→ 1 by K, there is a unique homomorphism φ fromU to G inducing a morphism (ξ, φ, IK ) from K to E.

Proposition 10.4.4 Universal central extension by K is unique up to equivalence.

Proof Let

≡ −→ →α →β −→ K 1 H U K 1 be another universal central extension by K. Then there is a unique homomorphism ( , , ) φ from U to U inducing a morphism ξ φ IK from K to K , and there is a unique ( , , ) homomorphism φ from U to U inducing a morphism ξ φ IK from K to K .

But, then we have homomorphisms φ oφ and IU inducing morphisms (ξ oξ, φ oφ, IK ) and (IH , IU , IK ) respectively. From the universal property of K , φ oφ = IU .Sim-

= ilarly, using the universal property of K , φoφ IU . This shows that K is  equivalent to K . Proposition 10.4.5 If

α β K ≡ 1 −→ H → U → K −→ 1 is a universal central extension, then U is perfect (U =[U, U]).

Proof Suppose that U is not perfect. Then U/[U, U] is a nontrivial abelian group. Consider the direct product extension

i p 1 −→ U/[U, U] →1 U/[U, U]×K →2 K −→ 1 by K, where i1 is the first inclusion, and p2 is the second projection. Clearly, this is a central extension by K. Further, the map (ν, β) defined by (ν, β)(u) = (u[U, U], β(u)), and the map(0, β) defined by (0, β)(u) = ([U, U], β(u)) are two distinct homomorphisms from U to U/[U, U]×K which induce morphism from K to the given direct product extension. Hence K can not be a universal central extension. 

Since homomorphic image of a perfect group is a perfect group, we have 10.4 Lower K-Theory Revisited 421

Corollary 10.4.6 If K admits a universal central extension by K, then K is a perfect group.  Conversely, Proposition 10.4.7 Every perfect group K admits (of course, a unique) universal central extension by K. Proof Suppose that K is perfect. Let

i η FK ≡ 1 −→ R → F → K −→ 1 be a free presentation of K. We have a central extension

i η EK ≡ 1 −→ R/[R, F] → F/[R, F] → K −→ 1 by K. Since K is perfect, [F/[R, F], F/[R, F]] = [F, F]/[R, F] is perfect (by the Proposition10.4.1), and we have a central extension  i η K ≡ 1 −→ R [F, F]/[R, F] →[F, F]/[R, F] → K −→ 1 by K. We prove that this is a universal central extension by K.Let

i β E ≡ 1 −→ H → G → K −→ 1 be a central extension by K. Since F is free, there is a homomorphism φ from F to G inducing a morphism (φ/R, φ, IK ) from FK to E. Since E is a central extension by K, it induces a morphism (φ/R, φ, IK ) from EK to E, which in turn, induces a morphism from K to E. Since [F, F]/[R, F] is perfect, by the Proposition10.4.2 such a morphism is unique.  Corollary 10.4.8 If K is perfect, then K ∧ K is also a perfect group, and it is the universal central extension of M(K) by K. More precisely,

i c 1 −→ M(K) → K ∧ K → K −→ 1 is a universal central extension, where c is the commutator map given by c(x ∧ y) = [x, y].  Proposition 10.4.9 A central extension

i β ≡ 1 −→ H → U → K −→ 1 by K is a universal central extension by K if and only if U is perfect and every central extension by U splits. 422 10 Group Extensions and Schur Multiplier

Proof Suppose that given extension is a universal central extension. Then by Proposition10.4.5, U is perfect. Let

i δ E ≡ 1 −→ L → G → U −→ 1 be a central extension by U. Then consider the extension

i βoδ E ≡ 1 −→ ker(βoδ) → G → K −→ 1 by K. We first show that this is a central extension by K.Letg ∈ ker(βoδ).We need to show that g ∈ Z(G). Since (β(δ(g))) = e, δ(g) ∈ kerβ = H ⊆ Z(U). Let x ∈ G. Since δ(g) ∈ Z(U), δ(xgx−1) = δ(x)δ(g)δ(x)−1 = e. This shows that xgx−1 ∈ L ⊆ Z(G). Again, since Z(G) is a characteristic subgroup of G,itfollows that g ∈ Z(G). In turn, it follows that E is a central extension by K. Since is universal central extension, there is a unique homomorphism φ from U to G such that (βoδ)oφ = β. This shows that (δoφ) and IU are homomorphisms from U to U which induce morphisms from to itself. Since is universal central extension, it follows that (δoφ) = IU . This shows that E is split exact sequence. Conversely, suppose that U is perfect and every central extension by U splits. Let

i δ E ≡ 1 −→ P → G → K −→ 1 be a central extension by K. In the light of the Proposition10.4.2, it is sufficient to show the existence of a homomorphism η from U to G inducing a morphism from

to E . Consider the subgroup U ×K G ={(u, g) | β(u) = δ(g)} of U × G.We have the extension

i1 p1 E ≡ 1 −→ H → U ×K G → U −→ 1 which is clearly a central extension by U. From our hypothesis, the sequence splits. Let t be an splitting. Then there is a homomorphism φ from U to G such that t(u) = (u, φ(u)) ∈ U ×K G. But, then β(φ(u)) = u. Thus, φ induces a morphism from to E . 

Let R be a commutative ring. Recall that the group E(R) is perfect. As such, we have the universal central extension

i c 1 −→ M(E(R)) → E(R) ∧ E(R) → E(R) −→ 1 by E(R). The group E(R) ∧ E(R) represent the group of commutator relations in the group E(R) modulo the trivial commutator relations. We shall have another interpretation of this group. We have the following definition. 10.4 Lower K-Theory Revisited 423

Definition 10.4.10 The group M(E(R)) is called the Milnor group of the ring R, and it is denoted by K2(R).

We shall have another way to see the group K2(R).Iff is a homomorphism from a ring R to a ring R , then f induces a natural homomorphism E(f ) from E(R) to E(R ) given by E(f )[aij]=[bij], where bij = f (aij). Clearly, E(gof ) = E(g)oE(f ) and E(IR) = IE(R). Further, since M defines a functor from the category of groups to the category of abelian groups, it follows that K2 is a functor from the category of rings to the category of abelian groups in the sense that if f is a homomorphism from a ring R to R , it induces a homomorphism K2(f ) from K2(R) to K2(R ) such that (i) ( ) = ( ) ( ) ( ) = K2 gof K2 g oK2 f and (ii) K2 IR IK2(R). The following natural exact sequence relates K1 and K2 functors.

i c ν 1 −→ K2(R) → E(R) ∧ E(R) → GL(R) → K1(R) −→ 1, where c represents the commutator map given by c(x ∧ y) =[x, y]. × , ≥ λ Recall that the n n n 3 elementary matrices Eij with entries in a ring R with identity satisfy the following relations termed as Steinberg relations. λ μ = λ+μ (i) Eij Eij Eij . [ λ, μ ]= = = (ii) Eij Ekl In for i l and j k. [ λ, μ]= λμ, = (iii) Eij Ejl Eil i l. [ λ, μ ]= −μλ (iv) Eij Eki Ejk . For each n ≥ 3, let St(n, R) denote the group generated by the set

{ λ | ≤ ≤ , ≤ ≤ , ∈ } xij 1 i n 1 j n λ R subject to the relations μ = λ+μ (i) xijλxij xij . [ λ, μ ]= = = (ii) xij xkl e for i l and j k. [ λ, μ]= λμ, = (iii) xij xjl xil i l. [ λ, μ ]= −μλ (iv) xij xki xjk .

For each n, we have the natural surjective homomorphism φn from St(n, R) to E(n, R). Clearly, St(n, R) is a subgroup of St(n + 1, R) in a natural way, and we have a chain

St(3, R) ⊆ St(4, R) ⊆···⊆St(n, R) ⊆ St(n + 1, R) ⊆······ of groups. The union St(R) of the chain is a group called the Steinberg group of the ring R. Note that the maps φn respect the inclusion maps in the sense that inoφn = φn+1oin, where in are the respective inclusion maps. In turn, in limit, φn induces a surjective homomorphism φ from St(R) to E(R). 424 10 Group Extensions and Schur Multiplier

Theorem 10.4.11 The short exact sequence

i φ 1 −→ Kerφ → St(R) → E(R) −→ 1 is a universal central extension by E(R). Before proceeding to prove the above theorem, let us have a corollary.

Corollary 10.4.12 We have natural isomorphisms kerφ ≈ K2(R) ≈ M(R) and St(R) ≈ E(R) ∧ E(R). 

Thus, K2(R) can be viewed as the group of nontrivial relations satisfied by the λ elementary matrices Eij . Further, we have the exact sequence

i φ ν 1 −→ K2(R) → St(R) → GL(R) → K1(R) −→ 1.

Lemma 10.4.13 Kerφ is the center Z(St(R)) of St(R). Proof Let a ∈ Z(St(R)). Then φ(a) ∈ Z(E(R)). Since E(R) is center less (a matrix commutes with all elementary matrices if and only if it is a scalar matrix), φ(a) is identity. This means that a ∈ Kerφ. Thus, Z(St(R)) ⊆ Kerφ. Suppose that φ(a) = e. We need to shows that a ∈ Z(St(R)).LetCn denote the subgroup of St(R) generated λ = = ∈ N by the elements of the type xij , where i n j. Clearly, there is a n such that a ∈ Cn.Fixan ∈ N such that a ∈ Cn.LetXn denote the subgroup of St(R) generated λ = by the elements of the types xin, where i n. Since any two such elements commute (see the Steinberg relations), Xn is abelian. Further, any nonidentity element x of Xn is expressible as x = xλ1 xλ2 ···xλr , i < i < ··· < i , i1n i2n ir n 1 2 r where i = n.Now,φ(x) = Eλ1 Eλ2 ···Eλr is the matrix all of whose diagonal k i1n i2n ir n th th , ≤ entries are 1, ik row and n column entry is λk k r and the rest of the entries are 0. This shows that the representation of an element x as given above is unique, and φ is an injective homomorphism when restricted to Xn. Similarly, let Yn denote ( ) λ = the subgroup of St R generated by the elements of the type xnj, where j n. Then λ = = Yn is also abelian, and φ restricted to Yn is injective. Consider xij , where i n j. μ ∈ λ μ −λ λμ μ = μ Then for any xkn Xn, xij xknxij is xin xkn,ifj k and xkn, otherwise. This means that λ −λ ⊆ λ −λ ⊆ xij Xnxij Xn. Similarly, xij Ynxij Yn. It follows Cn is contained in the normalizer −1 −1 of Xn as well as in the normalizer of Yn. Thus, aXna ⊆ Xn, and also aYna ⊆ Yn. −1 −1 Let u ∈ Xn. Then φ(aua ) = φ(a)φ(u)φ(a) = φ(u). Since phi is injective when −1 restricted to Xn,itfollowsthataua = u for all u ∈ Xn. Thus, a commutes with each element of Xn. Similarly, a commutes with all elements of Yn. It follows that λ μ = = λ = a commutes with xin and also with xnj for all i n j. Consider xkl, where k n = λ =[λ , μ ] λ and l n. Then xkl xkn xn1 , and so a commutes with xkl also. This means that a commutes with all the generators of St(R). Hence a ∈ Z(St(R)).  10.4 Lower K-Theory Revisited 425

In the light of the Proposition 10.4.9, to complete the proof of the Theorem 10.4.11, it is sufficient to establish the following Lemma. Lemma 10.4.14 Every central extension by St(R) splits.

Proof Let i ξ 1 −→ H → G → St(R) −→ 1 be a central extension by St(R). To show the existence of a splitting, it is sufficient { λ ∈ | ∈ } to show the existence of a set sij G λ R of elements of G which satisfy ( λ) = λ the Steinberg relations and ξ sij xij .Lett be a section of the extension. Then ([ ( λ ), ( 1 )]) =[( ( λ )), ( ( 1 )]=[λ , 1 ]= λ , , ξ t xik t xkj ξ t xik ξ t xkj xik xkj xij , where i j k are distinct. ( λ ) = λ ( λ ) Further, if t is another section of the extension, then t xik uikt xik for some λ ∈ ⊆ ( ) [ ( λ ), ( 1 )]=[ ( λ ), ( 1 )] uik H. Since H Z G ,itfollowsthat t xik t xkj t xik t xkj . Thus, [ ( λ ), ( 1 )] t xik t xkj is independent of the choice of a section. In fact, using trivial commu- [ ( λ, ( μ )]([ ( λ , ( μ )])−1 ∈ ( ) tator relations, and observing the fact that t xij t xkl t xip t xql Z G for , = , , = [ ( λ, ( μ)]([ ( λ , ( μ )])−1 ∈ ( ) , , = all i j k l p q and also t xij t xjl t xip t xpl Z G for all i j p q, [ ( λ ), ( 1 )] , = , = it can be shown that t xik t xkj is also independent of k i k j k.Take { λ =[( λ ), ( 1 )] sij t xik t xkj . Using the basic commutator relations, it may be further verified { λ}  that sij respects the Steinberg relations. Bibliography

1. Artin, M.: Algebra. Pearson Education (2008) 2. Artin, E.: Galois Theory, New edn. Dover Publication (1998) 3. Birkoff, G., MacLane, S.: A Survey of Modern Algebra, 3rd edn. Macmillan, New York (1965) 4. Curtis, R.: Representation Theory of Finite Groups and Associative Algebras. New edn, AMS (2006) 5. Curtis, M.L.: Matrix Groups. Springer (1984) 6. Fulton, H.: Representation Theory. GTM, Springer, Berlin (1999) 7. Halmos, P.R.: Linear Algebra. UTM, Springer, Berlin (1958) 8. Herstein, I.N.: Topics in Algebra, 2nd edn. Wiley, New York (1975) 9. Hoffman, Kunze: Linear Algebra, 2nd edn. Prentice-Hall (1998) 10. Hungerford, T.W.: Algebra, 8th edn. GTM, Springer, Berlin (2003) 11. Jacobson, N.: Basic Algebra I. II. Freeman, San Francisco (1980) 12. Lang, S.: Algebra, 2nd edn. Addison-Wesley, Boston, MA (1965) 13. Morandi, P.: Field and Galois Theory. GTM, Springer, Berlin (1996) 14. Robinson, D.J.S.: A Course in the Theory of Groups, 2nd edn. Springer (1995) 15. Rotman, J.J.: An Introduction to the Theory of Groups, 4th edn. GTM, Springer, Berlin (1999) 16. Saikia, P.: Linear Algebra. Pearson (2009) 17. Serre, J.P.: Linear Representations of Finite Groups. GTM, Springer, Berlin (1996) 18. Suzuki, M.: Group Theory I and II. Springer (1980)

© Springer Nature Singapore Pte Ltd. 2017 427 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0 Index

A Character afforded by representation, 351 Abel–Ruffini, 327 Characteristic of a field, 6 Abelian extension, 311 Characteristic polynomial, 150 Abstract Kernel, 378 Characteristic value, 150 Adjoint of a linear transformation, 117 Characteristic vector, 150 Adjoint of a matrix, 133 Character ring, 352 Affine lines, 114 Coefficient matrix, 41 Affine planes, 114 Co-factor matrix, 133 Algebra, 34 Column rank, 43 Algebraic closure, 270, 288 Column space, 43 Algebraic element, 266, 267 Companion matrix, 215 Algebraic extension, 269 Complete Group, 383 Algebraically closed field, 288 Composite field extension, 275 Algebraically dependent, 266 Congruent bilinear forms, 178 Alternating map, 137, 140 Congruent reduction, 65 Angle, 103 Connecting homomorphism, 400 Artin-Schreier, 314 Consistent, 41 Augmented matrix, 41 Constructible number, 318 Constructible points, 318 Coupling, 378 B Cramer’s rule, 134 Baer Sum, 389 Crossed homomorphism, 316 Basis, 18 Cycles, 136 Bessels inequality, 106 Cyclic extension, 311 Bilinear form, 176 Cyclic module, 205 Block multiplication, 39 Cyclotomic extension, 311 Brauer representation, 331 Burnside, 341, 343 Burnside Theorem, 359 D Dedekind domain, 250 Dedekind theorem, 279 C Degree of extension, 265 Cardano solution, 327 Degree of separability, 295 Cauchy–schwarz, 100 Determinant of a matrix, 131 Central extension, 399 Diagonalisable, 153 Chain conditions, 229 Dimension, 19 Character, 279 Direct sum of modules, 200 © Springer Nature Singapore Pte Ltd. 2017 429 R. Lal, Algebra 2, Infosys Science Foundation Series in Mathematical Sciences, DOI 10.1007/978-981-10-4256-0 430 Index

Direct sum of spaces, 23 Gaussian elimination, 44 Direct sum representation, 346 General linear group, 78 Divisible group, 247 Geometry of orthogonal transformation, 167 Dual basis, 83 Gram–Schmidt process, 107 Dual space, 80 Grothendieck group, 259 Group algebra, 336 Group of rigid motions, 123 E Eigenspace, 155 Eigenvalue, 150 H Eigenvector, 150 Hermitian linear transformation, 118 Elementary matrices, 54 Hermitian matrix, 37 Elementary operations, 44 Hermitian conjugate, 35 Equivalence of extensions, 370 Hilbert Basis Theorem, 232 Equivalent system, 43 Hilbert Satz 90, 317 Euclidean inner product, 98 Homogeneous system, 42 Euclidean metric, 102 Hyperbolic metric, 175 Euclidean n-space, 8 Hyperplane, 28 Even permutation, 139 Hyperplane reflection, 168 Exact sequence, 238, 368 Exponent, 208 Extension of a group by a group, 368 I Exterior algebra, 258 Idempotent linear transformation, 94 Exterior power, 256 Induced character, 361 Exterior power representation, 346 Induced representation, 361 Injective module, 244 Inner product, 97 F Inner product space, 97 Factor system associated to an extension, Inseparable extension, 294 373 Invariant subspaces, 150 Factor systems, 373 Inverse of a matrix, 38 Ferrari solution, 328 Irreducible representation, 346 Field, 1 Isometry, 120 Field extension, 265 Finitely generated space, 13 Five lemma, 238 J Five lemma for groups, 370 Jacobson Density, 340 Five-term exact sequence, 413 Jordan block, 217 Fixed field, 276 Jordan–Chevalley, 220 Free abelian groups, 205 Jordan–Chevalley decomposition, 221 Free central extension, 401 Free module, 203 Free variable, 45 K Frobenious reciprocity, 362 K-automorphism, 276 Function field, 272 Kernel of a linear transformation, 74 Fundamental Exact Sequence, 400 K-isomorphism, 276 Fundamental theorem of algebra, 308 Fundamental theorem of Galois theory, 305 Fundamental theorem of homomorphism, 75 L Length of a vector, 100 Linear combination, 13 G Linear functional, 80 Galois extension, 276 Linear independence, 14 Galois group, 276 Linear representation, 346 Index 431

Linear space, 9 Orthogonal compliment, 115 Linear transformation, 73 Orthogonal group, 110 Local ring, 250 Orthogonal matrix, 109 Lorentz Group, 193 Orthogonal Projection, 115 Lorentz inner product, 193 Orthogonal reduction, 186 Lorentz matrix, 193 Orthogonal sum, 168 Lorentz Transformation, 193 Orthogonality of vectors, 104 LU factorization, 58 Orthogonality relation, 352, 354 Orthonormal basis, 106 Orthonormal set, 106 M Mackey irreducibility criteria, 366 Maschke Theorem, 336 P Matrices, 31 Parallelogram Law, 104 Matrix addition, 32 Perfect field, 300 Matrix multiplication, 34 Period of an element, 206 Matrix of transformation, 87 Permutation, 135 Matrix representation map, 85 Pivot, 45 Milnor group, 419, 423 Pivot variable, 45 Minimum polynomial, 214 Polar decomposition, 164 Minimum polynomial of a linear transforma- Positive bilinear form, 181 tion, 94 Positive definite symmetric matrix, 126 Minkowski space, 8 Prime fields, 5 Modular representation, 331 Primitive element, 270 Multilinear map, 139 Principal minors, 151 Projective General Linear Group, 399 Projective module, 243 N Projective representation, 399 Negative bilinear form, 181 Proper value, 150 Nilpotent endomorphism, 91 Proper vector, 150 Noether equation, 316 Purely inseparable extension, 303 Noetherian module, 230 Pythagoras Theorem, 104 Noetherian ring, 230 Non-abelian exterior power, 414 Nondegenerate bilinear form, 180 Q Nonsingular bilinear form, 180 Quadratic form, 186 Nonsingular matrix, 37 Quotient modules, 201 Norm of a field extension, 315 Normal basis theorem, 309 Normal closure, 293 R Normal extension, 284, 291 Radical extension, 324 Normal form, 60 Rank, 43, 50 Normal transformation, 118 Rank of a bilinear form, 180 Null space, 42, 74 Rank of a linear transformation, 83 Nullity, 42 Rational canonical form, 215 Nullity of a linear transformation, 83 Reduced row echelon form, 45 Number field, 250 Reflection about a hyperplane, 129 Reflexive, 81 Restricted Burnside Conjecture, 344 O Rigid motion, 122 Obstruction, 392 Root system, 129 Obstruction map, 394 Rotation group, 174 Odd permutation, 139 Row rank, 43 Ordered basis, 21 Row space, 43 432 Index

S Subspace generated by a set, 13 Scalar matrix, 36 Sylvester law, 183 Schreier extensions, 370 Symmetric bilinear form, 181 Schur, 344 Symmetric matrix, 36 Schur Lemma, 332 Symmetric power of a representation, 346 Schur multiplier, 408 System of linear equations, 40 Schur-Zasenhauss, 390 Second co-homology group, 386 Self adjoint, 37, 118 T Semi-direct product, 382 Tensor algebra, 256 Semi-simple linear transformation, 153 Tensor product, 251 Semi-simple module, 334 Tensor product representation, 346 Semi-simple ring, 334 Tensors, 256 Separable closure, 296 Third co-homology group, 394 Separable element, 294 Torsion-free module, 197 Separable extension, 284 Torsion module, 197 Separable polynomial, 294 Trace, 151 Set of generators, 13 Trace form, 95 Shortest distance, 112 Trace of field extension, 315 Signature of a real symmetric matrix, 184 Transcendental, 267 Signature of a symmetric bilinear form, 184 Transpose of a linear transformation, 80 Similar matrices, 88 Transpose of a matrix, 34 Simple extension, 270 Transpositions, 136 Simple module, 331 Transvections, 55 Singular value, 165 Triangulable matrix, 157 Singular value decomposition, 166 Skew-Hermitian linear transformation, 118 Skew-Hermitian matrix, 37 U Skew-symmetric bilinear form, 184 Unitary group, 110 Skew symmetric matrix, 36 Unitary matrix, 109 Solution space, 42 Unitary space, 99 Space of linear transformation, 79 Unitary transformation, 118 Space of matrices, 33 Universal central extension, 420 Spectral theorem, 161 Spherical n-space, 175 Split exact sequence, 239 V Split extension, 382 Vandermonde matrix, 145 Splitting, 382 Vector space, 9 Splitting field, 285 Stably isomorphic, 259 Steinberg group, 423 W Steinberg relations, 55 Whitehead group, 262 Structure theorem of semi-simple ring, 337 Subfield, 4 Sub-representation, 346 Z Subspace, 11 Zelmanov, 344