<<

Graduate Texts in 135

Editorial Board S. Axler F.W. Gehring K.A. Ribet Graduate Texts in Mathematics

1 TAKEUTI/ZARING.Introduction to 34 SPITZER.Principles of Random Walk. Axiomatic Theory. 2nd ed. 2nd ed. 2 OXTOBY.Measure and Category. 2nd ed. 35 ALEXANDE~ERMER. Several Complex 3 SCHAEFER.Topological Vector Spaces. Variables and Banach Algebras. 3rd ed. 2nd ed. 36 KELLEY/NAMIOKAet al. Linear 4 HILTON/STAMMBACH.A Course in Topological Spaces. Homological Algebra. 2nd ed. 37 MONK.Mathematical Logic. 5 MAC LANE. Categories for the Working 38 GRAUERT/FRITZSCHE.Several Complex Mathematician. 2nd ed. Variables. 6 HUGHES/PIPER.Projective Planes. 39 ARVESON.An Invitation to C*-Algebras. 7 J.-P. SERRE. A Course in Arithmetic. 40 KEMENY/SNELL/KNAPP.Denumerable 8 TAKEUTI/ZARING.Axiomatic Set Theory. Markov Chains. 2nd ed. 9 HUMPHREYS.Introduction to Lie Algebras 41 APOSTOL.Modular Functions and Dirichlet and Representation Theory. Series in Number Theory. 10 COHEN.A Course in Simple Homotopy 2nd ed. Theory. 42 J.-P. SERRE. Linear Representations of 11 CONWAY.Functions of One Complex Finite Groups. Variable I. 2nd ed. 43 GILLMAN/JERISON.Rings of Continuous 12 BEALS.Advanced Mathematical Analysis. Functions. 13 ANDERSON/FULLER.Rings and Categories 44 KENDIG.Elementary Algebraic Geometry. of Modules. 2nd ed. 45 LOEVE.Probability Theory I. 4th ed. 14 GOLUBITSKY/GUILLEMIN.Stable Mappings 46 LoI~vE.Probability Theory II. 4th ed. and Their Singularities. 47 MOISE.Geometric Topology in 15 BERBERIAN.Lectures in Functional Dimensions 2 and 3. Analysis and Operator Theory. 48 SACHS/~qVu.General Relativity for 16 WrNTER.The Structure of Fields. Mathematicians. 17 ROSENBLATT.Random Processes. 2nd ed. 49 GRUENBERG/~vVEIR.Linear Geometry. 18 HALMOS.Measure Theory. 2nd ed. 19 HALMOS.A Problem Book. 50 EDWARDS.Fermat's Last Theorem. 2nd ed. 51 KLINGENBERG.A Course in Differential 20 HUSEMOLLER.Fibre Bundles. 3rd ed. Geometry. 21 HUMPHREYS.Linear Algebraic Groups. 52 HARTSHORNE.Algebraic Geometry. 22 BARNES/]VIACK.An Algebraic Introduction 53 MANN.A Course in Mathematical Logic. to Mathematical Logic. 54 GRAVEK/V~ATKINS.Combinatorics with 23 GREUB.Linear Algebra. 4th ed. Emphasis on the Theory of Graphs. 24 HOLMES.Geometric 55 BROWN/PEARCY.Introduction to Operator and Its Applications. Theory I: Elements of Functional 25 HEWITT/STROMBERG.Real and Abstract Analysis. Analysis. 56 MASSEY.Algebraic Topology: An 26 MANES.Algebraic Theories. Introduction. 27 KELLEY.General Topology. 57 CROWELL/Fox.Introduction to Knot 28 ZARISKI/SAMUEL.Commutative Algebra. Theory. Vol.I. 58 KOBLITZ.p-adic Numbers, p-adic Analysis, 29 ZAPdSrO/SAMUEL.Commutative Algebra. and Zeta-Functions. 2nd ed. Vol.II. 59 LANG.Cyclotomic Fields. 30 JACOBSON.Lectures in Abstract Algebra I. 60 ARNOLD.Mathematical Methods in Basic Concepts. Classical Mechanics. 2nd ed. 31 JACOBSON.Lectures in Abstract Algebra II. 61 WHITEHEAD.Elements of Homotopy . Theory. 32 JACOBSON.Lectures in Abstract Algebra 62 KARGAPOLOV/]~ERLZJAKOV.Fundamentals III. Theory of Fields and Galois Theory. of the Theory of Groups. 33 HIRSCH.Differential Topology. 63 BOLLOBAS.Graph Theory.

(continued after index) Steven Roman

Advanced Linear Algebra

Second Edition

___ Springer Steven Roman University of California, Irvine Irvine, California 92697-3875 USA [email protected]

Editorial Board." S. Axler EW. Gehring K.A. Ribet Mathematics Department Mathematics Department Mathematics Department San Francisco State East Hall University of California, University University of Michigan Berkeley San Francisco, CA 94132 Ann Arbor, MI 48109 Berkeley, CA 94720-3840 USA USA USA [email protected] [email protected] [email protected]

Mathematics Subject Classification (2000): 15-xx

Library of Congress Cataloging-in-Publication Data Roman, Steven. Advanced linear algebra / Steven Roman.--2nd ed. p. cm. Includes bibliographical references and index. ISBN 0-387-24766-1 (acid-free paper) 1. Algebras, Linear. I. Title. QA184.2.R66 2005 512'.5-- dc22 2005040244

ISBN 0-387-24766-1 Printed on acid-free paper.

© 2005 Steven Roman All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+ Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed in the United States of America.

98765432 springer.com To Donna and to my poker buddies Rachelle, Carol and Dan Preface to the Second Edition

Let me begin by thanking the readers of the first edition for their many helpful comments and suggestions. The second edition represents a major change from the first edition. Indeed, one might say that it is a totally new book, with the exception of the general range of topics covered.

The text has been completely rewritten. I hope that an additional 12 years and roughly 20 books worth of experience has enabled me to improve the quality of my exposition. Also, the exercise sets have been completely rewritten.

The second edition contains two new chapters: a chapter on convexity, separation and positive solutions to linear systems (Chapter 15) and a chapter on the QR decomposition, singular values and pseudoinverses (Chapter 17). The treatments of products and the have been greatly expanded and I have included discussions of (in the chapter on tensor products), the complexification of a real , Schur's lemma and Gerš gorin disks.

Steven Roman Irvine, California February 2005 Preface to the First Edition

This book is a thorough introduction to linear algebra, for the graduate or advanced undergraduate student. Prerequisites are limited to a knowledge of the basic properties of matrices and determinants. However, since we cover the basics of vector spaces and linear transformations rather rapidly, a prior course in linear algebra (even at the sophomore level), along with a certain measure of “mathematical maturity,” is highly desirable.

Chapter 0 contains a summary of certain topics in modern algebra that are required for the sequel. This chapter should be skimmed quickly and then used primarily as a reference. Chapters 1–3 contain a discussion of the basic properties of vector spaces and linear transformations.

Chapter 4 is devoted to a discussion of modules, emphasizing a comparison between the properties of modules and those of vector spaces. Chapter 5 provides more on modules. The main goals of this chapter are to prove that any two bases of a free have the same and to introduce noetherian modules. However, the instructor may simply skim over this chapter, omitting all proofs. Chapter 6 is devoted to the theory of modules over a principal ideal domain, establishing the cyclic decomposition theorem for finitely generated modules. This theorem is the key to the structure theorems for finite-dimensional linear operators, discussed in Chapters 7 and 8.

Chapter 9 is devoted to real and complex inner product spaces. The emphasis here is on the finite-dimensional case, in order to arrive as quickly as possible at the finite-dimensional spectral theorem for normal operators, in Chapter 10. However, we have endeavored to state as many results as is convenient for vector spaces of arbitrary .

The second part of the book consists of a collection of independent topics, with the one exception that Chapter 13 requires Chapter 12. Chapter 11 is on metric vector spaces, where we describe the structure of symplectic and orthogonal geometries over various base fields. Chapter 12 contains enough material on metric spaces to allow a unified treatment of topological issues for the basic x Preface

Hilbert space theory of Chapter 13. The rather lengthy proof that every metric space can be embedded in its completion may be omitted.

Chapter 14 contains a brief introduction to tensor products. In order to motivate the universal property of tensor products, without getting too involved in categorical terminology, we first treat both free vector spaces and the familiar direct sum, in a universal way. Chapter 15 [Chapter 16 in the second edition] is on affine geometry, emphasizing algebraic, rather than geometric, concepts.

The final chapter provides an introduction to a relatively new subject, called the umbral calculus. This is an algebraic theory used to study certain types of functions that play an important role in applied mathematics. We give only a brief introduction to the subjectc emphasizing the algebraic aspects, rather than the applications. This is the first time that this subject has appeared in a true textbook.

One final comment. Unless otherwise mentioned, omission of a proof in the text is a tacit suggestion that the reader attempt to supply one.

Steven Roman Irvine, California Contents

Preface to the Second Edition, vii Preface to the First Edition, ix Preliminaries, 1 Part 1 Preliminaries, 1 Part 2 Algebraic Structures, 16

Part I—Basic Linear Algebra, 31 1 Vector Spaces, 33 Vector Spaces, 33 Subspaces, 35 Direct Sums, 38 Spanning Sets and , 41 The Dimension of a Vector Space, 44 Ordered Bases and Coordinate Matrices, 47 The of a , 48 The Complexification of a Real Vector Space, 49 Exercises, 51 2 Linear Transformations, 55 Linear Transformations, 55 , 57 The and Image of a Linear Transformation, 57 Linear Transformations from -- to , 59 The Plus Nullity Theorem, 59 Change of Matrices, 60 The Matrix of a Linear Transformation, 61 Change of Bases for Linear Transformations, 63 Equivalence of Matrices, 64 Similarity of Matrices, 65 Similarity of Operators, 66 Invariant Subspaces and Reducing Pairs, 68 xii Contents

Topological Vector Spaces, 68 Linear Operators on = d , 71 Exercises, 72 3 The Theorems, 75 Quotient Spaces, 75 The Universal Property of Quotients and the First Isomorphism Theorem, 77 Quotient Spaces, Complements and Codimension, 79 Additional Isomorphism Theorems, 80 Linear Functionals, 82 Dual Bases, 83 Reflexivity, 84 Annihilators, 86 Operator Adjoints, 88 Exercises, 90 4 Modules I: Basic Properties, 93 Modules, 93 Motivation, 93 Submodules, 95 Spanning Sets, 96 Linear Independence, 98 Torsion Elements, 99 Annihilators, 99 Free Modules, 99 Homomorphisms, 100 Quotient Modules, 101 The Correspondence and Isomorphism Theorems, 102 Direct Sums and Direct Summands, 102 Modules Are Not As Nice As Vector Spaces, 106 Exercises, 106 5 Modules II: Free and Noetherian Modules, 109 The Rank of a , 109 Free Modules and Epimorphisms, 114 Noetherian Modules, 115 The Hilbert Basis Theorem, 118 Exercises, 119 6 Modules over a Principal Ideal Domain, 121 Annihilators and Orders, 121 Cyclic Modules, 122 Free Modules over a Principal Ideal Domain, 123 Torsion-Free and Free Modules, 125 Contents xiii

Prelude to Decomposition: Cyclic Modules, 126 The First Decomposition, 127 A Look Ahead, 127 The Primary Decomposition, 128 The Cyclic Decomposition of a Primary Module, 130 The Primary Cyclic Decomposition Theorem, 134 The Invariant Factor Decomposition, 135 Exercises, 138 7 The Structure of a Linear Operator, 141 A Brief Review, 141 The Module Associated with a Linear Operator, 142 Orders and the Minimal Polynomial, 144 Cyclic Submodules and Cyclic Subspaces, 145 Summary, 147 The Decomposition of = , 147 The Rational Canonical Form, 148 Exercises, 151 8 Eigenvalues and Eigenvectors, 153 The Polynomial of an Operator, 153 Eigenvalues and Eigenvectors, 155 Geometric and Algebraic Multiplicities, 157 The Jordan Canonical Form, 158 Triangularizability and Schur's Lemma, 160 Diagonalizable Operators, 165 Projections, 166 The Algebra of Projections, 167 Resolutions of the Identity, 170 Spectral Resolutions, 172 Projections and Invariance, 173 Exercises, 174 9 Real and Complex Inner Product Spaces , 181 and Distance, 183 Isometries, 186 , 187 Orthogonal and Orthonormal Sets, 188 The Projection Theorem and Best Approximations, 192 Orthogonal Direct Sums, 194 The Riesz Representation Theorem, 195 Exercises, 196 10 Structure Theory for Normal Operators, 201 The Adjoint of a Linear Operator, 201 xiv Contents

Unitary Diagonalizability, 204 Normal Operators, 205 Special Types of Normal Operators, 207 Self-Adjoint Operators, 208 Unitary Operators and Isometries, 210 The Structure of Normal Operators, 215 Matrix Versions, 222 Orthogonal Projections, 223 Orthogonal Resolutions of the Identity, 226 The Spectral Theorem, 227 Spectral Resolutions and Functional Calculus, 228 Positive Operators, 230 The Polar Decomposition of an Operator, 232 Exercises, 234

Part II—Topics, 235 11 Metric Vector Spaces: The Theory of Bilinear Forms, 239 Symmetric, Skew-Symmetric and Alternate Forms, 239 The Matrix of a Bilinear Form, 242 Quadratic Forms, 244 Orthogonality, 245 Linear Functionals, 248 Orthogonal Complements and Orthogonal Direct Sums, 249 Isometries, 252 Hyperbolic Spaces, 253 Nonsingular Completions of a Subspace, 254 The Witt Theorems: A Preview, 256 The Classification Problem for Metric Vector Spaces, 257 Symplectic Geometry, 258 The Structure of Orthogonal Geometries: Orthogonal Bases, 264 The Classification of Orthogonal Geometries: Canonical Forms, 266 The Orthogonal , 272 The Witt's Theorems for Orthogonal Geometries, 275 Maximal Hyperbolic Subspaces of an Orthogonal Geometry, 277 Exercises, 279 12 Metric Spaces, 283 The Definition, 283 Open and Closed Sets, 286 Convergence in a Metric Space, 287 The of a Set, 288 Contents xv

Dense Subsets, 290 Continuity, 292 Completeness, 293 Isometries, 297 The Completion of a Metric Space, 298 Exercises, 303 13 Hilbert Spaces, 307 A Brief Review, 307 Hilbert Spaces, 308 Infinite Series, 312 An Approximation Problem, 313 Hilbert Bases, 317 Fourier Expansions, 318 A Characterization of Hilbert Bases, 328 Hilbert Dimension, 328 A Characterization of Hilbert Spaces, 329 The Riesz Representation Theorem, 331 Exercises, 334 14 Tensor Products, 337 Universality, 337 Bilinear Maps, 341 Tensor Products, 343 When Is a Zero? 348 Coordinate Matrices and Rank, 350 Characterizing Vectors in a Tensor Product, 354 Defining Linear Transformations on a Tensor Product, 355 The Tensor Product of Linear Transformations, 357 Change of Base , 359 Multilinear Maps and Iterated Tensor Products, 363 Tensor Spaces, 366 Special Multilinear Maps, 371 Graded Algebras, 372 The Symmetric Tensor Algebra, 374 The Antisymmetric Tensor Algebra: The Exterior Product Space, 380 The , 387 Exercises, 391 15 Positive Solutions to Linear Systems: Convexity and Separation 395 Convex, Closed and Compact Sets, 398 Convex Hulls, 399 xvi Contents

Linear and Affine Hyperplanes, 400 Separation, 402 Exercises, 407 16 Affine Geometry, 409 Affine Geometry, 409 Affine Combinations, 41 Affine Hulls, 412 The of Flats, 413 Affine Independence, 416 Affine Transformations, 417 Projective Geometry, 419 Exercises, 423 17 Operator Factorizations: QR and Singular Value, 425 The QR Decomposition, 425 Singular Values, 428 The Moore–Penrose , 430 Least Squares Approximation, 433 Exercises, 434 18 The Umbral Calculus, 437 Formal Power Series, 437 The Umbral Algebra, 439 Formal Power Series as Linear Operators, 443 Sheffer Sequences, 446 Examples of Sheffer Sequences, 454 Umbral Operators and Umbral Shifts, 456 Continuous Operators on the Umbral Algebra, 458 Operator Adjoints, 459 Umbral Operators and Automorphisms of the Umbral Algebra, 460 Umbral Shifts and Derivations of the Umbral Algebra, 465 The Transfer Formulas, 470 A Final Remark, 471 Exercises, 472 References, 473 Index, 475 Chapter 0 Preliminaries

In this chapter, we briefly discuss some topics that are needed for the sequel. This chapter should be skimmed quickly and used primarily as a reference. Part 1 Preliminaries

Multisets The following simple concept is much more useful than its infrequent appearance would indicate.

Definition Let :4: be a nonempty set. A multiset with underlying set is a set of ordered pairs

b 4~¸² Á³“ :Á { Á £  for £¹

b where { ~ ¸Á Á à ¹ . The number  is referred to as the multiplicity of the elements  in 4 . If the underlying set of a multiset is finite, we say that the multiset is finite . The size of a finite multiset 4 is the sum of the multiplicities of all of its elements.

For example, 4 ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ is a multiset with underlying set : ~¸ÁÁ¹. The elements  has multiplicity  . One often writes out the elements of a multiset according to multiplicities, as in 4 ~¸ÁÁÁÁÁ¹ .

Of course, two mutlisets are equal if their underlying sets are equal and if the multiplicity of each element in the comon underlying set is the same in both multisets. Matrices

The set of  d  matrices with entries in a field - is denoted by CÁ ²- ³ or by CC<Á when the field does not require mention. The set Á²³ is denoted by CC²- ³ or À If (  C , the ²Á ³ -th entry of ( will be denoted by (Á . The identity matrix of size d is denoted by 0 . The elements of the base 2 Advanced Linear Algebra field - are called scalars . We expect that the reader is familiar with the basic properties of matrices, including matrix addition and multiplication.

The main diagonal of an d matrix ( is the sequence of entries

(Á(ÁÃÁ(Á Á Á where ~min ¸Á¹.

! Definition The of (CÁ is the matrix ( defined by ! ²( ³Á ~ ( Á A matrix ((~((~c( is symmetric if !! and skew-symmetric if .

Theorem 0.1 (Properties of the transpose) Let () , CÁ . Then 1) ²(!! ³ ~ ( 2) ²( b )³!!! ~ ( b ) 3) ² (³!! ~ ( for all  - 4) ²()³!!! ~ ) ( provided that the product () is defined 5) det²(! ³ ~ det ²(³. Partitioning and Let 4 be a matrix of size  d  . If ) ‹ ¸Á à Á ¹ and * ‹ ¸Á à Á ¹ then the submatrix 4´)Á*µ is the matrix obtained from 4 by keeping only the rows with index in )* and the columns with index in . Thus, all other rows and columns are discarded and 4´)Á*µ has size (( ) d (( * .

Suppose that 4CCÁ and 5 Á . Let

1) F ~¸) ÁÃÁ) ¹ be a partition of ¸ÁÃÁ¹ 2) G ~¸* ÁÃÁ* ¹ be a partition of ¸ÁÃÁ¹ 3) H ~¸+ ÁÃÁ+ ¹ be a partition of ¸ÁÃÁ¹

(Partitions are defined formally later in this chapter.) Then it is a very useful fact that matrix multiplication can be performed at the block level as well as at the entry level. In particular, we have

´45µ´) Á + µ ~ 4´)  Á * µ5´*  Á + µ

* G When the partitions in question contain only single-element blocks, this is precisely the usual formula for matrix multiplication

 ´45µÁ ~ 4 Á 5 Á ~ Preliminaries 3

Block Matrices

It will be convenient to introduce the notational device of a . If )Á are matrices of the appropriate sizes then by the block matrix

vy))Ä)Á Á Á 4~ ÅÅ Å wz ))Ä)Á Á Á block we mean the matrix whose upper left submatrix is )Á , and so on. Thus, the )4Á's are submatrices of and not entries. A of the form

vy)Ä x{ÆÆÅ 4~x{ ÅÆÆ wz Ä) block where each ) is square and is a zero submatrix, is said to be a block diagonal matrix. Elementary Row Operations Recall that there are three types of elementary row operations. Type 1 operations consist of multiplying a row of ( by a nonzero . Type 2 operations consist of interchanging two rows of ( . Type 3 operations consist of adding a scalar multiple of one row of (( to another row of .

If we perform an elementary operation of type 0 to an identity matrix  , the result is called an of type  . It is easy to see that all elementary matrices are invertible.

In order to perform an elementary row operation on (CÁ we can perform that operation on the identity 0, , to obtain an elementary matrix and then take the product ,( . Note that multiplying on the right by , has the effect of performing column operations.

Definition A matrix 9 is said to be in reduced if 1) All rows consisting only of  's appear at the bottom of the matrix. 2) In any nonzero row, the first nonzero entry is a  . This entry is called a leading entry. 3) For any two consecutive rows, the leading entry of the lower row is to the right of the leading entry of the upper row. 4) Any column that contains a leading entry has  's in all other positions.

Here are the basic facts concerning reduced row echelon form. 4 Advanced Linear Algebra

Theorem 0.2 Matrices (Á ) CÁ are row equivalent , denoted by ( — ) , if either one can be obtained from the other by a series of elementary row operations. 1) is an equivalence relation. That is, a) (—( b) (—)¬)—( c) (—) , )—*¬(—* . 2) A matrix (9 is row equivalent to one and only one matrix that is in reduced row echelon form. The matrix 9 is called the reduced row echelon form of ( . Furthermore,

(~,Ä,9

where ,( are the elementary matrices required to reduce to reduced row echelon form. 3) ( is invertible if and only if its reduced row echelon form is an identity matrix. Hence, a matrix is invertible if and only if it is the product of elementary matrices.

The following definition is probably well known to the reader.

Definition A square matrix is upper triangular if all of its entries below the main diagonal are  . Similarly, a square matrix is lower triangular if all of its entries above the main diagonal are  . A square matrix is diagonal if all of its entries off the main diagonal are  . Determinants We assume that the reader is familiar with the following basic properties of determinants.

Theorem 0.3 Let ( CÁ ²- ³ . Then det ²(³ is an element of - . Furthermore, 1) For any ) C ²- ³ , det²()³ ~ det ²(³ det ²)³ 2) (²(³£ is nonsingular (invertible) if and only if det . 3) The determinant of an upper triangular or lower triangular matrix is the product of the entries on its main diagonal. 4) If a square matrix 4 has the block diagonal form

vy)Ä x{ÆÆÅ 4~x{ ÅÆÆ wz Ä) block

then det²4³ ~ det ²) ³. Preliminaries 5

Polynomials The set of all in the variable %- with coefficients from a field is denoted by - ´%µ . If ²%³  - ´%µ , we say that ²%³ is a polynomial over - . If  ²%³ ~  b  % b Ä b   % is a polynomial with £ then  is called the leading coefficient of ²%³ and the degree of ²%³ is  , written deg ²%³ ~  . For convenience, the degree of the zero polynomial is cB . A polynomial is monic if its leading coefficient is .

Theorem 0.4 ()Division algorithm Let ²%³Á ²%³  - ´%µ where deg ²%³ €  . Then there exist unique polynomials ²%³Á ²%³  - ´%µ for which ²%³ ~ ²%³²%³ b ²%³ where ²%³ ~  or  deg ²%³  deg ²%³ .

If ²%³ divides ²%³ , that is, if there exists a polynomial ²%³ for which ²%³ ~ ²%³²%³ then we write ²%³ “ ²%³ .

Theorem 0.5 Let ²%³Á ²%³  - ´%µ . The greatest common divisor of ²%³ and ²%³, denoted by gcd ²²%³Á ²%³³ , is the unique monic polynomial ²%³ over - for which 1) ²%³ “ ²%³ and ²%³ “ ²%³ 2) if ²%³ “ ²%³ and ²%³ “ ²%³ then ²%³ “ ²%³ . Furthermore, there exist polynomials ²%³ and ²%³ over - for which gcd²²%³Á ²%³³ ~ ²%³²%³ b ²%³²%³

Definition The polynomials ²%³Á ²%³  - ´%µ are relatively prime if gcd²²%³Á ²%³³ ~ . In particular, ²%³ and ²%³ are relatively prime if and only if there exist polynomials ²%³ and ²%³ over - for which ²%³²%³ b ²%³²%³ ~ 

Definition A nonconstant polynomial ²%³ -´%µ is irreducible if whenever ²%³ ~ ²%³²%³ then one of ²%³ and ²%³ must be constant.

The following two theorems support the view that irreducible polynomials behave like prime numbers.

Theorem 0.6 A nonconstant polynomial ²%³ is irreducible if and only if it has the property that whenever ²%³ “ ²%³²%³ then either ²%³ “ ²%³ or ²%³ “ ²%³. 6 Advanced Linear Algebra

Theorem 0.7 Every nonconstant polynomial in -´%µ can be written as a product of irreducible polynomials. Moreover, this expression is unique up to order of the factors and multiplication by a scalar. Functions To set our notation, we should make a few comments about functions.

Definition Let ¢: ¦ ; be a function from a set : to a set ; . 1) The domain of : is the set . 2) The image or range of  is the set im ²³~ ¸² ³“  :¹ . 3) %£&¬²%³£²&³ is injective ( one-to-one ), or an injection , if . 4)  is surjective ( onto ; ), or a surjection , if im ²³ ~ ; . 5)  is bijective , or a bijection , if it is both injective and surjective. 6) Assuming that ; , the support of  is supp²³~¸ :“² ³£¹ If ¢: ¦ ; is injective then its inverse c ¢im ²³¦ : exists and is well- defined as a function on im²³ .

It will be convenient to apply :;?‹: to subsets of and . In particular, if and if @‹; , we set ²?³ ~ ¸²%³“ %  ?¹ and ²@³~¸c :“² ³@¹ Note that the latter is defined even if  is not injective.

Let ¢: ¦ ; . If ( ‹ : , the restriction of  to ( is the function O( ¢( ¦ ; defined by

O( ²³~ ²³ for all ( . Clearly, the restriction of an injective map is injective. Equivalence Relations The concept of an equivalence relation plays a major role in the study of matrices and linear transformations.

Definition Let :—: be a nonempty set. A binary relation on is called an equivalence relation on : if it satisfies the following conditions: Preliminaries 7

1) ()Reflexivity — for all : . 2) ()Symmetry —¬— for all Á   : . 3) ()Transitivity —Á—¬— for all Á Á   : .

Definition Let—:: be an equivalence relation on . For , the set of all elements equivalent to  is denoted by ´µ~¸:“—¹ and called the equivalence class of  .

Theorem 0.8 Let—: be an equivalence relation on . Then 1)   ´µ ¯   ´µ ¯ ´µ ~ ´µ 2) For any Á   : , we have either ´µ ~ ´µ or ´µ q ´µ ~ J .

Definition A partition of a nonempty set :¸(ÁÃÁ(¹ is a collection  of nonempty subsets of : , called the blocks of the partition, for which 1) (q(~J for all £ 2) :~( rÄr(.

The following theorem sheds considerable light on the concept of an equivalence relation.

Theorem 0.9 1) Let—: be an equivalence relation on . Then the set of distinct equivalence classes with respect to—: are the blocks of a partition of . 2) Conversely, if F is a partition of :— , the binary relation defined by — if  and  lie in the same block of F is an equivalence relation on : , whose equivalence classes are the blocks of F. This establishes a one-to-one correspondence between equivalence relations on :: and partitions of .

The most important problem related to equivalence relations is that of finding an efficient way to determine when two elements are equivalent. Unfortunately, in 8 Advanced Linear Algebra most cases, the definition does not provide an efficient test for equivalence and so we are led to the following concepts.

Definition Let—:¢:¦; be an equivalence relation on . A function , where ;— is any set, is called an invariant of if it is constant on the equivalence classes of — , that is, —¬²³~²³ and a complete invariant if it is constant and distinct on the equivalence classes of — , that is, —¯²³~²³

A collection ¸ ÁÃ Á ¹ of invariants is called a complete system of invariants if

—¯²³~²³ for all ~ÁÃÁ Definition Let—:*‹: be an equivalence relation on . A subset is said to be a set of canonical forms (or just a canonical form ) for— if for every : , there is exactly one * such that — . Put another way, each equivalence class under —* contains exactly one member of .

Example 0.1 Define a binary relation—-´%µ on by letting ²%³—²%³ if and only if ²%³ ~ ²%³ for some nonzero constant   - . This is easily seen to be an equivalence relation. The function that assigns to each polynomial its degree is an invariant, since ²%³ — ²%³ ¬deg ²²%³³ ~ deg ²²%³³ However, it is not a complete invariant, since there are inequivalent polynomials with the same degree. The set of all monic polynomials is a set of canonical forms for this equivalence relation.

Example 0.2 We have remarked that row equivalence is an equivalence relation on CÁ²- ³ . Moreover, the subset of reduced row echelon form matrices is a set of canonical forms for row equivalence, since every matrix is row equivalent to a unique matrix in reduced row echelon form.

Example 0.3 Two matrices () , C ²-³ are row equivalent if and only if there is an 7(~7)() such that . Similarly, and are column equivalent, that is, () can be reduced to using elementary column operations if and only if there exists an invertible matrix 8(~)8 such that .

Two matrices () and are said to be equivalent if there exist invertible matrices 78 and for which Preliminaries 9

(~7)8 Put another way, () and are equivalent if ( can be reduced to ) by performing a series of elementary row and/or column operations. (The use of the term equivalent is unfortunate, since it applies to all equivalence relations, not just this one. However, the terminology is standard, so we use it here.)

It is not hard to see that an d matrix 9 that is in both reduced row echelon form and reduced column echelon form must have the block form

0Ác 1~ >? cÁ cÁc block

We leave it to the reader to show that every matrix ( in C is equivalent to exactly one matrix of the form 1 and so the set of these matrices is a set of canonical forms for equivalence. Moreover, the function  defined by ²(³~ , where ( — 1 , is a complete invariant for equivalence.

Since the rank of 1 is and since neither row nor column operations affect the rank, we deduce that the rank of ( is . Hence, rank is a complete invariant for equivalence. In other words, two matrices are equivalent if and only if they have the same rank.

Example 0.4 Two matrices () , C ²-³ are said to be similar if there exists an invertible matrix 7 such that (~7)7c

Similarity is easily seen to be an equivalence relation on C . As we will learn, two matrices are similar if and only if they represent the same linear operators on a given = -dimensional vector space . Hence, similarity is extremely important for studying the structure of linear operators. One of the main goals of this book is to develop canonical forms for similarity.

We leave it to the reader to show that the determinant function and the trace function are invariants for similarity. However, these two invariants do not, in general, form a complete system of invariants.

Example 0.5 Two matrices () , C ²-³ are said to be congruent if there exists an invertible matrix 7 for which (~7)7! where 77! is the transpose of . This relation is easily seen to be an equivalence relation and we will devote some effort to finding canonical forms for congruence. For some base fields - (such as sd , or a finite field), this is relatively easy to do, but for other base fields (such as r ), it is extremely difficult. 10 Advanced Linear Algebra

Zorn's Lemma In order to show that any vector space has a basis, we require a result known as Zorn's lemma. To state this lemma, we need some preliminary definitions.

Definition A partially ordered set is a pair ²7 Á  ³ where 7 is a nonempty set and is a binary relation called a partial order , read “less than or equal to,” with the following properties: 1) ()Reflexivity For all 7 ,  2) ()Antisymmetry For all Á   7 ,  and  implies ~ 3) ()Transitivity For all Á Á   7 ,  and  implies  Partially ordered sets are also called posets .

It is customary to use a phrase such as “Let 7 be a partially ordered set” when the partial order is understood. Here are some key terms related to partially ordered sets.

Definition Let 7 be a partially ordered set. 1) A maximal element is an element 7 with the property that there is no larger element in 7 , that is 7Á¬~ 2) A minimal element is an element 7 with the property that there is no smaller element in 7 , that is 7Á¬~ 3) Let Á   7 . Then "  7 is an upper bound for  and  if " and " The unique smallest upper bound for  and , if it exists, is called the least upper bound of  and and is denoted by lub ¸Á¹ . 4) Let Á   7 . Then M  7 is a lower bound for  and  if M and M The unique largest lower bound for  and , if it exists, is called the greatest lower bound of  and and is denoted by glb ¸Á¹ . Preliminaries 11

Let :7"7 be a subset of a partially ordered set . We say that an element is an upper bound for : if " for all : . Lower bounds are defined similarly.

Note that in a partially ordered set, it is possible that not all elements are comparable. In other words, it is possible to have %Á &  7 with the property that %“& and &“%.

Definition A partially ordered set in which every pair of elements is comparable is called a totally ordered set , or a linearly ordered set . Any totally ordered subset of a partially ordered set 77 is called a chain in .

Example 0.6 1) The set s of real numbers, with the usual binary relation  , is a partially ordered set. It is also a totally ordered set. It has no maximal elements. 2) The set o ~ ¸Á Á à ¹ of natural numbers, together with the binary relation of divides, is a partially ordered set. It is customary to write “ to indicate that  divides . The subset : of o consisting of all powers of  is a totally ordered subset of oo , that is, it is a chain in . The set 7 ~ ¸Á Á Á Á Á  ¹ is a partially ordered set under “ . It has two maximal elements, namely and  . The subset 8 ~ ¸Á Á Á Á ¹ is a partially ordered set in which every element is both maximal and minimal! 3) Let :²:³: be any set and let F be the power set of , that is, the set of all subsets of :²:³ . Then F , together with the subset relation ‹ , is a partially ordered set.

Now we can state Zorn's lemma, which gives a condition under which a partially ordered set has a maximal element.

Theorem 0.10 (Zorn's lemma ) If 7 is a partially ordered set in which every chain has an upper bound then 7 has a maximal element.

We will not prove Zorn's lemma. Indeed, Zorn's lemma is a result that is so fundamental that it cannot be proved or disproved in the context of ordinary set theory. (It is equivalent to the famous .) Therefore, Zorn's lemma (along with the Axiom of Choice) must either be accepted or rejected as an axiom of set theory. Since almost all mathematicians accept it, we will do so as well. Indeed, we will use Zorn's lemma to prove that every vector space has a basis. Cardinality Two sets :; and have the same cardinality , written ((:~; (( if there is a bijective function (a one-to-one correspondence) between the sets. 12 Advanced Linear Algebra

The reader is probably aware of the fact that (({o~~ (( and (( ro (( where o{r denotes the natural numbers, the integers and the rational numbers.

If :;:; is in one-to-one correspondence with a subset of , we write (( (( . If :;; is in one-to-one correspondence with a proper subset of but not all of then we write ((:; (( . The second condition is necessary, since, for instance, o{o is in one-to-one correspondence with a proper subset of and yet is also in one-to-one correspondence with {o{ itself. Hence, ((~ (( .

This is not the place to enter into a detailed discussion of cardinal numbers. The intention here is that the cardinality of a set, whatever that is, represents the “size” of the set. It is actually easier to talk about two sets having the same, or different, size (cardinality) than it is to explicitly define the size (cardinality) of a given set.

Be that as it may, we associate to each set :: a cardinal number, denoted by (( or card²:³ , that is intended to measure the size of the set. Actually, cardinal numbers are just very special types of sets. However, we can simply think of them as vague amorphous objects that measure the size of sets.

Definition 1) A set is finite if it can be put in one-to-one correspondence with a set of the form { ~ ¸Á Á à Á  c ¹ , for some nonnegative integer  . A set that is not finite is infinite . The cardinal number (or cardinality ) of a finite set is just the number of elements in the set. 2) The cardinal number of the set o of natural numbers is L (read “aleph nought”), where L is the first letter of the Hebrew alphabet. Hence,

((o{r~~~L (( (( 

3) Any set with cardinality L is called a countably infinite set and any finite or countably infinite set is called a countable set. An infinite set that is not countable is said to be uncountable .

Since it can be shown that ((so€ (( , the real numbers are uncountable.

If :; and are finite sets then it is well known that ((:; (( and (( ; :¬:~; (( (( (( The first part of the next theorem tells us that this is also true for infinite sets.

The reader will no doubt recall that the power set F²:³ of a set : is the set of all subsets of :: . For finite sets, the power set of is always bigger than the set Preliminaries 13 itself. In fact, ((: ~  ¬ (F ²:³ ( ~  The second part of the next theorem says that the power set of any set : is bigger (has larger cardinality) than : itself. On the other hand, the third part of this theorem says that, for infinite sets :: , the set of all finite subsets of is the same size as : .

Theorem 0.11 1) ()Schroder¨ – Bernstein theorem For any sets :; and , ((:; (( and (( ;: ((¬ (( :~; ((

2) ()Cantor's theorem If F²:³ denotes the power set of : then ((:  (F ²:³ (

3) If F²:³ denotes the set of all finite subsets of : and if : is an infinite set then

((: ~ (F ²:³ ( Proof. We prove only parts 1) and 2). Let ¢: ¦ ; be an injective function from :; into and let ¢;¦: be an injective function from ;: into . We want to use these functions to create a bijective function from :; to . For this purpose, we make the following definitions. The descendants of an element : are the elements obtained by repeated alternate applications of the functions  and , namely ² ³Á ²² ³³Á ²²² ³³³Á Ã

If ! is a descendant of then is an ancestor of ! . Descendants and ancestors of elements of ; are defined similarly.

Now, by tracing an element's ancestry to its beginning, we find that there are three possibilities: the element may originate in :; , or in , or it may have no point of origin. Accordingly, we can write : as the union of three disjoint sets

I: ~¸ :“ originates in :¹ I; ~¸ :“ originates in ;¹ IB ~¸ :“ has no originator ¹

Similarly, ; is the disjoint union of JJ:; , and J B .

Now, the restriction

OI: ¢IJ:: ¦ is a bijection. To see this, note that if !J: then ! originated in : and therefore must have the form ² ³ for some  : . But ! and its ancestor ² ³ have the 14 Advanced Linear Algebra

same point of origin and so !JI:: implies  . Thus, OI: is surjective and hence bijective. We leave it to the reader to show that the functions

c ²OJ; ³ ¢IJ;; ¦ and OIB ¢ I BB ¦ J are also bijections. Putting these three bijections together gives a bijection between :; and . Hence, (( :~; (( , as desired.

We now prove Cantor's Theorem. The map F¢ : ¦ ²:³ defined by  ² ³ ~ ¸ ¹ is an injection from : to FF ²:³ and so (( :  ( ²:³ ( . To complete the proof we must show that if no injective map ¢ : ¦F ²:³ can be surjective. To this end, let ?~¸ :“ ¤² ³¹F ²:³ We claim that ? is not in im ²³ . For suppose that ? ~ ²%³ for some %  : . Then if %? , we have by the definition of ? that %¤? . On the other hand, if %¤?, we have again by the definition of ? that %? . This contradiction implies that ?¤im ²³ and so  is not surjective. Cardinal Arithmetic Now let us define addition, multiplication and exponentiation of cardinal numbers. If :; and are sets, the cartesian product :d; is the set of all ordered pairs :d; ~¸² Á!³“ :Á!;¹ The set of all functions from ;: to is denoted by :; .

Definition Let  and denote cardinal numbers. Let :; and be any sets for which ((:~ and (( ;~ . 1) The sum b:r; is the cardinal number of . 2) The product  is the cardinal number of :d; . 3) The power  is the cardinal number of :; .

We will not go into the details of why these definitions make sense. (For instance, they seem to depend on the sets :; and , but in fact they do not.) It can be shown, using these definitions, that cardinal addition and multiplication are associative and commutative and that multiplication distributes over addition.

Theorem 0.12 Let  , and  be cardinal numbers. Then the following properties hold: 1) ()Associativity b²b³~²b³b and ² ³~²³ Preliminaries 15

2) ()Commutativity b~b and  ~ 3) ()Distributivity ²b³~   b  4) () Properties of Exponents a) b ~   b) ²³~  c) ²³~  

On the other hand, the arithmetic of cardinal numbers can seem a bit strange, as the next theorem shows.

Theorem 0.13 Let  and be cardinal numbers, at least one of which is infinite. Then b~ ~max ¸Á¹  It is not hard to see that there is a one-to-one correspondence between the power set F²:³ of a set : and the set of all functions from : to ¸Á ¹ . This leads to the following theorem.

Theorem 0.14 For any cardinal  1) If ((: ~F then ( ²:³ ( ~  2)  

We have already observed that ((o ~L . It can be shown that L is the smallest infinite cardinal, that is,

L0 ¬ is a natural number It can also be shown that the set s of real numbers is in one-to-one correspondence with the power set Fo²³ of the natural numbers. Therefore,

((s ~L The set of all points on the real line is sometimes called the continuum and so L is sometimes called the power of the continuum and denoted by .

Theorem 0.13 shows that cardinal addition and multiplication have a kind of “absorption” quality, which makes it hard to produce larger cardinals from smaller ones. The next theorem demonstrates this more dramatically.

Theorem 0.15 1) Addition applied a countable number of times or multiplication applied a finite number of times to the cardinal number L , does not yield anything 16 Advanced Linear Algebra

more than L . Specifically, for any nonzero o , we have  LhL~L  and L~L  2) Addition and multiplication, applied a countable number of times to the cardinal number LL does not yield more than . Specifically, we have

LL LLL  Lh~ and ²³ ~ Using this theorem, we can establish other relationships, such as

LLLLL   ²L ³  ² ³ ~  which, by the Schro¨der–Bernstein theorem, implies that

LL ²L ³ ~  We mention that the problem of evaluating  in general is a very difficult one and would take us far beyond the scope of this book.

We will have use for the following reasonable–sounding result, whose proof is omitted.

Theorem 0.16 Let ¸( “   2¹ be a collection of sets, indexed by the set 2 , with ((2~ . If ( ( (  for all 2 then

ee (  2 Let us conclude by describing the cardinality of some famous sets.

Theorem 0.17 1) The following sets have cardinality L . a) The rational numbers r . b) The set of all finite subsets of o . c) The union of a countable number of countable sets. d) The set { of all ordered  -tuples of integers. 2) The following sets have cardinality L . a) The set of all points in s . b) The set of all infinite sequences of natural numbers. c) The set of all infinite sequences of real numbers. d) The set of all finite subsets of s . e) The set of all irrational numbers. Part 2 Algebraic Structures We now turn to a discussion of some of the many algebraic structures that play a role in the study of linear algebra. Preliminaries 17

Groups Definition A group is a nonempty set . , together with a binary operation denoted by *, that satisfies the following properties: 1) ()Associativity For all Á Á   . ²i³i ~ i²i³

2) ()Identity There exists an element . for which i ~ i ~  for all . . 3) ()Inverses For each . , there is an element c . for which ic ~ c i~ Definition A group . is abelian , or commutative , if i ~ i for all Á   . . When a group is abelian, it is customary to denote the operation iib by +, thus writing as . It is also customary to refer to the identity as the zero element and to denote the inverse cc by , referred to as the negative of .

Example 0.7 The set < of all bijective functions from a set :: to is a group under composition of functions. However, in general, it is not abelian.

Example 0.8 The set CÁ²- ³ is an under addition of matrices. The identity is the zero matrix 0Á of size d . The set C  ²-³ is not a group under multiplication of matrices, since not all matrices have multiplicative inverses. However, the set of invertible matrices of size d is a (nonabelian) group under multiplication.

A group . is finite if it contains only a finite number of elements. The cardinality of a finite group . is called its order and is denoted by ².³ or simply ((. . Thus, for example, { ~ ¸Á Á à Á  c ¹ is a finite group under addition modulo ²³ , but CsÁ is not finite.

Definition A subgroup of a group .:. is a nonempty subset of that is a group in its own right, using the same operations as defined on . . Rings Definition A is a nonempty set 9 , together with two binary operations, called addition (denoted by b ) and multiplication (denoted by juxtaposition), for which the following hold: 1) 9 is an abelian group under addition 18 Advanced Linear Algebra

2) ()Associativity For all Á Á   9 , ²³ ~ ²³

3) ()Distributivity For all Á Á   9 , ² b ³ ~  b  and ² b ³ ~  b 

A ring 9~Á99 is said to be commutative if for all . If a ring contains an element  with the property that  ~  ~  for all 9 , we say that 9 is a ring with identity . The identity  is usually denoted by  .

Example 0.9 The set { ~ ¸Á Á à Á c¹ is a under addition and multiplication modulo  l~²b³mod Á p~ mod 

The element { is the identity.

Example 0.10 The set , of even integers is a commutative ring under the usual operations on { , but it has no identity.

Example 0.11 The set C²- ³ is a noncommutative ring under matrix addition and multiplication. The identity matrix 0²-³ is the identity for C .

Example 0.12 Let --´%µ be a field. The set of all polynomials in a single variable %- , with coefficients in , is a commutative ring, under the usual operations of polynomial addition and multiplication. What is the identity for -´%µ? Similarly, the set -´% ÁÃÁ% µ of polynomials in  variables is a commutative ring under the usual addition and multiplication of polynomials.

Definition A subring of a ring 9:9 is a subset of that is a ring in its own right, using the same operations as defined on 9 and having the same multiplicative identity as 9 .

The condition that a subring :9 have the same multiplicative identity as is required. For example, the set :d of all matrices of the form  (~  >? for - is a ring under addition and multiplication of matrices (isomorphic to -:(0). The multiplicative identity in is the matrix  , which is not the identity of CCÁ²- ³ . Hence, : is a ring under the same operations as Á ²- ³ but it is not a subring of CÁ²- ³ . Preliminaries 19

Applying the definition is not generally the easiest way to show that a subset of a ring is a subring. The following characterization is usually easier to apply.

Theorem 0.18 A nonempty subset :9 of a ring is a subring if and only if 1) The multiplicative identity 99 of is in : 2) : is closed under subtraction, that is Á:¬c: 3) : is closed under multiplication, that is, Á   : ¬   :

Ideals Rings have another important substructure besides subrings.

Definition Let 99 be a ring. A nonempty subset ? of is called an ideal if 1) ?? is a subgroup of the abelian group 9 , that is, is closed under subtraction Á  ?? ¬  c   2) ? is closed under multiplication by any ring element, that is, ??? Á 9¬  and  Note that if an ideal ?? contains the unit element ~9 then .

Example 0.13 Let ²%³ be a polynomial in - ´%µ . The set of all multiples of ²%³ º²%³» ~ ¸²%³²%³ “ ²%³  - ´%µ¹ is an ideal in - ´%µ , called the ideal generated by ²%³ .

Definition Let :9 be a subset of a ring with identity. The set

º:»~¸  bÄb  “  9Á  :Á‚¹ of all finite linear combinations of elements of :9 , with coefficients in , is an ideal in 9: , called the ideal generated by . It is the smallest (in the sense of set inclusion) ideal of 9 containing : . If : ~¸  Áà Á ¹ is a finite set, we write

º  Áà Á »~¸ bÄb “ 9Á :¹ Note that in the previous definition, we require that 9 have an identity. This is to ensure that : ‹ º:» .

Theorem 0.19 Let 9 be a ring. 20 Advanced Linear Algebra

1) The intersection of any collection ¸“2¹? of ideals is an ideal. 2) If ??‹‹Ä is an ascending sequence of ideals, each one contained in the next, then the union ? is also an ideal. 3) More generally, if

9?~¸ “0¹  is a chain of ideals in 9~ then the union @?0  is also an ideal in 9 . Proof. To prove 1), let @?~  . Then if Á @ , we have Á ? for all 2. Hence, c?@@ for all 2 and so c . Hence, is closed under subtraction. Also, if 9 then ?@ for all 2 and so  . Of course, part 2) is a special case of part 3). To prove 3), if Á  @? then    and ??? for some Á0 . Since one of and is contained in the other, we may assume that ??‹Ác‹ . It follows that ?  and so ?@  and if 9 then ?@ ‹ . Thus @ is an ideal.

Note that in general, the union of ideals is not an ideal. However, as we have just proved, the union of any chain of ideals is an ideal. Quotient Rings and Maximal Ideals Let :9– be a subset of a commutative ring with identity. Let be the binary relation on 9 defined by – ¯ c: It is easy to see that–– is an equivalence relation. When , we say that and : are congruent modulo . The term “mod” is used as a colloquialism for modulo and – is often written –mod : As shorthand, we write – .

To see what the equivalence classes look like, observe that ´µ~¸ 9“ –¹ ~¸ 9“ c:¹ ~¸ 9“ ~b for some :¹ ~¸b “ :¹ ~b: The set b:~¸b “ :¹ is called a of :9 in . The element  is called a coset representative for b:. Preliminaries 21

Thus, the equivalence classes for congruence mod :b:: are the of in 9 . The set of all cosets is denoted by 9 ~¸b:“9¹ : This is read “9: mod .” We would like to place a ring structure on 9°: . Indeed, if :99°: is a subgroup of the abelian group then is easily seen to be an abelian group as well under coset addition defined by ²b:³b²b:³~²b³b: In order for the product ²b:³²b:³~b: to be well defined we must have b:~ZZ b:¬b:~b: or, equivalently, cZZ :¬²c³: But cZ may be any element of : and  may be any element of 9 and so this condition implies that :: must be an ideal. Conversely, if is an ideal then coset multiplication is well defined.

Theorem 0.20 Let 9 be a commutative ring with identity. Then the quotient 9°?? is a ring under coset addition and multiplication if and only if is an ideal of 99° . In this case, ?? is called the quotient ring of 9 modulo , where addition and multiplication are defined by ²b:³b²b:³~²b³b: ²b:³²b:³~b:

Definition An ideal ?? in a ring 9£9 is a maximal ideal if and if whenever @?@@?@ is an ideal satisfying ‹‹9 then either ~ or ~9 .

Here is one reason why maximal ideals are important.

Theorem 0.21 Let 9 be a commutative ring with identity. Then the quotient ring 9°?? is a field if and only if is a maximal ideal. Proof. First, note that for any ideal ?? of 99° , the ideals of are precisely the quotients @?°‹‹9° where @ is an ideal for which ? @ . It is clear that @? is an ideal of 9°?A . Conversely, if Z is an ideal of 9° ? then let A?A~¸ 9“ b Z ¹ It is easy to see that A?A is an ideal of 9‹‹9 for which . 22 Advanced Linear Algebra

Next, observe that a commutative ring :: with identity is a field if and only if has no nonzero proper ideals. For if :: is a field and ? is an ideal of containing a nonzero element then ~ c ?? and so ~: . Conversely, if :£ has no nonzero proper ideals and :º then the ideal »: must be and so there is an : for which ~ . Hence, : is a field.

Putting these two facts together proves the theorem.

The following result says that maximal ideals always exist.

Theorem 0.22 Any commutative ring 9 with identity contains a maximal ideal. Proof. Since 9 is not the zero ring, the ideal ¸¹ is a proper ideal of 9 . Hence, the set I of all proper ideals of 9 is nonempty. If

9?~¸ “0¹  is a chain of proper ideals in 9~ then the union @?0  is also an ideal. Furthermore, if @@?~9 is not proper, then  and so  , for some 0 , which implies that ?@II ~9 is not proper. Hence,  . Thus, any chain in has an upper bound in II and so Zorn's lemma implies that has a maximal element. This shows that 9 has a maximal ideal. Integral Domains Definition Let 99 be a ring. A nonzero element r is called a zero divisor if there exists a nonzero 9 for which ~ . A commutative ring 9 with identity is called an integral domain if it contains no zero divisors.

Example 0.14 If  is not a prime number then the ring { has zero divisors and so is not an integral domain. To see this, observe that if  is not prime then  ~  in {{ , where Á  ‚  . But in  , we have p~mod ~ and so  and are both zero divisors. As we will see later, if  is a prime then { is a field (which is an integral domain, of course).

Example 0.15 The ring - ´%µ is an integral domain, since ²%³²%³ ~  implies that ²%³ ~  or ²%³ ~ .

If 9 is a ring and % ~ & where Á %Á &  9 then we cannot in general cancel the 's and conclude that %~& . For instance, in { , we have h~h , but canceling the ~ 's gives . However, it is precisely the integral domains in which we can cancel. The simple proof is left to the reader.

Theorem 0.23 Let 99 be a commutative ring with identity. Then is an integral domain if and only if the cancellation law Preliminaries 23

% ~ &Á £  ¬ % ~ & holds. The Field of Quotients of an Integral Domain Any integral domain 9 can be embedded in a field. The quotient field (or field of quotients) of 99 is a field that is constructed from just as the field of rational numbers is constructed from the ring of integers. In particular, we set 9b ~¸²Á³“Á9Á£¹ Thinking of ²Á ³ as the “fraction” ° we define addition and multiplication of fractions in the same way as for rational numbers ²Á ³ b ² Á ³ ~ ² b  Á  ³ and ²Á ³ h ² Á ³ ~ ² Á  ³ It is customary to write ²Á ³ in the form ° . Note that if 9 has zero divisors, then these definitions do not make sense, because  may be  even if  and are not. This is why we require that 9 be an integral domain. Principal Ideal Domains Definition Let 99 be a ring with identity and let . The principal ideal generated by  is the ideal º» ~ ¸  “  9¹ An integral domain 9 in which every ideal is a principal ideal is called a principal ideal domain.

Theorem 0.24 The integers form a principal ideal domain. In fact, any ideal ? in {? is generated by the smallest positive integer a that is contained in .

Theorem 0.25 The ring -´%µ is a principal ideal domain. In fact, any ideal ? is generated by the unique monic polynomial of smallest degree contained in ? . Moreover, for polynomials  ²%³ÁÃÁ ²%³ ,

º ²%³Á à Á  ²%³» ~ ºgcd ¸  ²%³Á à Á  ²%³¹» Proof. Let ? be an ideal in - ´%µ and let ²%³ be a monic polynomial of smallest degree in ? . First, we observe that there is only one such polynomial in ??. For if ²%³  is monic and deg ²²%³³ ~ deg ²²%³³ then ²%³ ~ ²%³ c ²%³  ? and since deg²²%³³  deg ²²%³³ , we must have ²%³ ~  and so ²%³ ~ ²%³. 24 Advanced Linear Algebra

We show that ???~ º²%³» . Since ²%³  , we have º²%³» ‹ . To establish the reverse inclusion, if ²%³ ? then dividing ²%³ by ²%³ gives ²%³ ~ ²%³²%³ b ²%³ where ²%³ ~  or   deg ²%³  deg ²%³ . But since ? is an ideal, ²%³ ~ ²%³ c ²%³²%³  ? and so  deg ²%³  deg ²%³ is impossible. Hence, ²%³ ~  and ²%³ ~ ²%³²%³  º²%³» This shows that ??‹ º²%³» and so ~ º²%³» .

To prove the second statement, let ? ~º ²%³ÁÃÁ ²%³» . Then, by what we have just shown,

? ~ º ²%³Á à Á  ²%³» ~ º²%³» where ²%³ is the unique monic polynomial ²%³ in ? of smallest degree. In particular, since  ²%³  º²%³» , we have ²%³ “  ²%³ for each  ~ Á à Á  . In other words, ²%³ is a common divisor of the  ²%³ 's.

Moreover, if ²%³ “  ²%³ for all  , then  ²%³  º²%³» for all  , which implies that

²%³  º²%³» ~ º ²%³Á à Á  ²%³» ‹ º²%³» and so ²%³ “ ²%³ . This shows that ²%³ is the greatest common divisor of the ²%³ 's and completes the proof.

Example 0.16 The ring 9~-´%Á&µ of polynomials in two variables % and & is not a principal ideal domain. To see this, observe that the set ? of all polynomials with zero constant term is an ideal in 9 . Now, suppose that ? is the principal ideal ??~º²%Á&³» . Since %Á& , there exist polynomials ²%Á&³ and ²%Á&³ for which % ~ ²%Á &³²%Á &³ and & ~ ²%Á &³²%Á &³ (0.1) But ²%Á &³ cannot be a constant for then we would have ? ~ 9 . Hence, deg²²%Á &³³ ‚  and so ²%Á &³ and ²%Á &³ must both be constants, which implies that (0.1) cannot hold.

Theorem 0.26 Any principal ideal domain 9 satisfies the ascending chain condition, that is, 9 cannot have a strictly increasing sequence of ideals

??‰‰Ä where each ideal is properly contained in the next one. Preliminaries 25

Proof. Suppose to the contrary that there is such an increasing sequence of ideals. Consider the ideal

<~ ? which must have the form < ~ º» for some   < . Since  ? for some  , we have ??~‚ for all , contradicting the fact that the inclusions are proper. Prime and Irreducible Elements We can define the notion of a prime element in any integral domain. For Á  9, we say that divides (written “ ) if there exists an %  9 for which ~% .

Definition Let 9 be an integral domain. 1) An invertible element of 9"9"#~ is called a unit . Thus, is a unit if for some #9 . 2) Two elements Á   9 are said to be associates if there exists a unit " for which ~". 3) A nonzero nonunit 9 is said to be prime if “¬“ or “

4) A nonzero nonunit 9 is said to be irreducible if ~¬  or  is a unit Note that if "" is prime or irreducible then so is for any unit .

Theorem 0.27 Let 9 be a ring. 1) An element "9 is a unit if and only if º"»~9 . 2) and are associates if and only if º » ~ º » . 3) divides if and only if º » ‹ º » . 4) properly divides , that is, ~% % where is not a unit, if and only if º » ‰ º ».

In the case of the integers, an integer is prime if and only if it is irreducible. In any integral domain, prime elements are irreducible, but the converse need not hold. (In the ring {{´c µ~¸bc “Á¹jj the prime element  divides the product ² bjj c ³² c c ³ ~ but does not divide either factor.)

However, in principal ideal domains, the two concepts are equivalent.

Theorem 0.28 Let 9 be a principal ideal domain. 1) An 9 is irreducible if and only if the ideal º » is maximal. 2) An element in 9 is prime if and only if it is irreducible. 26 Advanced Linear Algebra

3) The elements Á   9 are relatively prime , that is, have no common nonunit factors if and only if there exist Á  9 for which  b  ~  Proof. To prove 1), suppose that is irreducible and that º »‹º»‹9 . Then º» and so ~% for some %9 . The irreducibility of implies that  or % is a unit. If  is a unit then º» ~ 9 and if % is a unit then º» ~ º%» ~ º » . This shows that º » is maximal. (We have º » £ 9 , since is not a unit.) Conversely, suppose that ~ is not irreducible, that is, where neither nor  is a unit. Then º »‹º»‹9 . But if º»~º » then and  are associates, which implies that º »£º»º»~9 is a unit. Hence . Also, if then must be a unit. So we conclude that º » is not maximal, as desired.

To prove 2), assume first that ~““ is prime and . Then or . We may assume that “ . Therefore, ~%~% . Canceling  's gives ~% and so  is a unit. Hence, is irreducible. (Note that this argument applies in any integral domain.)

Conversely, suppose that “ is irreducible and let . We wish to prove that “  or “  . The ideal º » is maximal and so º Á » ~ º » or º Á » ~ 9 . In the former case, “ and we are done. In the latter case, we have ~%b& for some %Á &  9 . Thus, ~%b&  and since “ divides both terms on the right, we have .

To prove 3), it is clear that if  b  ~  then  and  are relatively prime. For the converse, consider the ideal ºÁ » which must be principal, say ºÁ » ~ º%» . Then %“ and %“ and so % must be a unit, which implies that ºÁ»~9 . Hence, there exists Á  9 for which  b  ~  . Unique Factorization Domains Definition An integral domain 9 is said to be a unique factorization domain if it has the following factorization properties: 1) Every nonzero nonunit element 9 can be written as a product of a finite number of irreducible elements ~Ä . 2) The factorization into irreducible elements is unique in the sense that if ~Ä and ~Ä  are two such factorizations then ~ and after a suitable reindexing of the factors,  and are associates.

Unique factorization is clearly a desirable property. Fortunately, principal ideal domains have this property. Preliminaries 27

Theorem 0.29 Every principal ideal domain 9 is a unique factorization domain. Proof. Let 9 be a nonzero nonunit. If is irreducible then we are done. If not then ~  , where neither factor is a unit. If  and  are irreducible, we are done. If not, suppose that ~  is not irreducible. Then , where neither  nor is a unit. Continuing in this way, we obtain a factorization of the form (after renumbering if necessary)

~  ~ ² ³~² ³² ³~² ³² ³~Ä     Each step is a factorization of into a product of nonunits. However, this process must stop after a finite number of steps, for otherwise it will produce an infinite sequence Á ÁÃ of nonunits of 9 for which b properly divides  . But this gives the ascending chain of ideals

º »‰º »‰º »‰º »‰Ä where the inclusions are proper. But this contradicts the fact that a principal ideal domain satisfies the ascending chain condition. Thus, we conclude that every nonzero nonunit has a factorization into irreducible elements.

As to uniqueness, if ~Ä and ~Ä  are two such factorizations then because 9 is an integral domain, we may equate them and cancel like factors, so let us assume this has been done. Thus, £ for all Á . If there are no factors on either side, we are done. If exactly one side has no factors left then we have expressed  as a product of irreducible elements, which is not possible since irreducible elements are nonunits.

Suppose that both sides have factors left, that is,

Ä~Ä  where £ . Then   “Ä , which implies that   “ for some  . We can assume by reindexing if necessary that ~ . Since   is irreducible   must be a unit. Replacing  by and canceling   gives

  Ä c ~   Ä c This process can be repeated until we run out of  's or 's. If we run out of  's first then we have an equation of the form " Ä ~  where " is a unit, which is not possible since the  's are not units. By the same reasoning, we cannot run out of ~ 's first and so and the 's and 's can be paired off as associates. Fields For the record, let us give the definition of a field (a concept that we have been using). 28 Advanced Linear Algebra

Definition A field is a set - , containing at least two elements, together with two binary operations, called addition (denoted by b ) and multiplication (denoted by juxtaposition), for which the following hold: 1) - is an abelian group under addition. 2) The set --i of all nonzero elements in is an abelian group under multiplication. 3) ()Distributivity For all Á Á   - , ² b ³ ~  b  and ² b ³ ~  b  We require that - have at least two elements to avoid the pathological case, in which ~.

Example 0.17 The sets rs , and d , of all rational, real and complex numbers, respectively, are fields, under the usual operations of addition and multiplication of numbers.

Example 0.18 The ring { is a field if and only if  is a prime number. We have already seen that { is not a field if  is not prime, since a field is also an integral domain. Now suppose that ~ is a prime.

We have seen that { is an integral domain and so it remains to show that every nonzero element in {{ has a multiplicative inverse. Let £ . Since , we know that  and  are relatively prime. It follows that there exist integers "# and for which " b # ~  Hence, " – ² c #³ – mod  and so "p~ in { , that is, " is the multiplicative inverse of  .

The previous example shows that not all fields are infinite sets. In fact, finite fields play an extremely important role in many areas of abstract and applied mathematics.

A field - is said to be algebraically closed if every nonconstant polynomial over -- has a root in . This is equivalent to saying that every nonconstant polynomial splits into linear factors over - . For example, the complex field d is algebraically closed but the real field s is not. We mention without proof that every field -- is contained in an algebraically closed field , called the algebraic closure of - . The Characteristic of a Ring Let 9 be a ring with identity. If is a positive integer then by h , we simply mean Preliminaries 29

h ~’••“••” bÄb  terms Now, it may happen that there is a positive integer  for which h~

For instance, in {{ , we have h~~ . On the other hand, in , the equation h~ implies ~ and so no such positive integer exists.

Notice that, in any finite ring, there must exist such a positive integer  , since the infinite sequence of numbers hÁhÁhÁÃ cannot be distinct and so h~h for some  , whence ²c³h~ .

Definition Let 9 be a ring with identity. The smallest positive integer for which h~ is called the characteristic of 9 . If no such number  exists, we say that 9 has characteristic . The characteristic of 9 is denoted by char²9³.

If char²9³ ~  then for any  9 , we have h ~’••“••” bÄb ~²bÄb³ ~h ~ ’••“••”  terms terms Theorem 0.30 Any finite ring has nonzero characteristic. Any finite field has prime characteristic. Proof. We have already seen that a finite ring has nonzero characteristic. Let - be a finite field and suppose that char²-³~€ . If ~ , where Á then h~ . Hence, ²h³²h³~ , implying that h~ or h~ . In either case, we have a contradiction to the fact that  is the smallest positive integer such that h~ . Hence,  must be prime.

Notice that in any field -~- of characteristic , we have for all . Thus, in - ~c for all - This property takes a bit of getting used to and makes fields of characteristic  quite exceptional. (As it happens, there are many important uses for fields of characteristic .) Algebras The final algebraic structure of which we will have use is a combination of a vector space and a ring. (We have not yet officially defined vector spaces, but we will do so before needing the following definition, which is placed here for easy reference.) 30 Advanced Linear Algebra

Definition An algebra 77 over a field - is a nonempty set , together with three operations, called addition (denoted by b ), multiplication (denoted by juxtaposition) and (also denoted by juxtaposition), for which the following properties hold: 1) 7 is a vector space over - under addition and scalar multiplication. 2) 7 is a ring under addition and multiplication. 3) If - and Á7 then ²³ ~ ² ³ ~ ² ³ Thus, an algebra is a vector space in which we can take the product of vectors, or a ring in which we can multiply each element by a scalar (subject, of course, to additional requirements as given in the definition). Part I—Basic Linear Algebra Chapter 1 Vector Spaces

Vector Spaces Let us begin with the definition of one of our principal objects of study.

Definition Let - be a field, whose elements are referred to as scalars . A vector space over -= is a nonempty set , whose elements are referred to as vectors , together with two operations. The first operation, called addition and denoted byb²"Á#³="b#= , assigns to each pair of vectors in a vector in . The second operation, called scalar multiplication and denoted by juxtaposition, assigns to each pair ²Á"³-d= r a vector " in = . Furthermore, the following properties must be satisfied: 1) ()Associativity of addition For all vectors "Á #Á $  = " b ²# b $³ ~ ²" b #³ b $

2) ()Commutativity of addition For all vectors "Á #  = "b#~#b" 3) ()Existence of a zero There is a vector = with the property that b"~"b~" for all vectors "= . 4) ()Existence of additive inverses For each vector "= , there is a vector in =c" , denoted by , with the property that "b²c"³~²c"³b"~ 34 Advanced Linear Algebra

5) ()Properties of scalar multiplication For all scalars Á   F and for all vectors "Á #  = ²"b#³~"b# ² b ³" ~ " b " ²³" ~ ²"³ " ~ " Note that the first four properties in the definition of vector space can be summarized by saying that = is an abelian group under addition.

Any expression of the form

 # bÄb  # where - and #= for all  , is called a of the vectors #ÁÃÁ#. If at least one of the scalars   is nonzero, then the linear combination is nontrivial.

Example 1.1 1) Let -- be a field. The set - of all functions from -- to is a vector space over - , under the operations of ordinary addition and scalar multiplication of functions ² b ³²%³ ~ ²%³ b ²%³ and ²³²%³ ~ ²²%³³

2) The set CÁ²- ³ of all  d  matrices with entries in a field - is a vector space over - , under the operations of matrix addition and scalar multiplication. 3) The set - of all ordered -tuples, whose components lie in a field - , is a vector space over - , with addition and scalar multiplication defined componentwise

² ÁÃ Á ³b² ÁÃ Á ³~² b ÁÃ Á b ³ and

² Áà Á ³ ~ ²  Áà Á  ³ When convenient, we will also write the elements of -  in column form.  When --=²Á³- is a finite field  with elements, we write for  . 4) Many sequence spaces are vector spaces. The set Seq²- ³ of all infinite sequences with members from a field - is a vector space under componentwise operations

²  ³ b ²! ³ ~ ²  b ! ³ Vector Spaces 35

and

²  ³ ~ ² ³

In a similar way, the set  of all sequences of complex numbers that converge to M is a vector space, as is the set B of all bounded complex sequences. Also, if M is a positive integer then the set  of all complex sequences ²  ³ for which B   ((  B ~ is a vector space under componentwise operations. To see that addition is a binary operation on M , one verifies Minkowski's inequality

BBB° ° °  898989(( b!  ((  b (( !  ~ ~ ~ which we will not do here. Subspaces Most algebraic structures contain substructures, and vector spaces are no exception.

Definition A subspace of a vector space =:= is a subset of that is a vector space in its own right under the operations obtained by restricting the operations of =: to .

Since many of the properties of addition and scalar multiplication hold a fortiori in a nonempty subset :: , we can establish that is a subspace merely by checking that := is closed under the operations of .

Theorem 1.1 A nonempty subset :== of a vector space is a subspace of if and only if : is closed under addition and scalar multiplication or, equivalently, : is closed under linear combinations, that is Á   - Á "Á #  : ¬ " b #  : Example 1.2 Consider the vector space =²Á³ of all binary  -tuples, that is, -tuples of 's and 's. The weight M ²#³#=²Á³ of a vector is the number of nonzero coordinates in # . For instance, M ²³ ~  . Let , be the set of all vectors in =,=²Á³ of even weight. Then  is a subspace of .

To see this, note that MMMM²" b #³ ~ ²"³ b ²#³ c  ²" q #³ where "q# is the vector in =²Á³ whose  th component is the product of the 36 Advanced Linear Algebra

"#th components of and , that is,

²" q #³ ~ " h # Hence, if MM²"³ and ²#³ are both even, so is M ²" b #³ . Finally, scalar multiplication over -, is trivial and so is a subspace of =²Á³ , known as the even weight subspace of =²Á³.

Example 1.3 Any subspace of the vector space =²Á³ is called a . Linear codes are among the most important and most studied types of codes, because their structure allows for efficient encoding and decoding of information. The Lattice of Subspaces The set I²= ³ of all subspaces of a vector space = is partially ordered by set inclusion. The zero subspace ¸¹ is the smallest element in I ²= ³ and the entire space = is the largest element.

If :Á ; I ²= ³ then : q ; is the largest subspace of = that is contained in both :; and . In terms of set inclusion, :q; is the greatest lower bound of : and ; :q;~glb ¸:Á;¹

Similarly, if ¸: “   2¹ is any collection of subspaces of = then their intersection is the greatest lower bound of the subspaces

:~glb ¸:“2¹ 2 On the other hand, if :Á ; II ²= ³ (and - is infinite) then : r ;  ²= ³ if and only if :‹; or ;‹: . Thus, the union of two subspaces is never a subspace in any “interesting” case. We also have the following.

Theorem 1.2 A nontrivial vector space =- over an infinite field is not the union of a finite number of proper subspaces. Proof. Suppose that =~:rÄr: , where we may assume that

:‹\ : rÄr: 

Let $: ±²: rÄr:  ³ and let #¤:  . Consider the infinite set (~¸ $b#“ -¹ which is the “line” through #$ , parallel to . We want to show that each : contains at most one vector from the infinite set ( , which is contrary to the fact that =~:rÄr: . This will prove the theorem. Vector Spaces 37

If $b#: for £ then $: implies #: , contrary to assumption. Next, suppose that $b#: and $b#: , for ‚ , where  £ . Then

:  ² $ b #³ c ²  $ b #³ ~ ²  c ³$ and so $: , which is also contrary to assumption.

To determine the smallest subspace of =:; containing the subspaces and , we make the following definition.

Definition Let :; and be subspaces of = . The sum :b; is defined by :b; ~¸"b#“":Á#;¹

More generally, the sum of any collection ¸: “   2¹ of subspaces is the set of all finite sums of vectors from the union :

:~HI bÄb “  : 2 2 It is not hard to show that the sum of any collection of subspaces of = is a subspace of = and that in terms of set inclusion, the sum is the least upper bound :b;~lub ¸:Á;¹ More generally,

:~lub ¸:“2¹ 2 If a partially ordered set 7 has the property that every pair of elements has a least upper bound and greatest lower bound, then 77 is called a lattice . If has a smallest element and a largest element and has the property that every collection of elements has a least upper bound and greatest lower bound, then 7 is called a complete lattice .

Theorem 1.3 The set I²= ³ of all subspaces of a vector space = is a complete lattice under set inclusion, with smallest element ¸¹ , largest element = ,

glb¸: “   2¹ ~ : 2 and

lub¸: “   2¹ ~ : 2 38 Advanced Linear Algebra

Direct Sums As we will see, there are many ways to construct new vector spaces from old ones. External Direct Sums

Definition Let =ÁÃÁ= be vector spaces over a field - . The external direct sum of =ÁÃÁ= , denoted by

=~=^^ Ä = is the vector space = whose elements are ordered -tuples

= ~¸²# Áà Á# ³“# =Á~ÁÃÁ¹ with componentwise operations

²" ÁÃ Á" ³b²# ÁÃ Á# ³~²" b# ÁÃ Á" b# ³ and

²# ÁÃ Á# ³~² #  ÁÃ Á #  ³ Example 1.4 The vector space -- is the external direct sum of copies of , that is, -~-Ä ^^ - where there are  summands on the right-hand side.

This construction can be generalized to any collection of vector spaces by generalizing the idea that an ordered ²#ÁÃÁ#³ -tuple  is just a function ¢ ¸Á à Á ¹ ¦ = from the index set ¸Á à Á ¹ to the union of the spaces with the property that ²³ = .

Definition Let < ~¸=“2¹ be any family of vector spaces over - . The direct product of < is the vector space

= ~HI ¢ 2 ¦ =d ²³  = 2 2 thought of as a subspace of the vector space of all functions from 2= to   .

It will prove more useful to restrict the set of functions to those with finite support.

Definition Let < ~¸=“2¹ be a family of vector spaces over - . The support of a function ¢2 ¦ = is the set supp²³~¸2“²³£¹ Vector Spaces 39

Thus, a function  has finite support if ²³ ~  for all but a finite number of 2. The external direct sum of the family < is the vector space

ext  =~HI ¢2¦ =d ²³=, has finite support 2 2 thought of as a subspace of the vector space of all functions from 2= to   .

2 An important special case occurs when =~= for all 2 . If we let = 2 denote the set of all functions from 2 to = and ²= ³ denote the set of all functions in = 2 that have finite support then

22ext =~= and  =~²=³ 2 2 Note that the direct product and the external direct sum are the same for a finite family of vector spaces. Internal Direct Sums An internal version of the direct sum construction is often more relevant.

Definition Let == be a vector space. We say that is the (internal ) direct sum of a family < ~¸:“2¹ of subspaces of = if every vector #= can be written, in a unique way (except for order), as a finite sum of vectors from the subspaces in < , that is, if for all #= ,

#~" bÄb" for ": and furthermore, if

#~$ bÄb$ where $: then ~ and (after reindexing if necessary) $~"  for all ~ÁÃÁ.

If = is the direct sum of < , we write

=~ : 2 and refer to each :=~¸:ÁÃÁ:¹ as a direct summand of . If < is a finite family, we write

=~:lÄl: If =~:l; then ; is called a complement of : in = . 40 Advanced Linear Algebra

Note that a sum is direct if and only if whenever "bÄb"~ where

": and £  then "~   for all  , that is, if and only if  has a unique representation as a sum of vectors from distinct subspaces.

The reader will be asked in a later chapter to show that the concepts of internal and external direct sum are essentially equivalent (isomorphic). For this reason, we often use the term “direct sum” without qualification. Once we have discussed the concept of a basis, the following theorem can be easily proved.

Theorem 1.4 Any subspace of a vector space has a complement, that is, if : is a subspace of =;=~:l; then there exists a subspace for which .

It should be emphasized that a subspace generally has many complements (although they are isomorphic). The reader can easily find examples of this in s. We will have more to say about the existence and uniqueness of complements later in the book.

The following characterization of direct sums is quite useful.

Theorem 1.5 A vector space =~¸:“2¹ is the direct sum of a family <  of subspaces if and only if 1) =: is the sum of the 

=~ : 2 2) For each 2 ,

: q45 : ~ ¸¹ £ Proof. Suppose first that = is the direct sum of < . Then 1) certainly holds and if

#: q45 : £ then #~  for some : and

#~  bÄb

where : and £  for all ~ÁÃÁ . Hence, by the uniqueness of direct sum representations,  ~ and so #~ . Thus, 2) holds.

For the converse, suppose that 1) and 2) hold. We need only verify the uniqueness condition. If

#~  bÄb and Vector Spaces 41

#~! bÄb!

where : and !:    then by including additional terms equal to  we may assume that the index sets ¸ ÁÃÁ ¹ and ¸  ÁÃÁ  ¹ are the same set ¸ ÁÃÁ ¹, that is

#~  bÄb and

#~! bÄb! Thus,

²  c! ³bÄb²   c! ³~

Hence, each term ""c!:  " is a sum of vectors from subspaces other than

:"""""", which can happen only if c!~ . Thus, ~!= for all and is the direct sum of < .

Example 1.5 Any matrix (C can be written in the form  (~ ²(b(³b!! ²(c(³~)b* (1.1)  where ((! is the transpose of . It is easy to verify that )* is symmetric and is skew-symmetric and so (1.1) is a decomposition of ( as the sum of a symmetric matrix and a skew-symmetric matrix.

Since the sets Sym and SkewSym of all symmetric and skew-symmetric matrices in CC are subspaces of , we have

C ~bSym SkewSym Furthermore, if :b;~:ZZ b; , where : and : Z are symmetric and ; and ; Z are skew-symmetric, then the matrix <~:c:~;c;ZZ is both symmetric and skew-symmetric. Hence, provided that char²- ³ £  , we must have <~ and so :~:ZZ and ;~; . Thus,

C ~lSym SkewSym

Spanning Sets and Linear Independence A set of vectors spans a vector space if every vector can be written as a linear combination of some of the vectors in that set. Here is the formal definition. 42 Advanced Linear Algebra

Definition The subspace spanned (or subspace generated ) by a set : of vectors in =: is the set of all linear combinations of vectors from

º:» ~span ²:³ ~ ¸  # b Ä b  # “   - Á #   = ¹

When : ~¸# ÁÃÁ# ¹ is a finite set, we use the notation º#  ÁÃÁ# » , or span²# ÁÃÁ# ³. A set : of vectors in = is said to span = , or generate = , if =~span ²:³, that is, if every vector #= can be written in the form

#~  # bÄb  # for some scalars  ÁÃÁ and vectors #  ÁÃÁ# .

It is clear that any superset of a spanning set is also a spanning set. Note also that all vector spaces have spanning sets, since the entire space is a spanning set.

Definition A nonempty set := of vectors in is linearly independent if for any #ÁÃÁ# in : , we have

 # bÄb  # ~¬  ~Ä~  ~ If a set of vectors is not linearly independent, it is said to be linearly dependent.

It follows from the definition that any nonempty subset of a linearly independent set is linearly independent.

Theorem 1.6 Let := be a set of vectors in . The following are equivalent: 1) : is linearly independent. 2) Every vector in span²:³ has a unique expression as a linear combination of the vectors in : . 3) No vector in :: is a linear combination of the other vectors in .

The following key theorem relates the notions of spanning set and linear independence.

Theorem 1.7 Let := be a set of vectors in . The following are equivalent: 1) := is linearly independent and spans . 2) For every vector #= , there is a unique set of vectors # ÁÃÁ# in : , along with a unique set of scalars ÁÃÁ  in - , for which

#~  # bÄb  # 3) ::=: is a minimal spanning set , that is, spans but any proper subset of does not span = . 4) :: is a maximal linearly independent set , that is, is linearly independent, but any proper superset of : is not linearly independent. Proof. We leave it to the reader to show that 1) and 2) are equivalent. Now suppose 1) holds. Then ::: is a spanning set. If some proper subset Z of also Vector Spaces 43 spanned =:c: then any vector in Z would be a linear combination of the vectors in ::Z , contradicting the fact that the vectors in are linearly independent. Hence 1) implies 3).

Conversely, if : is a minimal spanning set then it must be linearly independent. For if not, some vector : would be a linear combination of the other vectors in ::c¸ and so ¹ would be a proper spanning subset of : , which is not possible. Hence 3) implies 1).

Suppose again that 1) holds. If : were not maximal, there would be a vector #= c: for which the set :r¸#¹ is linearly independent. But then # is not in the span of ::: , contradicting the fact that is a spanning set. Hence, is a maximal linearly independent set and so 1) implies 4).

Conversely, if ::= is a maximal linearly independent set then must span , for if not, we could find a vector #= c: that is not a linear combination of the vectors in ::r¸#¹ . Hence, would be a linearly independent proper superset of :, which is a contradiction. Thus, 4) implies 1).

Definition A set of vectors in = that satisfies any (and hence all) of the equivalent conditions in Theorem 1.7 is called a basis for = .

Corollary 1.8 A finite set : ~¸# ÁÃÁ# ¹ of vectors in = is a basis for = if and only if

= ~ º# » l Ä l º# »

 Example 1.6 The - th standard vector in is the vector  that has s in all coordinate positions except the  th, where it has a . Thus,

 ~²ÁÁÃÁ³Á  ~²ÁÁÃÁ³ ÁÃÁ   ~²ÁÃÁÁ³

 The set ¸ ÁÃÁ ¹ is called the for - .

The proof that every nontrivial vector space has a basis is a classic example of the use of Zorn's lemma.

Theorem 1.9 Let =0 be a nonzero vector space. Let be a linearly independent set in =: and let be a spanning set in = containing 0 . Then there is a basis 8 for =0‹‹: for which 8 . In particular, 1) Any vector space, except the zero space ¸¹ , has a basis. 2) Any linearly independent set in = is contained in a basis. 3) Any spanning set in = contains a basis. Proof. Consider the collection 7 of all linearly independent subsets of = containing 0: and contained in . This collection is not empty, since 07 . Now, if 44 Advanced Linear Algebra

9 ~¸0 “2¹ is a chain in 7 then the union

<~ 0 2 is linearly independent and satisfies 0‹<‹: , that is, <7 . Hence, every chain in 77 has an upper bound in and according to Zorn's lemma, 7 must contain a maximal element 8 , which is linearly independent.

Now, 8 is a basis for the vector space º:» ~ = , for if any  : is not a linear combination of the elements of 88 then r¸ ¹‹: is linearly independent, contradicting the maximality of 88 . Hence :‹º» and so =~º:»‹º» 8 .

The reader can now show, using Theorem 1.9, that any subspace of a vector space has a complement. The Dimension of a Vector Space The next result, with its classical elegant proof, says that if a vector space = has a finite spanning set : then the size of any linearly independent set cannot exceed the size of : .

Theorem 1.10 Let = be a vector space and assume that the vectors # ÁÃÁ# are linearly independent and the vectors ÁÃÁ span = . Then  . Proof. First, we list the two sets of vectors: the spanning set followed by the linearly independent set

 ÁÃÁ Â# ÁÃÁ#

Then we move the first vector # to the front of the first list

#Á ÁÃÁ Â#ÁÃÁ# 

Since ÁÃÁ span = , #  is a linear combination of the  's. This implies that we may remove one of the  's, which by reindexing if necessary can be , from the first list and still have a spanning set

#Á ÁÃÁ Â#ÁÃÁ#  Note that the first set of vectors still spans = and the second set is still linearly independent.

Now we repeat the process, moving # from the second list to the first list

#Á#Á ÁÃÁ Â#ÁÃÁ#  As before, the vectors in the first list are linearly dependent, since they spanned =## before the inclusion of  . However, since the 's are linearly independent, any nontrivial linear combination of the vectors in the first list that equals  Vector Spaces 45

must involve at least one of the  's. Hence, we may remove that vector, which again by reindexing if necessary may be taken to be  and still have a spanning set

#Á#Á ÁÃÁ Â#ÁÃÁ#  Once again, the first set of vectors spans = and the second set is still linearly independent.

Now, if  , then this process will eventually exhaust the  's and lead to the list

#Á#ÁÃÁ#Â# b ÁÃÁ#  where #Á#ÁÃÁ#  span = , which is clearly not possible since #  is not in the span of #Á#ÁÃÁ#  . Hence,  .

Corollary 1.11 If == has a finite spanning set then any two bases of have the same size.

Now let us prove Corollary 1.11 for arbitrary vector spaces.

Theorem 1.12 If == is a vector space then any two bases for have the same cardinality. Proof. We may assume that all bases for = are infinite sets, for if any basis is finite then = has a finite spanning set and so Corollary 1.11 applies.

Let 89~¸“0¹ be a basis for = and let be another basis for = . Then any vector 98 can be written as a finite linear combination of the vectors in , where all of the coefficients are nonzero, say

~ 

< But because 9 is a basis, we must have

 <~0 9 for if the vectors in 9 can be expressed as finite linear combinations of the vectors in a proper subset 88ZZ of then 8 spans = , which is not the case.

Since ((<L for all 9 , Theorem 0.16 implies that

(899 (~0L (( (( ~ (( But we may also reverse the roles of 89 and , to conclude that (( 89 (( and so ((89~ (( by the Schro¨der–Bernstein theorem.

Theorem 1.12 allows us to make the following definition. 46 Advanced Linear Algebra

Definition A vector space = is finite-dimensional if it is the zero space ¸¹ , or if it has a finite basis. All other vector spaces are infinite-dimensional . The dimension of the zero space is  and the dimension of any nonzero vector space === is the cardinality of any basis for . If a vector space has a basis of cardinality  , we say that = is -dimensional and write dim²= ³ ~ .

It is easy to see that if : is a subspace of = then dim ²:³  dim ²= ³ . If in addition, dim²:³ ~ dim ²= ³  B then : ~ = .

Theorem 1.13 Let = be a vector space. 1) If 8 is a basis for =~rq~J and if 888 and 88  then

=~º88 »lº »

2) Let =~:l; . If 88 is a basis for : and is a basis for ; then 88q~J and 888 ~r  is a basis for = .

Theorem 1.14 Let :; and be subspaces of a vector space = . Then dim²:³ b dim ²; ³ ~ dim ²: b ; ³ b dim ²: q ; ³ In particular, if ;:= is any complement of in then dim²:³ b dim ²; ³ ~ dim ²= ³ that is, dim²: l ; ³ ~ dim ²:³ b dim ²; ³

Proof. Suppose that 8 ~¸“0¹ is a basis for :q; . Extend this to a basis 78r: for where 7 ~¸“1¹ is disjoint from 8 . Also, extend 8 to a basis 89r; for where 9 ~¸“2¹ is disjoint from 8 . We claim that 789rr is a basis for :b; . It is clear that ºrr»~:b; 789 .

To see that 789rr is linearly independent, suppose to the contrary that

#bÄb  #~ where #789 r r and  £ for all  . There must be vectors #  in this expression from both 79 and , since 7889rr and are linearly independent. Isolating the terms involving the vectors from 7 on one side of the equality shows that there is a nonzero vector in %º789 »qº r » . But then %:q; and so %º78 »qº » , which implies that %~ , a contradiction. Hence, 789rr is linearly independent and a basis for :b; . Vector Spaces 47

Now, dim²:³ b dim ²; ³ ~((((78 r b 89 r ~((7889 bbb (( (( (( ~((789 b (( b (( bdim ²: q ; ³ ~dim ²:b;³b dim ²:q;³ as desired.

It is worth emphasizing that while the equation dim²:³ b dim ²; ³ ~ dim ²: b ; ³ b dim ²: q ; ³ holds for all vector spaces, we cannot write dim²: b ; ³ ~ dim ²:³ b dim ²; ³ c dim ²: q ; ³ unless :b; is finite-dimensional. Ordered Bases and Coordinate Matrices It will be convenient to consider bases that have an order imposed upon their members.

Definition Let == be a vector space of dimension . An ordered basis for is an ordered  -tuple ²# ÁÃÁ# ³ of vectors for which the set ¸#  ÁÃÁ# ¹ is a basis for = .

If 8 ~²#ÁÃÁ# ³ is an ordered basis for = then for each #= there is a unique ordered ² ÁÃÁ ³ -tuple  of scalars for which

#~  # bÄb  #

 Accordingly, we can define the coordinate map 8¢= ¦- by

vy  88²#³ ~ ´#µ ~ Å (1.3) wz  where the column matrix ´#µ8 is known as the coordinate matrix of # with respect to the ordered basis 8 . Clearly, knowing ´#µ8 is equivalent to knowing # (assuming knowledge of 8 ).

Furthermore, it is easy to see that the coordinate map 8 is bijective and preserves the vector space operations, that is,

888² # bÄb  # ³~  ²#³bÄb   ²#³  or equivalently 48 Advanced Linear Algebra

´ # bÄb  # µ88 ~  ´#µ  bÄb  ´#µ  8 Functions from one vector space to another that preserve the vector space operations are called linear transformations and form the objects of study in the next chapter. The Row and Column Spaces of a Matrix Let (d- be an matrix over . The rows of ( span a subspace of - known as the row space of ((- and the columns of span a subspace of  known as the column space of ( . The dimensions of these spaces are called the row rank and column rank , respectively. We denote the row space and row rank by rs²(³ and rrk ²(³ and the column space and column rank by cs ²(³ and crk ²(³ .

It is a remarkable and useful fact that the row rank of a matrix is always equal to its column rank, despite the fact that if £ , the row space and column space are not even in the same vector space!

Our proof of this fact hinges upon the following simple observation about matrices.

Lemma 1.15 Let (d be an matrix. Then elementary column operations do not affect the row rank of ( . Similarly, elementary row operations do not affect the column rank of ( . Proof. The second statement follows from the first by taking transposes. As to the first, the row space of ( is

rs²(³ ~ º (Á Ã Á  (»

 where - are the standard basis vectors in . Performing an elementary column operation on (( is equivalent to multiplying on the right by an elementary matrix ,(, . Hence the row space of is

rs²(,³ ~ º (,ÁÃÁ (,» and since , is invertible, rr²(³ ~dim ² rs ²(³³ ~ dim ² rs ²(,³³ ~ rr ²(,³ as desired.

Theorem 1.16 If ( CÁ then rrk ²(³ ~ crk ²(³ . This number is called the rank of (²(³ and is denoted by rk . Proof. According to the previous lemma, we may reduce ( to reduced column echelon form without affecting the row rank. But this reduction does not affect the column rank either. Then we may further reduce ( to reduced row echelon form without affecting either rank. The resulting matrix 4 has the same row and column ranks as (4 . But is a matrix with  's followed by  's on the main Vector Spaces 49

diagonal (entries 4Á4ÁÃÁ Á ) and  's elsewhere. Hence, rrk²(³ ~ rrk ²4³ ~ crk ²4³ ~ crk ²(³ as desired. The Complexification of a Real Vector Space If > is a complex vector space (that is, a vector space over d ), then we can think of > as a real vector space simply by restricting all scalars to the field s . Let us denote this real vector space by >>s and call it the real version of .

On the other hand, to each real vector space = , we can associate a complex vector space = d . This “complexification” process will play a useful role when we discuss the structure of linear operators on a real vector space. (Throughout our discussion = will denote a real vector space.)

Definition If ==~=d= is a real vector space then the set d of ordered pairs, with componentwise addition ²"Á #³ b ²%Á &³ ~ ²" b %Á # b &³ and scalar multiplication over d defined by ² b ³²"Á #³ ~ ²" c #Á # b "³ for Á  s is a complex vector space, called the complexification of = .

It is convenient to introduce a notation for vectors in = d that resembles complex numbers. In particular, we denote ²"Á #³  =d by " b # and so =~¸"b#“"Á#=¹d Addition now looks like ordinary addition of complex numbers ²"b#³b²%b&³~²"b%³b²#b&³ and scalar multiplication looks like ordinary multiplication of complex numbers ²b³²"b#³~²"c#³b²#b"³ Thus, for example, we immediately have for Á   s ²" b #³ ~ " b # ²" b #³ ~ c# b " ²b³"~"b" ²b³#~c#b#

The real part of '~"b# is "= and the imaginary part of ' is #= . The essence of the fact that '~"b#=d is really an ordered pair is that ' is  if and only if its real and imaginary parts are both . 50 Advanced Linear Algebra

We can define the complexification map cpx¢= ¦=d by cpx²#³ ~ # b 

Let us refer to #b as the complexification , or complex version of #= . Note that this map is a group homomorphism, that is, cpx²³ ~  b and cpx ²" f #³ ~ cpx ²"³ f cpx ²#³ and it is injective cpx²"³ ~ cpx ²#³ ¯ " ~ # Also, it preserves multiplication by real scalars cpx²"³ ~ " b  ~ ²" b ³ ~  cpx ²"³ for s . However, the complexification map is not surjective, since it gives only “real” vectors in = d .

The complexification map is an injective linear transformation from the real dd vector space =²=³= to the real version s of the complexification , that is, to the complex vector space = d provided that scalars are restricted to real numbers. In this way, we see that ==d contains an embedded copy of . The Dimension of = d The vector-space dimensions of == and d are the same. This should not necessarily come as a surprise because although ==d may seem “bigger” than , the field of scalars is also “bigger.”

Theorem 1.17 If 8s~¸#“0¹ is a basis for = over then the complexification of 8

cpx²³~¸#b“#88 ¹ is a basis for the vector space = d over d . Hence, dim²=d ³ ~ dim ²= ³

Proof. To see that cpx²³8d spans =dd over , let %b&= . Then %Á&= and so there exist real numbers  and (some of which may be  ) for which Vector Spaces 51

11 %b&~ # b@A #   ~ ~ 1 ~²#b#³   ~ 1 ~ ² b  ³²# b ³ ~

To see that cpx²³8 is linearly independent, if

1  ² b³²# b³~b ~ then the previous computations show that

11 # ~ and #  ~ ~ ~

The independence of 8 then implies that ~ and ~ for all  .

If #= and 8 is a basis for = then we may write  #~ # ~ for  s . Since the coefficients are real, we have  #b~ ²# b³ ~ and so the coordinate matrices are equal

´# b µcpx²³8 ~ ´#µ8

Exercises 1. Let = be a vector space over - . Prove that #~ and ~ for all #= and - . Describe the different  's in these equations. Prove that if #~ then ~ or #~ . Prove that #~# implies that #~ or ~ . 2. Prove Theorem 1.3. 3. a) Find an abelian group =-= and a field for which is a vector space over - in at least two different ways, that is, there are two different definitions of scalar multiplication making =- a vector space over . 52 Advanced Linear Algebra

b) Find a vector space =- over and a subset := of that is (1) a subspace of = and (2) a vector space using operations that differ from those of = . 4. Suppose that =~¸“0¹: is a vector space with basis 8  and is a subspace of =¸)ÁÃÁ)¹ . Let  be a partition of 8 . Then is it true that

 : ~ ²: q º) »³ ~

What if : q º) » £ ¸¹ for all  ? 5. Prove Corollary 1.8. 6. Let :Á;Á< I ²= ³ . Show that if < ‹ : then : q ²; b < ³ ~ ²: q ; ³ b <

This is called the modular law for the lattice I²= ³ . 7. For what vector spaces does the distributive law of subspaces : q ²; b < ³ ~ ²: q ; ³ b ²: q < ³ hold?  8. A vector #~² ÁÃÁ ³s is called strongly positive if   € for all ~ÁÃÁ. a) Suppose that # is strongly positive. Show that any vector that is “close enough” to # is also strongly positive. (Formulate carefully what “close enough” should mean.) b) Prove that if a subspace : of s contains a strongly positive vector, then : has a basis of strongly positive vectors. 9. Let 4d be an matrix whose rows are linearly independent. Suppose

that the  columns  ÁÃÁ of 4 span the column space of 4 . Let * be

the matrix obtained from 4 by deleting all columns except  ÁÃÁ . Show that the rows of * are also linearly independent. 10. Prove that the first two statements in Theorem 1.7 are equivalent. 11. Show that if : is a subspace of a vector space = then dim ²:³  dim ²= ³ . Furthermore, if dim²:³ ~ dim ²= ³  B then : ~ = . Give an example to show that the finiteness is required in the second statement. 12. Let dim²=³B and suppose that = ~

15. Prove that the vector space 9ss of all continuous functions from to is infinite-dimensional. 16. Show that Theorem 1.2 need not hold if the base field - is finite. 17. Let :=#b:~¸#b be a subspace of . The set “ :¹ is called an affine subspace of = . a) Under what conditions is an affine subspace of == a subspace of ? b) Show that any two affine subspaces of the form #b: and $b: are either equal or disjoint. 18. If => and are vector spaces over - for which (( =~> ( ( then does it follow that dim²= ³ ~ dim ²> ³ ? 19. Let = be an -dimensional real vector space and suppose that : is a subspace of =²:³~c with dim . Define an equivalence relation – on the set =±: by #–$ if the “line segment” 3²#Á $³ ~ ¸ # b ² c ³$ “    ¹ has the property that 3²#Á $³ q : ~ J . Prove that – is an equivalence relation and that it has exactly two equivalence classes. 20. Let --2- be a field. A subfield of is a subset of that is a field in its own right using the same operations as defined on - . a) Show that -2- is a vector space over any subfield of . b) Suppose that - is an -dimensional vector space over a subfield 2 of -=. If is an  -dimensional vector space over - , show that = is also a vector space over 2= . What is the dimension of as a vector space over 2? 21. Let -= be a finite field of size and let be an -dimensional vector space over - . The purpose of this exercise is to show that the number of subspaces of = of dimension is  ² c ³Ä² c ³ 45~  ²c c ³Ä² c ³² c ³Ä² c ³

 The expressions ²³  are called Gaussian coefficients and have properties similar to those of the binomial coefficients. Let :²Á³ be the number of =-dimensional subspaces of . a) Let 5²Á³ be the number of  -tuples of linearly independent vectors ²# ÁÃÁ# ³ in = . Show that 5²Á ³ ~ ² c ³² c ³Ä² c c  ³ b) Now, each of the  -tuples in a) can be obtained by first choosing a subspace of = of dimension and then selecting the vectors from this subspace. Show that for any = -dimensional subspace of , the number of  -tuples of independent vectors in this subspace is ² c ³² c ³Ä² c c  ³ 54 Advanced Linear Algebra

c) Show that 5²Á ³ ~ :²Á ³² c ³² c ³Ä² c c  ³ How does this complete the proof? 22. Prove that any subspace : of s is a closed set or, equivalently, that its set complement :~s ±: is open, that is, for any %:  there is an open ball )² Á ³ centered at % with radius €  for which )²%Á ³ ‹ : . 23. Let 89~¸ÁÃÁ ¹ and ~¸ÁÃÁ  ¹ be bases for a vector space = . Let      c  . Show that there is a permutation  of ¸Á à Á ¹ such that

 ÁÃÁ Á²b³ ÁÃÁ ²³ and

²³ ÁÃÁ ²³ Áb ÁÃÁ  are both bases for = . 24. Let dim²=³~ and suppose that : ÁÃÁ: are subspaces of = with dim²: ³    . Prove that there is a subspace ; of = of dimension  c  for which ; q : ~ ¸¹ for all  . 25. What is the dimension of the complexification = d thought of as a real vector space? 26. (When is a subspace of a complex vector space a complexification?) Let = be a real vector space with complexification =

if and only if <¢=¦= is closed under complex conjugation  dd defined by ²"b#³~"c#. Chapter 2 Linear Transformations

Linear Transformations Loosely speaking, a linear transformation is a function from one vector space to another that preserves the vector space operations. Let us be more precise.

Definition Let => and be vector spaces over a field - . A function  ¢=¦> is a linear transformation if ² " b #³ ~ ²"³ b ²#³ for all scalars Á  - and vectors " , #  = . A linear transformation  ¢ = ¦ = is called a linear operator on == . The set of all linear transformations from to >²=Á>³ is denoted by B and the set of all linear operators on = is denoted by B²= ³.

We should mention that some authors use the term linear operator for any linear transformation from => to .

Definition The following terms are also employed: 1) homomorphism for linear transformation 2) endomorphism for linear operator 3) monomorphism (or embedding ) for injective linear transformation 4) epimorphism for surjective linear transformation 5) isomorphism for bijective linear transformation. 6) automorphism for bijective linear operator.

Example 2.1 1) The derivative +¢ = ¦ = is a linear operator on the vector space = of all infinitely differentiable functions on s . 56 Advanced Linear Algebra

2) The integral operator ¢-´%µ¦-´%µ defined by % ²³ ~ ²!³!  is a linear operator on -´%µ .  3) Let (d- be an matrix over . The function ( ¢-¦- defined by (²#³ ~ (#, where all vectors are written as column vectors, is a linear transformation from -- to . This function is just multiplication by ( . 4) The coordinate map ¢= ¦- of an  -dimensional vector space is a linear transformation from =- to  .

The set BB²= Á > ³ is a vector space in its own right and ²= ³ has the structure of an algebra, as defined in Chapter 0.

Theorem 2.1 1) The set B²= Á > ³ is a vector space under ordinary addition of functions and scalar multiplication of functions by elements of - . 2) If B²<Á=³ and B ²=Á>³ then the composition B is in ²<Á>³ . 3) If B²=Á>³ is bijective then c ²>Á=³ B . 4) The vector space B²= ³ is an algebra, where multiplication is composition of functions. The identity map B²=³ is the multiplicative identity and the zero map  B ²= ³ is the additive identity. Proof. We prove only part 3). Let ¢= ¦> be a bijective linear transformation. Then  c¢> ¦= is a well-defined function and since any two vectors $ and $ in > have the form $  ~ ²# ³ and $  ~ ²# ³ , we have c c ²$ b$³~ ² ²#³b  ²#³³  c ~ ² ²# b # ³³ ~# b# c c ~ ²$³b ²$³ which shows that  c is linear.

One of the easiest ways to define a linear transformation is to give its values on a basis. The following theorem says that we may assign these values arbitrarily and obtain a unique linear transformation by linear extension to the entire domain.

Theorem 2.2 Let => and be vector spaces and let 8 ~¸#“0¹ be a basis for = . Then we can define a linear transformation B  ²= Á > ³ by specifying the values of 8²# ³  > arbitrarily for all #  and extending the domain of  to = using linearity, that is,

²# bÄb  # ³~  ²#³bÄb   ²#³  Linear Transformations 57

This process uniquely defines a linear transformation, that is, if Á  B ²=Á>³ satisfy  ²#³~  ²#³ for all #   8 then  ~  . Proof. The crucial point is that the extension by linearity is well-defined, since each vector in = has a unique representation as a linear combination of a finite number of vectors in 8 . We leave the details to the reader.

Note that if B²=Á>³ and if : is a subspace of = , then the restriction  O: of  to ::> is a linear transformation from to . The Kernel and Image of a Linear Transformation There are two very important vector spaces associated with a linear transformation  from => to .

Definition Let B²=Á>³ . The subspace ker²³~¸#=“ ²#³~¹ is called the kernel of  and the subspace im²³~¸²#³“#=¹ is called the image of  . The dimension of ker²³ is called the nullity of  and is denoted by null²³ . The dimension of im ²³ is called the rank of and is denoted by rk²³ .

It is routine to show that ker²³ is a subspace of = and im ²³ is a subspace of > . Moreover, we have the following.

Theorem 2.3 Let B²=Á>³ . Then 1)  is surjective if and only if im²³~> 2)  is injective if and only if ker² ³ ~ ¸¹ Proof. The first statement is merely a restatement of the definition of surjectivity. To see the validity of the second statement, observe that ²"³ ~ ²#³ ¯  ²" c #³ ~  ¯ " c # ker ²  ³ Hence, if ker² ³ ~ ¸¹ then ²"³ ~ ²#³ ¯ " ~ # , which shows that  is injective. Conversely, if  is injective and " ker ² ³ then ²"³ ~ ²³ and so " ~ . This shows that ker ² ³ ~ ¸¹ . Isomorphisms Definition A bijective linear transformation ¢= ¦> is called an isomorphism from => to . When an isomorphism from => to exists, we say that => and are isomorphic and write =š> .

Example 2.2 Let dim²= ³ ~  . For any ordered basis 8 of = , the coordinate  map 8¢= ¦- that sends each vector #= to its coordinate matrix 58 Advanced Linear Algebra

 ´#µ8  - is an isomorphism. Hence, any  -dimensional vector space over - is isomorphic to -  .

Isomorphic vector spaces share many properties, as the next theorem shows. If B²=Á>³ and :‹= we write ²:³ ~ ¸ ² ³ “  :¹

Theorem 2.4 Let B²=Á>³ be an isomorphism. Let :‹= . Then 1) := spans if and only if  ²:³> spans . 2) : is linearly independent in = if and only if  ²:³ is linearly independent in > . 3) := is a basis for if and only if  ²:³> is a basis for .

An isomorphism can be characterized as a linear transformation ¢= ¦> that maps a basis for => to a basis for .

Theorem 2.5 A linear transformation B²=Á>³ is an isomorphism if and only if there is a basis 88 of =²³> for which is a basis of . In this case, maps any basis of => to a basis of .

The following theorem says that, up to isomorphism, there is only one vector space of any given dimension.

Theorem 2.6 Let => and be vector spaces over - . Then =š> if and only if dim²= ³ ~ dim ²> ³.

In Example 2.2, we saw that any  -dimensional vector space is isomorphic to  ) -. Now suppose that ) is a set of cardinality  and let ²- ³ be the vector space of all functions from )- to with finite support. We leave it to the reader ) to show that the functions ²- ³ defined for all ) , by %~if  ²%³ ~  F %£if

)) form a basis for ²- ³ , called the standard basis . Hence, dim ²²- ³ ³ ~(( ) .

It follows that for any cardinal number  , there is a vector space of dimension . ) Also, any vector space of dimension  is isomorphic to ²- ³ .

Theorem 2.7 If  is a natural number then any -dimensional vector space over -- is isomorphic to  . If  is any cardinal number and if ) is a set of cardinality  then any -dimensional vector space over - is isomorphic to the ) vector space ²- ³ of all functions from ) to - with finite support. Linear Transformations 59

The Rank Plus Nullity Theorem Let B²=Á>³ . Since any subspace of = has a complement, we can write =~ker ²³l ker ²³ where ker²³ is a complement of ker ²³ in = . It follows that dim²= ³ ~ dim ² ker ² ³³ b dim ² ker ² ³ ³ Now, the restriction of  to ker²³ ¢²³¦>ker is injective, since ker² ³ ~ ker ² ³ q ker ² ³ ~ ¸¹ Also, im²³‹²³ im . For the reverse inclusion, if  ²#³²³ im  then since #~"b$"²³$²³ for ker and ker  , we have ²#³ ~ ²"³ b ²$³ ~ ²$³ ~ ²$³ im ²  ³ Thus im²³~²³ im . It follows that ker²³š im ²³ From this, we deduce the following theorem.

Theorem 2.8 Let B²=Á>³ . 1) Any complement of ker²³ is isomorphic to im ²³ 2) ()The rank plus nullity theorem dim² ker ² ³³ b dim ²im ² ³³ ~ dim ²= ³ or, in other notation, rk² ³ b null ² ³ ~dim ²= ³ Theorem 2.8 has an important corollary.

Corollary 2.9 Let B ²=Á>³ , where dim ²=³~ dim ²>³B . Then  is injective if and only if it is surjective.

Note that this result fails if the vector spaces are not finite-dimensional. Linear Transformations from -- to Recall that for any d matrix ( over - the multiplication map

(²#³ ~ (# 60 Advanced Linear Algebra is a linear transformation. In fact, any linear transformation B²-Á-³ has this form, that is,  is just multiplication by a matrix, for we have

²³ 2323² ³ “ Ä “ ² ³  ~  ² ³ “ Ä “ ² ³ ~ ² ³ and so ~ ( where

( ~23 ² ³ “ Ä “ ² ³ Theorem 2.10  1) If ( is an  d  matrix over - then B(  ²- Á - ³ .  2) If B²-Á-³ then  ~( where

(~² ²³“Ä“ ²³³ The matrix ( is called the matrix of  .

Example 2.3 Consider the linear transformation ¢- ¦- defined by ²%Á&Á'³~²%c&Á'Á%b&b'³ Then we have, in column form

vy% v %c& y v c% yvy  &~~ '  & wz' w %b&b' z w    zwz ' and so the standard matrix of  is

vyc (~  wz

If (CÁ then since the image of ( is the column space of ( , we have  dim²²³³b²(³~²-³ ker( rk dim This gives the following useful result.

Theorem 2.11 Let (d- be an matrix over .  1) (¢- ¦- is injective if and only if rk ²(³~ n.  2) (¢- ¦- is surjective if and only if rk ²(³~ m. Matrices

Suppose that 89~²ÁÃÁ ³ and ~²ÁÃÁ  ³ are ordered bases for a vector space =´#µ´#µ . It is natural to ask how the coordinate matrices 89 and are c related. The map that takes ´#µ89899 to ´#µ is Á ~ 8 and is called the change of basis operator (or change of coordinates operator ). Since 89Á is an  operator on - , it has the form ( where Linear Transformations 61

(~²89Á ²³ÁÃÁ 89 Á ²³³ c c ~ ²989888 ²´ µ ³Á à Á  ²´ µ ³³ ~ ²´ µ99 Á à Á ´ µ ³³

We denote (4 by 89, and call it the change of basis matrix from 89 to .

Theorem 2.12 Let 89~²ÁÃÁ ³ and be ordered bases for a vector space c  =~-. Then the change of basis operator 89Á 9 8 is an automorphism of , whose standard matrix is

489, ~²´ µ 9 ÁÃÁ´ µ 9 ³³ Hence

´#µ9898 ~ 4Á ´#µ

c and 4~498Á 89, .

Consider the equation

(~489Á or equivalently,

(~²´ µ99 ÁÃÁ´ µ ³³ Then given any two of (dÁ (an invertible matrix)8 (an ordered basis for --) and 9 (an order basis for ), the third component is uniquely determined by this equation. This is clear if 89 and are given or if (( and 9 are given. If c and 89 are given then there is a unique for which (~498Á and so there is a unique 9 for which (~489Á .

Theorem 2.13 If we are given any two of the following: 1) An invertible d matrix ( . 2) An ordered basis 8 for -  . 3) An ordered basis 9 for -  . then the third is uniquely determined by the equation

(~489Á

The Matrix of a Linear Transformation Let ¢ = ¦ > be a linear transformation, where dim ²= ³ ~  and dim²> ³ ~  and let 89 ~ ² Á Ã Á  ³ be an ordered basis for = and an ordered basis for > . Then the map

¢ ´#µ89 ¦ ´ ²#³µ is a representation of  as a linear transformation from -- to , in the sense 62 Advanced Linear Algebra that knowing 89 (along with and , of course) is equivalent to knowing  . Of course, this representation depends on the choice of ordered bases 89 and .

Since is a linear transformation from -- to , it is just multiplication by an d matrix ( , that is

´ ²#³µ98 ~ (´#µ

Indeed, since ´ µ8 ~  , we get the columns of ( as follows:

²³ (~(~(´#µ~´²³µ89

Theorem 2.14 Let B²=Á>³ and let 8 ~²ÁÃÁ³ and 9 be ordered bases for => and , respectively. Then 8 can be represented with respect to and 9 as matrix multiplication, that is

´²#³µ~´µ9898, ´#µ where

´µ 89, ~²´²³µ“Ä“´²³µ³99 is called the matrix of 89 with respect to the bases and . When =~> and 89~´µ´µ, we denote 88, by  8 and so

´²#³µ~´µ´#µ888

Example 2.4 Let +¢FF ¦ be the derivative operator, defined on the vector space of all polynomials of degree at most ~~²Á%Á%³ . Let 89  . Then

vy vy vy   ´+²³µ~´µ~99, ´+²%³µ~´µ~ 99 Á´+²%³µ~´%µ~ 9 9  wz wz wz  and so

vy ´+µ8 ~  wz

Hence, for example, if ²%³~ b%b% then

v yvyvy  ´+²%³µ988 ~ ´+µ ´²%³µ ~  ~  w zwzwz   and so +²%³ ~  b % .

The following result shows that we may work equally well with linear transformations or with the matrices that represent them (with respect to fixed Linear Transformations 63 ordered bases 89 and ). This applies not only to addition and scalar multiplication, but also to matrix multiplication.

Theorem 2.15 Let => and be vector spaces over - , with ordered bases 89~²ÁÃÁ ³ and ~²ÁÃÁ  ³ , respectively. 1) The map B¢²=Á>³¦ CÁ ²-³ defined by

²³~´µ 89,

is an isomorphism and so BC²= Á > ³ šÁ ²- ³ . 2) If B²<Á=³ and B ²=Á>³ and if 89 , and : are ordered bases for <=, and > , respectively then

´µ8:,,, ~´µ´µ  9:  89 Thus, the matrix of the product (composition)  is the product of the matrices of  and . In fact, this is the primary motivation for the definition of matrix multiplication. Proof. To see that  is linear, observe that for all 

´  b ! µ89Á ´ µ 8 ~ ´²  b ! ³²  ³µ 9 ~ ´  ² ³ b ! ² ³µ9 ~ ´ ² ³µ99 b !´ ² ³µ ~ ´µ89Á ´µ 8 b!´µ 89 Á ´µ 8 ~² ´µ89ÁÁ b!´µ 89 ³´µ 8 and since ´ µ8 ~  is a standard basis vector, we conclude that

´  b ! µ89ÁÁÁ ~ ´  µ 89 b !´  µ 89

²³ and so C is linear. If (Á , we define by the condition ´ ²³µ  9 ~( , whence ²³~( and  is surjective. Since dim ² B ²=Á>³³~ dim ² CÁ ²-³³ , the map  is an isomorphism. To prove part 2), we have

´ µ8:ÁÁÁ ´#µ 8 ~ ´  ²  ²#³³µ : ~ ´  µ 9:, ´  ²#³µ 9 ~ ´  µ 9: ´  µ 89 ´#µ 8

Change of Bases for Linear Transformations

Since the matrix ´µ89, that represents depends on the ordered bases 89 and , it is natural to wonder how to choose these bases in order to make this matrix as simple as possible. For instance, can we always choose the bases so that  is represented by a diagonal matrix?

As we will see in Chapter 7, the answer to this question is no. In that chapter, we will take up the general question of how best to represent a linear operator by a matrix. For now, let us take the first step and describe the relationship between the matrices ´µ89ÁÁ and ´µ 8ZZ 9 of with respect to two different pairs ZZ ²Á³89 and ² 8 Á 9 ³ of ordered bases. Multiplication by ´µ 89ZZÁ sends ´#µ 8 Z to 64 Advanced Linear Algebra

Z ´²#³µ889Z . This can be reproduced by first switching from to , then applying Z ´µ9989Á and finally switching from to , that is,

c ´µ8ZZ,, 9 ~4 99ÁÁÁ Z ´µ 89 4 8 Z 8 ~4 99 Z ´µ  89 , 488Á Z Theorem 2.16 Let B²=>³ , and let ²Á³ 89 and ²Á³ 89ZZ be pairs of ordered bases of => and , respectively. Then

´µ89ZZÁÁÁÁ ~4 99 Z ´µ 89 4 88 Z (2.1) When B²=³ is a linear operator on = , it is generally more convenient to represent  by matrices of the form ´µ8 , where the ordered bases used to represent vectors in the domain and image are the same. When 89~ , Theorem 2.16 takes the following important form.

Corollary 2.17 Let B²=³ and let 8 and 9 be ordered bases for = . Then the matrix of 9 with respect to can be expressed in terms of the matrix of  with respect to 8 as follows

c ´µ9898 ~4Á ´µ489Á (2.2)

Equivalence of Matrices Since the change of basis matrices are precisely the invertible matrices, (2.1) has the form

c ´µ89ZZÁÁ ~7´µ 89 8 where 78 and are invertible matrices. This motivates the following definition.

Definition Two matrices () and are equivalent if there exist invertible matrices 78 and for which )~7(8c We remarked in Chapter 0 that )() is equivalent to if and only if can be obtained from ( by a series of elementary row and column operations. Performing the row operations is equivalent to multiplying the matrix ( on the left by 7( and performing the column operations is equivalent to multiplying on the right by 8c .

In terms of (2.1), we see that performing row operations (premultiplying by 7 ) is equivalent to changing the basis used to represent vectors in the image and performing column operations (postmultiplying by 8c ) is equivalent to changing the basis used to represent vectors in the domain. Linear Transformations 65

According to Theorem 2.16, if () and are matrices that represent  with respect to possibly different ordered bases then () and are equivalent. The converse of this also holds.

Theorem 2.18 Let = and > be vector spaces with dim ²= ³ ~  and dim²> ³ ~ . Then two  d  matrices ( and ) are equivalent if and only if they represent the same linear transformation B²=Á>³ , but possibly with respect to different ordered bases. In this case, () and represent exactly the same set of linear transformations in B²= Á > ³ . Proof. If () and represent  , that is, if

(~´µ89,, and )~´µ 8ZZ 9 for ordered bases 898ÁÁZZ and 9 then Theorem 2.16 shows that ( and ) are equivalent. Now suppose that () and are equivalent, say )~7(8c where 78 and are invertible. Suppose also that ( represents a linear transformation B²=Á>³ for some ordered bases 8 and 9 , that is,

(~´ µ89Á Theorem 2.13 implies that there is a unique ordered basis 8Z for = for which Z 8~488Á Z and a unique ordered basis 9 for > for which 7 ~499Á Z . Hence

)~499ÁÁÁZZZZ ´ µ 89 4 88 ~´ µ 89 Á Hence, )() also represents  . By symmetry, we see that and represent the same set of linear transformations. This completes the proof.

We remarked in Example 0.3 that every matrix is equivalent to exactly one matrix of the block form

0Ác 1~ >? cÁ cÁc block Hence, the set of these matrices is a set of canonical forms for equivalence. Moreover, the rank is a complete invariant for equivalence. In other words, two matrices are equivalent if and only if they have the same rank. Similarity of Matrices

When a linear operator B²=³ is represented by a matrix of the form ´µ 8 , equation (2.2) has the form

c ´µ88Z ~7´µ7 where 7 is an invertible matrix. This motivates the following definition. 66 Advanced Linear Algebra

Definition Two matrices () and are similar if there exists an invertible matrix 7 for which )~7(7c The equivalence classes associated with similarity are called similarity classes.

The analog of Theorem 2.18 for square matrices is the following.

Theorem 2.19 Let =d be a vector space of dimension . Then two matrices () and are similar if and only if they represent the same linear operator B²=³ , but possibly with respect to different ordered bases. In this case, ( and ) represent exactly the same set of linear operators in B ²= ³ . Proof. If () and represent B ²=³ , that is, if

(~´µ89 and )~´µ for ordered bases 89 and then Corollary 2.17 shows that () and are similar. Now suppose that () and are similar, say )~7(7c Suppose also that ( represents a linear operator B  ²= ³ for some ordered basis 8 , that is,

(~´ µ8 Theorem 2.13 implies that there is a unique ordered basis 9 for = for which 7~489Á . Hence

c )~489Á ´ µ 8 489Á ~´ µ 9 Hence, )() also represents  . By symmetry, we see that and represent the same set of linear operators. This completes the proof.

We will devote much effort in Chapter 7 to finding a canonical form for similarity. Similarity of Operators We can also define similarity of operators.

Definition Two linear operators Á  B ²= ³ are similar if there exists an automorphism B²=³ for which ~ c The equivalence classes associated with similarity are called similarity classes. Linear Transformations 67

The analog of Theorem 2.19 in this case is the following.

Theorem 2.20 Let = be a vector space of dimension . Then two linear operators  and on =( are similar if and only if there is a matrix C that represents both operators (but with respect to possibly different ordered bases). In this case,  and are represented by exactly the same set of matrices in C . Proof. If  and are represented by ( C , that is, if

´ µ89 ~(~´ µ for ordered bases 89 and then

´µ~´µ9898989 ~4ÁÁ ´µ4 

Let B²=³ be the automorphism of = defined by  ²³~ , where 89~¸ÁÃÁ ¹ and ~¸ÁÃÁ  ¹ . Then

´µ~²´²³µ“Ä“´²³µ³~²´µ“Ä“´µ³~499Á  99989 and so

c c ´µ~´µ99999 ´µ´µ~´ µ from which it follows that  and are similar. Conversely, suppose that  and are similar, say ~ c

Suppose also that C is represented by the matrix (  , that is,

(~´ µ8 for some ordered basis 8 . Then

c c ´µ8888 ~´ µ ~´µ´µ´µ8

If we set  ~9 ² ³ then ~ ²  Á Ã Á  ³ is an ordered basis for = and

´µ88 ~²´²³µ“Ä“´²³µ³~²´µ“Ä“´µ³~4Á  88898 Hence

c ´µ8988 ~4Á ´µ498Á It follows that

c (~´µ8898 ~4Á ´µ489Á ~´µ 9 and so ( also represents  . By symmetry, we see that and are represented by the same set of matrices. This completes the proof. 68 Advanced Linear Algebra

Invariant Subspaces and Reducing Pairs The restriction of a linear operator B²=³ to a subspace : of = is not necessarily a linear operator on : . This prompts the following definition.

Definition Let B²=³ . A subspace : of = is said to be invariant under  or -invariant if ²:³ ‹ : , that is, if  ² ³  : for all  : . Put another way, : is invariant under  if the restriction O:: is a linear operator on .

If =~:l; then the fact that :; is  -invariant does not imply that the complement is also s-invariant. (The reader may wish to supply a simple example with =~  .)

Definition Let B²=³=~:l; . If and if both : and ; are  -invariant, we say that the pair ²:Á ; ³ reduces  .

A reducing pair can be used to decompose a linear operator into a direct sum as follows.

Definition Let B ²= ³ . If ²:Á ; ³ reduces  we write

~OlO:;  and call  the direct sum of OO:; and . Thus, the expression ~l means that there exist subspaces :;= and of for which ²:Á;³ reduces  and

~O:; and  ~O The concept of the direct sum of linear operators will play a key role in the study of the structure of a linear operator. Topological Vector Spaces This section is for readers with some familiarity with point-set topology. The standard topology on s is the topology for which the set of open rectangles

)~¸0 dÄd0 “0's are open intervals in s ¹ is a basis (in the sense of topology), that is, a subset of s is open if and only if it is a union of sets in ) . The standard topology is the topology induced by the Euclidean metric on s . Linear Transformations 69

The standard topology on s has the property that the addition function 7s¢ d s ¦ s  ¢ ²#Á $³ ¦ # b $ and the scalar multiplication function Cs¢d s ¦ s ¢² Á#³¦ # are continuous. As such, s is a . Also, any linear functional ¢ss ¦ is a continuous map.

More generally, any real vector space = endowed with a topology J is called a topological vector space if the operations of addition 7¢= d= ¦= and scalar multiplication Cs¢d=¦= are continuous under J .

Let = be a real vector space of dimension and fix an ordered basis 8 ~²#ÁÃÁ# ³ for = . Consider the coordinate map  ~¢=¦¢#¦´#µ88 s and its inverse

c  s8 ~¢¦=¢²ÁÃÁ³¦#8  

We claim that there is precisely one topology JJ~== on for which = becomes a topological vector space and for which all linear functionals are continuous. This is called the natural topology on = . In fact, the natural topology is the topology for which  88 (and therefore also ) is a homeomorphism, for any basis 8 . (Recall that a homeomorphism is a bijective map that is continuous and has a continuous inverse.)

Once this has been established, it will follow that the open sets in J are  precisely the images of the open sets in s under the map 8 . A basis for the natural topology is given by

¸²0dÄd0³“0 s8 's are open intervals in ¹

~0DE # “ 's are open intervals in s

0 First, we show that if = is a topological vector space under a topology J then  is continuous. Since ~¢¦=  where s is defined by

² ÁÃÁ  ³~#  it is sufficient to show that these maps are continuous. (The sum of continuous maps is continuous.) Let 6 be an open set in J . Then Csc²6³ ~ ¸² Á %³  d = “ %  6¹ is open in s d= . We need to show that the set 70 Advanced Linear Algebra

c  s ²6³ ~ ¸² Á à Á  ³  “   #  6¹  c is open in s , so let ² ÁÃÁ ³  ²6³ . Thus, #  6 . It follows that c ² Á # ³ Cs ²6³, which is open, and so there is an open interval 0 ‹ and an open set )J of = for which

c ² Á # ³  0 d ) ‹C ²6³ Then the open set < ~ssss dÄd d0d dÄd , where the factor 0 is in the ²<³‹6 th position, has the property that  . Thus c ² ÁÃÁ ³< ‹  ²6³ c and so  ²6³ is open. Hence,  , and therefore also , is continuous.

Next we show that if every linear functional on = is continuous under a topology J on =#= then the coordinate map is continuous. If denote by ´#µ88Á the  th coordinate of ´#µ . The map s ¢ = ¦ defined by  ²#³ ~ ´#µ 8Á is a linear functional and so is continuous by assumption. Hence, for any open interval 0 s the set

(Á ~¸#= “´#µ8 0¹ is open. Now, if 0 are open intervals in s then

c  ²0 dÄd0 ³~¸#= “´#µ8 0  dÄd0 ¹~ ( is open. Thus,  is continuous.

Thus, if a topology J has the property that = is a topological vector space and every linear functional is continuous, then   and ~ c are homeomorphisms. This means that J , if it exists, must be unique.

It remains to prove that the topology J on = that makes a homeomorphism has the property that = is a topological vector space under J and that any linear functional = on continuous.

As to addition, the maps s¢= ¦ and ²  d ³¢= d= ¦ ss d are homeomorphisms and the map 7sZ¢d¦ s  s  is continuous and so the map 77¢=d=¦=, being equal to c k Z k²d³ , is also continuous.

As to scalar multiplication, the maps s¢= ¦  and ²ds ³¢d=¦ ss d  are homeomorphisms and the map CsZ¢d s ¦ s is continuous and so the map C ¢=d=¦= , being equal to Cckk²d³ Z , is also continuous.

Now let k be a linear functional. Since  is continuous if and only if c is  continuous, we can confine attention to =~s . In this case, if ÁÃÁ is the Linear Transformations 71

 standard basis for ss and ((²³ 4 , then for any %~² ÁÃÁ ³ we have

((²%³ ~cc  ²³ (((   ²³  (  4  ((  

Now, if ((%²%³°4 then ((  °4 and so (  ( , which implies that is continuous.

According to the Riesz representation theorem and the Cauchy–Schwarz inequality, we have

))))))²%³ H %

Hence, %¦ implies ²%³¦ and so by linearity, %¦%  implies ²% ³ ¦ % and so  is continuous.

Theorem 2.21 Let = be a real vector space of dimension . There is a unique topology on == , called the natural topology for which is a topological vector space and for which all linear functionals on = are continuous. This topology is determined by the fact that the coordinate map s¢= ¦  is a homeomorphism. Linear Operators on = d A linear operator  on a real vector space = can be extended to a linear operator  dd on the complexification = by defining d²" b #³ ~ ²"³ b ²#³

Here are the basic properties of this complexification of  .

Theorem 2.22 If Á  B ²= ³ then 1) ²s ³dd ~ ,   2) ²b³~ddd  b  3) ²³~ddd   4) ´ ²#³µddd ~ ²# ³.

Let us recall that for any ordered basis 8 for =#= and any vector we have

´# b µcpx²³8 ~ ´#µ8

Now, if 8 is a basis for = , then the th column of ´µ8 is d ´²³µ~´²³bµ8 cpx²³88 ~´  ²b³µ  cpx ²³ which is the  th column of the coordinate matrix of  d with respect to the basis cpx²³8 . Thus we have the following theorem. 72 Advanced Linear Algebra

Theorem 2.23 Let B²=³ where = is a real vector space. The matrix of d with respect to the basis cpx²³8 is equal to the matrix of with respect to the basis 8

d ´µcpx²³8 ~´µ8 Hence, if a real matrix (=( represents a linear operator  on then also represents the complexification dd of on = . Exercises

1. Let (CCÁ have rank  . Prove that there are matrices ? Á and @CÁ, both of rank  , for which (~?@ . Prove that ( has rank  if and only if it has the form (~%&! where % and & are row matrices. 2. Prove Corollary 2.9 and find an example to show that the corollary does not hold without the finiteness condition. 3. Let B²=Á>³ . Prove that  is an isomorphism if and only if it carries a basis for => to a basis for . 4. If B²=Á>³ and B ²=Á>³  we define the external direct sum ^²==Á>>³ B ^ ^ by

²^ ³²²# Á # ³³ ~ ²  ²#  ³Á  ²#  ³³ Show that ^ is a linear transformation. 5. Let =~:l; . Prove that :l;š:^ ; . Thus, internal and external direct sums are equivalent up to isomorphism. 6. Let =~(b) and consider the external direct sum ,~(^ ) . Define a map ^¢( ) ¦= by  ²#Á$³~#b$ . Show that  is linear. What is the kernel of  ? When is an isomorphism? 7. Let JB be a subset of ²= ³ . A subspace : of = is J-invariant if : is  - invariant for every J= . Also, is J-irreducible if the only J -invariant subspaces of = are ¸¹ and = . Prove the following form of Schur's lemma. Suppose that JB=>‹²=³ and J ‹²>³ B and = is J = -irreducible and > is JBJJ>=> -irreducible. Let ²=Á>³ satisfy ~ , that is, for any J=> there is a J such that  ~ . Prove that  ~ or  is an isomorphism. 8. Let B ²=³ where dim ²=³B . If rk ²  ³~ rk ²  ³ show that im² ³ qker ² ³ ~ ¸¹ . 9. Let B ²< , = ³ and B  ²= Á > ³ . Show that rk²³¸²³Á²³¹min rk  rk  10. Let B²<Á=³ and B ²=Á>³ . Show that null²³²³b²³ null  null  11. Let Á  B ²= ³ where  is invertible. Show that rk²³~²³~²³ rk  rk  Linear Transformations 73

12. Let Á  B ²= Á > ³ . Show that rk²b³²³b²³ rk  rk  13. Let := be a subspace of . Show that there is a B ²=³ for which ker²B ³~:. Show also that there exists a  ²=³ for which im ² ³~: . 14. Suppose that Á  B ²= ³ . a) Show that ~²³‹²³ for some ²=³B if and only if im  im  . b) Show that ~²³‹²³ for some ²=³B if and only if ker  ker  . 15. Let = ~ : l : . Define linear operators   on = by  ² b ³ ~  for ~Á. These are referred to as projection operators . Show that  1)  ~  2) b~0 , where 0 is the identity map on = . 3) ~ for £ where  is the zero map. 4) =~im ² ³l im ² ³ 16. Let dim²= ³  B and suppose that B  ²= ³ satisfies  ~  . Show that rk ² ³ dim ²= ³ . 17. Let (d- be an matrix over . What is the relationship between the  linear transformation (¢- ¦- and the system of equations (?~) ? Use your knowledge of linear transformations to state and prove various results concerning the system (? ~ ) , especially when ) ~  . 18. Let = have basis 8 ~¸# ÁÃÁ# ¹ . Suppose that for each Á we define BÁ ²=³ by

#£ if Á²#  ³ ~ F #b#if ~

Prove that the BÁ are invertible and form a basis for ²= ³ . 19. Let B²=³: . If is a  - of = must there be a subspace ; of = for which ²:Á ; ³ reduces  ? 20. Find an example of a vector space =:= and a proper subspace of for which =š:. 21. Let dim²= ³  B . If  ,  B ²= ³ prove that  ~ implies that  and  are invertible and that ~²³ for some polynomial ²%³-´%µ . 22. Let B²=³ where dim ²=³B . If  ~ for all B ²=³ show that ~, for some - , where  is the identity map. 23. Let (Á ) C ²- ³ . Let 2 be a field containing - . Show that if ( and ) c are similar over 2)~7(77²2³( , that is, if where C then and ) are also similar over - , that is, there exists 8 C ²- ³ for which )~8(8c. Hint : consider the equation ?(c)?~ as a homogeneous system of linear equations with coefficients in - . Does it have a solution? Where? 24. Let ¢ss ¦ be a continuous function with the property that ²%b&³~ ²%³b²&³ Prove that  is a linear functional on s . 74 Advanced Linear Algebra

25. Prove that any linear functional ¢ss ¦ is a continuous map. 26. Prove that any subspace : of s is a closed set or, equivalently, that :~s ±: is open, that is, for any %:  there is an open ball )² Á³ centered at %€)²%Á³‹: with radius  for which  . 27. Prove that any linear transformation ¢= ¦> is continuous under the natural topologies of => and . 28. Prove that any surjective linear transformation  from => to (both finite- dimensional topological vector spaces under the natural topology) is an open map, that is,  maps open sets to open sets. 29. Prove that any subspace := of a finite-dimensional vector space is a closed set or, equivalently, that :%: is open, that is, for any there is an open ball )² Á ³ centered at % with radius €  for which )²%Á ³ ‹ :. 30. Let : be a subspace of = with dim ²= ³  B . a) Show that the subspace topology on := inherited from is the natural topology. b) Show that the natural topology on =°: is the topology for which the natural projection map ¢= ¦=°: continuous and open. 31. If == is a real vector space then d is a complex vector space. Thinking of dd d = as a vector space ²= ³ss over s , show that ²= ³ is isomorphic to the external direct product ==^ . 34. (When is a complex a complexification?) Let = be a real vector space with complexification =dd and let B  ²= ³ . Prove that  is a complexification, that is, B has the form d for some ²=³ if and only if  commutes with the conjugate map ¢=dd ¦= defined by ²"b#³~"c#. 35. Let > be a complex vector space. a) Consider replacing the scalar multiplication on > by the operation ²'Á $³ ¦ '$ where 'd and $> . Show that the resulting set with the addition defined for the vector space > and with this scalar multiplication is a complex vector space, which we denote by > . d b) Show, without using dimension arguments, that ²>s ³ š >^ > . 36. a) Let  be a linear operator on the real vector space < with the property that  ~c . Define a scalar multiplication on < by complex numbers as follows ² b ³ h # ~ # b  ²#³ for Á  s and #  < . Prove that under the addition of < and this scalar multiplication < is a complex vector space, which we denote by < . d b) What is the relationship between <= and ? Hint: consider < ~ =^ = and ²"Á #³ ~ ²c#Á "³ . Chapter 3 The Isomorphism Theorems

Quotient Spaces Let := be a subspace of a vector space . It is easy to see that the binary relation on = defined by "–#¯"c#: is an equivalence relation. When "– # , we say that " and # are congruent modulo :"–# . The term mod is used as a colloquialism for modulo and is often written "–#mod : When the subspace in question is clear, we will simply write "–# .

To see what the equivalence classes look like, observe that ´#µ~¸"=“"–#¹ ~¸"=“"c#:¹ ~¸"=“"~#b for some :¹ ~¸#b “ :¹ ~#b: The set ´#µ~#b:~¸#b “ :¹ is called a coset of := in and # is called a coset representative for #b: . (Thus, any member of a cost is a coset representative.)

The set of all cosets of := in is denoted by = ~¸#b:“#=¹ : This is read “=: mod ” and is called the quotient space of = modulo : . Of 76 Advanced Linear Algebra course, the term space is a hint that we intend to define vector space operations on =°:.

Note that congruence modulo : is preserved under the vector space operations on ="–#"–# , for if  and  then

"c#:Á"c#:¬ ²"c#³b   ²"c#³:  ¬ ² " b "³c² #  b #³: ¬ " b " – # b # A natural choice for vector space operations on =°: is ²"b:³~ "b: ²"b:³b²#b:³~²"b#³b: However, in order to show that these operations are well-defined, it is necessary to show that they do not depend on the choice of coset representatives, that is, if

"b:~"b: and #b:~#b:  then

" b : ~ " b : ²" b# ³b:~²"  b# ³b: The straightforward details of this are left to the reader. Let us summarize.

Theorem 3.1 Let := be a subspace of . The binary relation ""–#¯ c#: is an equivalence relation on = , whose equivalence classes are the cosets #:~#bb¹¸ “ : of ::: in ==° . The set of all cosets of in = , called the quotient space of = modulo : , is a vector space under the well-defined operations

" b : ~ " b : ²" b# ³b:~²"  b# ³b: The zero vector in =°: is the coset b:~: . The Natural Projection and the Correspondence Theorem

If := is a subspace of then we can define a map : ¢=¦=°: by sending each vector to the coset containing it

: ²#³ ~ # b : This map is called the canonical projection or natural projection of = onto =°:, or simply projection modulo : . It is easily seen to be linear, for we have The Isomorphism Theorems 77

(writing  for : ) ² " b #³ ~ ² " b #³ b : ~ ²" b :³ b ²# b :³ ~ ²"³ b ²#³ The canonical projection is clearly surjective. To determine the kernel of  , note that #ker ² ³¯ ²#³~¯#b:~:¯#: and so ker²³~:

Theorem 3.2 The canonical projection : ¢= ¦=°: defined by

: ²#³ ~ # b : is a surjective linear transformation with ker²³~:: .

If := is a subspace of then the subspaces of the quotient space =°: have the form ;°: for some intermediate subspace ; satisfying : ‹; ‹= . In fact, as shown in Figure 3.1, the projection map : provides a one-to-one correspondence between intermediate subspaces :‹;‹= and subspaces of the quotient space =°: . The proof of the following theorem is left as an exercise.

V

V/S T

S T/S

{0} {0}

Figure 3.1: The correspondence theorem Theorem 3.3 ( The correspondence theorem ) Let := be a subspace of . Then the function that assigns to each intermediate subspace :‹;‹= the subspace ;°: of =°: is an order preserving (with respect to set inclusion) one-to-one correspondence between the set of all subspaces of =: containing and the set of all subspaces of =°: . The Universal Property of Quotients and the First Isomorphism Theorem

Let :=²=°:Á³ be a subspace of . The pair : has a very special property, known as the universal property —a term that comes from the world of category theory. 78 Advanced Linear Algebra

Figure 3.2 shows a linear transformation B²=Á>³ , along with the canonical projection : from ==°: to the quotient space .

W V W

Ss W'

V/S

Figure 3.2: The universal property The universal property states that if ker²³Œ: then there is a unique  Z¢=°:¦> for which Z k~:  Another way to say this is that any such B²=Á>³ can be factored through the canonical projection : .

Theorem 3.4 Let :=²=Á>³ be a subspace of and let B satisfy :‹ker ² ³. Then, as pictured in Figure 3.2, there is a unique linear transformation  Z¢=°:¦> with the property that Z k~:  Moreover, ker²³~ZZ ker ²³°: and im ²³~²³  im . ZZ Proof. We have no other choice but to define  by the condition k~: , that is, Z²# b :³ ~ ²#³ This function is well-defined if and only if # b : ~ " b : ¬ZZ ²# b :³ ~ ²" b :³ which is equivalent to each of the following statements: # b : ~ " b : ¬ ²#³ ~ ²"³ #c":¬ ²#c"³~ %:¬ ²%³~ :‹ker ² ³

Thus,  Z¢=°:¦> is well-defined. Also, im²ZZ ³~¸ ²#b:³“#=¹~¸  ²#³“#=¹~ im ²  ³ and The Isomorphism Theorems 79

ker²³~¸#b:“²#b:³~¹ZZ ~¸#b:“ ²#³~¹ ~¸#b:“#ker ² ³¹ ~ker ² ³°: The uniqueness of  Z is evident.

Theorem 3.4 has a very important corollary, which is often called the first isomorphism theorem and is obtained by taking :~ker ² ³ .

Theorem 3.5 (The first isomorphism theorem ) Let ¢= ¦> be a linear transformation. Then the linear transformation Z¢=°ker ² ³¦> defined by Z²# bker ² ³³ ~ ²#³ is injective and = š²³im  ker²³ According to Theorem 3.5, the image of any linear transformation on = is isomorphic to a quotient space of ==°:= . Conversely, any quotient space of is the image of a linear transformation on = : the canonical projection : . Thus, up to isomorphism, quotient spaces are equivalent to homomorphic images. Quotient Spaces, Complements and Codimension The first isomorphism theorem gives some insight into the relationship between complements and quotient spaces. Let :=; be a subspace of and let be a complement of : , that is =~:l; Since every vector #= has the form #~ b! , for unique vectors : and !;, we can define a linear operator  ¢= ¦= by setting ² b !³ ~ !

Because and ! are unique,  is well-defined. It is called projection onto ; along : . (Note the word onto, rather than modulo, in the definition; this is not the same as projection modulo a subspace.) It is clear that im²³~; and ker² ³~¸ b!= “!~¹~: 80 Advanced Linear Algebra

Hence, the first isomorphism theorem implies that = ;š : Theorem 3.6 Let := be a subspace of . All complements of := in are isomorphic to =°: and hence to each other.

The previous theorem can be rephrased by writing (l)~(l* ¬)š* On the other hand, quotients and complements do not behave as nicely with respect to isomorphisms as one might casually think. We leave it to the reader to show the following: 1) It is possible that (l)~*l+ with (š* but )š+° . Hence, (š* does not imply that a complement of (* is isomorphic to a complement of . 2) It is possible that =š> and =~:l) and >~:l+ but )š+°° . Hence, = š> does not imply that =°:š>°: . (However, according to the previous theorem, if =>)š+ equals then .)

Corollary 3.7 Let := be a subspace of a vector space . Then dim²= ³ ~ dim ²:³ b dim ²= °:³

Definition If : is a subspace of = then dim ²= °:³ is called the codimension of := in and is denoted by codim ²:³ or codim= ²:³ .

Thus, the codimension of := in is the dimension of any complement of := in and when = is finite-dimensional , we have

codim= ²:³ ~dim ²= ³ c dim ²:³ (This makes no sense, in general, if = is not finite-dimensional, since infinite cardinal numbers cannot be subtracted.) Additional Isomorphism Theorems There are several other isomorphism theorems that are consequences of the first isomorphism theorem. As we have seen, if =~:l; then =°;š: . This can be written The Isomorphism Theorems 81

:l; : š ;:q; This applies to nondirect sums as well.

Theorem 3.8 (The second isomorphism theorem ) Let = be a vector space and let :; and be subspaces of = . Then :b; : š ;:q; Proof. Let ¢²:b;³¦:°²:q;³ be defined by ² b !³ ~ b ²: q ; ³ We leave it to the reader to show that  is a well-defined surjective linear transformation, with kernel ; . An application of the first isomorphism theorem then completes the proof.

The following theorem demonstrates one way in which the expression =°: behaves like a fraction.

Theorem 3.9 (The third isomorphism theorem ) Let = be a vector space and suppose that :‹;‹= are subspaces of = . Then =°: = š ;:° ;

Proof. Let ¢ = °: ¦ = °; be defined by ²# b :³ ~ # b ; . We leave it to the reader to show that  is a well-defined surjective linear transformation whose kernel is ;°: . The rest follows from the first isomorphism theorem.

The following theorem demonstrates one way in which the expression =°: does not behave like a fraction.

Theorem 3.10 Let =:= be a vector space and let be a subspace of . Suppose that =~=l= and :~:l:  with :‹=   . Then =====l ~š ^  ::l::  : 

Proof. Let ^¢=¦²=°:³² =°:³  be defined by

²# b# ³~²#  b: Á# b: ³

This map is well-defined, since the sum =~=l= is direct. We leave it to the reader to show that  is a surjective linear transformation, whose kernel is :l:. The rest follows from the first isomorphism theorem. 82 Advanced Linear Algebra

Linear Functionals Linear transformations from =- to the base field (thought of as a vector space over itself) are extremely important.

Definition Let =- be a vector space over . A linear transformation B ²=Á-³, whose values lie in the base field - is called a linear functional (or simply functional ) on = . (Some authors use the term linear function .) The vector space of all linear functionals on == is denoted by * and is called the algebraic of = .

The adjective algebraic is needed here, since there is another type of dual space that is defined on general normed vector spaces, where continuity of linear transformations makes sense. We will discuss the so-called continuous dual space briefly in Chapter 13. However, until then, the term “dual space” will refer to the algebraic dual space.

To help distinguish linear functionals from other types of linear transformations, we will usually denote linear functionals by lower case italic letters, such as  , and .

Example 3.1 The map ¢-´%µ¦ - , defined by ²²%³³~ ²³ is a linear functional, known as evaluation at  .

Example 3.2 Let 9´Á µ denote the vector space of all continuous functions on ´Á µ ‹s9. Let ¢ ´Á µ ¦ s be defined by

 ² ²%³³ ~ ²%³ %  Then 9 ´Áµi.

For any =* , the rank plus nullity theorem is dim² ker ²³³b dim ²im ²³³~ dim ²=³ But since im²³ ‹ - , we have either im ²³ ~ ¸¹ , in which case  is the zero linear functional, or im²³ ~ - , in which case  is surjective. In other words, a nonzero linear functional is surjective. Moreover, if £ then = codim²²³³~ker dim67 ~ ker²³ and if dim²= ³  B then dim² ker ²³³ ~ dim ²= ³ c  The Isomorphism Theorems 83

Thus, in dimensional terms, the kernel of a linear functional is a very “large” subspace of the domain = .

The following theorem will prove very useful.

Theorem 3.11 1) For any nonzero vector #= , there exists a linear functional =* for which ²#³£ . 2) A vector #= is zero if and only if ²#³~ for all =* . 3) Let   =i . If ²%³ £  then = ~ º%» lker ²³ 4) Two nonzero linear functionals Á  = i have the same kernel if and only if there is a nonzero scalar  such that ~  . Proof. For part 3), if  £ #  º%» qker ²³ then ²#³ ~  and # ~ % for  £   -, whence ²%³ ~  , which is false. Hence, º%» qker ²³ ~ ¸¹ and the direct sum : ~ º%» lker ²³ exists. Also, for any #  = we have ²#³ ²#³ # ~ % b67 # c %  º%» bker ²³ ²%³ ²%³ and so = ~ º%» lker ²³ .

For part 4), if ~  for £ then ker ²³~ ker ²³ . Conversely, if 2 ~ker ²³ ~ ker ²³ then for % ¤ 2 we have by part 3), = ~ º%» l 2

Of course, O22 ~ O for any . Therefore, if  ~ ²%³°²%³ , it follows that ²%³ ~ ²%³ and hence  ~  . Dual Bases

Let = be a vector space with basis 8 ~¸#“0¹ . For each 0 , we can i * define a linear functional #= , by the orthogonality condition i #²#³~ Á where Á is the Kronecker delta function , defined by ~if  ~ Á F £if

ii Then the set 8 ~¸# “0¹ is linearly independent, since applying the equation ii ~ # bÄb #

to the basis vector # gives 84 Advanced Linear Algebra

 i ~ #²#³~  Á  ~ ~ ~ for all  .

Theorem 3.12 Let =~¸#“0¹ be a vector space with basis 8  . ii 1) The set 8 ~¸# “0¹ is linearly independent. 2) If == is finite-dimensional then 8ii is a basis for , called the of 8. Proof. For part 2), for any =i , we have

i ²# ³# ²#³ ~ ²# Á ³ ~ ²#³   ii* i and so ~ ²#³#  is in the span of 88 . Hence, is a basis for = .

Corollary 3.13 If dim²= ³  B then dim ²=i ³ ~ dim ²= ³ .

The next example shows that Corollary 3.13 does not hold without the finiteness condition.

Example 3.3 Let = be an infinite-dimensional vector space over the field - ~{8 ~ ¸Á ¹, with basis . Since the only coefficients in - are  and  , a finite linear combination over -= is just a finite sum. Hence, is the set of all finite sums of vectors in 8 and so according to Theorem 0.11,

((= (F8 ²³~ ( (( 8 On the other hand, each linear functional =i is uniquely defined by specifying its values on the basis 8 . Since these values must be either  or , specifying a linear functional is equivalent to specifying the subset of 8 on which  takes the value . In other words, there is a one-to-one correspondence between linear functionals on = and all subsets of 8 . Hence, (((=~²³€‚=i F8 ((((( 8 This shows that ==i cannot be isomorphic to , nor to any proper subset of = . Hence, dim²=i ³ € dim ²= ³. Reflexivity If == is a vector space then so is the dual space i . Hence, we may form the double (algebraic) dual space = ii , which consists of all linear functionals ¢=i ¦-. In other words, an element of =** is a linear map that assigns a scalar to each linear functional on = . The Isomorphism Theorems 85

With this firmly in mind, there is one rather obvious way to obtain an element of =#=#¢=¦-ii. Namely, if , consider the map i defined by #²³ ~ ²#³ which sends the linear functional ²#³# to the scalar . The map is called evaluation at #. To see that #=ii , if Á= i and Á- then #² b ³ ~ ² b ³²#³ ~ ²#³ b ²#³ ~ #²³ b #²³ and so # is indeed linear.

We can now define a map ¢= ¦=ii by ²#³ ~ #

This is called the canonical map (or the natural map ) from == to ii . This map is injective and hence in the finite-dimensional case, it is also surjective.

Theorem 3.14 The canonical map ¢= ¦=ii defined by ²#³~# , where # is evaluation at #= , is a monomorphism. If is finite-dimensional then  is an isomorphism. Proof. The map  is linear since " b #²³ ~ ²" b #³ ~ ²"³ b ²#³ ~ ²" b #³²³ for all =i . To determine the kernel of  , observe that ²#³~¬#~ ¬#²³~ for all =i ¬²#³~ for all =i ¬#~ by Theorem 3.11 and so ker² ³ ~ ¸¹ .

In the finite-dimensional case, since dim²=ii ³ ~ dim ²= i ³ ~ dim ²= ³ , it follows that  is also surjective, hence an isomorphism.

Note that if dim²= ³  B then since the dimensions of = and = ii are the same, we deduce immediately that =š=ii . This is not the point of Theorem 3.14. The point is that the natural map #¦# is an isomorphism. Because of this, = is said to be algebraically reflexive . Thus, Theorem 3.14 implies that all finite- dimensional vector spaces are algebraically reflexive.

If == is finite-dimensional, it is customary to identify the double dual space ii with === and to think of the elements of ii simply as vectors in . Let us consider an example of a vector space that is not algebraically reflexive. 86 Advanced Linear Algebra

Example 3.4 Let = be the vector space over { with basis

 ~²ÁÃÁÁÁÁó where the  is in the th . Thus, = is the set of all infinite binary sequences with a finite number of ²#³#= 's. Define the order of any to be the largest coordinate of #²#³B#= with value . Then for all .

i Consider the dual vectors  , defined (as usual) by i ²³~ Á For any #= , the evaluation functional # has the property that ii #²³~ ²#³~ if €²#³ i However, since the dual vectors  are linearly independent, there is a linear functional =ii for which i ² ³ ~  for all ‚ . Hence,  does not have the form # for any #= . This shows that the canonical map is not surjective and so = is not algebraically reflexive. Annihilators The functions =i are defined on vectors in = , but we may also define  on subsets 4= of by letting ²4³~ ¸²#³“ #  4¹

Definition Let 4= be a nonempty subset of a vector space . The annihilator 44 of is 4i ~ ¸  = “ ²4³ ~ ¸¹¹ The term annihilator is quite descriptive, since 4  consists of all linear functionals that annihilate (send to 4 ) every vector in . It is not hard to see that 4=4=i is a subspace of , even when is not a subspace of .

The basic properties of annihilators are contained in the following theorem, whose proof is left to the reader.

Theorem 3.15 1) ()Order-reversing For any subsets 45= and of , 4‹5¬ 5 ‹4 The Isomorphism Theorems 87

2) If dim²= ³  B then we have 4 šspan ²4³ under the natural map. In particular, if :=:š: is a subspace of then  . 3) If dim²= ³  B and : and ; are subspaces of = then ²:q;³ ~: b; and ²:b;³  ~: q; Consider a direct sum decomposition =~:l; Then any linear functional ;i can be extended to a linear functional  on = by setting ²:³ ~  . Let us call this extension by  . Clearly,   : and it is easy to see that the extension by ¦ map is an isomorphism from ;i to :;, whose inverse is restriction to .

Theorem 3.16 Let =~:l; . a) The extension by ;: map is an isomorphism from i to and so ;š:i b) If = is finite-dimensional then

 dim²: ³ ~codim= ²:³ ~ dim ²= ³ c dim ²:³ Example 3.5 Part b) of Theorem 3.16 may fail in the infinite-dimensional case, since it may easily happen that :š=i . As an example, let = be the vector space over {8 with a countably infinite ordered basis ~²ÁÁó . Let ii :~º» and ;~ºÁÁû . It is easy to see that : š; š= and that dim²=i ³ € dim ²= ³.

The annihilator provides a way to describe the dual space of a direct sum.

Theorem 3.17 A linear functional on the direct sum =~:l; can be written as a direct sum of a linear functional that annihilates : and a linear functional that annihilates ; , that is, ²: l ; ³i ~ : l ;

Proof. Clearly : q ; ~ ¸¹ , since any functional that annihilates both : and ;:l;~=:b;= must annihilate . Hence, the sum  is direct. If i then we can write

  ~ ² k;: ³ b ² k ³  : l ; and so =~:l; . 88 Advanced Linear Algebra

Operator Adjoints If B²=Á>³ then we may define a map di ¢>¦=* by d²³ ~  k ~  for >* . (We will write composition as juxtaposition.) Thus, for any #= ´d ²³µ²#³ ~ ² ²#³³

The map d is called the operator adjoint of and can be described by the phrase “apply  first.”

Theorem 3.18 (Properties of the Operator Adjoint) 1) For Á  B ²=Á>³ and Á- ² b  ³ddd ~   b   2) For B²=Á>³ and B ²>Á<³ ²³~ddd   3) For any invertible B²=³ ²³~²³cdd c

Proof. Proof of part 1) is left for the reader. For part 2), we have for all 

If B²=Á>³ then diiddiiii  B ( >Á= ) and so  ²=Á>³ B . Of course, dd is not equal to . However, in the finite-dimensional case, if we use the natural maps to identify ==>>ii with and ii with then we can think of  dd as being in B²= Á > ³ . Using these identifications, we do have equality in the finite-dimensional case.

Theorem 3.19 Let => and be finite-dimensional and let B ²=Á>³ . If we identify ==>>ii with and ii with using the natural maps then  dd is identified with . Proof. For any %= let the corresponding element of =ii be denoted by % and similarly for >#= . Then before making any identifications, we have for dd²#³²³ ~ #´ d ²³µ ~ #² ³ ~ ² ²#³³ ~ ²#³²³ for all >* and so The Isomorphism Theorems 89

dd²#³~ ²#³> ii Therefore, using the canonical identifications for both =>ii and ii we have dd²#³ ~ ²#³ for all #= .

The next result describes the kernel and image of the operator adjoint.

Theorem 3.20 Let B²=Á>³ . Then 1) ker²³~²³d im  2) im²³~²³d ker  Proof. For part 1), ker²³~¸>“²³~¹did ~ ¸  >i “ ² ²= ³³ ~ ¸¹¹ ~ ¸  >i “ ²im ² ³³ ~ ¸¹¹ ~²³im   For part 2), if ~ ~dd im ²  ³ then ker ²  ³‹ ker ²³ and so ker ² ³.

For the reverse inclusion, let ker ²³i ‹= . On 2~ ker ²³ , there is no problem since ~2 and di agree on for any >: . Let be a complement of ker²³ . Then maps a basis 8 ~¸“0¹ for : to a linearly independent set

8²³~¸²³“0¹   in >>²³ and so we can define i any way we want on 8 . In particular, let >i be defined by setting

² ² ³³ ~ ² ³ and extending in any manner to all of >~~id . Then  on 8 and therefore on :~²³ . Thus, ddim .

Corollary 3.21 Let B²=Á>³ , where = and > are finite-dimensional. Then rk²³~ rk ²d ³.

In the finite-dimensional case,  and d can both be represented by matrices. Let

89~²ÁÃÁ ³ and ~²ÁÃÁ  ³ be ordered bases for => and , respectively and let 90 Advanced Linear Algebra

ii i ii i 89~²ÁÃÁ ³ and ~²ÁÃÁ  ³ be the corresponding dual bases. Then i ²´µ89ÁÁ ³ ~²´²³µ³  9  ~ ´  ²³µ  and ddiiididii ²´ µ98iiÁÁ ³ ~ ²´ ² ³µ 8 i ³  ~  ´  ² ³µ ~ ² ³²  ³ ~  ² ²  ³³ Comparing the last two expressions we see that they are the same except that the roles of  and are reversed. Hence, the matrices in question are transposes.

Theorem 3.22 Let B²=Á>³ , where = and > are finite-dimensional. If 8 and 989 are ordered bases for => and , respectively and ii and are the corresponding dual bases then

d! ´µ98iiÁÁ ~²´µ³ 89 In words, the matrices of  and its operator adjoint d are transposes of one another. Exercises 1. If =: is infinite-dimensional and is an infinite-dimensional subspace, must the dimension of =°: be finite? Explain. 2. Prove the correspondence theorem. 3. Prove the first isomorphism theorem. 4. Complete the proof of Theorem 3.10. 5. Let : be a subspace of = . Starting with a basis ¸  ÁÃÁ ¹ for :Á how would you find a basis for =°: ? 6. Use the first isomorphism theorem to prove the rank-plus-nullity theorem rk² ³ b null ² ³ ~dim ²= ³ for B²=Á>³. 7. Let B²=³ and suppose that : is a subspace of = . Define a map  Z¢ = °: ¦ = °: by Z²# b :³ ~ ²#³ b : When is ZZ well-defined? If is well-defined, is it a linear transformation? What are im²³ZZ and ker ²³ ? 8. Show that for any nonzero vector #= , there exists a linear functional   =i for which ²#³ £  . 9. Show that a vector #= is zero if and only if ²#³~ for all =i . 10. Let := be a proper subspace of a finite-dimensional vector space and let #= ±:. Show that there is a linear functional =i for which ²#³~  and ² ³~  for all  : . The Isomorphism Theorems 91

11. Find a vector space = and decompositions =~(l)~*l+ with (š* but )š+° . Hence, (š* does not imply that ( š* . 12. Find isomorphic vectors spaces => and with =~:l) and >~:l+ but )š+°° . Hence, = š> does not imply that =°:š>°: . 13. Let = be a vector space with

=~:l;~:l; 

Prove that if :: and have finite codimension in = then so does :q:  and

codim²: q : ³ dim ²;  ³ b dim ²;  ³ 14. Let = be a vector space with

=~:l;~:l; 

Suppose that :: and have finite codimension. Hence, by the previous exercise, so does :q: . Find a direct sum decomposition =~>l? for which (1) >>‹:q: has finite codimension, (2)  and (3) ?Œ; b;. 15. Let 8 be a basis for an infinite-dimensional vector space = and define, for all  8 , the map Zi  = by  Z ²³ ~  if  ~  and  otherwise. Does ¸Zi “  8 ¹ form a basis for = ? What do you conclude about the concept of a dual basis? 16. Prove that ²: l ; ³iii š : l ; . 17. Prove that ~dd and  ~ where  is the zero linear operator and  is the identity. 18. Let : be a subspace of = . Prove that ²= °:³i š : . 19. Verify that a) ²b³~ddd  b  for B Á  ²=Á>³ . b) ²  ³dd ~ for any  - and B  ²= Á > ³ 20. Let B²=Á>³ , where = and > are finite-dimensional. Prove that rk²³~ rk ²d ³. 21. Prove that if B²=³ has the property that

=~im ²³lker ²³ and Oim²³ ~ then  is projection on im²³ along ker ²³ . 22. a) Let ¢= ¦: be projection onto a subspace : of = along a subspace ;= of . Show that  is idempotent , that is  ~ . b) Prove that if B²=³ is idempotent then it is a projection. c) Is the adjoint of a projection also a projection? Chapter 4 Modules I: Basic Properties

Motivation Let =-²=³ be a vector space over a field and let B . Then for any polynomial ²%³  -´%µ , the operator ² ³ is well-defined. For instance, if ²%³ ~  b % b % then ² ³ ~ b  b  where  is the identity operator and  is the threefold composition kk .

Thus, using the operator  we can define the product of a polynomial ²%³-´%µ and a vector #= by ²%³# ~ ² ³²#³ (4.1) This product satisfies the usual properties of scalar multiplication, namely, for all ²%³Á ²%³  - ´%µ and "Á #  = , ²%³²" b #³ ~ ²%³" b ²%³# ² ²%³ b ²%³³" ~ ²%³" b ²%³" ´ ²%³ ²%³µ" ~ ²%³´ ²%³"µ " ~ " Thus, for a fixed B²=³ , we can think of = as being endowed with the operations of (the usual) addition along with multiplication of an element of = by a polynomial in -´%µ . However, since -´%µ is not a field, these two operations do not make = into a vector space. Nevertheless, the situation in which the scalars form a ring but not a field is extremely important, not only in our context but in many others. Modules Definition Let 9 be a commutative ring with identity, whose elements are called scalars . An 994 -module (or a module over ) is a nonempty set , 94 Advanced Linear Algebra together with two operations. The first operation, called addition and denoted by b²"Á#³4d4"b#4 , assigns to each pair , an element . The second operation, denoted by juxtaposition, assigns to each pair ² Á"³9d4, an element #4 . Furthermore, the following properties must hold: 1) 4 is an abelian group under addition. 2) For all Á  9 and "Á #  4 ²"b#³~ "b # ² b ³" ~ " b " ² ³" ~ ² "³ " ~ "

The ring 94 is called the base ring of .

Note that vector spaces are just special types of modules: a vector space is a module over a field.

When we turn in a later chapter to the study of the structure of a linear transformation B²=³ , we will think of = as having the structure of a vector space over --´%µ= as well as a module over . Put another way, is an abelian group under addition, with two scalar multiplications—one whose scalars are elements of -- and one whose scalars are polynomials over . This viewpoint will be of tremendous benefit for the study of  . For now, we concentrate only on modules.

Example 4.1 1) If 99 is a ring, the set  of all ordered -tuples, whose components lie in 99, is an -module, with addition and scalar multiplication defined componentwise (just as in -  ),

² ÁÃÁ ³b² ÁÃÁ ³~² b ÁÃÁ b ³ and

² ÁÃÁ ³~²   ÁÃÁ   ³

 for Á 9 , . For example, {{ is the -module of all ordered  -tuples of integers. 2) If 9²9³d9 is a ring, the set CÁ of all matrices of size , is an - module, under the usual operations of matrix addition and scalar multiplication over 99 . Since is a ring, we can also take the product of matrices in CÁ²9³ . One important example is 9 ~ - ´%µ , whence CÁ²- ´%µ³ is the - ´%µ -module of all  d  matrices whose entries are polynomials. 3) Any commutative ring 99 with identity is a module over itself, that is, is an 9 -module. In this case, scalar multiplication is just multiplication by Modules I: Basic Properties 95

elements of 9 , that is, scalar multiplication is the ring multiplication. The defining properties of a ring imply that the defining properties of an 9 - module are satisfied. We shall use this example many times in the sequel.

Importance of the Base Ring Our definition of a module requires that the ring 9 of scalars be commutative. Modules over noncommutative rings can exhibit quite a bit more unusual behavior than modules over commutative rings. Indeed, as one would expect, the general behavior of 9 -modules improves as we impose more structure on the base ring 9 . If we impose the very strict structure of a field, the result is the very well-behaved vector space structure.

To illustrate, if we allow the base ring 9 to be noncommutative then, as we will see, it is possible for an 9 -module to have bases of different sizes! Since modules over noncommutative rings will not be needed for the sequel, we require commutativity in the definition of module.

As another example, if the base ring is an integral domain then whenever # ÁÃÁ# are linearly independent over 9 so are #  ÁÃÁ #  for any nonzero 9. This fails when 9 is not an integral domain.

We will also consider the property on the base ring 9 that all of its ideals are finitely generated. In this case, any finitely generated 94 -module has the desirable property that all of its submodules are also finitely generated. This property of 99 -modules fails if does not have the stated property.

When 9-´%µ is a principal ideal domain (such as { or ), not only are all of its ideals finitely generated, but each is generated by a single element. In this case, the 9 -modules are “reasonably” well behaved. For instance, in general a module may have a basis but one or more of its submodules may not. However, if 9 is a principal ideal domain, this cannot happen.

Nevertheless, even when 99 is a principal ideal domain, -modules are less well behaved than vector spaces. For example, there are modules over a principal ideal domain that do not have any linearly independent elements. Of course, such modules cannot have a basis.

Many of the basic concepts that we defined for vector spaces can also be defined for modules, although their properties are often quite different. We begin with submodules. Submodules The definition of submodule parallels that of subspace. 96 Advanced Linear Algebra

Definition A submodule of an 94 -module is a nonempty subset :4 of that is an 9 -module in its own right, under the operations obtained by restricting the operations of 4: to .

Theorem 4.1 A nonempty subset :9 of an -module 4 is a submodule if and only if it is closed under the taking of linear combinations, that is, Á  9Á "Á #  : ¬ " b #  : Theorem 4.2 If :; and are submodules of 4 then :q;:b; and are also submodules of 4 .

We have remarked that a commutative ring 9 with identity is a module over itself. As we will see, this type of module provides some good examples of non- vector space like behavior.

When we think of a ring 99 as an -module rather than as a ring, multiplication is treated as scalar multiplication. This has some important implications. In particular, if :9 is a submodule of then it is closed under scalar multiplication, which means that it is closed under multiplication by all elements of the ring 9 . In other words, :9 is an ideal of the ring . Conversely, if ? is an ideal of the ring 99 then ? is also a submodule of the module . Hence, the submodules of the 99 -module are precisely the ideals of the ring 9 . Spanning Sets The concept of spanning set carries over to modules as well.

Definition The submodule spanned (or generated ) by a subset : of a module 4: is the set of all linear combinations of elements of :

º:» ~span ²:³ ~ ¸  # b Ä b  # “   9Á #   :Á  ‚ ¹ A subset :‹4 is said to span 4 or generate 4 if 4 ~span ²:³ One very important point to note is that if a nontrivial linear combination of the elements #ÁÃÁ# in an 9 -module 4 is  ,

 # bÄb  # ~ where not all of the coefficients are  then we cannot conclude, as we could in a vector space, that one of the elements # is a linear combination of the others. After all, this involves dividing by one of the coefficients, which may not be possible in a ring. For instance, for the {{{ -module d we have ²Á ³ c ²Á ³ ~ ²Á ³ but neither ²Á ³ nor ²Á ³ is an integer multiple of the other. Modules I: Basic Properties 97

The following simple submodules play a special role in the theory.

Definition Let 49 be an -module. A submodule of the form º#»~9#~¸ #“ 9¹ for #4 is called the cyclic submodule generated by # .

Of course, any finite-dimensional vector space is the direct sum of cyclic submodules, that is, one-dimensional subspaces. One of our main goals is to show that a finitely generated module over a principal ideal domain has this property as well.

For reasons that will become clear soon, we need the following definition.

Definition An 94 -module is said to be finitely generated if it contains a finite set that generates 4 .

Of course, a vector space is finitely generated if and only if it has a finite basis, that is, if and only if it is finite-dimensional. For modules, life is more complicated. The following is an example of a finitely generated module that has a submodule that is not finitely generated.

Example 4.2 Let 9 be the ring -´% Á% Áõ of all polynomials in infinitely many variables over a field -? . It will be convenient to use to denote %Á%ÁÃ and write a polynomial in 9 in the form ²?³ . (Each polynomial in 99, being a finite sum, involves only finitely many variables, however.) Then is an 9 -module and as such, is finitely generated by the identity element ²?³ ~ .

Now, consider the submodule : of all polynomials with zero constant term. This module is generated by the variables themselves,

: ~º% Á% Áû

However, :.~¸ÁÃÁ¹ is not finitely generated. To see this, suppose that  is a finite generating set for :% . Choose a variable  that does not appear in any of the polynomials in .. . Then no linear combination of the polynomials in can be equal to % . For if  % ~  ²?³ ²?³ ~ 98 Advanced Linear Algebra

then let  ²?³ ~ %  ²?³ b ²?³ where  ²?³ does not involve %  . This gives  % ~ ´%  ²?³ b ²?³µ ²?³ ~  ~ %  ²?³  ²?³ b  ²?³  ²?³ ~ ~

The last sum does not involve % and so it must equal . Hence, the first sum must equal  , which is not possible since  ²?³ has no constant term. Linear Independence The concept of linear independence also carries over to modules.

Definition A subset :4 of a module is linearly independent if for any # ÁÃÁ# : and  ÁÃÁ 9 , we have

 # bÄb  # ~¬  ~ for all  A set : that is not linearly independent is linearly dependent .

It is clear from the definition that any subset of a linearly independent set is linearly independent.

Recall that, in a vector space, a set : of vectors is linearly dependent if and only if some vector in :: is a linear combination of the other vectors in . For arbitrary modules, this is not true.

Example 4.3 Consider {{ as a -module. The elements Á   { are linearly dependent, since ²³ c ²³ ~  but neither one is a linear combination (i.e., integer multiple) of the other.

The problem in the previous example (as noted earlier) is that

 # bÄb  # ~ implies that

# ~c #cÄc #   but, in general, we cannot divide both sides by  , since it may not have a multiplicative inverse in the ring 9 . Modules I: Basic Properties 99

Torsion Elements In a vector space =- over a field , singleton sets ¸#¹#£ where are linearly independent. Put another way, £ and #£ imply #£ . However, in a module, this need not be the case.

Example 4.4 The abelian group {{ ~ ¸Á Á à Á c¹ is a -module, with scalar multiplication defined by ' ~ ²' h ³mod  , for all  {{ and    . However, since  ~  for all  { , no singleton set ¸¹ is linearly independent. Indeed, { has no linearly independent sets.

This example motivates the following definition.

Definition Let 4 be an 9 -module. A nonzero element #4 for which #~ for some nonzero 9 is called a torsion element of 4 . A module that has no nonzero torsion elements is said to be torsion-free . If all elements of 4 are torsion elements then 4 is a torsion module . The set of all torsion elements of 44, together with the zero element, is denoted by tor .

If 44 is a module over an integral domain , it is not hard to see that tor is a submodule of 44°4 and that tor is torsion-free. (We will define quotient modules shortly: they are defined in the same way as for vector spaces.) Annihilators Closely associated with the notion of a torsion element is that of an annihilator.

Definition Let 49 be an -module. The annihilator of an element #4 is ann²#³~¸ 9“ #~¹ and the annihilator of a submodule 54 of is ann²5³ ~ ¸  9 “ 5 ~ ¸¹¹ where 5 ~ ¸ # “ #  5¹ . Annihilators are also called order ideals .

It is easy to see that ann²#³ and ann ²5³ are ideals of 9 . Clearly, #  4 is a torsion element if and only if ann²#³ £ ¸¹ .

Let 4 ~º" ÁÃÁ" » be a finitely generated torsion module over an integral domain 9 . Then for each  there is a nonzero  ann ²" ³ . Hence, the nonzero product ~Ä annihilates each generator of 4 and therefore every element of 4 , that is,  ann ²4³ . This shows that ann ²4³ £ ¸¹ . Free Modules The definition of a basis for a module parallels that of a basis for a vector space. 100 Advanced Linear Algebra

Definition Let 49 be an -module. A subset 88 of 4 is a basis if is linearly independent and spans 4 . An 9 -module 4 is said to be free if 4 ~ ¸¹ or if 444 has a basis. If 88 is a basis for , we say that is free on .

Theorem 4.3 A subset 8 of a module 4 is a basis if and only if for every #4, there are unique elements # ÁÃÁ# 8 and unique scalars ÁÃÁ  9 for which

#~  # bÄb  # In a vector space, a set of vectors is a basis if and only if it is a minimal spanning set, or equivalently, a maximal linearly independent set. For modules, the following is the best we can do in general. We leave proof to the reader.

Theorem 4.4 Let 8 be a basis for an 94 -module . Then 1) 8 is a minimal spanning set. 2) 8 is a maximal linearly independent set.

The {{ -module  has no basis since it has no linearly independent sets. But since the entire module is a spanning set, we deduce that a minimal spanning set need not be a basis. In the exercises, the reader is asked to give an example of a module 4 that has a finite basis, but with the property that not every spanning set in 44 contains a basis and not every linearly independent set in is contained in a basis. It follows in this case that a maximal linearly independent set need not be a basis.

The next example shows that even free modules are not very much like vector spaces. It is an example of a free module that has a submodule that is not free.

Example 4.5 The set {{d is a free module over itself, using componentwise scalar multiplication ²Á ³²Á ³ ~ ²Á ³ with basis ¸²Á ³¹ . But the submodule { d ¸¹ is not free since it has no linearly independent elements and hence no basis. Homomorphisms The term linear transformation is special to vector spaces. However, the concept applies to most algebraic structures.

Definition Let 459 and be -modules. A function  ¢4¦59 is an - homomorphism if it preserves the module operations, that is, ² " b #³ ~ ²"³ b ²#³ for all Á  9 and "Á #  4 . The set of all 9 -homomorphisms from 4 to 5 is denoted by hom9²4Á 5³ . The following terms are also employed: Modules I: Basic Properties 101

1) An 994 -endomorphism is an -homomorphism from to itself. 2) A 99 -monomorphism or - embedding is an injective 9 -homomorphism. 3) An 99 -epimorphism is a surjective -homomorphism. 4) An 99 -isomorphism is a bijective -homomorphism.

It is easy to see that hom9²4Á 5³ is itself an 9 -module under addition of functions and scalar multiplication defined by ²  ³²#³ ~ ² ²#³³ ~ ² #³

Theorem 4.5 Let ²4Á5³hom9 . The kernel and image of , defined as for linear transformations by ker²³~¸#4“ ²#³~¹ and im²³~¸²#³“#4¹ are submodules of 45 and , respectively. Moreover,  is a monomorphism if and only if ker² ³ ~ ¸¹ .

If 594¢5¦4 is a submodule of the -module then the map defined by ²#³ ~ # is evidently an 9 -monomorphism, called injection of 5 into 4 . Quotient Modules The procedure for defining quotient modules is the same as that for defining quotient vector spaces. We summarize in the following theorem.

Theorem 4.6 Let :94 be a submodule of an -module . The binary relation "–#¯"c#: is an equivalence relation on 4 , whose equivalence classes are the cosets #b:~¸#b “ :¹ of :4 in . The set 4°: of all cosets of :4 in , called the quotient module of 4:9 modulo , is an -module under the well-defined operations ²"b:³b²#b:³~²"b#³b: ²"b:³~ "b: The zero element in 4°: is the coset b: ~ : .

One question that immediately comes to mind is whether a quotient space of a free module need be free. As the next example shows, the answer is no.

Example 4.6 As a module over itself, { is free on the set ¸¹ . For any  €  , the set {{~¸'“' ¹ is a free cyclic submodule of { , but the quotient { - 102 Advanced Linear Algebra

module {{° is isomorphic to { via the map {²" b ³ ~ "mod  and since {{ is not free as a -module, neither is {{° . The Correspondence and Isomorphism Theorems The correspondence and isomorphism theorems for vector spaces have analogs for modules.

Theorem 4.7 ( The correspondence theorem ) Let :4 be a submodule of . Then the function that assigns to each intermediate submodule :‹;‹4 the quotient submodule ;°: of 4°: is an order-preserving (with respect to set inclusion) one-to-one correspondence between submodules of 4: containing and all submodules of 4°: .

Theorem 4.8 (The first isomorphism theorem ) Let ¢4 ¦5 be an 9 - homomorphism. Then the map Z¢4°ker ² ³¦5 defined by Z²# bker ² ³³ ~ ²#³ is an 9 -embedding and so 4 š²³im  ker²³

Theorem 4.9 (The second isomorphism theorem ) Let 49 be an -module and let :; and be submodules of 4 . Then :b; : š ;:q; Theorem 4.10 (The third isomorphism theorem ) Let 49 be an -module and suppose that :‹; are submodules of 4 . Then 4°: 4 š ;:° ;

Direct Sums and Direct Summands The definition of direct sum is the same for modules as for vector spaces. We will confine our attention to the direct sum of a finite number of modules.

Definition An 94 -module is the direct sum of the submodules :ÁÃÁ: , written

4~: lÄl: if every #4 can be written, in a unique way (except for order), as a sum of Modules I: Basic Properties 103

one element from each of the submodules :": , that is, there are unique for which

#~" bÄb"

In this case, each :44~:l;: is called a direct summand of . If then is said to be complemented and ;:4 is called a complement of in .

Note that a sum is direct if and only if whenever "bÄb"~ where ": then "~  for all  , that is, if and only if  has a unique representation as a sum of vectors from distinct submodules.

As with vector spaces, we have the following useful characterization of direct sums.

Theorem 4.11 A module 4 is the direct sum of submodules : ÁÃÁ: if and only if 1) 4~: bÄb: 2) For each ~ÁÃÁ

: q45 : ~ ¸¹ £

In the case of vector spaces, every subspace is a direct summand, that is, every subspace has a complement. However, as the next example shows, this is not true for modules.

Example 4.7 The set {{ of integers is a -module. Since the submodules of { are precisely the ideals of the ring {{ and since is a principal ideal domain, the submodules of { are the sets º»~{{ ~¸'“' ¹ Hence, any two nonzero proper submodules of { have nonzero intersection, for if £€ then {{q ~ {  where ~lcm ¸Á¹ . It follows that the only complemented submodules of { are { and ¸¹ .

In the case of vector spaces, there is an intimate connection between subspaces and quotient spaces, as we saw in Theorem 3.6. The problem we face in generalizing this to modules in general is that not all submodules have a complement. However, this is the only problem.

Theorem 4.12 Let :4 be a complemented submodule of . All complements of :4°: are isomorphic to and hence to each other. 104 Advanced Linear Algebra

Proof. For any complement ;: of , the first isomorphism theorem applied to the projection ¢4 ¦; onto ; along : gives ; š4°: . Direct Summands and One-Sided Invertibility The next theorem characterizes direct summands, but first a definition.

Definition A submodule :9 of an -module 4 is a retract of 4 by an 9 - homomorphism ¢4 ¦: if fixes each element of : , that is,  ² ³~ for all :.

Note that the homomorphism  in the definition of a retract is similar to a projection map onto : , in that both types of maps are the identity when restricted to :: . In fact, if is complemented and ; is a complement of :: then is a retract of 4:; by the projection onto along . Indeed, more can be said.

Theorem 4.13 A submodule :9 of the -module 4 is complemented if and only if it is a retract of 4:4 . In this case, if is a retract of by  then is projection onto :²³ along ker  and so 4~:lker ²³~im ²³l ker ²³ 

Proof. If 4~:l; then : is a retract of 4 by the projection map : ¢4 ¦:. Conversely, if ¢4 ¦: is an 9 -homomorphism that fixes : then clearly  is surjective and :~im ² ³ . Also, #:qker ² ³¬#~ ²#³~ and for any #4 we have # ~ ´# c ²#³µ b ²#³ ker ²  ³ b : Hence, 4~:lker ² ³.

Definition Let ¢(¦) be a module homomorphism. Then a left inverse of is a module homomorphism 33¢) ¦( for which k ~ . A right inverse of  is a module homomorphism 99¢) ¦( for which k ~ .

It is easy to see that in order for  to have a left inverse 3 , it must be injective since

²³ ~  ²³ ¬ 33 k ²³ ~  k ²³ ¬  ~  and in order for  to have a right inverse 9 , it must be surjective, since if ) then ~ ´9 ²³µim ²  ³.

Now, if we were dealing with functions between sets, then the converses of these statements would hold:  is left-invertible if and only if it is injective and Modules I: Basic Properties 105

 is right-invertible if and only if it is surjective. However, for modules things are more complicated.

Theorem 4.14 1) An 9¢(¦) -homomorphism  has a left inverse 3 if and only if it is injective and im²³ is a direct summand of ) , in which case

)~²³l²³š²³l²³imker33 im  ker  3

2) An 9¢(¦) -homomorphism  has a right inverse 9 if and only if it is surjective and ker²³ is a direct summand of ) , in which case

(~ker ²³lim ²9 ³š ker ²³l  im ²³

Proof. For part 1), suppose first that 3 ~ ( . Then  is injective since applying 3 to the expression ²%³ ~ ²&³ gives % ~ & . Also, %im ² ³qker ²3 ³ implies that %~  ²³ and

~33 ²%³~ ² ²³³~ and so %~ . Hence, the direct sum im ² ³lker ²3 ³ exists. For any ) , we can write

 ~33 ²³ b ´ c  ²³µ where 3333²³ im ²  ³ and  c  ²³ ker ²  ³ and so ) ~ im ²  ³ l ker ²  ³ .

Conversely, if  is injective and )~2lim ² ³ for some submodule 2 then let

c 3 ~k im²³ where im²³ is projection onto im²(³ . This is well-defined since ¢ ( ¦ im ² ³ is an isomorphism. Then

c c 3 k²³~  k im²³ k²³~   k²³~  and so 33 is a left-inverse of . It is clear that 2~ker ²  ³ and since 33¢²³¦²³im im  is injective, im ²³š²³  im  3 .

For part 2), if  has a right inverse 99 then has a left inverse and so 9 must be injective and

(~²³l²³š²³l²³im9 ker im ker Conversely, suppose that (~?lker ² ³ . Since the elements of different cosets of ker²³ are mapped to different elements of ) , it is natural to define 99¢ ) ¦ ( by taking ²³ to be a particular element in the coset  b ker ²  ³ that is sent to  by  . However, in general we cannot pick just any element of 106 Advanced Linear Algebra the coset and expect to get a module morphism. (We get a right inverse but only as a set function.)

However, the condition (~?lker ² ³ is precisely what we need, because it says that the elements of the submodule ? form a set of distinct coset representatives; that is, each %? belongs to exactly one coset and each coset contains exactly one element of ? .

In addition, if ²%³~ and ²%³~  for %Á%?  then

² % b %³~ ²%³b  ²%³~    b 

Thus, we can define 9 as follows. For any ) there is a unique %? for which ²%³ ~  . Let 9 ²³ ~ % . Then we have

²9 ²³³~  ²%³~ and so k~9)  . Also,

9²  b ³~ %   b %  ~ 9 ²³b 9 ²³ and so 9 is a module morphism. Thus is right-invertible.

The last part of the previous theorem is worth further comment. Recall that if ¢= ¦> is a linear transformation on vector spaces then =šker ²³lim ²³ This does not hold in general for modules, but it does hold if ker²³ is a direct summand. Modules Are Not As Nice As Vector Spaces Here is a list of some of the properties of modules (over commutative rings with identity) that emphasize the differences between modules and vector spaces.

1) A submodule of a module need not have a complement. 2) A submodule of a finitely generated module need not be finitely generated. 3) There exist modules with no linearly independent elements and hence with no basis. 4) A minimal spanning set or maximal linearly independent set is not necessarily a basis. 5) There exist free modules with submodules that are not free. 6) There exist free modules with linearly independent sets that are not contained in a basis and spanning sets that do not contain a basis.

Recall also that a module over a noncommutative ring may have bases of different sizes. However, all bases for a free module over a commutative ring with identity have the same size, as we will prove in the next chapter. Modules I: Basic Properties 107

Exercises 1. Give the details to show that any commutative ring with identity is a module over itself. 2. Let : ~¸# ÁÃÁ# ¹ be a subset of a module 4 . Prove that 5 ~º:» is the smallest submodule of 4: containing . First you will need to formulate precisely what it means to be the smallest submodule of 4: containing . 3. Let 49 be an -module and let 0 be an ideal in 904 . Let be the set of all finite sums of the form

 # bÄb  #

where 0 and #4 . Is 04 a submodule of 4 ? 4. Show that if :; and are submodules of 4 then (with respect to set inclusion) :q;~glb ¸:Á;¹ and :b;~ lub ¸:Á;¹

5. Let :‹:‹Ä be an ascending sequence of submodules of an 9 - module 4:4 . Prove that the union   is a submodule of . 6. Give an example of a module 4 that has a finite basis but with the property that not every spanning set in 4 contains a basis and not every linearly independent set in 4 is contained in a basis. 7. Show that, just as in the case of vector spaces, an 9 -homomorphism can be defined by assigning arbitrary values on the elements of a basis and extending by linearity. 8. Let 8²4Á5³9hom9 be an -isomorphism. If is a basis for 4 , prove that 8²³~¸²³“¹  8 is a basis for 5 . 9. Let 4 be an 9 -module and let  hom9 ²4Á4³ be an 9 -endomorphism. If  is idempotent , that is, if  ~ show that 4~ker ² ³lim ² ³ Does the converse hold? 10. Consider the ring 9~-´%Á&µ of polynomials in two variables. Show that the set 49 consisting of all polynomials in that have zero constant term is an 949 -module. Show that is not a free -module. 11. Prove that 994 is an integral domain if and only if all -modules have the following property: If #ÁÃÁ# is linearly independent over 9 then so is # ÁÃÁ # for any nonzero 9 . 12. Prove that if a commutative ring 9 with identity has the property that every finitely generated 99 -module is free then is a field. 13. Let 459 and be -modules. If : is a submodule of 4; and is a submodule of 5 show that 4l5 4 5 šl :l; : ; 108 Advanced Linear Algebra

14. If 99 is a commutative ring with identity and ?? is an ideal of then is an 9-module. What is the maximum size of a linearly independent set in ? ? Under what conditions is ? free? 15. a) Show that for any module 44 over an integral domain the set tor of all torsion elements in a module 44 is a submodule of . b) Find an example of a ring 99 with the property that for some -module 44 the set tor is not a submodule. c) Show that for any module 4 over an integral domain, the quotient module 4°4tor is torsion-free. 16. Fix a prime  and let  4~FG “Á{ Á‚ 

Show that 4 is a { -module and that the set , ~ ¸° “  ‚ ¹ is a minimal spanning set for 4, . Is the set linearly independent? 17. Let 5 be an abelian group together with a scalar multiplication over a ring 99# that satisfies all of the properties of an -module except that does not necessarily equal ##55 for all . Show that can be written as a direct sum of an 95 -module  and another “pseudo 9 -module” 5 . 18. Prove that hom9²4Á 5³ is an 9 -module under addition of functions and scalar multiplication defined by ²  ³²#³ ~ ² ²#³³ ~ ² #³

19. Prove that any 94 -module is isomorphic to the 9 -module hom9 ²9Á4³ . 20. Let 9: and be commutative rings with identity and let ¢9¦: be a ring homomorphism. Show that any :9 -module is also an -module under the scalar multiplication # ~ ² ³#

21. Prove that hom{²{{ Á ³ š {  where  ~ gcd ²Á ³ . 22. Suppose that 9 is a commutative ring with identity. If ?@ and are ideals of 99°š9°9 for which ?@ as -modules then prove that ?@ ~ . Is the result true if 9°?@ š 9° as rings? Chapter 5 Modules II: Free and Noetherian Modules

The Rank of a Free Module Since all bases for a vector space = have the same cardinality, the concept of vector space dimension is well-defined. A similar statement holds for free 9 - modules when the base ring is commutative (but not otherwise).

Theorem 5.1 Let 49 be a free module over a commutative ring with identity. 1) Then any two bases of 4 have the same cardinality. 2) The cardinality of a spanning set is greater than or equal to that of a basis. Proof. The plan is to find a vector space = with the property that, for any basis for 4= , there is a basis of the same cardinality for . Then we can appeal to the corresponding result for vector spaces.

Let ?? be a maximal ideal of 99° , which exists by Theorem 0.22. Then is a field. Our first thought might be that 49° is a vector space over ? but that is not the case. In fact, scalar multiplication using the field 9°? ² b? ³# ~ # is not even well-defined, since this would require that ?4 ~ ¸¹ . On the other hand, we can fix precisely this problem by factoring out the submodule

??4~¸# bÄb#  “  Á#4¹  Indeed, 4°?? 4 is a vector space over 9° , with scalar multiplication defined by ² b?? ³²" b 4³ ~ " b ? 4 To see that this is well-defined, we must show that the conditions b?? ~ Z b "b?? 4 ~"Z b 4 imply 110 Advanced Linear Algebra

" b?? 4 ~ ZZ " b 4 But this follows from the fact that "c "ZZ ~ ²"c"³b² c ³" Z Z Z ? 4 Hence, scalar multiplication is well-defined. We leave it to the reader to show that 4°?? 4 is a vector space over 9° .

Consider now a set 8 ~¸“0¹‹4 and the corresponding set 4 8?b4~¸b4“0¹‹ ?  ?4 If 88??? spans 49 over then b44°49° spans over . To see this, note that any #4 has the form #~ '  for  9 and so

# b????? 4 ~45   b 4 ~  ²  b 4³ ~  ²  b ³²  b 4³ which shows that 8?b4 spans 4°4 ? .

Now suppose that 8 ~¸“0¹ is a basis for 4 over 9 . We claim that 8?b4 is a basis for 4°4 ? over 9° ? . We have seen that 8? b4 spans 4°? 4. Also, if

 ²  b??? ³² b 4³ ~ 4 then   ? 4 and so    ~   ~ ~ where  ?8? . From the linear independence of we deduce that  for all  and so b ?? ~ . Hence 8? b 4 is linearly independent and therefore a basis, as desired.

To see that ((88?~ ( b4 ( , note that if b4~b4 ? ? then  c~   ~ where ?? . If £ then ~ , which is not possible since ? is a maximal ideal. Hence, ~ .

Thus, if 8 is a basis for 49 over then

((88?~ ( b 4 ( ~dim9°? ²4° ? 4³ Modules II: Free and Noetherian Modules 111 and so all bases for 49 over have the same cardinality, which proves part 1). Moreover, if 88?? spans 49 over then b44°4 spans and so

dim9°? ²4°?8?8 4³ (((( b 4  Thus, 8 has cardinality at least as great as that of any basis for 49 over .

The previous theorem allows us to define the rank of a free module. (The term dimension is not used for modules in general.)

Definition Let 94³ be a commutative ring with identity. The rank rk² of a nonzero free 94 -module is the cardinality of any basis for 4 . The rank of the trivial module ¸¹ is  .

Theorem 5.1 fails if the underlying ring of scalars is not commutative. The next example describes a module over a noncommutative ring that has the remarkable property of possessing a basis of size  for any positive integer .

Example 5.1 Let =- be a vector space over with a countably infinite basis 8B~¸ÁÁù . Let ²=³ be the ring of linear operators on = . Observe that B²= ³ is not commutative, since composition of functions is not commutative.

The ring BB²= ³ is an ²= ³ -module and as such, the identity map  forms a basis for BB²= ³ . However, we can also construct a basis for ²= ³ of any desired finite size ~ . To understand the idea, consider the case and define the operators  and by

² ³ ~   Á b ² ³ ~  and

² ³ ~ Á b ² ³ ~   These operators are linearly independent essentially because they are surjective and their supports are disjoint. In particular, if

b~ then

 ~ ² b  ³² ³ ~ ² ³ and

 ~ ²b b  ³² ³ ~ ² ³ which shows that ~ and ~ . Moreover, if B ²=³ then we define  and  by 112 Advanced Linear Algebra

² ³ ~ ² ³ ²b ³ ~ ² ³ from which it follows easily that

~ b which shows that ¸Á ¹ is a basis for B ²=³ .

More generally, we begin by partitioning 8 into  blocks. For each ~ÁÃÁc, let

8 ~¸“– mod ¹

Now we define elements B ²=³ by

 ²b! ³ ~ !Á  where ! and where !Á is the Kronecker delta function. These functions are surjective and have disjoint support. It follows that 9c~¸0 ÁÃÁ  ¹ is linearly independent. For if B ²=³ and

~ bÄb  cc  then, applying this to b! gives

 ~!! ² b! ³ ~  ! ²  ³ for all ~ . Hence, ! .

Also, 9B spans ²= ³ , for if B  ²= ³ , we define B  ²= ³ by

 ² ³ ~ ² + ³ to get

² b Ä b  cc!  ³²++ ³ ~  !! ² ! ³ ~  ! ²  ³ ~  ² ! + ³ and so

~bÄb  cc

Thus, 9c~¸0 ÁÃÁ  ¹ is a basis for B ²=³ of size  .

We have spoken about the cardinality of minimal spanning sets. Let us now speak about the cardinality of maximal linearly independent sets.

Theorem 5.2 Let 949 be an integral domain and let be a free -module of finite rank ²4³ . Then all linearly independent sets have size at most rk . Modules II: Free and Noetherian Modules 113

Proof. Since 4š9b if we prove the result for 9 it will hold for 4 . Let 9 be the field of quotients of 9 . Then 9b is a subset of the vector space ²9 ³ and

1) 9²9³b is a subgroup of under addition 2) scalar multiplication by elements of 999 is defined and  is an -module, 3) scalar multiplication by elements of 99b is defined but is not closed under this scalar multiplication.

b Now, if 88~¸#ÁÃÁ#¹‹9 is linearly dependent over 9 then is clearly linearly dependent over 9 . Conversely, suppose that 8 is linearly independent over 9 and

 #bÄb #~  where £ for all  and £ for some  . Multiplying by ~ Ä £ produces a nontrivial linear dependency over 9

# bÄb #  ~  which implies that ~ for all  . Thus 8 is linearly dependent over 9 if and only if it is linearly dependent over 9²9³bb . Of course, in the vector space all sets of size b or larger are linearly dependent over 9b and hence all subsets of 9b of size or larger are linearly dependent over 9 .

Recall that if )=-= is a basis for a vector space over then is isomorphic to ) the vector space ²- ³ of all functions from ) to - that have finite support. A ) similar result holds for free 9²9³ -modules. We begin with the fact that  is a free 9 -module. The simple proof is left to the reader.

) Theorem 5.3 Let ) be any set and let 9 be a ring. The set ²9 ³ of all functions from )9 to that have finite support is a free 9 -module of rank (() with basis 8~¸ ¹ where %~if  ²%³ ~  F %£if

) This basis is referred to as the standard basis for ²9 ³ .

Theorem 5.4 Let 49 be an -module. If ) is a basis for 44 then is ) isomorphic to ²9 ³ . ) Proof. Consider the map ¢4 ¦²9 ³ defined by setting

²³ ~  where  is defined in Theorem 5.3 and extending this to all of 4 by linearity, 114 Advanced Linear Algebra that is,

²  bÄb   ³~  bÄb 

) Since 8 maps a basis for 4 to a basis ~ ¸ ¹ for ²9 ³ , it follows that is ) an isomorphism from 4²9³ to  .

Theorem 5.5 Two free 9 -modules (over a commutative ring) are isomorphic if and only if they have the same rank. Proof. If 4š5 then any isomorphism  from 4 to 5 maps a basis for 4 to a basis for 5²4³~²5³ . Since  is a bijection, we have rk rk . Conversely, suppose that rk²4³ ~ rk ²5³ . Let 89 be a basis for 4 and let be a basis for 5 . Since ((89~¢¦ (( , there is a bijective map 89 . This map can be extended by linearity to an isomorphism of 45 onto and so 4š5 . Free Modules and Epimorphisms Homomorphic images that are free have some behavior reminiscent of vector spaces.

Theorem 5.6 1) If ¢4 ¦- is a surjective 9 -homomorphism and - is free then ker ² ³ is complemented and 4~ker ²^ ³l5š ker ² ³ - where 5š-. 2) If :44°:: is a submodule of and if is free then is complemented and 4 4š:^ : If in addition, 4Á: and 4°: are free then 4 rk²4³ ~ rk ²:³ b rk67 : and if the ranks are all finite then 4 rk67~²4³c²:³ rk rk :

Proof. For part 1), we prove that 8 is right-invertible. Let ~¸#“0¹ be a basis for -¢-¦)²#³ . Define 99 by setting equal to any member of the c nonempty set ²#9 ³ and extending to an 9 -homomorphism. Then  9 is a right inverse of  and so Theorem 4.14 implies that ker²³ is a direct summand of 44š²³- and ker ^ . Part 2) follows from part 1), where  ~ : is projection onto : . Modules II: Free and Noetherian Modules 115

Noetherian Modules One of the most desirable properties of a finitely generated 94 -module is that all of its submodules be finitely generated. Example 4.2 shows that this is not always the case and leads us to search for conditions on the ring 9 that will guarantee that all submodules of a finitely generated module are themselves finitely generated.

Definition An 94 -module is said to satisfy the ascending chain condition on submodules (abbreviated a.c.c.) if any ascending sequence of submodules

:‹:‹:‹Ä of 4 is eventually constant, that is, there exists an index for which

:~: b ~:k b ~Ä Modules with the ascending chain condition on submodules are also called noetherian modules (after Emmy Noether, one of the pioneers of module theory).

Theorem 5.7 An 94 -module is noetherian if and only if every submodule of 4 is finitely generated. Proof. Suppose that all submodules of 44 are finitely generated and that contains an infinite ascending sequence

:‹:‹:‹Ä3 (5.1) of submodules. Then the union

:~ :  is easily seen to be a submodule of 4: . Hence, is finitely generated, say

: ~º" ÁÃÁ" ». Since "  : , there exists an index   such that "  :  . Therefore, if  ~max ¸ ÁÃÁ ¹ , we have

¸" ÁÃÁ" ¹‹: and so

: ~º"bb ÁÃÁ" »‹: ‹: ‹: ‹Ä‹: which shows that the chain (5.1) is eventually constant.

For the converse, suppose that 4: satisfies the a.c.c on submodules and let be a submodule of 4 . Pick "  : and consider the submodule : ~ º" » ‹ : generated by " . If : ~: then : is finitely generated. If :  £: then there is a " :c:. Now let :  ~º" , "» . If : ~: then : is finitely generated. If :£: then pick ":c:333 and consider the submodule :~º"""»  , , . 116 Advanced Linear Algebra

Continuing in this way, we get an ascending chain of submodules

º" » ‹ º" Á " » ‹ º" Á " Á " » ‹ Ä ‹ : If none of these submodules is equal to : , we would have an infinite ascending chain of submodules, each properly contained in the next, which contradicts the fact that 4:~º"ÁÃÁ"» satisfies the a.c.c. on submodules. Hence,  , for some : and so is finitely generated.

Since a ring 99 is a module over itself and since the submodules of the module are precisely the ideals of the ring 9 , the preceding discussion may be formulated for rings as follows.

Definition A ring 9 is said to satisfy the ascending chain condition on ideals if any ascending sequence

???‹‹‹Ä of ideals of 9 is eventually constant, that is, there exists an index for which

??bb~~~Ä ?2 A ring that satisfies the ascending chain condition on ideals is called a noetherian ring.

Theorem 5.8 A ring 99 is noetherian if and only if every ideal of is finitely generated.

Note that a ring 9 is noetherian as a ring if and only if it is noetherian as a module over itself. More generally, a ring 9 is noetherian if and only if every finitely generated 9 -module is noetherian.

Theorem 5.9 Let 9 be a commutative ring with identity. 1) 99 is noetherian if and only if every finitely generated -module is noetherian. 2) If, in addition, 94 is a principal ideal domain then if is generated by elements any submodule of 4 is generated by at most elements. Proof. For part 1), one direction is evident. Assume that 9 is noetherian and let 4 ~º" ÁÃÁ" » be a finitely generated 9 -module. Consider the epimorphism ¢9 ¦4 defined by

²  ÁÃÁ ³~ " bÄb " Let :4 be a submodule of . Then c²:³ ~ ¸"  9  “ ²"³  :¹ is a submodule of 9c and  ² ²:³³ ~ : . If every submodule of 9  is finitely c c generated, then ²:³ is finitely generated and so ²:³ ~ º# Á à Á # » . Then Modules II: Free and Noetherian Modules 117

: is finitely generated by ¸ ²# ³Á à Á ²# ³¹ . Hence, it is sufficient to show that every submodule 49 of  is finitely generated. We proceed by induction on  .

If ~ , then 4 is an ideal of 9 and is thus finitely generated by assumption. Assume that every submodule of 9 is finitely generated for all  and let :9 be a submodule of  .

If € , we can extract from : something that is isomorphic to an ideal of 9 and so will be finitely generated. In particular, let : be the “last coordinates” in :, specifically, let

:cc ~ ¸²Á à Á Á  ³ “ ² Á à Á  Á  ³  : for some  Á à Á   9¹

The set :9 is isomorphic to an ideal of and is therefore finitely generated, say :~º»==, where  ~¸ÁÃÁ¹  is a finite subset of :  .

Also, let

: ~ ¸#  : “ # ~ ²  Á à Á  c Á ³ for some   Á à Á  c  9¹ be the set of all elements of :: that have last coordinate equal to . Note that  is a nonempty submodule of 99c and is isomorphic to a submodule of . Hence, the inductive hypothesis implies that : is finitely generated, say :~º»==, where  is a finite subset of : .

By definition of : , each = has the form

Á ~²ÁÃÁÁ ³ for 9Á where there is a : of the form

 ~²Á ÁÃÁ Ác Á Á ³

Let === ~¸ÁÃÁ¹ . We claim that : is generated by the finite set  r  .

To see this, let #~² ÁÃÁ ³: . Then ²ÁÃÁÁ  ³: and so

 ²ÁÃÁÁ ³~  ~ for 9 . Consider now the sum

 $~   º= » ~ The last coordinate of this sum is 118 Advanced Linear Algebra

  Á ~  ~ and so the difference #c$ has last coordinate  and is thus in : ~º= » . Hence

#~²#c$³b$º==== »bº »~º r » as desired.

For part 2), we leave it to the reader to review the proof and make the necessary changes. The key fact is that :9 is isomorphic to an ideal of , which is principal. Hence, :4 is generated by a single element of . The Hilbert Basis Theorem Theorem 5.9 naturally leads us to ask which familiar rings are noetherian. The following famous theorem describes one very important case.

Theorem 5.10 ( Hilbert basis theorem ) If a ring 9 is noetherian then so is the polynomial ring 9´%µ . Proof. We wish to show that any ideal ? in 9´%µ is finitely generated. Let 3 denote the set of all leading coefficients of polynomials in ? , together with the  element of 93 . Then is an ideal of 9 .

To see this, observe that if ?3 is the leading coefficient of ²%³ and if 9 then either  ~ or else is the leading coefficient of ²%³ ? . In either case, 3 . Similarly, suppose that 3 is the leading coefficient of ²%³ ?. We may assume that deg ²%³ ~  and deg ²%³ ~  , with    . Then ²%³ ~ %c ²%³ is in ? , has leading coefficient and has the same degree as ²%³. Hence,  c is either  or it is the leading coefficient of ²%³ c ²%³ ?. In either case c  3 .

Since 39 is an ideal of the noetherian ring , it must be finitely generated, say 3~º ÁÃÁ ». Since   3 , there exist polynomials ²%³ ? with leading coefficient ²%³% . By multiplying each by a suitable power of , we may assume that

deg²%³~~ max ¸ deg ²%³¹ for all ~ÁÃÁ .

Now for ~ÁÃÁc let 3 be the set of all leading coefficients of polynomials in ? of degree 9 , together with the element of . A similar argument shows that 393 is an ideal of and so is also finitely generated.

Hence, we can find polynomials 7ÁÁ ~ ¸ ²%³Á à Á  ²%³¹ in ? whose leading coefficients constitute a generating set for 3 . Modules II: Free and Noetherian Modules 119

Consider now the finite set

c 7 ~89 7 r ¸ ²%³Á à Á   ²%³¹ ~ If @@? is the ideal generated by 7‹ then . An induction argument can be used to show that @?~ . If ²%³  ? has degree  then it is a linear combination of the elements of 7 (which are constants) and is thus in @ . Assume that any polynomial in ?@? of degree less than ²%³ is in and let have degree  .

If    then some linear combination ²%³ over 9 of the polynomials in 7 has the same leading coefficient as ²%³ and if  ‚  then some linear combination ²%³ of the polynomials

c c BC%  ²%³Á à Á %  ²%³ ‹ @ has the same leading coefficient as ²%³ . In either case, there is a polynomial ²%³ @? that has the same leading coefficient as ²%³ . Since ²%³ c ²%³  has degree strictly smaller than that of ²%³ the induction hypothesis implies that ²%³ c ²%³  @ and so ²%³ ~ ´²%³ c ²%³µ b ²%³  @ This completes the induction and shows that ?@~ is finitely generated. Exercises 1. If 49 is a free -module and  ¢4¦5 is an epimorphism then must 5 also be free? 2. Let ??? be an ideal of 99°9 . Prove that if is a free -module then is the zero ideal. 3. Prove that the union of an ascending chain of submodules is a submodule. 4. Let :944 be a submodule of an -module . Show that if is finitely generated, so is the quotient module 4°: . 5. Let :9 be a submodule of an -module. Show that if both :4°: and are finitely generated then so is 4 . 6. Show that an 94 -module satisfies the a.c.c. for submodules if and only if the following condition holds. Every nonempty collection I of submodules of 4 has a maximal element. That is, for every nonempty collection I of submodules of 4: there is an I with the property that ;I ¬;‹:. 7. Let ¢4 ¦5 be an 9 -homomorphism. a) Show that if 4²³ is finitely generated then so is im  . 120 Advanced Linear Algebra

b) Show that if ker²³ and im ²³ are finitely generated then 4~ker ² ³b: where : is a finitely generated submodule of 4 . Hence, 4 is finitely generated. 8. If 999° is noetherian and ?? is an ideal of show that is also noetherian. 9. Prove that if 9 is noetherian then so is 9´% ÁÃÁ% µ . 10. Find an example of a commutative ring with identity that does not satisfy the ascending chain condition. 11. a) Prove that an 94 -module is cyclic if and only if it is isomorphic to 9°?? where is an ideal of 9 . b) Prove that an 9 -module 4 is simple ( 4 £ ¸¹ and 4 has no proper nonzero submodules) if and only if it is isomorphic to 9°?? where is a maximal ideal of 9 . c) Prove that for any nonzero commutative ring 9 with identity, a simple 9-module exists. 12. Prove that the condition that 9 be a principal ideal domain in part 2) of Theorem 5.9 is required. 13. Prove Theorem 5.9 in the following way. a) Show that if ;‹: are submodules of 4 and if ; and :°; are finitely generated then so is : . b) The proof is again by induction. Assuming it true for any module generated by 4~º#ÁÃÁ#» elements, let b and let ZZ 4 ~º# ÁÃÁ# ». Then let ; ~: q4 in part a). 14. Prove that any 94 -module is isomorphic to the quotient of a free module -4. If is finitely generated then - can also be taken to be finitely generated. 15. Prove that if :; and are isomorphic submodules of a module 4 it does not necessarily follow that the quotient modules 4°: and 4°; are isomorphic. Prove also that if :l; š:l; as modules it does not necessarily follow that ;š; . Prove that these statements do hold if all modules are free and have finite rank. Chapter 6 Modules over a Principal Ideal Domain

We remind the reader of a few of the basic properties of principal ideal domains.

Theorem 6.1 Let 9 be a principal ideal domain. 1) An element 9 is irreducible if and only if the ideal º » is maximal. 2) An element in 9 is prime if and only if it is irreducible. 3) 9 is a unique factorization domain. 4) 9 satisfies the ascending chain condition on ideals. Hence, so does any finitely generated 94 -module . Moreover, if 4 is generated by  elements any submodule of 4 is generated by at most elements. Annihilators and Orders When 9 is a principal ideal domain all annihilators are generated by a single element. This permits the following definition.

Definition Let 949 be a principal ideal domain and let be an -module, with submodule 5²5³5 . Any generator of ann is called an order of . An order of an element #4 is an order of the submodule º#» .

Note that any two orders  and of 5#4 (or of an element ) are associates, since º»~º» . Hence, an order of 5 is uniquely determined up to multiplication by a unit of 9 . For this reason, we may occasionally abuse the terminology and refer to “the” order of an element or submodule.

Also, if ( ‹ ) ‹ 4 are submodules of 4 then ann ²)³ ‹ ann ²(³ and so any order of () divides any order of . Thus, just as with finite groups, the order of an element/submodule divides the order of the module. 122 Advanced Linear Algebra

Cyclic Modules The simplest type of nonzero module is clearly a cyclic module. Despite their simplicity, cyclic modules are extremely important and so we want to explore some of their basic properties.

Theorem 6.2 Let 9 be a principal ideal domain. 1) If º#» is a cyclic 9 -module with ann ²#³ ~ º » then the map ¢ 9 ¦ º#» defined by ² ³ ~ # is a surjective 9 -homomorphism with kernel º » . Hence 9 º#» š º» In other words, cyclic 9 -modules are isomorphic to quotient modules of the base ring 9º»99°º» . If  is a prime then is a maximal ideal in and so  is a field. 2) Any submodule of a cyclic 9 -module is cyclic. 3) Let º#» be a cyclic submodule of 4 of order  . Then º #» has order °²Á³gcd . Hence, if  and are relatively prime then º#»  also has order . 4) If " ÁÃÁ" are nonzero elements of M with orders   ÁÃÁ that are pairwise relatively prime, then the sum

#~" bÄb"

has order ~Ä . Consequently, if 4 is an 9 -module and

4~( bÄb(

where the submodules ( have orders  that are pairwise relatively prime, then the sum is direct. Proof. We leave proof of part 1) as an exercise. Part 2) follows from part 2) of Theorem 5.9. For part 3), we first consider the two extremes: when  is relatively prime to  and when “ . As to the first, let and be relatively prime. If ² #³~ then  “  . Hence, ²  Á ³~ implies that  “  and so ~°gcd ²Á³  is an order of  # .

Next, if ~  then ²  #³ ~  and so any order ²  #³ of  # divides  . But if ² #³ properly divides  then properly divides  ~ and yet annihilates # , which contradicts the fact that  is an order of #²#³ . Hence and are associates and so ~ ° ~  °gcd ²  Á ³ is an order of  # .

Now we can combine these two extremes to finish the proof. Write ~² °³ where ~gcd ² Á ³ divides  and  ° is relatively prime to  . Using the previous results, we find that ²°³# has order and so #~²°³# has order °. Modules Over a Principal Ideal Domain 123

For part 4), since  annihilates ## , the order of divides . If the order of # is a proper divisor of  then for some index  , there is a prime dividing  for which ° annihilates # . But ° annihilates each " for  £  and so

  ~ #~ " ~67 "  

However,  and °²°³" are relatively prime and so the order of   is equal to the order of " , which contradicts the equation above. Hence, the order of # is  . Finally, to see that the sum above is direct, note that if

#bÄb#~ where #( then each #  must be  , for otherwise the order of the sum on the left would be different from  . Free Modules over a Principal Ideal Domain Example 4.5 showed that a submodule of a free module need not be free. (The submodule {{{d ¸¹ of d is not free.) However, if 9 is a principal ideal domain this cannot happen.

Theorem 6.3 Let 49 be a free module over a principal ideal domain . Then any submodule : of 4 is also free and rk ²:³  rk ²4³ . Proof. We will give the proof only for modules of finite rank, although the theorem is true for all free modules. Thus, since 4š9 where ~rk ²4³ we may in fact assume that 4~9 . Our plan is to proceed by induction on  .

For ~ , we have 4~9 and any submodule : of 9 is just an ideal of 9 . If : ~ ¸¹ then : is free by definition. Otherwise, : ~ º» for some  £  . But since 9 £ £¸¹ is an integral domain, we have for all and so is a basis for : . Thus, : is free and rk ²:³ ~  ~ rk ²4³ .

Now assume that if  then any submodule : of 9 is free and rk ²:³ . Let :9 be a submodule of  . Let

: ~ ¸#  : “ # ~ ²  Á à Á  c Á ³ for some   Á à Á  c  9¹ and

:cc ~ ¸²Á à Á Á  ³ “ ² Á à Á  Á  ³  : for some  Á à Á   9¹

Note that :: and are nonempty.

c Since :9 is isomorphic to a submodule of , the inductive hypothesis implies that : is free. Let 88 be a basis for : with ((   c  . If : ~ ¸¹ then take 8 ~J . 124 Advanced Linear Algebra

Now, :9 is isomorphic to a submodule (ideal) of and is therefore also free of rank at most  . If : ~ ¸¹ then all elements of : have zero final coordinate, which means that :~: , which is free with rank at most  , as desired. So assume that :¸¹: is not trivial and let be a basis for where ~²ÁÃÁÁ ³ for £ 9 . Let : satisfy

~² c ÁÃÁ Á ³ We claim that 88r¸¹ is a basis for : . To see that r¸¹ generates : let #~² ÁÃÁ ³:. Then ²ÁÃÁÁ  ³: and so

²ÁÃÁÁ ³~ ~²ÁÃÁÁ ³ for 9 . Thus  ~ and

 ~ ² cc Á Ã Á Á ³ ~ ² Á Ã Á Á  ³

Hence the difference #c  is in : ~º8 » . We then have #~²#c ³b º88 »bº»~º r¸¹» and so 88r¸¹ generates : . Finally, to see that r¸¹ is linearly independent, note that if 8 ~¸#ÁÃÁ#¹ and if

 # bÄb c # b~ then comparing  ~9 th coefficients gives . Since is an integral domain and £ we deduce that ~ . It follows that  ~ for all  . Thus 8 r¸¹ is a basis for : and the proof is complete.

If = is a vector space of dimension then any set of linearly independent vectors in == is a basis for . This fails for modules. For example, {{ is a - module of rank  but the independent set ¸¹ is not a basis. On the other hand, the fact that a spanning set of size  is a basis does hold for modules over a principal ideal domain, as we now show.

Theorem 6.4 Let 49 be a free -module of rank 9 , where is a principal ideal domain. Let : ~¸  ÁÃÁ ¹ be a spanning set for 4 . Then : is a basis for 4. Proof. Let 8~¸ÁÃÁ ¹ be a basis for 4 and define the map ¢4¦4 by ² ³ ~ and extending to a surjective 9 -homomorphism. Since 4 is free, Theorem 5.6 implies that 4šker ²³^im ²³~  ker ²³ ^ 4 Since ker²³ is a submodule of the free module and since 9 is a principal ideal domain, we know that ker²³ is free of rank at most  . It follows that Modules Over a Principal Ideal Domain 125

rk²4³ ~ rk ²ker ² ³³ b rk ²4³ and so rk²ker ² ³³~ , that is, ker ² ³~¸¹ , which implies that  is an 9 - isomorphism and so : is a basis.

In general, a basis for a submodule of a free module over a principal ideal domain cannot be extended to a basis for the entire module. For example, the set ¸¹ is a basis for the submodule {{ of the -module { , but this set cannot be extended to a basis for { itself. We state without proof the following result along these lines.

Theorem 6.5 Let 49 be a free -module of rank 9 , where is a principal ideal domain. Let 54 be a submodule of that is free of rank . Then there is a basis 8 for 4:~¸#ÁÃÁ#¹ that contains a subset  for which ¸  # ÁÃÁ  # ¹ is a basis for 5 , for some nonzero elements  ÁÃÁ  of 9 . Torsion-Free and Free Modules Let us explore the relationship between the concepts of torsion-free and free. It is not hard to see that any free module over an integral domain is torsion-free. The converse does not hold, unless we strengthen the hypotheses by requiring that the module be finitely generated.

Theorem 6.6 Let 4 be a torsion-free finitely generated module over a principal ideal domain 94 . Then is free. Thus, a finitely generated module over a principal ideal domain is free if and only if it is torsion free. Proof. Let . ~¸# ÁÃÁ# ¹ be a generating set for 4 . Consider first the case ~, whence .~¸#¹ . Then . is a basis for 4 since singleton sets are linearly independent in a torson-free module. Hence, 4 is free.

Now suppose that . ~¸"Á#¹ is a generating set with "Á#£ . If . is linearly independent, we are done. If not, then there exist nonzero Á  9 for which "~ #. It follows that 4~ º"Á#»‹º"» and so 4 is a submodule of a free module and is therefore free by Theorem 6.3. But the map ¢4 ¦ 4 defined by ²#³ ~ # is an isomorphism because 4 is torsion-free. Thus 4 is also free.

Now we can do the general case. Write

. ~¸"c ÁÃÁ" Á# ÁÃÁ# ¹ where : ~¸" ÁÃÁ" ¹ is a maximal linearly independent subset of . . (Note that : is nonempty because singleton sets are linearly independent.)

For each #¸"ÁÃÁ"Á#¹ , the set is linearly dependent and so there exist  9 and ÁÃÁ 9 for which 126 Advanced Linear Algebra

# b "   bÄb  "  ~

If ~Äc then

4~º"c ÁÃÁ" Á# ÁÃÁ# »‹º" ÁÃÁ" » and since the latter is a free module, so is 4 , and therefore so is 4 . Prelude to Decomposition: Cyclic Modules The following result shows how cyclic modules can be composed and decomposed.

Theorem 6.7 Let 49 be an -module. 1) If " ÁÃÁ" are nonzero elements of M with orders   ÁÃÁ that are pairwise relatively prime, then

º" b Ä b " » ~ º" » l Ä l º" »

2) If #4 has order  ~ Ä where   ÁÃÁ   are pairwise relatively prime, then # can be written in the form

#~" bÄb"

where " has order  . Moreover,

º#»~º"»lÄlº"» Proof. According to Theorem 6.2, the order of # is  and the sum on the right is direct. It is clear that º" bÄb" »‹º"»lÄlº"» . For the reverse inclusion, since  and ° Á are relatively prime, there exist 9 for which  b ~  Hence  "~67  b "~ "~ ²"bÄb"³º"bÄb"» 

Similarly, " º" bÄb"  » for all  and so we get the reverse inclusion.

For part 2), the scalars ~° are relatively prime and so there exist 9  for which

bÄb~  Hence,

#~² bÄb  ³#~  #bÄb  #

Since the order of # divides  , these orders are pairwise relatively prime. Modules Over a Principal Ideal Domain 127

Hence, the order of the sum on the right is the product of the orders of the terms and so # must have order  . The second statement follows from part 1). The First Decomposition The first step in the decomposition of a finitely generated module 4 over a principal ideal domain 9 is an easy one.

Theorem 6.8 Any finitely generated module 49 over a principal ideal domain is the direct sum of a free 99 -module and a torsion -module

4~4free l4 tor

As to uniqueness, the torsion part 4tor is unique (it must be the set of all torsion elements of 44 ) whereas the free part free is not unique. However, all possible free summands are isomorphic and thus have the same rank. Proof. As to existence, the set 4tor of all torsion elements is easily seen to be a submodule of 44 . Since is finitely generated, so is the torsion-free quotient module 4°4tor . Hence, according to Theorem 6.6, 4°4 tor is free. Consider now the canonical projection ¢ 4 ¦ 4°4tor onto 4 tor. Since 4°4tor is free, Theorem 5.6 implies that

4~4tor l- where -š4°4tor is free.

As to uniqueness, suppose that 4~;l. where ; is torsion and . is free. Then ; ‹4tor . But if #4 tor and #~!b where !; and . then # ~  and ! ~  for some nonzero Á   9 and so ²³ ~  , which implies that ~ , that is, #; . Thus, ; ~4tor .

For the free part, since 4~4l-~4l.tor tor , the submodules - and . are both complements of 4tor and hence are isomorphic. Hence, all free summands are isomorphic and therefore have the same rank.

Note that if ¸$ ÁÃÁ$ ¹ is a basis for 4free we can write

4 ~ º$ » l Ä l º$ » l 4tor where each cyclic submodule º$ » has zero annihilator. This is a partial decomposition of 4 into a direct sum of cyclic submodules. A Look Ahead So now we turn our attention to the decomposition of finitely generated torsion modules 4 over a principal ideal domain. We will develop two decompositions. One decomposition has the form 128 Advanced Linear Algebra

4 ~ º# » l Ä l º# » where the annihilators of the cyclic submodules form an ascending chain

ann²²º# »³ ‹ Ä ‹ ann º# »³ This decomposition is called an invariant factor decomposition of 4 .

Although we will not approach it in quite this manner, the second decomposition can be obtained by further decomposing each cyclic submodule in the invariant factor decomposition into cyclic submodules whose annihilators have the form º » where  is a prime. Submodules with annihilators of this form are called primary submodules and so the second decomposition is referred to as a primary cyclic decomposition .

Our plan will be to derive the primary cyclic decomposition first and then obtain the invariant factor decomposition from the primary cyclic decomposition by a piecing-together process, as described in Theorem 6.7.

As we will see, while neither of these decompositions is unique, the sequences of annihilators are unique, that is, these sequences are completely determined by the module 4 . The Primary Decomposition The first step in the primary cyclic decomposition is to decompose the torsion module into a direct sum of primary submodules.

Definition Let 9 be a prime in . A -primary (or just primary ) module is a module whose order is a power of  .

Note that a 4 -primary module with order  must have an element of order .

Theorem 6.9 ( The primary decomposition theorem ) Let 4 be a nonzero torsion module over a principal ideal domain 9 , with order

   ~ Ä where the 9 's are distinct nonassociate primes in . 1) Then 4 is the direct sum

4~4 lÄl4 where

 4 ~ ¸#  4 “  # ~ ¹ Modules Over a Principal Ideal Domain 129

 is a primary submodule with order  and annihilator

 ann²4 ³ ~ º » 2) This decomposition of 4 into primary submodules is unique up to order of the summands. That is, if

4~5 lÄl5

 where 5ÁÃÁ is primary of order  and are distinct nonassociate primes then ~ and after a suitable reindexing of the summands we

have 5~4 . Hence,   and   are associates and ~  (and so   ~ Ä is also a prime factorization of ).  Proof. For part 1), let us write  ~° . We claim that

4~  4~¸#“#4¹ 

 Since #4‹4 is annihilated by  , we have  . On the other hand, since   and Á9 are relatively prime, there exist for which

 b~  and so if %4 then

 %~%~² b ³%~ % 4

Hence 4‹  4.

Now, since gcd²ÁÃÁ³~ , there exist scalars   for which

bÄb~  and so for any %4  %~%~² bÄb  ³%  4 ~

 Moreover, since the order of 4 divides  and the  's are pairwise relatively prime, it follows that the sum of the submodules 4 is direct, that is,

4~ 4lÄl 4~4 lÄl4

 As to the annihilators, it is clear that º » ‹ann ² 4³ . For the reverse  inclusion, if ann ² 4³ then  ann ²4³ and so    “ , that is,  “  and so º» . Thus ann ² 4³~º» .

  As to uniqueness, we claim that ~ Ä is an order of 4 . This follows  from the fact that 5" contains an element of order  and so the sum #~" bÄb" has order  . Hence,  divides  . But divides  and so  and  are associates. 130 Advanced Linear Algebra

Unique factorization in 9~ now implies that and, after a suitable reindexing, that ~ and   and   are associates. Hence, 5  is primary of  order 55 . For convenience, we can write  as . Hence,

 5 ‹ ¸#  4 “  # ~ ¹ ~ 4 But if

5 lÄl5 ~4 lÄl4

and 5‹4 for all  , we must have 5~4   for all  . The Cyclic Decomposition of a Primary Module The next step in the decomposition process is to show that a primary module can be decomposed into a direct sum of cyclic submodules. While this decomposition is not unique (see the exercises), the set of annihilator ideals is unique, as we will see. To establish this uniqueness, we use the following result.

Lemma 6.10 Let 49 be a module over a principal ideal domain and let 9 be a prime. 1) If 4 ~ ¸¹ then 4 is a vector space over the field 9°º» with scalar multiplication defined by ² b º»³# ~ # for all #4 . 2) For any submodule :4 of the set :²³ ~ ¸#  : “ # ~ ¹ is also a submodule of 44~:l; and if then 4~:l;²³ ²³ ²³ Proof. For part 1), since  is prime, the ideal º» is maximal and so 9°º» is a field. We leave the proof that 49°º» is a vector space over to the reader. For part 2), it is straightforward to show that :4²³ is a submodule of . Since :²³ ‹ : and ; ²³ ‹ ; we see that : ²³ q ; ²³ ~ ¸¹ . Also, if #  4 ²³ then #~. But #~ b! for some : and !; and so ~#~ b! . Since   : and !  ; we deduce that  ~ ! ~  , whence #  :²³ l ; ²³ . Thus, 4‹:l;²³ ²³ ²³ and since the reverse inequality is manifest, the result is proved.

Theorem 6.11 () The cyclic decomposition theorem of a primary module Let 4 be a nonzero primary finitely generated torsion module over a principal ideal domain 9 , with order  . Modules Over a Principal Ideal Domain 131

1) Then 4 is the direct sum

4 ~ º# » l Ä l º# » (6.1)

 of cyclic submodules with annihilators ann²º# »³ ~ º » that can be arranged in ascending order

ann²²º# »³ ‹ Ä ‹ ann º# »³ or equivalently

~‚‚Ä‚  2) As to uniqueness, suppose that 4 is also the direct sum

4 ~ º" » l Ä l º" »

 of cyclic submodules with annihilators ann²º" »³ ~ º » arranged in ascending order

ann²²º" »³ ‹ Ä ‹ ann º" »³ or equivalently

‚‚Ä‚  Then the two chains of annihilators are identical, that is

ann²²º" »³ ~ ann º# »³

for all ~ . Thus, , and are associates and ~ for all .  Proof. Note first that if (6.1) holds then # ~ . Hence, the order of # divides  #4 and so must have the form for  . To prove (6.1), let be an element with order equal to the order of 4 , that is ann²#³~ ann ²4³~º» (We remarked earlier that such an element must exist.)

If we show that º#» is complemented, that is, 4 ~ º#» l : for some submodule : then since :9 is also a finitely generated primary torsion module over , we can repeat the process to get

4 ~º#»lº#»lÄlº#»l:

 where ann²# ³ ~ º » . We can continue this decomposition as long as : £ ¸¹. But the ascending sequence of submodules

º#» ‹ º#» l º# » ‹ Ä must terminate since 4 is noetherian and so eventually we must have : ~ ¸¹, giving (6.1). 132 Advanced Linear Algebra

Now, the direct sum 4 ~ º#» l ¸¹ clearly exists. Suppose that the direct sum

4 ~ º#» l : exists. We claim that if 4£4b then it is possible to find a submodule : that properly contains :4~º#»l:bb for which the direct sum exists.

Once this claim is established, then since 4 is finitely generated, this process must stop after a finite number of steps, leading to 4 ~ º#» l : for some submodule : , as desired.

If 4 £4 then there is a "4±4 . We claim that for some scalar  9 , we can take :~º:Á"c#»b  , which is a proper superset of :  since " ¤ º#» l :.

Thus, we must show that there is an  9 for which

%º#»qº:Á"c  #»¬%~ Now, there exist scalars  and for which % ~ # ~ b ²" c #³ What can we say about the scalars  and ?

First, although "¤4 , we do have

" ~ ² b ³# c  º#» l : So let us consider the ideal of all such scalars

? ~¸ 9“ "º#»l: ¹ Since  ?? and is principal, we have

 ? ~º»~¸ 9“ "º#»l: ¹ for some  . Moreover,  is not  since that would imply that ? ~9 and so "~"º#»l:, contrary to assumption.

 Since ? , we have "~#b! for some !: . Then ~"~²"³~#b! c  c c and since º#» q : ~ ¸¹ , it follows that c # ~  . Hence  ~   for some  9 and "~#b!~ #b! Since ? , we have ~  and so Modules Over a Principal Ideal Domain 133

" ~  " ~  # b ! Thus, %~#~ b²"c #³~ b  #b !c  # Now it appears that ~ would be a good choice, since then

%~#~ b !: and since º#» q : ~ ¸¹ we get % ~  . This completes the proof of (6.1).

For uniqueness, note first that 4 has orders  and and so and are associates and ~ . Next we show that ~ . According to part 2) of Lemma 6.10,

²³ ²³ ²³ 4 ~ º# » l Ä l º# » and

²³ ²³ ²³ 4 ~ º" » l Ä l º" » where all summands are nonzero. Since 4²³ ~ ¸¹ , it follows from part 1) of Lemma 6.10 that 49°º»²³ is a vector space over and so each of the preceding decompositions expresses 4 ²³ as a direct sum of one-dimensional vector subspaces. Hence, ~dim ²4²³ ³~ .

Finally, we show that the exponents  and are equal using induction on   . If  ~ then  ~ for all  and since   ~ , we also have  ~ for all  . Suppose the result is true whenever c and let ~ . Suppose that

² ÁÃÁ ³~² ÁÃÁ ÁÁÃÁ³Á € and

²! ÁÃÁ ³~² ÁÃÁ ÁÁÃÁ³Á ! € Then

4 ~ º# » l Ä l º# » and

4 ~ º"! » l Ä l º" »

c But º# » ~ º# » is a cyclic submodule of 4 with annihilator º » and so by the induction hypothesis

~! and  ~ ÁÃÁ ~ which concludes the proof of uniqueness. 134 Advanced Linear Algebra

The Primary Cyclic Decomposition Theorem Now we can combine the various decompositions.

Theorem 6.12 () The primary cyclic decomposition theorem Let 4 be a nonzero finitely generated module over a principal ideal domain 9 . 1) Then

4~4free l4 tor

where 44free is free and tor is a torsion module. If 4 tor has order

   ~ Ä

where the 9 's are distinct nonassociate primes in then 4tor can be uniquely decomposed (up to the order of the summands) into the direct sum

4~4 lÄl4 where

 4 ~ ¸#  4tor “  # ~ ¹

 is a primary submodule with annihilator º » . Finally, each primary

submodule 4 can be written as a direct sum of cyclic submodules, so that

4 ~4free l´º#’••••••••“••••••••”ÁÁÁÁ »lÄlº# »µlÄl´º# ’••••••••“••••••••” »lÄlº# »µ

44 

Á where ann²º#Á »³ ~ º » and the terms in each cyclic decomposition can be arranged so that, for each  ,

ann²²º#Á »³ ‹ Ä ‹ ann º# Á »³ or, equivalently,

~‚‚Ä‚ Á Á Á 2) As for uniqueness, suppose that

4 ~ 5free l º%M » l Ä l º% »

is a decomposition of 45 into the direct sum of a free module free and primary cyclic submodules º% » . Then a) rk²5free ³ ~ rk ²4 free ³ b) The number of summands is the same in both decompositions, that is M~ bÄb c) The summands in this decomposition can be reordered to get

4 ~5free l´º"ÁÁÁÁ »lÄlº" »µlÄl´º" »lÄlº"n »µ where the primary submodules are the same Modules Over a Principal Ideal Domain 135

º"Á » l Ä l º" Á » ~ º# Á » l Ä l º# Á » for ~ÁÃÁ and the annihilator chains are the same, that is,

ann²º"Á »³ ~ ann ²º# Á »³ for all Á  . In summary, the free rank, primary submodules and annihilator chain are uniquely determined by the module 4 . Proof. We need only prove the uniqueness. We have seen that 54free and free are isomorphic and thus have the same rank. Let us look at the torsion part.

Since the order of each primary cyclic submodule º% » must divide the order of 4, this order is a power of one of the primes  ÁÃÁ . Let us group the summands by like primes  to get

4 ~5free l´º"’••••••••“••••••••”ÁÁÁÁ »lÄlº" »µlÄl´º" ’••••••••“••••••••” »lÄlº" »µ  5²primary of order ³ 5² primary of order ³ 

 Then each group 54 is a primary submodule of with order  . The uniqueness of primary decompositions of 45~4tor implies that  . Then the uniqueness of cyclic decompositions implies that the annihilator chains for the decompositions of 54 and are the same.

We have seen that in a primary cyclic decomposition

4 ~4free l´º#ÁÁÁÁ »lÄlº# »µlÄl´º# »lÄlº# »µ the chain of annihilators

Á ann²º#Á »³ ~ º »

Á is unique except for order. The sequence  of generators is uniquely determined up to order and multiplication by units. This sequence is called the sequence of elementary divisors of 4 . Note that the elementary divisors are not quite as unique as the annihilators: the multiset of annihilators is unique but Á Á the multiset of generators is not since if " is a generator then so is for any unit "9 in . The Invariant Factor Decomposition According to Theorem 6.7, if :; and are cyclic submodules with relatively prime orders, then :l; is a cyclic submodule whose order is the product of the orders of :; and . Accordingly, in the primary cyclic decomposition of 4

4 ~4free l´º#’••••••••“••••••••”ÁÁÁÁ »lÄlº# »µlÄl´º# ’••••••••“••••••••” »lÄlº# »µ

44  136 Advanced Linear Algebra

Á with elementary divisors  satisfying

~‚‚Ä‚ Á Á Á (6.4) we can feel free to regroup and combine cyclic summands with relatively prime orders. One judicious way to do this is to take the leftmost (highest order) cyclic submodules from each group to get

+Á ~ º# » l Ä l º# Á » and repeat the process

+Á ~ º# » l Ä l º# Á » +Á ~ º# » l Ä l º# Á » Å

Of course, some summands may be missing here since the primary modules 4 do not necessarily have the same number of summands. In any case, the result of this regrouping and combining is a decomposition of the form

4~4free l+lÄl+ which is called an invariant factor decomposition of 4 .

For example, suppose that

4~4l´º#»lº#»µl´º#»µl´º#»lº#»lº#»µfree Á Á Á Á Á Á Then the resulting regrouping and combining gives

4~4l´º#»lº#»lº#»µl´º#»lº#»µl´º#»µfree ’•••••••••“•••••••••”Á Á Á ’••••“••••” Á Á  Á

+++

As to the orders of the summands, referring to (6.4), if + has order then since the highest powers of each prime  are taken for , the second–highest for  and so on, we conclude that

“c “Ä““  (6.5) or equivalently,

ann²+ ³ ‹ ann ²+ ³ ‹ Ä

The numbers  are called invariant factors of the decomposition.

For instance, in the example above suppose that the elementary divisors are

  ÁÁÁÁÁ  Then the invariant factors are Modules Over a Principal Ideal Domain 137

 ~  ~  ~

Á The process described above that passes from a sequence  of elementary divisors in order (6.4) to a sequence of invariant factors in order (6.5) is reversible. The inverse process takes a sequence ÁÃÁ satisfying (6.5), factors each  into a product of distinct nonassociate prime powers with the primes in the same order and then “peels off” like prime powers from the left. (The reader may wish to try it on the example above.)

This fact, together with Theorem 6.7, implies that primary cyclic decompositions and invariant factor decompositions are essentially equivalent. In particular, given a primary cyclic decomposition of 4 we can produce an invariant factor decomposition of 4 whose invariant factors are products of the elementary divisors and for which each elementary divisor appears in exactly one invariant factor. Conversely, given an invariant factor decomposition of 4 we can obtain a primary cyclic decomposition of 4 whose elementary divisors are precisely the multiset of prime power factors of the invariant factors.

It follows that since the elementary divisors of 4 are unique up to multiplication by units, the invariant factors of 4 are also unique up to multiplication by units.

Theorem 6.13 ( The invariant factor decomposition theorem ) Let 4 be a finitely generated module over a principal ideal domain 9 . Then

4~4free l+lÄl+ where 44free is a free submodule and D is a cyclic submodule of , with order , where

“c “Ä““  This decomposition is called an invariant factor decomposition of 4 and the scalars 4 , are called the invariant factors of . The invariant factors are uniquely determined, up to multiplication by a unit, by the module 4 . Also, the rank of 44free is uniquely determined by .

The annihilators of an invariant factor decomposition are called the invariant ideals of 4 . The chain of invariant ideals is unique, as is the chain of annihilators in the primary cyclic decomposition. Note that 4 is an order of , that is

ann²4³ ~ º » Note also that the product 138 Advanced Linear Algebra

 ~Ä of the invariant factors of 4 has some nice properties. For example,  is the product of all the elementary divisors of 4 . We will see in a later chapter that in the context of a linear operator  on a vector space, is the characteristic polynomial of  . Exercises 1. Show that any free module over an integral domain is torsion-free. 2. Let 999 be a principal ideal domain and bb the field of quotients. Then is an 99 -module. Prove that any nonzero finitely generated submodule of b is a free module of rank  . 3. Let 94 be a principal ideal domain. Let be a finitely generated torson-free 95459-module. Suppose that is a submodule of for which is a free - module of rank 4°5 and is a torsion module. Prove that 4 is a free 9 - module of rank  . Hint : Use the results of the previous exercise. 4. Show that the primary cyclic decomposition of a torsion module over a principal ideal domain is not unique (even though the elementary divisors are). 5. Show that if 499 is a finitely generated -module where is a principal ideal domain, then the free summand in the decomposition 4~-l4tor need not be unique. 6. If º#» is a cyclic 9 -module or order  show that the map  ¢ 9 ¦ º#» defined by ² ³ ~ # is a surjective 9 -homomorphism with kernel º» and so 9 º#» š º» 7. If 99 is a ring with the property that all submodules of cyclic -modules are cyclic, show that 9 is a principal ideal domain. 8. Suppose that -- is a finite field and let i be the set of all nonzero elements of - . a) Show that - i is an abelian group under multiplication. b) Show that ²%³  -´%µ is a nonconstant polynomial over - and if - is a root of ²%³ then %c is a factor of 7²%³ . c) Prove that a nonconstant polynomial ²%³  -´%µ of degree  can have at most - distinct roots in . d) Use the invariant factor or primary cyclic decomposition of a finite { - module to prove that - i is cyclic. 9. Let 94~º#»9 be a principal ideal domain. Let be a cyclic -module with order  . We have seen that any submodule of 4 is cyclic. Prove that for each 9 such that “ there is a unique submodule of 4 of order  . 10. Suppose that 4 is a free module of finite rank over a principal ideal domain 95 . Let be a submodule of 44°5 . If is torsion, prove that rk²5³ ~ rk ²4³. Modules Over a Principal Ideal Domain 139

11. Let -´%µ be the ring of polynomials over a field - and let -Z ´%µ be the ring of all polynomials in -´%µ that have coefficient of % equal to  . Then -´%µ is an -Z ´%µ -module. Show that - ´%µ is finitely generated and torsion-free but not free. Is -´%µZ a principal ideal domain? 12. Show that the rational numbers r{ form a torsion-free -module that is not free. More on Complemented Submodules 13. Let 949 be a principal ideal domain and let be a free -module. a) Prove that a submodule 54 of is complemented if and only if 4°5 is free. b) If 45 is also finitely generated, prove that is complemented if and only if 4°5 is torsion-free. 14. Let 49 be a free module of finite rank over a principal ideal domain . a) Prove that if 54 is a complemented submodule of then rk²5³ ~ rk ²4³ if and only if 5 ~ 4 . b) Show that this need not hold if 5 is not complemented. c) Prove that 55 is complemented if and only if any basis for can be extended to a basis for 4 . 15. Let 45 and be free modules of finite rank over a principal ideal domain 9¢4¦59. Let  be an -homomorphism. a) Prove that ker²³ is complemented. b) What about im²³ ? c) Prove that 4 rk²4³ ~ rk ²ker ² ³³ b rk ² im ² ³³ ~ rk ² ker ²  ³³ b rk67 ker²³ d) If  is surjective then is an isomorphism if and only if rk²4³ ~ rk ²5³. e) If 4°3 is free then 4 rk67~²4³c²3³ rk rk 3

16. A submodule 54 of a module is said to be pure in 4 if whenever #¤4±5 then #¤5 for all nonzero 9 . a) Show that 5#5#~ $ 9 is pure if and only if and for implies $5. b) Show that 54°5 is pure if and only if is torsion-free. c) If 94 is a principal ideal domain and is finitely generated, prove that 54°5 is pure if and only if is free. d) If 35 and are pure submodules of 4 then so are 3q53r5 and . What about 3b5 ? e) If 54 is pure in then show that 3q53 is pure in for any submodule 34 of . 140 Advanced Linear Algebra

17. Let 49 be a free module of finite rank over a principal ideal domain . Let 35 and be submodules of 4 with 3 complemented in 4 . Prove that rk²3 b 5³ b rk ²3 q 5³ ~ rk ²3³ b rk ²5³ Chapter 7 The Structure of a Linear Operator

In this chapter, we study the structure of a linear operator on a finite- dimensional vector space, using the powerful module decomposition theorems of the previous chapter. Unless otherwise noted, all vector spaces will be assumed to be finite-dimensional. A Brief Review We have seen that any linear operator on a finite-dimensional vector space can be represented by matrix multiplication. Let us restate Theorem 2.14 for linear operators.

Theorem 7.1 Let B²=³ and let 8 ~²ÁÃÁ³ be an ordered basis for = . Then  can be represented by matrix multiplication

´²#³µ~´µ´#µ888 where

´µ 8 ~²´²³µ“Ä“´²³µ³88

Since the matrix ´µ88 depends on the ordered basis , it is natural to wonder how to choose this basis in order to make the matrix ´µ 8 as simple as possible. That is the subject of this chapter.

Let us also restate the relationship between the matrices of  with respect to different ordered bases.

Theorem 7.2 Let B²=³ and let 8 and 8Z be ordered bases for = . Then the matrix of 8 with respect to Z can be expressed in terms of the matrix of  with respect to 8 as follows

c ´µ888888ZZ ~4ÁÁ ´µ²4 Z ³ 142 Advanced Linear Algebra where

488, ZZ ~²´ µ 8 ÁÃÁ´ µ 8 Z ³³ Finally, we recall the definition of similarity and its relevance to the current discussion.

Definition Two matrices () and are similar if there exists an invertible matrix 7 for which )~7(7c The equivalence classes associated with similarity are called similarity classes.

Theorem 7.3 Let =d be a vector space of dimension . Then two matrices () and are similar if and only if they represent the same linear operator B²=³, but possibly with respect to different ordered bases. In this case, ( and )²=³ represent exactly the same set of linear operators in B .

According to Theorem 7.3, the matrices that represent a given linear operator B²=³ are precisely the matrices that lie in one particular similarity class. Hence, in order to uniquely represent all linear operators on = we would like to find a simple representative of each similarity class, that is, a set of simple canonical forms for similarity.

The simplest type of useful matrices is the diagonal matrices. However, not all linear operators can be represented by diagonal matrices, that is, the set of diagonal matrices does not form a set of canonical forms for similarity.

This gives rise to two different directions for further study. First, we can search for a characterization of those linear operators that can be represented by diagonal matrices. Such operators are called diagonalizable . Second, we can search for a different type of “simple” matrix that does provide a set of canonical forms for similarity. We will pursue both of these directions at the same time. The Module Associated with a Linear Operator Throughout this chapter, we fix a nonzero linear operator B²=³ and think of =--´%µ not only as a vector space over a field but also as a module over , with scalar multiplication defined by ²%³# ~ ² ³²#³

We call =-´%µ the -module defined by  and write = to indicate the dependence on  (when necessary). Thus, == and are modules over the same ring -´%µ , although the scalar multiplication is different if  £ . The Structure of a Linear Operator 143

Our plan is to interpret the concepts of the previous chapter for the module/vector space == . First, since is a finite-dimensional vector space, so is B²= ³. It follows that = is a torsion module. To see this, note that since dim²²=³³~B , the b vectors

 ÁÁ ÁÃÁ  are linearly dependent in B²= ³ , which implies that ² ³ ~  for some polynomial ²%³  - ´%µ . Hence, ²%³= ~ ¸¹ , which shows that ann²= ³ £ ¸¹.

Also, since = is finitely generated as a vector space, it is, a fortiori, finitely generated as an -´%µ -module defined by  . Thus, = is a finitely generated torsion module over a principal ideal domain -´%µ and so we may apply the decomposition theorems of the previous chapter.

Next we take a look at the connection between module isomorphisms and vector space isomorphisms. This also describes the connection with similarity.

Theorem 7.4 Let  and be linear operators on === . Then  and are isomorphic as -´%µ -modules if and only if  and are similar as linear operators. In particular, a function ¢= ¦= is a module isomorphism if and only if it is a vector space automorphism of = satisfying ~ c

Proof. Suppose that ¢= ¦= is a module isomorphism. Then for #= ²%#³ ~ % ²#³ which is equivalent to ² ²#³³ ~  ² ²#³³ and since  is bijective this is equivalent to ²c ³# ~  ²#³

c that is, ~== . Since a module isomorphism from  to is a vector space isomorphism as well, the result follows.

For the converse, suppose that ~ c for a vector space automorphism  on =~ . This condition is equivalent to   and so ²% #³ ~ ² ²#³³ ~ ² ²#³³ ~ %  ²#³ and by the -²%³-´%µ -linearity of  , for any polynomial we have 144 Advanced Linear Algebra

²² ³#³ ~ ²  ³ ²#³ which shows that  is a module isomorphism from == to . Submodules and Invariant Subspaces

There is a simple connection between the submodules of the -´%µ -module = and the subspaces of the vector space =:= . Recall that a subspace of is  - invariant if ²:³ ‹ : .

Theorem 7.5 A subset := of is a submodule of the -´%µ -module = if and only if it is a  -invariant subspace of the vector space = . Orders and the Minimal Polynomial We have seen that since = is finite-dimensional, the annihilator ann²= ³ ~ ¸²%³  - ´%µ “ ²%³= ~ ¸¹¹ of = is a nonzero ideal of - ´%µ and since - ´%µ is a principal ideal domain, this ideal is principal, say ann²= ³ ~ º²%³» Since all orders of =-´%µ are associates and since the units of are precisely the nonzero elements of -= , there is a unique monic order of .

Definition Let =-´%µ be an -module defined by  . The unique monic order of =²=³ , that is, the unique monic polynomial that generates ann  is called the minimal polynomial for  and is denoted by ²%³ or min ²³ . Thus,

ann²= ³ ~ º ²%³» and

²%³= ~¸¹¯²³~¯ ²%³“²%³

In treatments of linear algebra that do not emphasize the role of the module = the minimal polynomial of a linear operator  is simply defined as the unique monic polynomial ²%³ of smallest degree for which ²³ ~  . It is not hard to see that this definition is equivalent to the previous definition.

The concept of minimal polynomial is also defined for matrices. If ( is a square matrix over -²%³( the minimal polynomial A of is defined as the unique monic polynomial ²%³  - ´%µ of smallest degree for which ²(³ ~  . We leave it to the reader to verify that this concept is well-defined and that the following holds. The Structure of a Linear Operator 145

Theorem 7.6 1) If ( and ) are similar matrices then ( ²%³ ~ ) ²%³ . Thus, the minimal polynomial is an invariant under similarity. 2) The minimal polynomial of B²=³ is the same as the minimal polynomial of any matrix that represents  . Cyclic Submodules and Cyclic Subspaces

For an -´%µ -module = , consider the cyclic submodule º#» ~ ¸²%³# “ ²%³  - ´%µ¹ ~ ¸² ³²#³ “ ²%³  - ´%µ¹ We would like to characterize these simple but important submodules in terms of vector space notions.

As we have seen, º#» is a  -invariant subspace of = , but more can be said. Let ²%³ be the minimal polynomial of  Oº#» and suppose that deg ²²%³³ ~  . Any element of º#» has the form ²%³# . Dividing ²%³ by ²%³ gives ²%³ ~ ²%³²%³ b ²%³ where deg ²%³  deg ²%³ . Since ²%³# ~  , we have ²%³# ~ ²%³²%³# b ²%³# ~ ²%³# Thus, º#» ~ ¸ ²%³# “deg ²%³  ¹ Put another way, the ordered set 8~ ²#Á %#Á à Á %c #³ ~ ²#Á ²#³Á à Á c ²#³³ spans º#» as a vector space over - . But it is also the case that 8 is linearly independent over - , for if

c  #b %#bÄb c % #~ then ²%³# ~  where

c ²%³ ~  b % b Ä b c % has degree less than  . Hence, ²%³~ , that is,  ~ for all ~ÁÃÁc . Thus, 8 is an ordered basis for º#» .

 To determine the matrix of 8O$~²#³º#» with respect to , write  . Then

b ²$³~b ² ²#³³~ ²#³~$ for ~ÁÃÁc and so 8 simply “shifts” each basis vector in , except the last one, to the next basis vector in 8 . For the last vector $c , if 146 Advanced Linear Algebra

c  ²%³~ b %bÄb c % b% then since ² ³ ~  we have

c  ~² ³~ b bÄb c  b and so

c  ²$c ³ ~ ² ²#³³ ~ ²#³ c ~c² b bÄb c ³²#³ c ~c#c ²#³cÄc c ²#³ ~c$ c$  cÄc cc $

Hence, the matrix of 8Oº#» with respect to is

vyÄ c x{Ä c x{ *´²%³µ ~ x{Æ Å x{ ÅÅÆcc2 wzÄcc

This is known as the companion matrix for the polynomial ²%³ . Note that companion matrices are defined only for monic polynomials.

Definition Let B²=³ . A subspace : of = is  -cyclic if there exists a vector #: for which the set ¸#Á ²#³Á à Ác ²#³¹ is a basis for : .

Theorem 7.7 Let = be an - ´%µ -module defined by B  ²= ³ . 1) ()Characterization of cyclic submodules A subset :‹= is a cyclic submodule of = if and only if it is a  -cyclic subspace of the vector space = . 2) Suppose that º#» is a cyclic submodule of = . If the monic order of º#» is

c  ²%³~ b %bÄb c % b% then 8~ ²#Á %#Á à Á %c #³ ~ ²#Á ²#³Á à Á c ²#³³

is an ordered basis for º#» and the matrix ´ Oº#» µ8 is the companion matrix *´²%³µ of ²%³ . Hence, dim²º#»³ ~ deg ²²%³³ The Structure of a Linear Operator 147

Summary The following table summarizes the connection between the module concepts and the vector space concepts that we have discussed.

-´%µ-Module = -- Vector Space = Scalar multiplication: ²%³# Action of ² ³ : ² ³²#³ Submodule of ==  -Invariant subspace of Annihilator: Annihilator: ann²= ³ ~ ¸²%³ “ ²%³= ~ ¸¹¹ ann ²= ³ ~ ¸²%³ “ ² ³²= ³ ~ ¸¹¹ Monic order ²%³ of = : Minimal polynomial of  : ann²= ³ ~ º²%³» ²%³ has smallest deg with ² ³ ~  Cyclic submodule of == : -cyclic subspace of : º#» ~ ¸²%³# “deg ²%³  deg ²%³¹ º#» ~ span ¸#Á ²#³Á à Á c²#³¹

The Decomposition of = We are now ready to translate the cyclic decomposition theorem into the language of = . First, we define the elementary divisors and invariant factors of an operator  to be the elementary divisors and invariant factors, respectively, of the module = . Also, the elementary divisors and invariant factors of a matrix ( are defined to be the elementary divisors and invariant factors, respectively, of the operator ( .

We will soon see that the multiset of elementary divisors and the multiset of invariant factors are complete invariants under similarity and so the multiset of elementary divisors (or invariant factors) of an operator  is the same as the multiset of elementary divisors (or invariant factors) of any matrix that represents .

Theorem 7.8 (The cyclic decomposition theorem for =³ Let  be a linear operator on a finite-dimensional vector space = . Let

   ²%³ ~  ²%³Ä ²%³ be the minimal polynomial of  , where the monic polynomials ²%³ are distinct and irreducible. 1) ()Primary decomposition The -´%µ -module = is the direct sum

=~=lÄl=  where

 = ~ ¸#  = “  ² ³²#³ ~ ¹

 is a primary submodule of =²%³ of order  . In vector space terms, = is a 148 Advanced Linear Algebra

-invariant subspace of =O and the minimal polynomial of is = min²O ³~²%³ = 

2) ()Cyclic decomposition Each primary summand = can be decomposed into a direct sum

=ÁÁ ~ º# » l Ä l º# »

Á of cyclic submodules º#Á » of order  ²%³ with

~‚‚Ä‚ Á Á Á

In vector space terms, º#Á » is a  -cyclic subspace of =  and the minimal

polynomial of Oº#Á » is

Á min²O º#Á » ³~ ²%³ 3) ()The complete decomposition This yields the decomposition of = into a direct sum of  -cyclic subspaces

= ~ ²º#ÁÁÁÁ » l Ä l º# »³ l Ä l ²º# » l Ä l º# »³ 4) ()Elementary divisors and dimensions The multiset of elementary divisors Á Á ¸ ²%³¹ of  is uniquely determined by . If deg ² ²%³³ ~ Á then the -cyclic subspace º#Á » has basis

cÁ 8Á~ ²# Á Á ²# Á ³Á à Á ²# Á ³³

Á and so dim²º#Á »³ ~ deg ² ³ . Hence,

  Á dim²= ³ ~ deg ² ³ ~

The Rational Canonical Form The cyclic decomposition theorem can be used to determine a set of canonical forms for similarity. Recall that if =~:l; and if both : and ; are invariant under  , the pair ²:Á ; ³ is said to reduce . Put another way, ²:Á ; ³ reduces if the restrictions OO:; and are linear operators on :; and , respectively.

Recall also that we write ~l if there exist subspaces : and ; of = for which ²:Á ; ³ reduces  and

~O:; and  ~O If ~l then any matrix representations of  and  can be used to construct a matrix representation of  . The Structure of a Linear Operator 149

Theorem 7.9 Suppose that ~l²=³  B has a reducing pair ²:Á;³ . Let

8 ~² ÁÃÁ Á! ÁÃÁ ³ be an ordered basis for = , where 9 ~² ÁÃÁ ³ is an ordered basis for : and :~²ÁÃÁ³! is an ordered basis for ; . Then the matrix ´ µ8 has the block diagonal form

´µ 9  ´µ 8 ~>? ´µ : block Of course, this theorem may be extended to apply to multiple direct summands and this is especially relevant to our situation, since according to Theorem 7.8

~Oº#Á » lÄlO  º# Á »

In particular, if 8Á is an ordered basis for the cyclic submodule º# Á » and if

88~²Á ÁÃÁ 8 Á ³ denotes the ordered basis for = obtained from these ordered bases (as we did in Theorem 7.9) then ´µ vyÁ 8Á ´µ 8 ~ Æ wz ´µÁÁ8  block

where Á ~Oº#Á ».

According to Theorem 7.8, the cyclic submodule º#Á » has ordered basis

cÁ 8Á~ ²# Á Á ²# Á ³Á à Á ²# Á ³³

Á where Á ~deg ² ²%³³ . Hence, we arrive at the matrix representation of  described in the following theorem.

Theorem 7.10 ( The rational canonical form ) Let dim²= ³  B and suppose that B²=³ has minimal polynomial

   ²%³ ~  ²%³Ä ²%³ where the monic polynomials ²%³ are distinct and irreducible. Then = has an ordered basis 8 under which 150 Advanced Linear Algebra

Á vy*´ ²%³µ x{Æ x{ x{Á x{*´ ²%³µ x{ ´µ 8 ~x{Æ x{Á x{*´ ²%³µ x{Æ wzÁ *´ ²%³µ block

Á where the polynomials ²%³ are the elementary divisors of  . We also write this in the form

Á Á Á Á diag45*´ ²%³µÁÃÁ*´ ²%³µÁÃÁ*´ ²%³µÁÃÁ*´ ²%³µ

This block diagonal matrix is said to be in rational canonical form and is called a rational canonical form of  . Except for the order of the blocks in the matrix, the rational canonical form is a canonical form for similarity, that is, up to order of the blocks, each similarity class contains exactly one matrix in rational canonical form. Put another way, the multiset of elementary divisors is a complete invariant for similarity. Proof. It remains to prove that if two matrices in rational canonical form are similar, then they must be equal, up to order of the blocks. Let ( be the matrix ´µ88 above. The ordered basis clearly gives a decomposition of = into a direct sum of primary cyclic submodules for which the elementary divisors are Á the polynomials ²%³ .

Now suppose that ) is another matrix in rational canonical form

Á Á Á Á ) ~diag45 *´ ²%³µÁÃÁ*´ ²%³µÁÃÁ*´ ²%³µÁÃÁ*´ ²%³µ

If )( is similar to then we get another primary cyclic decomposition of  for Á which the elementary divisors are the polynomials ²%³ . It follows that the two sets of elementary divisors are the same and so () and are the same up to the order of their blocks.

Corollary 7.11 1) Any square matrix ( is similar to a unique (except for the order of the blocks on the diagonal) matrix that is in rational canonical form. Any such matrix is called a rational canonical form of ( . 2) Two square matrices over the same field - are similar if and only if they have the same multiset of elementary divisors.

Here are some examples of rational canonical forms. The Structure of a Linear Operator 151

Example 7.1 Let s be a linear operator on the vector space 7 and suppose that  has minimal polynomial

  ²%³ ~ ²% c ³²% b ³ Noting that %c and ²% b³ are elementary divisors and that the sum of the degrees of all elementary divisors must equal , we have two possibilities

1) %cÁ²%b³Á%b 1 2) %cÁ%cÁ%cÁ²%  b1 ³

These correspond to the following rational canonical forms

vyc  x{c x{ x{   x{ 1) x{c x{ x{   x{  c wz  

vyc   x{c  x{ x{ c x{ 2) x{  c x{ x{    x{   c wz   

Exercises 1. We have seen that any B²=³ can be used to make = into an -´%µ - module. Does every module = over - ´%µ come from some B  ²= ³ ? Explain. 2. Show that if () and are block diagonal matrices with the same blocks, but in possibly different order, then () and are similar. 3. Let (-2 be a square matrix over a field . Let be the smallest subfield of -( containing the entries of . Prove that any rational canonical form for ( has coefficients in the field 2 . This means that the coefficients of any rational canonical form for ( are “rational” expressions in the coefficients of ( , hence the origin of the term “rational canonical form.” Given an operator B²=³ what is the smallest field 2 for which any rational canonical form must have entries in 2 ? 4. Let 2 be a subfield of - . Prove that two matrices (Á ) C ²2³ are similar over 2-(~7)7 if and only if they are similar over , that is c 152 Advanced Linear Algebra

for some 7 CC ²2³ if and only if ( ~ 8)8c for some 8  ²- ³ . Hint: Use the results of the previous exercise. 5. Prove that the minimal polynomial of B²=³ is the least common multiple of its elementary divisors. 6. Let r be the field of rational numbers. Consider the linear operator  Br ² ³ defined by  ²³~Á  ²³~c  . a) Find the minimal polynomial for  and show that the rational canonical form for  is c 9~ >?

What are the elementary divisors of  ? b) Now consider the map Bd² ³ defined by the same rules as  , namely, ² ³ ~  Á ² ³ ~ c  . Find the minimal polynomial for  and the rational canonical form for  . What are the elementary divisors of ? c) The invariant factors of  are defined using the elementary divisors of  in the same way as we did at the end of Chapter 6, for a module 4 . Describe the invariant factors for the operators in parts a) and b). 7. Find all rational canonical forms ² up to the order of the blocks on the diagonal) for a linear operator on s6 having minimal polynomial ²% c11. ³ ²% b ³ 8. How many possible rational canonical forms (up to order of blocks) are there for linear operators on s6 with minimal polynomial ²% c 1 ³²% b 1 ³ ? 9. Prove that if * is the companion matrix of ²%³ then ²*³ ~  and * has minimal polynomial ²%³ . 10. Let  be a linear operator on -  with minimal polynomial   ²%³ ~ ²% b1 ³²% c 2 ³ . Find the rational canonical form for  if -~rs, -~ or -~ d. 11. Suppose that the minimal polynomial of B²=³ is irreducible. What can you say about the dimension of = ? Chapter 8 Eigenvalues and Eigenvectors

Unless otherwise noted, we will assume throughout this chapter that all vector spaces are finite-dimensional. The Characteristic Polynomial of an Operator It is clear from our discussion of the rational canonical form that elementary divisors and their companion matrices are important. Let *´ ²%³µ be the companion matrix of a monic polynomial

c  ²%ÂÁÃÁ³~b%bÄb%b% c c

By way of motivation, note that when  ~  , we can write the polynomial  ²%³ as follows

 ²%ÂÁ³~ b%b% ~%²%b³b   which looks suspiciously like a determinant, namely,

% ²%ÂÁ³~det>? c % b 

c ~%0cdet67>? c ~det ²%0 c *´ ²%³µ³

So, let us define

(²%Â c Á Ã Á  ³ ~ %0 c *´  ²%³µ vy%Ä  x{c % Ä   x{ ~ x{cÆ Å x{ ÅÅÆ%c wzÄc%bc 154 Advanced Linear Algebra where % is an independent variable. The determinant of this matrix is a polynomial in % whose degree equals the number of parameters c ÁÃÁ . We have just seen that

det²(²%Â  Á  ³³ ~   ²%Â   Á  ³ and this is also true for ~ . As a basis for induction, suppose that

det²(²%Âcc ÁÃÁ ³³~ ²%Â ÁÃÁ ³ Then, expanding along the first row gives

det²(²%Á ÁÃÁ ³³ vyc % Ä  x{cÆ ~%det ²(²%Á ÁÃÁ ³³b²c³  detx{   ÅÅÆ% wz Äcd ~%det ²(²%Á ÁÃÁ ³³b ~% ²%ÂÁÃÁ  ³b   b ~%b% bÄb % b% b ~b ²%Â  ÁÃÁ  ³ We have proved the following.

Lemma 8.1 If *´²%³µ is the companion matrix of the polynomial ²%³ , then det²%0 c *´²%³µ³ ~ ²%³ Since the determinant of a block diagonal matrix is the product of the determinants of the blocks on the diagonal, if 9 is a matrix in rational canonical form then

Á det²%0 c 9³ ~  ²%³ Á is the product of the elementary divisors of  . Moreover, if 49 is similar to , say 4~797c then det²%0 c 4³ ~ det ²%0 c 7 97c ³ ~det ´7 ²%0 c 9³7c µ ~det ²7 ³ det ²%0 c 9³ det ²7c ³ ~²%0c9³det and so *²%³4 is the product of the elementary divisors of 4 . The polynomial *4 ²%³ ~det ²%0 c 4³ is known as the characteristic polynomial of 4 . Since the characteristic polynomial is an invariant under similarity, we have the following. Eigenvalues and Eigenvectors 155

Theorem 8.2 Let  be a linear operator on a finite-dimensional vector space = . The characteristic polynomial *²%³ of  is defined to be the product of the elementary divisors of  . If 4 is any matrix that represents , then

* ²%³ ~ *4 ²%³ ~det ²%0 c 4³ Note that the characteristic polynomial is not a complete invariant under similarity. For example, the matrices  (~ and )~ >? >? have the same characteristic polynomial but are not similar. (The reader might wish to provide an example of two nonsimilar matrices with the same characteristic and minimal polynomials.)

We shall have several occasions to use the fact that the minimal polynomial

   ²%³ ~  ²%³Ä ²%³ and characteristic polynomial

Á * ²%³ ~  ²%³ Á of a linear operator B²=³ have the same set of prime factors. This implies, for example, that these two polynomials have the same set of roots (not counting multiplicity). Eigenvalues and Eigenvectors Let B²=³ and let 4 be a matrix that represents  . A scalar  - is a root of the characteristic polynomial *²%³ of  if and only if det²0c4³~ (8.1) that is, if and only if the matrix 0c4 is singular. In particular, if dim²= ³ ~  then (8.1) holds if and only if there exists a nonzero vector %  -  for which ²0c4³%~ or, equivalently

4 ²%³ ~ %

If 4~´µ 88 and ´#µ~% , then this is equivalent to

´ µ88 ´#µ ~ ´#µ 8 or, in operator language 156 Advanced Linear Algebra

²#³ ~ # This prompts the following definition.

Definition Let =- be a vector space over . 1) A scalar  - is an eigenvalue (or characteristic value ) of an operator B²=³ if there exists a nonzero vector #= for which ²#³ ~ #

In this case, # is an eigenvector (or characteristic vector ) of  associated with . 2) A scalar  - is an eigenvalue for a matrix ( if there exists a nonzero column vector % for which (% ~ % In this case, %( is an eigenvector (or characteristic vector ) for associated with  . 3) The set of all eigenvectors associated with a given eigenvalue  , together with the zero vector, forms a subspace of = , called the eigenspace of  , denoted by ; . This applies to both linear operators and matrices. 4) The set of all eigenvalues of an operator or matrix is called the spectrum of the operator or matrix.

The following theorem summarizes some key facts.

Theorem 8.3 Let B ²= ³ have minimal polynomial m ²%³ and characteristic polynomial *²%³ . 1) The polynomials  ²%³ and * ²%³ have the same prime factors and hence the same set of roots, called the spectrum of  . 2) ()The Cayley–Hamilton theorem The minimal polynomial divides the characteristic polynomial. Another way to say this is that an operator  satisfies its own characteristic polynomial, that is,

*²³~  3) The eigenvalues of a matrix are invariants under similarity. 4) If ; is an eigenvalue of a matrix ( then the eigenspace  is the solution space to the homogeneous system of equations ² 0 c (³²%³ ~  One way to compute the eigenvalues of a linear operator  is to first represent by a matrix ( and then solve the characteristic equation

*²%³~( Unfortunately, it is quite likely that this equation cannot be solved when Eigenvalues and Eigenvectors 157 dim²= ³ ‚ . As a result, the art of approximating the eigenvalues of a matrix is a very important area of applied linear algebra.

The following theorem describes the relationship between eigenspaces and eigenvectors of distinct eigenvalues.

Theorem 8.4 Suppose that ÁÃÁ are distinct eigenvalues of a linear operator B²=³. 1) The eigenspaces meet only in the  vector, that is

;;q ~ ¸¹ ³ Eigenvectors associated with distinct eigenvalues are linearly independent.

That is, if # ; then the vectors ¸# ÁÃÁ# ¹ are linearly independent. Proof. We leave the proof of part 1) to the reader. For part 2), assume that the eigenvectors # ; are linearly dependent. By renumbering if necessary, we may also assume that among all nontrivial linear combinations of these vectors that equal  , the equation

 # bÄb #  ~ (8.2) has the fewest number of terms. Applying  gives

 # bÄb  # ~ (8.3)

Now we multiply (8.2) by  and subtract from (8.3), to get

²c³#bÄb ²c³#~       But this equation has fewer terms than (8.2) and so all of the coefficients must equal  ~~ÁÃÁ . Since the  's are distinct we deduce that for and so ~ as well. This contradiction implies that the # 's are linearly independent. Geometric and Algebraic Multiplicities Eigenvalues have two forms of multiplicity, as described in the next definition.

Definition Let B be an eigenvalue of a linear operator ²=³ . 1) The algebraic multiplicity of  is the multiplicity of as a root of the characteristic polynomial *²%³ . 2) The geometric multiplicity of ; is the dimension of the eigenspace  .

Theorem 8.5 The geometric multiplicity of an eigenvalue B of ²=³ is less than or equal to its algebraic multiplicity. Proof. Suppose that ; is an eigenvalue of with eigenspace  . Given any basis 8;~¸#ÁÃÁ#¹  of  we can extend it to a basis 8; for = . Since is invariant under  , the matrix of with respect to 8 has the block form 158 Advanced Linear Algebra

0( ´µ 8 ~67 )block where () and are matrices of the appropriate sizes and so

*8 ²%³ ~det ²%0 c ´ µ ³ ~det ²%0 c 0 ³ det ²%0 c c (³  ~²%c ³det ²%0c c(³ (Here = is the dimension of .) Hence, the algebraic multiplicity of  is at least , which is the geometric multiplicity of  . The Jordan Canonical Form One of the virtues of the rational canonical form is that every linear operator  on a finite-dimensional vector space has a rational canonical form. However, the rational canonical form may be far from the ideal of simplicity that we had in mind for a set of simple canonical forms.

We can do better when the minimal polynomial of  splits over - , that is, factors into a product of linear factors

 ²%³~²%c  ³ IJ%c ³ (8.4) In some sense, the difficulty in the rational canonical form is the basis for the cyclic submodules º#Á » . Recall that since º# Á » is a  -cyclic subspace of = we have chosen the ordered basis

cÁ 8Á~#Á²#³ÁÃÁ23 Á Á ²#³ Á

Á where Á ~deg ² ³ . With this basis, all of the complexity comes at the end, when we attempt to express

cÁ  Á ² ²#Á ³³ ~  ²# Á ³ as a linear combination of the basis vectors.

When the minimal polynomial ²%³ has the form (8.4), the elementary divisors are

Á Á ²%³~²%c³  In this case, we can choose the ordered basis

cÁ 9Á~23 # Á Á ² c  ³²# Á ³Á à Á ² c  ³ ²# Á ³ for º#Á » . Denoting the  th basis vector in 9 Á by   , we have for ~ÁÃÁÁ c, Eigenvalues and Eigenvectors 159

 ²Á ³ ~ ´² c ³ ²# ³µ  ~ ² c b  ³´² c Á ³ ²# ³µ b  ~² cÁÁ ³ ²# ³b  ² c ³ ²# ³ ~b b   

For ~Á c , a similar computation, using the fact that

b Á ² c ³ ²# Á ³ ~ ²  c  ³ ²# Á ³ ~  gives

²cÁ ³ ~ c  Á In this case, the complexity is more or less spread out evenly, and the matrix of

9Odº#Á » with respect to Á is the Á Á matrix

vy ÄÄ x{ÆÅ x{ @²Á³~x{ÆÆÅ Á x{ Å ÆÆÆ  wzÄ  which is called a Jordan block associated with the scalar  . Note that a Jordan block has  's on the main diagonal,  's on the subdiagonal and 's elsewhere. This matrix is, in general, simpler (or at least more aesthetically pleasing) than a companion matrix.

Now we can state the analog of Theorem 7.10 for this choice of ordered basis.

Theorem 8.6 ( The Jordan canonical form ) Let dim²= ³  B and suppose that the minimal polynomial of B²=³ splits over the base field - , that is,

 ²%³~²%c  ³ IJ%c ³ Then = has an ordered basis 9 under which

vy@²Á³Á x{Æ x{ x{@²Á³ x{Á ´µ ~x{Æ 9 x{ x{@²Á³ x{Á Æ wz @²Á³Á block

Á where the polynomials ²% c ³ are the elementary divisors of . This block diagonal matrix is said to be in Jordan canonical form and is called the Jordan canonical form of . 160 Advanced Linear Algebra

If the base field - is algebraically closed, then except for the order of the blocks in the matrix, Jordan canonical form is a canonical form for similarity, that is, up to order of the blocks, each similarity class contains exactly one matrix in Jordan canonical form. Proof. As to the uniqueness, suppose that @ is a matrix in Jordan canonical form that represents the operator 8 with respect to some ordered basis , and that @@@ has Jordan blocks ² Á ³ÁÃÁ  ² Á ³ , where the  's may not be distinct. Then = is the direct sum of  -invariant subspaces, that is, submodules of = , say

=~=lÄl=

Consider a particular submodule = . it is easy to see from the matrix  representation that O²%c³== satisfies the polynomial on , but no  polynomial of the form ²% c ³ for    , and so the order of =  is  ²% c ³. In particular, each = is a primary submodule of = .

We claim that = is also a cyclic submodule of = . To see this, let ²# ÁÃÁ# ³ be the ordered basis that gives the Jordan block @²Á³ . Then it is easy to see  by induction that  #b is a linear combination of # ÁÃÁ# , with coefficient of #cb equal to or . Hence, the set

c ¸# Á # Á #  Á Ã Á  #  ¹ is also a basis for === , from which it follows that is a  -cyclic subspace of , that is, a cyclic submodule of = .

Thus, the Jordan matrix @ corresponds to a primary cyclic decomposition of =  with elementary divisors ²% c ³ . Since the multiset of elementary divisors is unique, so is the Jordan matrix representation of  , up to order of the blocks.

Note that if @ has Jordan canonical form then the diagonal elements of @ are precisely the eigenvalues of  , each appearing a number of times equal to its algebraic multiplicity. Triangularizability and Schur's Lemma We have now discussed two different canonical forms for similarity: the rational canonical form, which applies in all cases and the Jordan canonical form, which applies only when the base field is algebraically closed. Let us now drop the rather strict requirements of canonical forms and look at two classes of matrices that are too large to be canonical forms (the upper triangular matrices and the almost upper triangular matrices) and a class of matrices that is too small to be a canonical form (the diagonal matrices).

The upper triangular matrices (or lower triangular matrices) have some nice properties and it is of interest to know when an arbitrary matrix is similar to a Eigenvalues and Eigenvectors 161 triangular matrix. We confine our attention to upper triangular matrices, since there are direct analogs for lower triangular matrices as well.

It will be convenient to make the following, somewhat nonstandard, definition.

Definition A linear operator  on = is upper triangular with respect to an ordered basis 8~²#ÁÃÁ# ³ if the matrix ´ µ8 is upper triangular, that is, if for all ~ÁÃÁ

²#³º# ÁÃÁ#»  The operator  is upper triangularizable if there is an ordered basis with respect to which  is upper triangular.

As we will see next, when the base field is algebraically closed, all operators are upper triangularizable. However, since two distinct upper triangular matrices can be similar, the class of upper triangular matrices is not a canonical form for similarity. Simply put, there are just too many upper triangular matrices.

Theorem 8.7 (Schur's Lemma) Let = be a finite-dimensional vector space over a field - . 1) If B²=³ has the property that its characteristic polynomial *²%³ splits over - then  is upper triangularizable. 2) If - is algebraically closed then all operators are upper triangularizable. Proof. Part 2) follows from part 1). The proof of part 1) is most easily accomplished by matrix means, namely, we prove that every square matrix (4 ²-³ whose characteristic polynomial splits over - is similar to an upper triangular matrix. If ~ there is nothing to prove, since all d matrices are upper triangular. Assume the result is true for  c  and let (  4 ²- ³ .

Let #-( be an eigenvector associated with the eigenvalue  of and extend  ¸# ¹ to an ordered basis ²# Á ÀÀÀÁ #³ for s8 . The matrix of ( with respect to has the form

 i ´(µ8 ~ >? ( block for some ( 4²-³c . Since ´(µ8 and ( are similar, we have

det²%0 c (³ ~ det ²%0 c ´(µ8 ³ ~ ²% c ³ det ²%0 c ( ³

Hence, the characteristic polynomial of (- also splits over and, by the induction hypothesis, there exists an invertible matrix 74c ²-³ for which c <~7(7 is upper triangular. Hence, if 162 Advanced Linear Algebra

 8~>? 7block then 8 is invertible and

c  i i 8´(µ8 8 ~>?>?>?>?c ~ 7 ( 7 < is upper triangular. This completes the proof.

When the base field is -~s , not all operators are triangularizable. We can, however, achieve a form that is close to triangular. For the sake of the exposition, we make the following nonstandard definition (that is, the reader should not expect to find this definition in other books).

Definition A matrix (4 ²-³ is almost upper triangular if it has the form

vy(i x{( (~x{ Æ wz (  where each matrix (dd either has size or else has size with an irreducible characteristic polynomial. A linear operator  ²=³B is almost upper triangularizable if there is an ordered basis 8 for which ´µ8 is almost upper triangular.

We will prove that every real linear operator is almost upper triangularizable. In the case of a complex vector space = , any complex linear operator B  ²= ³ has an eigenvalue and hence = contains a one-dimensional  -invariant subspace. The analog for the real case is that for any real linear operator B²=³, the vector space = contains either a one-dimensional or a “nonreducible” two-dimensional  -invariant subspace.

Theorem 8.8 Let B²=³ be a real linear operator. Then = contains at least one of the following: 1) A one-dimensional  -invariant subspace, 2) A two-dimensional  -invariant subspace >~O for which > has the property that ²%³~*²%³ is an irreducible quadratic. Hence, > is not the direct sum of two one-dimensional  -invariant subspaces. Proof. The minimal polynomial ²%³ of  factors into a product of linear and quadratic factors over s . If there is a linear factor %c , then is an eigenvalue for  and if #~ # then º#» is the desired one-dimensional -invariant subspace. Eigenvalues and Eigenvectors 163

 Otherwise, let ²%³ ~ % b % b  be an irreducible quadratic factor of  ²%³ and write

 ²%³ ~ ²%³²%³ Since ² ³ £, we may choose a nonzero vector #= such that ²³#£ . Let >~º² ³#Á ² ³ # » This subspace is  -invariant, for we have ´² ³#µ  > and ´²³#µ~  ²³#~c²b³²³#>  

Hence,  ~O > is a linear operator on > . Also, ² ³> ~ ¸¹ and so  has minimal polynomial dividing ²%³ . But since ²%³ is irreducible and monic, ²%³~²%³ is quadratic. It follows that > is two-dimensional, for if ²³#b²³#~ then b ~ on > , which is not the case. Finally, the characteristic polynomial * ²%³ has degree  and is divisible by  ²%³ , whence * ²%³ ~ ²%³~²%³ is irreducible. Thus, > satisfies condition 2).

Now we can prove Schur's lemma for real operators.

Theorem 8.9 (Schur's lemma: real case ) Every real linear operator  ²=³B is almost upper triangularizable. Proof. As with the complex case, it is simpler to proceed using matrices, by showing that any d real matrix ( is similar to an almost upper triangular matrix. The result is clear for ~ or if ( is the zero matrix.

For  ~  , the characteristic polynomial *²%³ of ( has degree  and is divisible by the minimal polynomial ²%³ . If ²%³ ~ % c is linear then ( ~ 0 is diagonal. If ²%³ ~ ²% c ³ then ( is similar to an upper triangular matrix with diagonal elements  and if ²%³ ~ ²% c ³²% c ³ with £ then ( is similar to a diagonal matrix with diagonal entries  and . Finally, if ²%³ ~ *²%³ is irreducible then the result still holds.

Assume for the purposes of induction that any square matrix of size less than d is almost upper triangularizable. We wish to show that the same is true for any d matrix ( . We may assume that ‚ .

 If (#>~º#» has an eigenvector s , then let . If not, then according to  Theorem 8.8, there is a pair of vectors " Á " s for which > ~ º$  Á $  » is 164 Advanced Linear Algebra two-dimensional and ( -invariant and the characteristic and minimal polynomials of (O<>> are equal and irreducible. Let be a complement of . If  dim²>³~ then let 8s ~²# Á" ÁÃÁ" c ³ be an ordered basis for and if  dim²>³~ then let 8s ~²$ Á$ Á" ÁÃÁ" c ³ be an ordered basis for . In either case, ( is similar to a matrix of the form

)i ´(µ8 ~ >? ( block where ) has size d or ) has size d , with irreducible quadratic minimal polynomial. Also, (d has size , where ~c or ~c . Hence, the induction hypothesis applies to ( and there exists an invertible matrix 74 for which c <~7(7 is almost upper triangular. Hence, if

0c 8~>? 7block then 8 is invertible and

c 0)ic 0c )i  8´(µ8 8 ~>?>?>?>?c ~  7 ( 7 < is almost upper triangular. This completes the proof. Unitary Triangularizability Although we have not yet discussed inner product spaces and orthonormal bases, the reader is no doubt familiar with these concepts. So let us mention that when == is a real or complex , then if an operator  on can be triangularized (or almost triangularized) using an ordered basis 8 , it can also be triangularized (or almost triangularized) using an orthonormal ordered basis E.

To see this, suppose we apply the Gram–Schmidt process to 8E~²#ÁÃÁ# ³. The resulting ordered ~²"ÁÃÁ"  ³ has the property that

º# ÁÃÁ#»~º"  ÁÃÁ"» for all  . Since ´ µ8 is upper triangular, that is,

²#³º# ÁÃÁ#»  for all  , it follows that

²"³º # ÁÃÁ #»‹º#  ÁÃÁ#»~º" ÁÃÁ"» Eigenvalues and Eigenvectors 165

and so the matrix ´µ E is also upper triangular. (A similar argument holds in the almost upper triangular case.)

A linear operator  is unitarily upper triangularizable if there is an ordered orthonormal basis with respect to which  is upper triangular. Accordingly, when = is an inner product space, we can replace the term “upper triangularizable” with “unitarily upper triangularizable” in Schur's lemma. (A similar statement holds for almost upper triangular matrices.) Diagonalizable Operators A linear operator B²=³ is diagonalizable if there is an ordered basis 8 for which ´µ 8 is diagonal. In the case of an algebraically closed field, we have seen that all operators are upper triangularizable. However, even for such fields, not all operators are diagonalizable.

Our first characterization of diagonalizability amounts to little more than the definitions of the concepts involved.

Theorem 8.10 An operator B²=³ is diagonalizable if and only if there is a basis for = that consists entirely of eigenvectors of  , that is, if and only if

=~;; lÄl where ÁÃÁ are the distinct eigenvalues of  .

Diagonalizability can also be characterized via minimal polynomials. Suppose that 8 is diagonalizable and that ~¸#ÁÃÁ# ¹ is a basis for = consisting of eigenvectors of  . Let ÁÃÁ  be a list of the distinct eigenvalues of  . Then each basis vector # is an eigenvector for one of these eigenvalues and so

  ²c ³#~ ~ for all basis vectors # . Hence, if

 ²%³ ~ ²% c ³ ~ then ²³~ and so  ²%³“²%³ . But every eigenvalue  is a root of the minimal polynomial of  and so ²%³ “  ²%³ , whence ²%³ ~  ²%³ .

Conversely, if the minimal polynomial of  is a product of distinct linear factors, then the primary decomposition of = looks like

=~=lÄl= where 166 Advanced Linear Algebra

= ~ ¸#  = “ ² c ³# ~ ¹ ~ ; By Theorem 8.10,  is diagonalizable. We have established the following result.

Theorem 8.11 A linear operator B²=³ on a finite-dimensional vector space is diagonalizable if and only if its minimal polynomial is the product of distinct linear factors. Projections We have met the following type of operator before.

Definition Let =~:l; . The linear operator  ¢=¦= defined by ² b !³ ~ where : and !; is called projection on : along ; .

The following theorem describes projection operators.

Theorem 8.12 1) Let  be projection on :; along . Then a) im²³~: , ker ²³~; b) =²³²³~lim ker c) #im²³ ¯ ²³ # ~# Note that the last condition says that a vector is in the image of  if and only if it is fixed by  . 2) Conversely, if B²=³ has the property that

=~im ²³lker ²³ and Oim²³ ~ then  is projection on im²³ along ker ²³ .

Projection operators play a major role in the spectral theory of linear operators, which we will discuss in Chapter 10. Now we turn to some of the basic properties of these operators.

Theorem 8.13 A linear operator B²=³ is a projection if and only if it is idempotent, that is, if and only if  ~ . Proof. If  is projection on :; along then for any :!; and , ² b!³~ ² ³~ ~ ² b!³ and so  ~#²³q²³ . Conversely, suppose that  is idempotent. If im  ker  then # ~ ²%³ and so  ~ ²#³ ~ ²%³ ~  ²%³ ~ # Hence im² ³ qker ² ³ ~ ¸¹ . Moreover, if #  = then Eigenvalues and Eigenvectors 167

# ~ ´# c ²#³µ b ²#³ ker ²  ³ l im ²  ³ and so = ~ker ² ³ lim ² ³ . Finally,  ´ ²%³µ ~ ²%³ and so  Oim²³ ~  . Hence,  is projection on im²³ along ker ²³ . The Algebra of Projections If  is projection on :; along then c is idempotent, since ²c³~ccb~c    Hence, c is also a projection. Since ker ²c³~  im ²³  and im ²c³  ~²³ker , it follows that c is projection on ; along : . Orthogonal Projections Definition The projections Á²=³ B are orthogonal , written  ž  , if ~~  .

Note that ž if and only if im²³‹ker ²³ and im ²³‹  ker ²³ The following example shows that it is not enough to have  ~ in the definition of orthogonality. In fact, it is possible for ~ and yet  is not even a projection.

Example 8.1 Let =~- and let + ~ ¸²%Á %³ “ %  - ¹ ? ~ ¸²%Á ³ “ %  - ¹ @ ~ ¸²Á &³ “ &  - ¹ Thus, +?%@&- is the diagonal, is the -axis and is the -axis in  . (The reader  may wish to draw pictures in s .³ Using the notation (Á) for the projection on () along , we have

+Á? +Á@~£~ +Á@ +Á? +Á@ +Á? From this we deduce that if  and are projections, it may happen that both products  and  are projections, but that they are not equal.

We leave it to the reader to show that @Á? ?Á+ ~ (which is a projection), but that ?Á+ @ Á? is not a projection. Thus, it may also happen that  is a projection but that  is not a projection.

If  and  are projections, it does not necessarily follow that bc , or is a projection. Let us consider these one by one. 168 Advanced Linear Algebra

The Sum of Projections The sum b is a projection if and only if ²b³~  b or b~  (8.5) Of course, this holds if ~~  , that is, if  ž  . We contend that the converse is also true, namely, that (8.5) implies that ž .

Multiplying (8.5) on the left by  and on the right by gives the pair of equations b~  b~  Hence ~~  which together with (8.5) gives  . Therefore, if char²- ³ £  then  ~  and therefore  ~  , that is,  ž  . We have proven that bž- is a projection if and only if  (assuming that has characteristic different from  ).

Now suppose that b²b³ is a projection. To determine ker  , suppose that ² b ³²#³ ~  Applying  and noting that  ~~²#³~ and , we get . Similarly, ²#³ ~  and so ker ² b ³ ‹ ker ² ³ q ker ² ³ . But the reverse inclusion is obvious and so ker²b³~ ker ²³q  ker ²³  As to the image of b , we have #im ² b ³¬#~²  b ³²#³~  ²#³b  ²#³ im ²  ³b im ²  ³ and so im² b ³‹ im ²  ³b im ²  ³ . But  ~ implies that im ²  ³‹ker ²  ³ and so the sum is direct and im²b³‹ im ²³l²³  im  For the reverse inequality, if #~ b , where ²³im and ²³ im then ² b ³²#³ ~ ²  b ³² ³ b ²  b ³² ³ ~ b ~ # and so #im ² b ³ . Let us summarize.

Theorem 8.14 Let Á²=³ B be projections where = is a vector space over a field of characteristic£ . Then  b is a projection if and only if  ž , in which case b²³l²³²³q²³ is projection on im  im  along ker  ker  . Eigenvalues and Eigenvectors 169

The Difference of Projections Let  and be projections. The difference c is a projection if and only if ~c²c³~²c³b    is a projection. Hence, we may apply the previous theorem to deduce that c is a projection if and only if ²c ³ ~ ²c ³~ or, equivalently, ~~   Moreover, in this case,  c~c is projection on ker ²³ along im ²³ . Theorem 8.14 also implies that im²³~²c³l²³~²³l²³  im imker  im and ker²³~²c³q²³~²³q²³ ker  kerim ker

Theorem 8.15 Let Á²=³ B be projections where = is a vector space over a field of characteristic£c 2. Then  is a projection if and only if ~~   in which case c²³q²³²³l²³ is projection on im ker  along ker  im  . The Product of Projections Finally, let us consider the product  of two projections.

Theorem 8.16 Let Á  B ²= ³ be projections. If  and  commute, that is, if ~  then  is a projection. In this case,  is projection on im²³q im ²³ along ker ²³b  ker ²³  . (Example 8.1 shows that the converse may be false.) Proof. If ~  then ²³~  ~   ~  and so  is a projection. To find the image of  , observe that if # ~  ²#³ then ²#³ ~ ²#³ ~ ²#³ ~ # and so #im ² ³ . Similarly # im ² ³ and so im²³‹²³q²³ im  im  For the reverse inclusion, if % ~ ²#³ ~ ²$³ im ²  ³ q im ²  ³ then 170 Advanced Linear Algebra

²%³~  ²$³~  ²$³~  ²%³~  ²#³~  ²#³~% and so %im ² ³ . Hence, im²³~²³q²³ im  im  Next, we observe that if #ker ² ³ then  ²#³~ and so  ²#³ ker ²  ³ . Hence, # ~ ²#³ b ²# c ²#³³ ker ² ³ b ker ² ³ Moreover, if #~ b ker ² ³b ker ² ³ then ²#³ ~  ² b ³ ~  ² ³ b  ² ³ ~  b  ~  and so #ker ² ³ . Thus, ker²³~²³b²³ ker  ker  We should remark that the sum above need not be direct. For example, if ~ then ker²³~ ker ²³. Resolutions of the Identity If  is a projection then ž²c³ and  b²c³~ Let us generalize this to more than two projections.

Definition If ÁÃÁ are mutually orthogonal projections, that is,   ž for £ and if

bÄb ~ where  is the identity operator then we refer to this sum as a resolution of the identity.

There is a connection between the resolutions of the identity map on = and the decomposition of == . In general, if the linear operators  on satisfy

bÄb ~ then for any #= we have

#~ #~ ²#³bÄb  ²#³im ²  ³bÄb im ²  ³ and so

=~im ² ³bÄb im ² ³ However, the sum need not be direct. The next theorem describes when the sum is direct. Eigenvalues and Eigenvectors 171

Theorem 8.17 Resolutions of the identity correspond to direct sum decompositions of = in the following sense: 1) If bÄb ~ is a resolution of the identity then

=~im ² ³lÄl im ² ³

and  is projection on im²³ along

ker²³~ im ²³ £

2) Conversely, suppose that

=~:lÄl:

If  is projection on : along the direct sum of the other subspaces

 : £

then bÄb ~ is a resolution of the identity. Proof. To prove 1) suppose that bÄb ~ is a resolution of the identity. Then as we have seen

=~im ² ³bÄb im ² ³ To see that the sum is direct, if

%bÄb  %~

 then applying  gives %~ %~ for all  . Hence, the sum is direct. Finally, we have

im²³l im ²³~=~ im ²³l  ker ²³ £ which implies that

ker²³~ im ²³ £

To prove part 2), observe that for £ ,

im²³~:‹ker ²³  and similarly im²³‹ker ²³ . Hence,   ž . Also, if #~ bÄb where : then

#~  bÄb ~ ²#³bÄb ²#³ and so ~bÄb  is a resolution of the identity. 172 Advanced Linear Algebra

Spectral Resolutions Let us try to do something similar to Theorem 8.17 for an arbitrary linear operator  on = (rather than just the identity ). Suppose that can be resolved as follows

~bÄb   where bÄb ~ is a resolution of the identity and   - . Then

=~im ² ³lÄl im ² ³

Moreover, if #im ² ³ then #~ ²%³ and so

²#³ ~ ² b Ä b ³ ²%³ ~ ²%³ ~ #

Hence, im²³‹;  . But the reverse is also true, since the equation  ²#³~#  is

² b Ä b   ³²#³ ~   ²   b Ä b   ³# or

² c ³ ²#³bÄb²   c ³ ²#³~

But since ² c ³ ²#³ im ²   ³ , we deduce that   ²#³ ~  for  £  and so

#~² bÄb ³#~ #im ² ³

Thus, im²³~;  and we can conclude that

=~;; lÄl that is,  is diagonalizable. The converse also holds, for if = is the direct sum of the eigenspaces of  and if  is projection on ; along the direct sum of the other eigenspaces then

bÄb ~

But for any # ; , we have

²# ³ ~ # ~ ² b Ä b  ³# ~ ²  b Ä b ³²# ³ and so

~bÄb   Theorem 8.18 A linear operator B²=³ is diagonalizable if and only if it can be written in the form

~bÄb   (8.6) where the  's are distinct and bÄb ~ is a resolution of the identity. In this case, ¸ÁÃÁ¹ is the spectrum of  and the projections   satisfy Eigenvalues and Eigenvectors 173

im²³~; and ker ²³~  ; £

Equation (8.6) is referred to as the spectral resolution of  . Projections and Invariance There is a connection between projections and invariant subspaces. Suppose that := is a  -invariant subspace of and let be any projection onto : (along any complement of : ). Then for any #  = , we have  ²#³  : and so ² ²#³³  : . Hence, ²²#³³ is fixed by  , that is ²#³ ~  ²#³ Thus ~~  . Conversely, if   then for any : , we have ~²  ³ , whence ² ³ ~  ² ³ ~  ² ³ ~  ² ³ and so ² ³ is fixed by , from which it follows that  ² ³  : . In other words, : is -invariant.

Theorem 8.19 Let B²=³ . A subspace : of = is  -invariant if and only if ~  for some projection  on : .

We also have the following relationship between projections and reducing pairs.

Theorem 8.20 Let = ~ : l ; . Then a linear operator B  ²= ³ is reduced by the pair ²:Á ; ³ if and only if  ~  , where  is projection on : along ; . Proof. Suppose first that ~:;  , where  is projection on along . For : we have ² ³ ~  ² ³ ~  ² ³ and so  fixes ² ³ , which implies that  ² ³  : . Hence : is invariant under  . Also, for !; ²!³ ~  ²!³ ~  and so ²!³ ker ² ³ ~ ; . Hence, ; is invariant under  .

Conversely, suppose that ²:Á ; ³ reduces  . The projection operator fixes vectors in :; and sends vectors in to . Hence, for :!; and we have ² ³ ~ ² ³ ~ ² ³ and 174 Advanced Linear Algebra

²!³ ~  ~  ²!³ which imply that ~  . Exercises 1. Let 1d be the matrix all of whose entries are equal to  . Find the minimal polynomial and characteristic polynomial of 1 and the eigenvalues. 2. A linear operator B²=³ is said to be nonderogatory if its minimal polynomial is equal to its characteristic polynomial. Prove that  is nonderogatory if and only if = is a cyclic module. 3. Prove that the eigenvalues of a matrix do not form a complete set of invariants under similarity. 4. Show that B²=³ is invertible if and only if  is not an eigenvalue of  . 5. Let (d be an matrix over a field - that contains all roots of the characteristic polynomial of (²(³ . Prove that det is the product of the eigenvalues of ( , counting multiplicity. 6. Show that if  is an eigenvalue of then ² ³ is an eigenvalue of ² ³ , for any polynomial ²%³ . Also, if  £  then c is an eigenvalue for  c . 7. An operator B²=³ is nilpotent if  ~ for some positive  o . a) Show that if  is nilpotent then the spectrum of is ¸¹ . b) Find a nonnilpotent operator  with spectrum ¸¹ . 8. Show that if Á  B ²= ³ then  and  have the same eigenvalues. 9. (Halmos) a) Find a linear operator  that is not idempotent but for which ²c  ³~. b) Find a linear operator  that is not idempotent but for which ²c  ³~ . c) Prove that if ²c  ³~²c  ³~ then  is idempotent. 10. An involution is a linear operator  for which  ~ . If is idempotent what can you say about c ? Construct a one-to-one correspondence between the set of idempotents on = and the set of involutions.  c 11. Let (Á)4² d ³ and suppose that ( ~) ~0Á()(~) but (£0 and )£0 . Show that if *4² d ³ commutes with both ( and ) then *~ 0 for some scalar d . 12. Suppose that 12 and are matrices in Jordan canonical form. Prove that if 12 and are similar then they are the same except for the order of the Jordan blocks. Hence, Jordan form is a canonical form for similarity (up to order of the blocks). 13. Fix  € . Show that any complex matrix is similar to a matrix that looks just like a Jordan matrix except that the entries that are equal to  are replaced by entries with value  , where is any . Thus, any complex matrix is similar to a matrix that is “almost” diagonal. Hint: Eigenvalues and Eigenvectors 175

consider the fact that

vyvyvyvy      c ~   wzwzwzwz c  14. Show that the Jordan canonical form is not very robust in the sense that a small change in the entries of a matrix ( may result in a large jump in the entries of the Jordan form 1 . Hint : consider the matrix   (~  >?

What happens to the Jordan form of (¦ as  ? 15. Give an example of a complex nonreal matrix all of whose eigenvalues are real. Show that any such matrix is similar to a real matrix. What about the type of the invertible matrices that are used to bring the matrix to Jordan form? 16. Let 1~´µB8 be the Jordan form of a linear operator  ²=³ . For a given Jordan block of 1² Á³ let < be the subspace of = spanned by the basis vectors of 8 associated with that block. a) Show that O< has a single eigenvalue with geometric multiplicity . In other words, there is essentially only one eigenvector (up to scalar multiple) associated with each Jordan block. Hence, the geometric multiplicity of  for is the number of Jordan blocks for  . Show that the algebraic multiplicity is the sum of the dimensions of the Jordan blocks associated with  . b) Show that the number of Jordan blocks in 1 is the maximum number of linearly independent eigenvectors of  . c) What can you say about the Jordan blocks if the algebraic multiplicity of every eigenvalue is equal to its geometric multiplicity? 17. Assume that the base field - is algebraically closed. Then assuming that the eigenvalues of (1 are known, it is possible to determine the Jordan form of a matrix () by looking at the rank of various matrix powers. A matrix is nilpotent if )~ for some € . The smallest such exponent is called the index of nilpotence . a) Let 1~1²Á³ be a single Jordan block of size d . Show that 1c 0 is nilpotent of index  . Thus,  is the smallest integer for which rk²1 c 0³ ~ . Now let 1 be a matrix in Jordan form but possessing only one eigenvalue . b) Show that 1c 0 is nilpotent. Let  be its index of nilpotence. Show that 1 is the maximum size of the Jordan blocks of and that rk²1 c 0³c is the number of Jordan blocks in 1 of maximum size. c) Show that rk²1 c 0³c is equal to  times the number of Jordan blocks of maximum size plus the number of Jordan blocks of size one less than the maximum. 176 Advanced Linear Algebra

d) Show that the sequence rk²1c 0³ for  ~ÁÃÁ uniquely determines the number and size of all of the Jordan blocks in 1 , that is, it uniquely determines 1 up to the order of the blocks. e) Now let 11 be an arbitrary Jordan matrix. If  is an eigenvalue for show that the sequence rk²1c 0³ for  ~ÁÃÁ where  is the first integer for which rk²1 c 0³b ~ rk ²1 c 0³ uniquely determines 1 up to the order of the blocks. f) Prove that for any matrix (¸ÁÃÁ¹ with spectrum  the sequence  rk²(c 0³ for ~ÁÃÁ and  ~ÁÃÁ where  is the first integer for which rk²( c 0³b ~ rk ²( c 0³ uniquely determines the Jordan matrix 1( for up to the order of the blocks. 18. Let ( C ²- ³ . a) If all the roots of the characteristic polynomial of (- lie in prove that (() is similar to its transpose ! . Hint: Let be the matrix

vyÄ   x{ÅÇ   )~x{ ÇÇÅ wzÄ

that has  's on the diagonal that moves up from left to right and 's elsewhere. Let 1) be a Jordan block of the same size as . Show that )1)c ~ 1 !. b) Let (Á ) C ²-³ . Let 2 be a field containing - . Show that if ( and c )2)~7(77²2³ are similar over , that is, if where C then () and are also similar over - , that is, there exists 8²-³C for which )~8(8c . Hint : consider the equation ?(c)?~ as a homogeneous system of linear equations with coefficients in - . Does it have a solution? Where? c) Show that any matrix is similar to its transpose. 19. Prove Theorem 8.8 using the complexification of = . The Trace of a Matrix 20. Let ( be an  d  matrix over a field - . The trace of ( , denoted by tr ²(³ , is the sum of the elements on the main diagonal of ( . Verify the following statements: a) tr² A ³ ~ tr ²(³ , for  - b) tr²( b )³ ~ tr ²(³ b tr ²)³ c) tr²()³ ~ tr ²)(³ d) Prove that tr²()*³ ~ tr ²*()³ ~ tr ²)*(³ . Find an example to show that tr²()*³ may not equal tr ²(*)³ . e) The trace is an invariant under similarity f) If -( is algebraically closed then the trace of is the sum of the eigenvalues of ( . Formulate a definition of the trace of a linear operator, show that it is well- defined and relate this concept to the eigenvalues of the operator. Eigenvalues and Eigenvectors 177

21. Use the concept of the trace of a matrix, as defined in the previous exercise, to prove that there are no matrices () , Cd ² ³ for which () c )( ~ 0

22. Let ; ¢C ²- ³ ¦ - be a function with the following properties. For all matrices (Á ) C ²-³ and  - , 1) ;² A ³~ ;²(³ 2) ;²(b)³~;²(³b;²)³ 3) ;²()³~;²)(³ Show that there exists  - for which ; ²(³ ~ tr ²(³ , for all ( C ²- ³. Simultaneous Diagonalizability 23. A pair of linear operators Á  B ²= ³ is simultaneously diagonalizable if there is an ordered basis 8 for =´µ´µ for which 88 and are both diagonal, that is, 8 is an ordered basis of eigenvectors for both and . Prove that two diagonalizable operators  and are simultaneously diagonalizable if and only if they commute, that is, ~~  . Hint : If   then the eigenspaces of  are invariant under . Common Eigenvectors It is often of interest to know whether a family

<~¸ B²= ³ “ ? ¹ of linear operators on = has a common eigenvector , that is, a single vector #= that is an eigenvector for every operator in < (the corresponding eigenvalues may be different for each operator, however).

A commuting family < of operators is a family in which each pair of operators commutes, that is, Á < implies  ~  . We say that a subspace <= of is < -invariant if it is < -invariant for every  .

24. Let Á  B ²= ³ . Prove that if  and  commute then every eigenspace of  is < -invariant. Thus, if is a commuting family then every eigenspace of any member of << is -invariant. 25. Let ?c  >? c 

have to do with the issue of common eigenvectors? 178 Advanced Linear Algebra

Geršgorin Disks It is generally impossible to determine precisely the eigenvalues of a given complex operator or matrix (C²³d , f or if ‚ then the characteristic equation has degree and cannot in general be solved. As a result, the approximation of eigenvalues is big business. Here we consider one aspect of this approximation problem, which also has some interesting theoretical consequences.

! Let (C²³d and suppose that (#~# where #~²ÁÃÁ ³. Comparing th rows gives  (~    ~ which can also be written in the form

 ² c( ³~ (  ~ £

If ‚ has the property that (( (( for all , we have  (((c((( ( ( ((( (( ( ( ~ ~ £ £ and thus

 (((( c(  (  (8.7) ~ £

The right-hand side is the sum of the absolute values of all entries in the  th row of (( except the diagonal entry  . This sum 9²(³ is the th deleted absolute row sum of ( . The inequality (8.7) says that, in the complex , the eigenvalue  lies in the disk centered at the diagonal entry ( with radius equal to 9²(³ . This disk

GR²(³ ~ ¸'  d “(( ' c (  9  ²(³¹ is called the Geršgorin row disk for the ( th row of . The union of all of the Geršgorin row disks is called the Geršgorin row region for ( .

Since there is no way to know in general which is the index  for which ((‚ ((, the best we can say in general is that the eigenvalues of ( lie in the union of all Geršgorin row disks, that is, in the Geršgorin row region of ( . Eigenvalues and Eigenvectors 179

Similar definitions can be made for columns and since a matrix has the same eigenvalues as its transpose, we can say that the eigenvalues of ( lie in the Geršgorin column region of ( . The Geršgorin region .²(³ of a matrix (4 ²-³ is the intersection of the Geršgorin row region and the Geršgorin column region and we can say that all eigenvalues of ( lie in the Geršgorin region of ( . In symbols,  ²(³ ‹ .²(³ .

27. Find and sketch the Geršgorin region and the eigenvalues for the matrix

vy (~  wz

28. A matrix (4 ²d ³ is diagonally dominant if for each  ~ÁÃÁ

(((‚9²(³  and it is strictly diagonally dominant if strict inequality holds. Prove that if ( is strictly diagonally dominant then it is invertible. 29. Find a matrix (4 ²d ³ that is diagonally dominant but not invertible. 30. Find a matrix (4 ²d ³ that is invertible but not strictly diagonally dominant. Chapter 9 Real and Complex Inner Product Spaces

We now turn to a discussion of real and complex vector spaces that have an additional function defined on them, called an inner product , as described in the upcoming definition. Thus, in this chapter, - will denote either the real or complex field. If is a complex number then the complex conjugate of is denoted by .

Definition Let =-~-~ be a vector space over sd or . An inner product on = is a function ºÁ»¢= d= ¦- with the following properties: 1) ()Positive definiteness For all #  = , the inner product º#Á #» is real and º#Á #» ‚  and º#Á #» ~  ¯ # ~ 

2) For -~d:( Conjugate symmetry ) º"Á #» ~ º#Á "»

For -~s:(Symmetry ) º"Á #» ~ º#Á "»

3) ()Linearity in the first coordinate For all "Á #  = and Á  - º " b #Á $» ~ º"Á $» b º#Á $» A real (or complex) vector space = , together with an inner product, is called a real (or complex ) inner product space .

We will study bilinear forms ²³ also called inner products on vector spaces over fields other than sd or in Chapter 11. Note that property 1) implies that the quantity º#Á #» is always real, even if = is a complex vector space.

Combining properties 2) and 3), we get, in the complex case

º$Á " b #» ~ º " b #Á $» ~ º"Á $» b º#Á $» ~ º$Á "» b º$Á #» 182 Advanced Linear Algebra

This is referred to as conjugate linearity in the second coordinate. Thus, a complex inner product is linear in its first coordinate and conjugate linear in its second coordinate. This is often described by saying that the inner product is sesquilinear. (Sesqui means “one and a half times.”³-~ In the real case (s ), the inner product is linear in both coordinates—a property referred to as bilinearity.

Example 9.1 ) The vector space s is an inner product space under the standard inner product, or , defined by

º²  Á à Á ³Á ² Á à Á ³» ~ b Ä b The inner product space s is often called -dimensional . 2) The vector space d is an inner product space under the standard inner product defined by

º²  Á à Á ³Á ² Á à Á ³» ~ b Ä b This inner product space is often called -dimensional unitary space . 3) The vector space *´Áµ of all continuous complex-valued functions on the closed interval ´Á µ is a complex inner product space under the inner product

b ºÁ » ~ ²%³²%³ dx a Example 9.2 One of the most important inner product spaces is the vector space   M² of all real (or complex) sequences ³ with the property that (( B , under the inner product

B º²  ³Á ²! ³» ~  ! ~ Of course, for this inner product to make sense, the sum on the right must  converge. To see this, note that if ²  ³Á ²! ³  M then

  ²((  c (( ! ³ ~ ((  c ((((  ! b (( ! and so

 ((((((  !   b !  Real and Complex Inner Product Spaces 183 which gives

BBBB  ee !(( ! (( b!B ((  ~ ~ ~ ~ We leave it to the reader to verify that M is an inner product space.

The following simple result is quite useful and easy to prove.

Lemma 9.1 If = is an inner product space and º"Á %» ~ º#Á %» for all %  = then "~#.

Note that a vector subspace := of an inner product space is also an inner product space under the restriction of the inner product of =: to . Norm and Distance If =#= is an inner product space, the norm , or length of is defined by ))# ~j º#Á #» (9.1)

A vector ##~ is a unit vector if )) . Here are the basic properties of the norm.

Theorem 9.2 1) ))#‚ and )) #~ if and only if #~ . 2) )) # ~ (()) # for all -Á#= 3) ()The Cauchy-Schwarz inequality For all "Á #  = , (())))º"Á #»  " # with equality if and only if one of "# and is a scalar multiple of the other. 4) ()The triangle inequality For all "Á #  = ))))))"b#  " b # with equality if and only if one of "# and is a scalar multiple of the other. 5) For all "Á #Á %  = ))))))"c#  "c% b %c# 6) For all "Á #  = (())))"c# )) "c#

7) ()The parallelogram law For all "Á #  = ))))))))"b# b "c# ~ " b # Proof. We prove only Cauchy-Schwarz and the triangle inequality. For Cauchy- Schwarz, if either "# or is zero the result follows, so assume that "Á#£ . Then, for any scalar - , 184 Advanced Linear Algebra

)) "c #  ~º"c #Á"c #» ~ º"Á "» c º"Á #» c ´º#Á "» c º#Á #»µ Choosing ~ º#Á "»°º#Á #» makes the value in the square brackets equal to  and so

º#Á "»º"Á #»(( º"Á #»    º"Á "» c ~)) " c º#Á #» ))#  which is equivalent to the Cauchy-Schwarz inequality. Furthermore, equality holds if and only if ))"c # ~ , that is, if and only if "c #~ , which is equivalent to "# and being scalar multiples of one another.

To prove the triangle inequality, the Cauchy-Schwarz inequality gives

))"b# ~º"b#Á"b#» ~ º"Á "» b º"Á #» b º#Á "» b º#Á #» ")) b"#b# )))) )) ~²)) " b )) # ³ from which the triangle inequality follows. The proof of the statement concerning equality is left to the reader.

Any vector space =h¢=¦ , together with a function )) s that satisfies properties 1), 2) and 4) of Theorem 9.2, is called a normed linear space . (And the function ))h³ is called a norm . Thus, any inner product space is a normed linear space, under the norm given by (9.1).

It is interesting to observe that the inner product on = can be recovered from the norm.

Theorem 9.3 ()The polarization identities 1) If = is a real inner product space, then  º"Á#»~ ²)))) "b# c "c# ³  2) If = is a complex inner product space, then  º"Á#»~ ²)))) "b# c "c# ³b ²"b# ) )))  c "c#  ³  The formulas in Theorem 9.3 are known as the polarization identities .

The norm can be used to define the distance between any two vectors in an inner product space. Real and Complex Inner Product Spaces 185

Definition Let = be an inner product space. The distance ²"Á #³ between any two vectors "#= and in is ²"Á #³ ~)) " c # (9.2) Here are the basic properties of distance.

Theorem 9.4 1) ²"Á #³ ‚  and ²"Á #³ ~  if and only if " ~ # 2) ()Symmetry ²"Á #³ ~ ²#Á "³ 3) ()The triangle inequality ²"Á #³  ²"Á $³ b ²$Á #³ Any nonempty set =¢=d=¦ , together with a function s that satisfies the properties of Theorem 9.4, is called a metric space and the function  is called a metric on = . Thus, any inner product space is a metric space under the metric (9.2).

Before continuing, we should make a few remarks about our goals in this and the next chapter. The presence of an inner product (and hence a metric) raises a host of topological issues related to the notion of convergence. We say that a sequence ²# ³ of vectors in an inner product space converges to #  = if

lim ²# Á #³ ~  ¦B that is, if

lim ))#c#~ ¦B Some of the more important concepts related to convergence are closedness and closures, completeness and the continuity of linear operators and linear functionals.

In the finite-dimensional case, the situation is very straightforward: all subspaces are closed, all inner product spaces are complete and all linear operators and functionals are continuous. However, in the infinite-dimensional case, things are not as simple.

Our goals in this chapter and the next are to describe some of the basic properties of inner product spaces—both finite and infinite-dimensional—and then discuss certain special types of operators (normal, unitary and self-adjoint) in the finite-dimensional case only. To achieve the latter goal as rapidly as possible, we will postpone a discussion of topological properties until Chapter 186 Advanced Linear Algebra

13. This means that we must state some results only for the finite-dimensional case in this chapter, deferring the infinite-dimensional case to Chapter 13. Isometries An isomorphism of vector spaces preserves the vector space operations. The corresponding concept for inner product spaces is the following.

Definition Let => and be inner product spaces and let B ²=Á>³ . 1)  is an isometry if it preserves the inner product, that is, if º ²"³Á ²#³» ~ º"Á #» for all "Á #  = . 2) A bijective isometry is called an isometric isomorphism . When ¢= ¦> is a bijective isometry, we say that => and are isometrically isomorphic.

It is not hard to show that an isometry is injective and so it is an isometric isomorphism provided it is also surjective. Moreover, if dim²= ³ ~ dim ²> ³  B injectivity implies surjectivity and so the concepts of isometry and isometric isomorphism are equivalent. On the other hand, the following example shows that this is not the case for infinite-dimensional inner product spaces.

Example 9.3 Let ¢M ¦M be defined by

²% Á% Á% ó~²Á%  Á% Áó (This is the right shift operator .³ Then  is an isometry, but it is clearly not surjective.

Theorem 9.5 A linear transformation B²=Á>³ is an isometry if and only if it preserves the norm, that is, if and only if ))))²#³ ~ # for all #= . Proof. Clearly, an isometry preserves the norm. The converse follows from the polarization identities. In the real case, we have Real and Complex Inner Product Spaces 187

 º ²"³Á ²#³» ~ ²))))  ²"³ b  ²#³ c  ²"³ c  ²#³ ³   ~ ²)))) ²" b #³ c ²" c #³ ³   ~ ²)))) "b# c "c# ³  ~º"Á#» and so  is an isometry. The complex case is similar.

The next result points out one of the main differences between real and complex inner product spaces.

Theorem 9.6 Let = be an inner product space and let B  ²= ³ . 1) If º²#³Á$»~ for all #Á$= then ~ . 2) If = is a complex inner product space and º ²#³Á #» ~  for all #  = then  ~. 3) Part 2³ does not hold in general for real inner product spaces. Proof. Part 1) follows directly from Lemma 9.1. As for part 2), let #~ %b& , for %Á &  = and  - . Then  ~ º ² % b &³Á % b &» ~ (( º²%³Á%»bº²&³Á&»b º²%³Á&»b º²&³Á%»   ~ º²%³Á&»b º²&³Á%» Setting ~ gives º²%³Á&»bº²&³Á%»~ and setting ~ gives º²%³Á&»cº²&³Á%»~ These two equations imply that º ²%³Á&»~ for all %Á&= and so ~ by part 1). As for part 3), rotation by s° in the real plane  has the property that º²#³Á#»~ for all # , yet is not zero. Orthogonality The presence of an inner product allows us to define the concept of orthogonality.

Definition Let = be an inner product space. 1) Two vectors "Á #  = are orthogonal , written " ž # , if º"Á #» ~  . 2) Two subsets ?Á@ ‹ = are orthogonal , written ? ž @ , if % ž & for all %? and &@ . 188 Advanced Linear Algebra

3) The of a subset ?‹= is the set ?~¸#=“¸#¹ž?¹ž The following result is easily proved.

Theorem 9.7 Let = be an inner product space. 1) The orthogonal complement ??‹==ž of any subset is a subspace of . 2) For any subspace : of = , : q :ž ~ ¸¹ . Orthogonal and Orthonormal Sets

Definition A nonempty set E ~¸"“2¹ of vectors in an inner product space is said to be an orthogonal set if "ž" for all £2 . If, in addition, each vector " is a unit vector, the set E is an orthonormal set . Thus, a set is orthonormal if

º" Á " » ~  Á for all Á   2 , where Á is the Kronecker delta function.

Of course, given any nonzero vector #= , we may obtain a unit vector " by multiplying # by the reciprocal of its norm  "~ # ))# Thus, it is a simple matter to construct an orthonormal set from an orthogonal set of nonzero vectors.

Note that if "ž# then ))))))"b# ~ " b # and the converse holds if -~s .

Theorem 9.8 Any orthogonal set of nonzero vectors in = is linearly independent. Proof. Let E ~¸"“2¹ be an orthogonal set of nonzero vectors and suppose that

 " bÄb  " ~ Then, for any ~ÁÃÁ ,

~º " bÄb "Á"»~ º"Á"»      and so ~ , for all  . Hence, E is linearly independent. Real and Complex Inner Product Spaces 189

Definition A maximal orthonormal set in an inner product space = is called a Hilbert basis for = .

Zorn's lemma can be used to show that any nontrivial inner product space has a Hilbert basis. We leave the details to the reader.

Extreme care must be taken here not to confuse the concepts of a basis for a vector space and a Hilbert basis for an inner product space. To avoid confusion, a vector space basis, that is, a maximal linearly independent set of vectors, is referred to as a Hamel basis . An orthonormal Hamel basis will be called an orthonormal basis, to distinguish it from a Hilbert basis.

The following example shows that, in general, the two concepts of basis are not the same.

Example 9.4 Let =~M and let 4 be the set of all vectors of the form

 ~²ÁÃÁÁÁÁó where  has a in the th coordinate and  's elsewhere. Clearly, 4 is an  orthonormal set. Moreover, it is maximal. For if #~²%³M has the property that #ž4 then

% ~ º#Á  » ~  for all #~ and so . Hence, no nonzero vector #¤4 is orthogonal to 4 . This shows that 4M is a Hilbert basis for the inner product space  .

On the other hand, the vector space span of 4: is the subspace of all sequences in M that have finite support, that is, have only a finite number of nonzero terms and since span²4³ ~ : £ M , we see that 4 is not a Hamel basis for the vector space M .

We will show in Chapter 13 that all Hilbert bases for an inner product space have the same cardinality and so we can define the Hilbert dimension of an inner product space to be that cardinality. Once again, to avoid confusion, the cardinality of any Hamel basis for = is referred to as the Hamel dimension of = . The Hamel dimension is, in general, not equal to the Hilbert dimension. However, as we will now show, they are equal when either dimension is finite.

Theorem 9.9 Let = be an inner product space. 1) ()Gram–Schmidt orthogonalization If 8 ~²#Á#Áó is a linearly independent sequence in = , then there is an orthogonal sequence E ~²"Á"Áó in = for which

span²" ÁÃÁ" ³~ span ²#  ÁÃÁ# ³ for all € . 190 Advanced Linear Algebra

2) If dim²= ³ ~  is finite then = has a Hilbert basis of size  and all Hilbert bases for = have size . 3) If = has a finite Hilbert basis of size  , then dim ²= ³ ~  . Proof. To prove part 1), first let "~# . Once the orthogonal set ¸"ÁÃÁ"¹   of nonzero vectors has been chosen so that

span²" ÁÃÁ" ³~ span ²#  ÁÃÁ# ³ the next vector "b is chosen by setting

" ~#  b  " bÄb cc " and requiring that ""b be orthogonal to each  for , that is,

 ~ º" Á " » ~ º#  b  " b Ä b cc " Á " » ~ º#  Á " » b  º" Á " » or, finally,

º# Á " » ~c º" Á " » for all ~ÁÃÁ .

For part 2), applying the Gram–Schmidt orthogonalization process to a Hamel basis gives a Hilbert basis of the same size = . Moreover, if has a Hilbert basis of size greater than  , it must also have a Hamel basis of size greater than , which is not possible. Finally, if = has a Hilbert basis 8 of size less than then 89 can be extended to a proper superset that is also linearly independent. The Gram–Schmidt process applied to 98 gives a proper superset of that is orthonormal, which is not possible. Hence, all Hilbert bases have size  .

For part 3), suppose that dim²= ³ €  . Since a Hilbert basis > of size  is linearly independent, we can adjoin a new vector to > to get a linearly independent set of size b . Applying the Gram–Schmidt process to this set gives an orthonormal set that properly contains > , which is not possible.

For reference, let us state the Gram–Schmidt orthogonalization process separately and give an example of its use.

Theorem 9.10 () The Gram–Schmidt orthogonalization process If 8 ~²#Á#Áó is a sequence of linearly independent vectors in an inner product space = , then the sequence E ~²" Á" Áó defined by

c º# Á " » "~#c "  ~ º" Á " » Real and Complex Inner Product Spaces 191 is an orthogonal sequence in = with the property that

span²" ÁÃÁ" ³~ span ²#  ÁÃÁ# ³ for all € .

Of course, from the orthogonal sequence ²" ³ , we get the orthonormal sequence ²$ ³, where $ ~ " °)) " .

Example 9.5 Consider the inner product space s´%µ of real polynomials, with inner product defined by

 º²%³Á ²%³» ~ ²%³²%³% c Applying the Gram–Schmidt process to the sequence 8 ~²Á%Á%Á%Áó gives

"²%³~   c%% "²%³~%c h~%   c%  c%% c %%  "²%³~%3 c hc h%~% c   c% c % %    cc%% c %% %²%c³%3   "²%³~%4 c hc h%c h%c67     c% c % % c²% c3 ³ %  ~% c % and so on. The polynomials in this sequence are (at least up to multiplicative constants) the Legendre polynomials .

Orthonormal bases have a great advantage over arbitrary bases. From a computational point of view, if 8 ~¸#ÁÃÁ# ¹ is a basis for = then each #= has the form

#~  # bÄb  #

In general, however, determining the coordinates  requires solving a system of linear equations of size d .

On the other hand, if E ~¸"ÁÃÁ" ¹ is an orthonormal basis for = and

#~  " bÄb  " then the coefficients are quite easily computed:

º#Á"»~º " bÄb " Á"»~ º"Á"»~ 192 Advanced Linear Algebra

Even if E ~¸"ÁÃÁ" ¹ is not a basis (but just an orthonormal set), we can still consider the expansion

V# ~ º#Á " »" b Ä b º#Á "  »" Proof of the following characterization of orthonormal (Hamel) bases is left to the reader.

Theorem 9.11 Let E ~¸"ÁÃÁ"¹ be an orthonormal set of vectors in a finite-dimensional inner product space =#= . For any , the Fourier expansion of # with respect to E is

V#~º#Á" »" bÄbº#Á"  »" In this case, Bessel's inequality holds for all #= , that is ))V## )) Moreover, the following are equivalent: 1) The set E is an orthonormal basis for = . 2) Every vector is equal to its Fourier expansion, that is, for all #= V#~# 3) Bessel's identity holds for all #= , that is ))V#~# )) 4) Parseval's identity holds for all #Á $  = , that is

º#Á $» ~ º#Á " »º$Á " » b Ä b º#Á "  »º$Á " »

The Projection Theorem and Best Approximations We have seen that if := is a subspace of an inner product space then : q :ž ~ ¸¹. This raises the question of whether or not the orthogonal complement ::ž is a vector space complement of , that is, whether or not =~:l:ž.

If := is a finite-dimensional subspace of , the answer is yes, but for infinite- dimensional subspaces, : must have the topological property of being complete . Hence, in accordance with our goals in this chapter, we will postpone a discussion of the general case to Chapter 13, contenting ourselves here with an example to show that, in general, =£:l:ž .

Example 9.6 As in Example 9.4, let =~M and let : be the subspace of all sequences of finite support, that is, : is spanned by the vectors

 ~²ÁÃÁÁÁÁó

ž where  has a  in the  th coordinate and  s elsewhere. If % ~ ²% ³  : then Real and Complex Inner Product Spaces 193

ž % ~ º%Á  » ~  for all  and so % ~  . Therefore, : ~ ¸¹ . However, :l:ž ~:£M As the next theorem shows, in the finite-dimensional case, orthogonal complements are also vector space complements. This theorem is often called the projection theorem , for reasons that will become apparent when we discuss projection operators. (We will discuss the projection theorem in the infinite- dimensional case in Chapter 13.³

Theorem 9.12 ( The projection theorem ) If : is a finite-dimensional subspace of an inner product space = (which need not be finite-dimensional) then =~:l:ž

Proof. Let E ~¸"ÁÃÁ"¹ be an orthonormal basis for : . For each #= , consider the Fourier expansion

V#~º#Á" »" bÄbº#Á"  »" with respect to E . We may write #~#b²#c#³VV where VV#: . Moreover, #c#:ž , since

º# cVV #Á " » ~ º#Á " » c º#Á " » ~  Hence = ~ : b :žž . We have already observed that : q : ~ ¸¹ and so =~:l:ž.

According to the proof of the projection theorem, the component of # that lies in :# is just the Fourier expansion of with respect to any orthonormal basis E for :. Best Approximations The projection theorem implies that if #~#bVV žžž where #: and : then VV#:## is the element of that is closest to , that is, is the best approximation to #:!:#c#: from within . For if then since V ž we have ²# cVV #³ ž ²# c !³ and so )))#c! ~ #c#b#c!VV ))))) ~ #c# V b V#c! It follows that ))#c! is smallest when !~#VV . Also, note that # is the unique vector in :#c#ž: for which V . Thus, we can say that the best approximation to #: from within is the unique vector :²#c for which ³ž: and that this vector is the Fourier expansion V## of . 194 Advanced Linear Algebra

Orthogonal Direct Sums

Definition Let = be an inner product space and let : ÁÃÁ: be subspaces of ==. Then is the orthogonal direct sum of :ÁÃÁ: , written

:~: pÄp: if 1) =~:lÄl: 2) :ž: for £ In general, to say that the orthogonal direct sum :pÄp: of subspaces exists is to say that the direct sum :lÄl: exists and that 2) holds.

Theorem 9.12 states that =~:p:ž , for any finite-dimensional subspace : of a vector space = . The following simple result is very useful.

Theorem 9.13 Let = be an inner product space. The following are equivalent. 1) =~:p; 2) =~:l; and ;~:ž 3) =~:l; and ;‹:ž Proof. Suppose 1) holds. Then =~:l; and :ž; , which implies that ; ‹:žž. But if $: then $~ b! for :Á!; and so  ~ º Á $» ~ º Á » b º Á !» ~ º Á » Hence ~ and $; , which implies that :žž ‹; . Hence, : ~; , which gives 2). Of course, 2) implies 3). Finally, if 3) holds then ;‹:ž , which implies that :ž; and so 1) holds.

Theorem 9.14 Let = be an inner product space. 1) If dim²= ³  B and : is a subspace of = then dim²:ž ³ ~ dim ²= ³ c dim ²:³ 2) If := is a finite-dimensional subspace of then :~:žž 3) If ?=²²?³³B is a subset of and dim span then ?žž ~span ²?³

Proof. Since =~:l:žž , we have dim ²=³~ dim ²:³b dim ²:³ , which proves part 1). As for part 2), it is clear that :‹:žž . On the other hand, if #:žž then by the projection theorem #~ b Z Real and Complex Inner Product Spaces 195 where : and Zž : . But #: žž implies that ~º#Á Z»~º ZZÁ » and so Zžžžž~, showing that #: . Therefore, : ‹: and : ~: . We leave the proof of part 3) as an exercise. The Riesz Representation Theorem

If %=¢=¦- is a vector in an inner product space then the function % defined by

%²#³ ~ º#Á %» is easily seen to be a linear functional on = . The following theorem shows that all linear functionals on a finite-dimensional inner product space = have this form. (We will see in Chapter 13 that, in the infinite-dimensional case, all continuous linear functionals on =³ have this form.

Theorem 9.15 ( The Riesz representation theorem ) Let = be a finite- dimensional inner product space and let =i be a linear functional on = . Then there exists a unique vector %= for which ²#³ ~ º#Á %» (9.3) for all #= . Let us call % the Riesz vector for  and denote it by 9 . (This term and notation are not standard.) Proof. If %~ is the zero functional, we may take , so let us assume that £. Then 2~ker ²³ has codimension  and so = ~ º$» p 2 for $2ž . If %~ $ for some - , then (9.3) holds if and only if ²#³ ~ º#Á $» and since any #= has the form #~ $b for - and 2 , this is equivalent to ² $³ ~ º $Á $» or

²$³ ~ º$Á $» ~)) $ 

Hence, we may take  ~ ²$³°)) $  and

²$³ %~ $ ))$  Proof of uniqueness is left as an exercise.

 If = ~s , then it is easy to see that 9 ~ ²² ³Á à Á ²  ³³ where  ² ÁÃÁ ³ is the standard basis for s . 196 Advanced Linear Algebra

Using the Riesz representation theorem, we can define a map ¢=i ¦= by setting ²³ ~ 9 , where 9 is the Riesz vector for  . Since º#Á ²  b ³» ~ ²  b ³²#³ ~ ²#³ b ²#³ ~ º#Á  ²³» b º#Á ²³» ~ º#Á  ²³ b ²³» for all #= , we have ²  b ³ ~ ²³ b ²³ and so  is conjugate linear . Since is bijective, the map ¢=i ¦= is a “conjugate isomorphism.” Exercises 1. Verify the statement concerning equality in the triangle inequality. 2. Prove the parallelogram law. 3. Prove the Appolonius identity

   ))))))$c" b $c# ~ "c# bhh $c ²"b#³  4. Let = be an inner product space with basis 8 . Show that the inner product is uniquely defined by the values º"Á#» , for all "Á#8 . 5. Prove that two vectors "# and in a real inner product space = are orthogonal if and only if

))))))"b# ~ " b # 6. Show that an isometry is injective. 7. Use Zorn's lemma to show that any nontrivial inner product space has a Hilbert basis. 8. Prove Bessel's inequality. 9. Prove that an orthonormal set E is a basis for =#~# if and only if V , for all #=. 10. Prove that an orthonormal set E is a basis for = if and only if Bessel's identity holds for all #= , that is, if and only if ))V#~# )) for all #= . 11. Prove that an orthonormal set E is a basis for = if and only if Parseval's identity holds for all #Á $  = , that is, if and only if

º#Á $» ~ º#Á " »º$Á " » b Ä b º#Á "  »º$Á " » for all #Á $  = . Real and Complex Inner Product Spaces 197

 12. Let "~²  ÁÃÁ ³ and #~²  ÁÃÁ ³ be in s . The Cauchy-Schwarz inequality states that

  ((  bÄb  ²  bÄb ³² bÄb ³ Prove that we can do better:

   ² (( bÄb (  ( ³ ²  bÄb ³² bÄb ³ 13. Let = be a finite-dimensional inner product space. Prove that for any subset ? of = , we have ?žž ~span ²?³ . 14. Let F3 be the inner product of all polynomials of degree at most 3, under the inner product B  º²%³Á ²%³» ~ ²%³²%³c% % cB Apply the Gram–Schmidt process to the basis ¸Á%Á%Á%¹ , thereby computing the first four (at least up to a multiplicative constant). 15. Verify uniqueness in the Riesz representation theorem. 16. Let =:= be a complex inner product space and let be a subspace of . Suppose that #  = is a vector for which º#Á » b º Á #»  º Á » for all :. Prove that #:ž . 17. If => and are inner product spaces, consider the function on =>^ defined by

º²# Á $ ³Á ²#  Á $ ³» ~ º#  Á # » b º$  Á $ » Is this an inner product on =>^ ? 18. A over sd or is a vector space (over sd or ) together with a function ))¢= ¦s for which for all "Á#= and scalars we have a) )) # ~ (()) # b) ))))))"b#  " b # c) ))#~ if and only if #~ If = is a real normed space (over s ) and if the norm satisfies the parallelogram law

))))))))"b# b "c# ~ " b # prove that the polarization identity  º"Á#»~ ²)))) "b# c "c# ³  defines an inner product on = . 19. Let := be a subspace of an inner product space . Prove that each coset in =°: contains exactly one vector that is orthogonal to : . 198 Advanced Linear Algebra

Extensions of Linear Functionals 20. Let : be a linear functional on a subspace of a finite-dimensional inner i product space = . Let ²#³ ~ º#Á 9 » . Suppose that   = is an extension of O~ , that is, : . What is the relationship between the Riesz vectors 9 and 9? 21. Let : be a linear functional on a subspace of a finite-dimensional inner product space =2~²³= and let ker . Show that if i is an extension žž žž of  then 9 2 ±I . Moreover, for each vector "2 ±: there is exactly one scalar  for which the linear functional ²?³ ~ º?Á "» is an extension of  . Positive Linear Functionals on s  A vector #~² ÁÃÁ ³ in s is nonnegative (also called positive ), written #‚, if  ‚ for all  . The vector # is strictly positive , written #€ , if # is  nonnegative but not  . The set ssb of all strictly positive vectors in is called the nonnegative orthant in sÀ# The vector is strongly positive , written  #ˆ, if  € for all  . The set ssbb , of all strongly positive vectors in is the strongly positive orthant in sÀ

Let ¢: ¦ss be a linear functional on a subspace : of  . Then  is nonnegative (also called positive ), written ‚ , if #€¬²#³‚ for all #: and  is strictly positive , written € , if #€¬²#³€ for all #:À

 22. Prove that a linear functional 9€ on s is positive if and only if  and  strictly positive if and only if 9ˆ . If : is a subspace of s is it true that a linear functional : on is nonnegative if and only if 9€ ? 23. Let ¢: ¦ss be a strictly positive linear functional on a subspace : of  . Prove that  has a strictly positive extension to s . Use the fact that if  < qsb ~ ¸¹, where  sb ~¸² ÁÃÁ ³“ ‚ all ¹ and << is a subspace of sž then contains a strongly positive vector. 24. If = is a real inner product space, then we can define an inner product on its complexification = d as follows (this is the same formula as for the ordinary inner product on a complex vector space) º" b #Á % b &» ~ º"Á %» b º#Á &» b ²º#Á %» c º"Á &»³ Show that Real and Complex Inner Product Spaces 199

))))))²" b #³ ~ " b # where the norm on the left is induced by the inner product on = d and the norm on the right is induced by the inner product on = . Chapter 10 Structure Theory for Normal Operators

Throughout this chapter, all vector spaces are assumed to be finite-dimensional unless otherwise noted. The Adjoint of a Linear Operator The purpose of this chapter is to study the structure of certain special types of linear operators on finite-dimensional inner product spaces. In order to define these operators, we introduce another type of adjoint (different from the operator adjoint of Chapter 3).

Theorem 10.1 Let => and be finite-dimensional inner product spaces over - and let B²=Á>³ . Then there is a unique function i ¢>¦= , defined by the condition º ²#³Á $» ~ º#Ái ²$³» for all #= and $> . This function is in B ²>Á=³ and is called the adjoint of . Proof. For a fixed $> , consider the function $ ¢= ¦- defined by

$²#³ ~ º ²#³Á $»

It is easy to verify that $ is a linear functional on = and so, by the Riesz representation theorem, there exists a unique vector %= for which

$²#³ ~ º#Á %» for all #= . Hence, if  i ²$³~% then º ²#³Á $» ~ º#Ái ²$³» for all #= . This establishes the existence and uniqueness of  i . To show that  i is linear, observe that 202 Advanced Linear Algebra

º#Ái ² $ b "³» ~ º ²#³Á $ b "» ~ º ²#³Á $» b º ²#³Á "» ~ º#Áii ²$³» b º#Á ²"³» ~ º#Á ii ²#³» b º#Á ²"³» ~ º#Á ii ²$³ b ²"³» for all #= and so iii² $ b "³ ~ ²$³ b ²"³ Hence Bi ²=Á>³.

Here are some of the basic properties of the adjoint. Proof is left to the reader.

Theorem 10.2 Let => and be finite-dimensional inner product spaces. For every Á²=Á>³ B and - 1) ²b³~iii  b  2) ²  ³ii ~ 3) ii ~ 4) If =~> then ² ³iii ~   5) If  is invertible then ²³~²³c i i c 6) If =~> and ²%³s ´%µ then ²³ii ~²³ .

Now let us relate the kernel and image of a linear transformation to those of its adjoint.

Theorem 10.3 Let B²=Á>³ where = and > are finite-dimensional inner product spaces. Then 1) ker²³~²³ižim 2) im²³~²³ižker 3)  is injective if and only if i is surjective. 4)  is surjective if and only if i is injective. 5) ker²³~²³i ker  6) ker²³~²³ii ker  7) im²³~²³ii im  8) im²³~²³i im  9) If  is projection onto im²³ along ker ²³ then i is projection onto ker²³žž along im ²³ . Proof. For part 1), "ker ²ii ³¯ ²"³~ ¯ º i ²"³Á #» ~  for all # ¯ º"Á ²#³» ~  for all # ¯"im² ³ž Part 2) follows from part 1) by replacing  by i and taking orthogonal Structure Theory for Normal Operators 203 complements. Parts 3) and 4) follow from parts 1) and 2). For part 5), it is clear that ²"³ ~  implies that i ²"³ ~  . For the converse, we have ii²"³ ~  ¬ º  ²"³Á "» ~  ¬ º ²"³Á ²"³» ~  ¬²"³~

Part 6) follows from part 5) by replacing  with i . We leave parts 7)–9) for the reader. The Operator Adjoint and the Hilbert Space Adjoint We should make some remarks about the relationship between the operator adjoint di of , as defined in Chapter 3 and the adjoint  that we have just defined, which is sometimes called the Hilbert space adjoint . In the first place, if ¢= ¦> then di and have different domains and ranges  di¢> ¦= i but  i¢> ¦= ii These maps are shown in Figure 10.1, where ¢= ¦= and ¢> ¦> are the conjugate linear maps defined by the conditions

²#³ ~ º#Á ²³» for all =i and #= and

²$³ ~ º$Á ²³» for all >i and $> , and whose existence is guaranteed by the Riesz representation theorem. Wx VW**

I I W* V W W Figure 10.1 The map ¢>ii ¦= defined by

c i ~² ³ 204 Advanced Linear Algebra is linear. Moreover, for all >i and #=

c i ´ ²³µ²#³ ~ ´² ³ ²³µ²#³ c i ~ ¸² ³ ´ ²³µ¹²#³ i ~ º#Á ²³» ~ º ²#³Á ²³» ~² ²#³³ ~ d ²³²#³ and so ~ ddi . Hence, the relationship between  and  is

dci ~² ³ The functions  are like “change of variables” functions from linear functionals to vectors, and we can say, loosely speaking, that  i does to Riesz vectors what  d does to the corresponding linear functional.

In Chapter 3, we showed that the matrix of the operator adjoint  d is the transpose of the matrix of the map  . For Hilbert space adjoints, the situation is slightly different (due to the conjugate linearity of the inner product). Suppose that 8 ~²ÁÃÁ ³ is an ordered orthonormal basis for = and 9 ~²ÁÃÁ ³ is an ordered orthonormal basis for > . Then

ii ²´ µ98Á ³ Á ~ º ²  ³Á   » ~ º  Á  ²  ³» ~ º ²  ³Á   » ~ ²´ µ 89 Á ³ Á

i and so ´ µ98ÁÁ and ´ µ 89 are conjugate transposes. If ( ~ ² Á ³ is a matrix over - , let us write the conjugate transpose as i! ( ~ ²Á ³ Theorem 10.4 Let B²=Á>³ , where = and > are finite-dimensional inner product spaces. 1) The operator adjoint di and the Hilbert space adjoint are related by

dci ~² ³

where  maps a linear functional 9 to its Riesz vector . 2) If 89 and are ordered orthonormal bases for => and , respectively, then ii ´µ98ÁÁ ~²´µ³ 89 In words, the matrix of the adjoint  i is the conjugate transpose of the matrix of  . Unitary Diagonalizability Recall that a linear operator B²=³ on a finite-dimensional vector space = is diagonalizable if and only if = has a basis consisting entirely of eigenvectors of , or equivalently, = can be written as a direct sum of the eigenspaces of Structure Theory for Normal Operators 205

=~;; lÄl

Of course, each eigenspace ;E has an orthonormal basis  , but the union of these bases, while certainly a basis for = , need not be orthonormal.

Definition Let = be a finite-dimensional inner product space and let B  ²= ³ . If there is an orthonormal basis E for =´µ for which E is a diagonal matrix, we say that  is unitarily diagonalizable when = is complex and orthogonally diagonalizable when = is real.

For simplicity in exposition, we will tend to use the term unitarily diagonalizable for both cases. It is clear that the following statements are equivalent:

1)  is unitarily diagonalizable 2) There is an orthonormal basis for = consisting entirely of eigenvectors of  3) = can be written as an orthogonal direct sum of eigenspaces

=~;; pÄp Since unitarily diagonalizable operators are so well behaved, it is natural to seek a characterization of such operators. Remarkably, there is a simple one.

Let us first suppose that E is unitarily diagonalizable and that is an ordered orthonormal basis of eigenvectors for  . Then the matrix ´µE is diagonal

´µE ~diag ² ÁÃÁ ³ and so

i ´µ~E diag ² ÁÃÁ³

ii Clearly, ´µEE and ´ µ commute. Hence,  and also commute, that is ii~   It is a surprising fact that the converse also holds on a complex inner product space, that is, if  and i commute then  is unitarily diagonalizable. (Something similar holds for real inner product spaces, as we will see.) Normal Operators It is clear from the preceding discussion that the following concept is key.

Definition A linear operator  on an inner product space = is normal if it commutes with its adjoint ii~   Normal operators have some very nice properties. 206 Advanced Linear Algebra

Theorem 10.5 Let D be the set of normal operators on a finite-dimensional inner product space = . Then D satisfies the following properties: 1) ()Closure under linear combinations Á  - Á Á  D ¬  b   D 2) ()Closure under multiplication, under a commutivity requirement Á D Á ii ~  ¬   D 3) ()Closure under inverses D¬, invertible c D 4) ()Closure under polynomials D ¬ ²  ³  D for any ²%³  -´%µ Moreover, if D then 5) ²#³ ~  ¯i ²#³ ~  6) ²#³~¯ ²#³~ 7) The minimal polynomial of  is a product of distinct irreducible monic polynomials. 8) ²#³ ~ # ¯i ²#³ ~ # 9) Let :; and be submodules of = , whose orders are relatively prime real polynomials. Then :ž; . 10) If  and are distinct eigenvalues of  then ;;ž . Proof. We leave parts 1)–4) for the reader. Normality implies that

))²#³ ~ º ²#³Á ²#³» ~ ºii ²#³Á ²#³» ~ ) i ²#³ ) and so part 5) follows. For part 6) let ~ i . Then  has the property that º ²#³Á $» ~ ºii ²#³Á $» ~ º ²#³Á ²$³» ~ º#Á ²$³» ~ º#Á ²$³» and so i ~ . (We will discuss this property of being self-adjoint in detail later.) Now we can easily prove part 6) for  . For if ²#³ ~  for  €  then  ~ º ²#³Á c ²#³» ~ º  c ²#³Á  c ²#³» and so c²#³~ . Continuing in this way gives ²#³~ . Now, if  ²#³~ for € , then the normality of  implies that i²#³~² ³ ²#³~ and so ²#³ ~  . Hence  ~ º ²#³Á #» ~ ºi ²#³Á #» ~ º ²#³Á ²#³» and so ²#³ ~  . Structure Theory for Normal Operators 207

For part 7), suppose that

   ²%³ ~  ²%³Ä ²%³ where  ²%³ÁÃÁ ²%³ are distinct, irreducible and monic. If   € then for any #=

 ²³  ² ³´ ² ³#µ ~ 

²³  where  ²%³ ~  ²%³° ²%³ . Hence, since  ² ³ is also normal, part 6) implies that

²³  ² ³´ ² ³#µ ~ 

²³ for all #= and so ²  ³ ² ³~ , which is false since the polynomial ²³  ²%³ ²%³ has degree less than that of  ²%³ . Hence,  ~  for all  .

For part 8), using part 5) we have ²#³ ~ # ¯ ² c ³²#³ ~  ¯² c ³²#³~i ¯i ²#³ ~ ²#³ For part 9), let ann²:³ ~ º²%³» and ann ²; ³ ~ º²%³» . Then there are real polynomials ²%³ and ²%³ for which ²%³²%³ b ²%³²%³ ~  . If "  : and #; then since ²³"~ implies that ²i ³"~ , we have º"Á #» ~ º´²ii ³² ³ b ²  ii ³² ³µ"Á #» ~º²ii ³² ³"Á#» ~º"Á²³²³#» ~

Hence, :ž; . For part 10), we have for #;; and $ º#Á $» ~ º ²#³Á $» ~ º#Á i ²$³» ~ º#Á  $» ~ º#Á $» and since £º#Á$»~ we get . Special Types of Normal Operators Before discussing the structure of normal operators, we want to introduce some special types of normal operators that will play an important role in the theory.

Definition Let = be an inner product space and let B  ²= ³ . 1)  is self-adjoint (also called Hermitian in the complex case and symmetric in the real case), if i ~ 208 Advanced Linear Algebra

2)  is called skew-Hermitian in the complex case and skew-symmetric in the real case, if i ~c 3)  is called unitary in the complex case and orthogonal in the real case if is invertible and ic~ There are also matrix versions of these definitions, obtained simply by replacing the operator  by a matrix ( . In the finite-dimensional case, we have seen that ii ´µ~´µE E for any ordered orthonormal basis E of = and so if is normal then iiiii i ´µ´µEEEEEEEEEE ~´µ´  µ ~´  µ ~´  µ ~´  µ´µ  ~´µ´µ  which implies that the matrix ´µE of is normal. The converse holds as well. In fact, we can say that  is normal (respectively: Hermitian, symmetric, skew- Hermitian, unitary, orthogonal) if and only if any matrix that represents  , with respect to an ordered orthonormal basis E , is normal (respectively: Hermitian, symmetric, skew-Hermitian, unitary, orthogonal).

In some sense, square complex matrices are a generalization of complex numbers. Also, the adjoint (conjugate transpose) of a matrix seems to be a generalization of the complex conjugate. In looking for a tighter analogy—one that will lead to some useful mnemonics, we could consider just the diagonal matrices, but this is a bit too tight. The next logical choice is the normal operators.

Among the complex numbers, there are some special subsets: the real numbers, the positive numbers and the numbers on the unit circle. We will soon see that a normal operator is self-adjoint if and only if its complex eigenvalues are all real. This would suggest that the analog of the set of real numbers is the set of self- adjoint operators. Also, we will see that a normal operator is unitary if and only if all of its eigenvalues have norm  , so numbers on the unit circle seem to correspond to the set of unitary operators. Of course, this is just an analogy. Self-Adjoint Operators Let us consider some properties of self-adjoint operators. The quadratic form associated with the linear operator  is the function 8¢=¦- defined by

8 ²#³ ~ º ²#³Á #»

We have seen that in a complex inner product space  ~ if and only if 8 ~ but this does not hold, in general, for real inner product spaces. However, it does hold for symmetric operators on a real inner product space. Structure Theory for Normal Operators 209

Theorem 10.6 Let > be the set of self-adjoint operators on a finite-dimensional inner product space = . Then > satisfies the following properties: 1) ()Closure under addition Á > ¬b   > 2) ()Closure under real scalar multiplication s Á  > ¬   > 3) ()Closure under multiplication if the factors commute Á > Á ~  ¬   > 4) ()Closure under inverses >¬, invertible c > 5) ()Closure under real polynomials > ¬ ²  ³  > for any ²%³  s ´%µ

6) A complex operator  is Hermitian if and only if 8²#³ is real for all #=. 7) If -~ds or if -~ and is symmetric then  ~ if and only if 8 ~ 8) If s is self-adjoint, then the characteristic polynomial of splits over and so all complex eigenvalues are real. Proof. For part 6), if  is Hermitian then º ²#³Á #» ~ º#Á ²#³» ~ º ²#³Á #» and so 8 ²#³ ~ ºs ²#³Á #» is real. Conversely, if º ²#³Á #»  then º#Á ²#³» ~ º ²#³Á #» ~ º#Á i ²#³» and so º#Á ² cii ³²#³» ~  for all #  = , whence  c ~  , which shows that  is Hermitian.

As for part 7), the first case (-~d ) is just Theorem 9.6 so we need only consider the real case, for which  ~ º ²% b &³Á % b &» ~ º ²%³Á %» b º ²&³Á &» b º ²%³Á &» b º ²&³Á %» ~ º ²%³Á &» b º ²&³Á %» ~ º ²%³Á &» b º%Á ²&³» ~ º ²%³Á &» b º ²%³Á &» ~º²%³Á&» and so  ~ . 210 Advanced Linear Algebra

For part 8), if d is Hermitian (- ~ ) and ²#³ ~ # then

º#Á#»~º ²#³Á#»~8 ²#³ is real by part 5) and so  must be real. If is symmetric (-~ s ) , we must be a bit careful, since if  is a complex root of *²%³ , it does not follow that ²#³ ~ # for some  £ #  = . However, we can proceed as follows. Let  be represented by the matrix (= , with respect to some ordered basis for . Then *²%³~*²%³ ( . Now, ( is a real symmetric matrix, but can be thought of as a complex Hermitian matrix that happens to have real entries. As such, it represents a Hermitian linear operator on the complex space d and so, by what we have just shown, all (complex) roots of its characteristic polynomial are real. But the characteristic polynomial of (( is the same, whether we think of as a real or a complex matrix and so the result follows. Unitary Operators and Isometries We now turn to the basic properties of unitary operators. These are the workhorse operators, in that a unitary operator is precisely a normal operator that maps orthonormal bases to orthonormal bases.

Note that  is unitary if and only if º ²#³Á $» ~ º#Ác ²$³» for all #Á $  = .

Theorem 10.7 Let K be the set of unitary operators on a finite-dimensional inner product space = . Then K satisfies the following properties: 1) ()Closure under scalar multiplication by complex numbers of norm  dKK Á(( ~ and  ¬  2) ()Closure under multiplication Á K ¬   K 3) ()Closure under inverses K¬ c  K 4)  is unitary/orthogonal if and only it is an isometric isomorphism. 5)  is unitary/orthogonal if and only if it takes an orthonormal basis to an orthonormal basis. 6) If  is unitary/orthogonal then the eigenvalues of have absolute value  . Proof. We leave the proofs of 1)–3) to the reader. For part 4), a unitary/orthogonal map is injective and since the range and domain have the same finite dimension, it is also surjective. Moreover, for a bijective linear map , we have Structure Theory for Normal Operators 211

 is an isometry¯ º ²#³Á ²$³» ~ º#Á $» for all #Á $  = ¯ º#Ái ²$³» ~ º#Á $» for all #Á $  = ¯²$³~$$=i for all ¯~i  ¯~ic ¯  is unitary/orthogonal

For part 5), suppose that E is unitary/orthogonal and that ~¸"ÁÃÁ" ¹ is an orthonormal basis for = . Then

º ²"³Á ²"³»~º"Á"»~ Á  and so E²³ is an orthonormal basis for = . Conversely, suppose that E and E²³ are orthonormal bases for = . Then

º ²"³Á ²"³»~  Á ~º"Á"» Using the conjugate linearity/bilinearity of the inner product, we get º ²#³Á ²$³» ~ º#Á $» and so  is unitary/orthogonal.

For part 6), if  is unitary and ²#³ ~ # then º#Á #» ~ º  #Á  #» ~ º  ²#³Á  ²#³» ~ º#Á #» and so (( ~~ , which implies that ((  ~ .

We also have the following theorem concerning unitary (and orthogonal) matrices.

Theorem 10.8 Let (d be an matrix. 1) The following are equivalent: a) ( is unitary b) The columns of ( form an orthonormal set in d . c) The rows of ( form an orthonormal set in d . 2) If (²(³~( is unitary then ((det . In particular, if is orthogonal then det²(³ ~ f . Proof. The matrix (((~0 is unitary if and only if i , which is equivalent to saying that the rows of (( are orthonormal. Similarly, is unitary if and only if ((~0i , which is equivalent to saying that the columns of ( are orthonormal. As for part 2), we have

((ii ~ 0 ¬det ²(³ det ²( ³ ~  ¬ det ²(³ det ²(³ ~  from which the result follows.

Unitary/orthogonal matrices play the role of change of basis matrices when we restrict attention to orthonormal bases. Let us first note that if 8 ~²"ÁÃÁ" ³ 212 Advanced Linear Algebra is an ordered orthonormal basis and

#~ " bÄb  " $~ " bÄb  " then

º#Á$»~  bÄb   ~´#µ88 h´$µ and so # ž $ if and only if ´#µ88 ž ´$µ .

We can now state the analog of Theorem 2.13.

Theorem 10.9 If we are given any two of the following: 1) A unitary/orthogonal d matrix ( . 2) An ordered orthonormal basis 8 for -  . 3) An ordered orthonormal basis 9 for -  . then the third is uniquely determined by the equation

(~489Á

Unitary Similarity We have seen that the change of basis formula for operators is given by

c ´µ88Z ~7´µ7 where 7 is an invertible matrix. What happens when the bases are orthonormal?

Definition Two complex matrices () and are unitarily similar (also called unitarily equivalent) if there exists a unitary matrix < for which ) ~ <(

The analog of Theorem 2.19 is the following.

Theorem 10 .10 Let = be an inner product space of dimension . Then two d matrices ( and ) are unitarily/orthogonally similar if and only if they represent the same linear operator B²=³ , but possibly with respect to different ordered orthonormal bases. In this case, () and represent exactly the Structure Theory for Normal Operators 213 same set of linear operators in B²= ³ , when we restrict attention to orthonormal bases. Proof. If () and represent B ²=³ , that is, if

(~´µ89 and )~´µ for ordered orthonormal bases 89 and then

)~489ÁÁ (4 98 and according to Theorem 10.9, 4()89Á is unitary/orthogonal. Hence, and are unitarily/orthogonally similar.

Now suppose that () and are unitarily/orthogonally similar, say )~<(

(~´ µ8 Theorem 10.9 implies that there is a unique ordered orthonormal basis 9 for = for which <~489Á . Hence

c )~489Á ´ µ 8 489Á ~´ µ 9 Hence, )() also represents  . By symmetry, we see that and represent the same set of linear operators, under all possible ordered orthonormal bases.

Unfortunately, canonical forms for unitary similarity are rather complicated and not well discussed. We have shown in Chapter 8 that any complex matrix ( is unitarily similar to an upper triangular matrix, that is, that ( is unitarily upper triangularizable. (This is Schur's lemma.) However, just as in the nonunitary case, upper triangular matrices do not form a canonical form for unitary similarity. We will soon show that every complex normal matrix is unitarily diagonalizable. However, we will not discuss canonical forms for unitary similarity in this book, but instead refer the reader to the survey article [Sh]. Reflections The following defines a very special type of unitary operator.

Definition For a nonzero #= , the unique operator /# for which

/## # ~ c#Á ²/ ³Oº#»ž ~  is called a reflection or a Householder transformation . 214 Advanced Linear Algebra

It is easy to verify that º%Á #» /²%³~%c # # º#Á #»

Note also that if  is a reflection then ~/% if and only if  ²%³~c% . For if ~/# and ²%³~c% then we can write %~#b' where 'ž# and so

c²# b '³ ~ c% ~ ²%³ ~ /# ²# b '³ ~ c# b ' which implies that '~ , whence /#% ~/%c ~/ .

If /## is reflection and we extend to an ordered orthonormal basis 8 for = then ´/# µ8 is the matrix obtained from the identity matrix by replacing the ²Á ³ entry by c . Thus, we see that a reflection is unitary, Hermitian and idempotent  ()./~# 

Theorem 10.11 Let #Á $  = with )) # ~ ) $ ) £  . Then /#c$ is the unique reflection sending #$ to , that is, /#c$ ²#³~$ . Proof. If ))#~ ) $ ) then ²c³ž²b³ #$ #$ and so

/²c³~c#$c #$ $# /²b³~b#$c #$ #$ from which it follows that /²³~#$c%#$ . As to uniqueness, suppose / is a c reflection for which /²³~%%%#$ . Since /% ~/ , we have /² $# ³~ and so

/% ²#$ c ³ ~ c²# c $³ which implies that /~/%#c$.

Reflections can be used to characterize unitary operators.

Theorem 10.12 An operator  on a finite-dimensional inner product space = is unitary (for -~ds ) or orthogonal (for -~ ) if and only if it is a product of reflections. Proof. Since reflections are unitary (orthogonal) and the product of unitary (orthogonal) operators is unitary, one direction is easy.

For the converse, let 8 be unitary. Let ~²"" ÁÃÁ ³ be an orthonormal basis for =²³ . Hence 8 is also an orthonormal basis for = . We make repeated use of the fact that /²%³~&%c& . For example, if

%²""~³c then

²/%  ³²"" ³ ~ 

and so /%  is the identity on º" » . Next, if Structure Theory for Normal Operators 215

%% ~ ²/  ³²" ³ c " then

²/%% / ³²""  ³ ~  Also, we claim that . Since c , it follows that %ž" /"~/"~²"³ % %  

º²/%% ³²"" ³ c " Á " » ~ º²/ ³² ³Á " »

~º ²"% ³Á/ "» ~º ²" ³Á ²"³» ~º" Á"» ~ Hence

²/%% / ³²"  ³ ~ / %  ²"  ³ ~ " 

and so //%% is the identity on º" Á " » . Now let us generalize.

Assume that for  € , we have found reflections /%%c ÁÃÁ/  for which

/Ä/%%c   is the identity on ºÁÃÁ"» uc . If

%% ~ ²/c Ä/ %   ³²" ³ c " then

²/%% Ä/ ³²"" ³ ~ 

Also, we claim that %ž" for all  . Since / %c Ä/"~ %  ²" ³ it follows that

º²/%%%%c Ä/  ³²"" ³ c " Á " » ~ º²/ c Ä/  ³² ³Á " »

~º ²"% ³Á/c Ä/ %  "» ~º ²" ³Á ²"³» ~º" Á"» ~ Hence

²/%% Ä/ ³²"" ³ ~ / %  ²" ³ ~

and so /Ä/%% is the identity on º" ÁÃÁ" » .

Thus, for ~ we have /Ä/%% ~ and so  ~ /Ä/ %% , as desired . The Structure of Normal Operators We are now ready to consider the structure of a normal operator  on a finite- dimensional inner product space. According to Theorem 10.5, the minimal 216 Advanced Linear Algebra polynomial of  has the form

  ²%³ ~  ²%³Ä ²%³ where the  's are distinct monic irreducible polynomials.

If -~d , then each ²%³ is linear. Theorem 8.11 then implies that is diagonalizable and

=~;; lÄl where ÁÃÁ are the distinct eigenvalues of  . Theorem 10.5 also tells us that if £ž then ; ; and so

=~;; pÄp This is equivalent to saying that = has an orthonormal basis of eigenvectors of .

The converse is also true, that is, if 8 ~²"ÁÃÁ" ³ is an ordered orthonormal basis of eigenvectors of  then

i º "Á"»~º"ÁÁ "»~º"Á "»~ ~º "Á"»

i and so "~ " . It follows that ii "~   "~   "  and so  is normal.

Theorem 10.13 (The structure theorem for normal operators: complex case) Let = be a finite-dimensional complex inner product space. Then a linear operator 8 on == is normal if and only if has an orthonormal basis consisting entirely of eigenvectors of  , that is

=~;; pÄp  where ¸ÁÃÁ¹ is the spectrum of  . Put another way,  is normal if and only if it is unitarily diagonalizable.

Now let us consider the real case, which is more complex than the complex case. However, we can take advantage of the corresponding result for -~d , by using the complexification process (which we will review in a moment).

First, let us observe that when -~s , the minimal polynomial of is a product of distinct real linear and real quadratic factors, say

 ²%³ ~ ²% c  ³Ä²% c ³ ²%³Ä ²%³ where the  's are distinct and the  ²%³ 's are distinct real irreducible quadratics. Structure Theory for Normal Operators 217

Hence, according to part 9) of Theorem 10.5, the primary decomposition of = has the form

=~;; pÄp  p>pÄp> where ¸ÁÃÁ¹ is the spectrum of  and where

> ~ ¸#  = “  ² ³# ~ ¹ are  -invariant subspaces. Accordingly, we can focus on the subspace >~>pÄp> , upon which  is normal, with a minimal polynomial that is the product of distinct irreducible quadratic factors.

Let us briefly review the complexification process. Recall that if = is a real vector space, then the set =~¸"b#“"Á#=¹d is a complex vector space under addition and scalar multiplication “borrowed” from the field of complex numbers, that is, ²"b#³b²%b&³~²"b%³b²#b&³ ²b³²"b#³~²"c#³b²#b"³

Recall also that the complexification map cpx¢= ¦=d defined by cpx²#³ ~ # b  is an injective linear transformation from the real vector space = to the real dd version ²= ³s of the complexification = .

If 8s8~¸#“0¹ is a basis for = over , then the complexification of

cpx²³~¸#b“#88 ¹ is a basis for the vector space = d over d and so dim²=d ³ ~ dim ²= ³

For any linear operator  on == , we can define a linear operator dd on by d²" b #³ ~ ²"³ b ²#³ Note that ²³~ddd  

Also, if ²%³ is a real polynomial then ²dd ³ ~ ´² ³µ . 218 Advanced Linear Algebra

For any ordered basis 8 of = , we have

d ´µcpx²³8 ~´µ8 Hence, if a real matrix (=( represents a linear operator  on then also represents the complexification of  on = d . In particular, the polynomial ²%³ ~det ²%0 c (³ is the characteristic polynomial of both  and d À

If = is a real inner product space, then we can define an inner product on the complexification = d as follows (this is the same formula as for the ordinary inner product on a complex vector space) º" b #Á % b &» ~ º"Á %» b º#Á &» b ²º#Á %» c º"Á &»³ From this, it follows that if "Á %  = then º"dd Á % » ~ º"Á %» and, in particular, "ž% in = if and only if "dd ž% in = d .

Next, we have ²³~²³iidd , since º" b #Á ²iii ³d ²% b &³» ~ º" b #Á ²%³ b ²&³» ~ º"Áii ²%³» b º#Á ²&³» b ²º#Á  ii ²%³» c º"Á ²&³»³ ~ º ²"³Á %» b º ²#³Á &» b ²º  ²#³Á %» c º ²"³Á &»³ ~º ²"³b ²#³Á % b &» ~º d ²"b#³Á%b&» ~º"b#Á² d ³i ²%b&³»

It follows that  is normal if and only if d is normal.

Now consider a normal linear operator  on a real vector space = and suppose that the minimal polynomial ²%³ of  is the product of distinct irreducible quadratic factors

 ²%³ ~  ²%³Ä ²%³

Hence, ²%³ has distinct roots, all of which are nonreal, say

ÁÁÃÁÁ  

d and since the characteristic polynomial ²%³ of  and is a multiple of  ²%³ , these scalars are characteristic roots of  d .

Also, since ²%³ is real, it follows that

dd ² ³~´²³µ~ Structure Theory for Normal Operators 219

d and so ²%³“²%³ d  . However, the eigenvalues  Á of  are roots of ddd ²%³ and so  ²%³ “  ²%³ . Thus,  ²%³ ~  ²%³ .

Since ²%³ d is the product of distinct linear factors over d , we deduce immediately that  dd is diagonalizable. Hence, = has a basis of eigenvectors of  d, that is, =~d ;; p pÄpp ;;  Note also that since  d is normal, these eigenspaces are orthogonal under the inner product on = d .

Let us consider a particular eigenvalue pair  and and the subspace ;; p  . (For convenience, we have dropped the subscript.) Suppose that  ~b and that

E ~²" b# ÁÃÁ"  b# ³ is an ordered orthonormal basis for ; . Then for any ~ÁÃÁ , d  ²" b # ³ ~ ² b ³²"  b # ³ and so

²" ³ ~ " c # ²# ³ ~ " b # It follows that

d ²" c # ³ ~ ²"  ³ c ²#  ³ ~ " c # c ²"  b # ³ ~ ² c ³²" c # ³

~²"c#³ 

d which shows that "c# is an eigenvector for  associated with and so

E;~²" c# ÁÃÁ"  c# ³‹  But the set E is easily seen to be linearly independent and so dim²³‚;; dim ²³ . Using the same argument with  replaced by  , we see that this inequality is an equality. Hence E; is an ordered orthonormal basis for  .

It follows that

;; p~

< ~span ²" b # Á " c # ³ 220 Advanced Linear Algebra

is two-dimensional, because the eigenvectors "b# and "c#  are associated with distinct eigenvalues and are therefore linearly independent.

Hence, = d is the orthogonal direct sum of  -invariant two-dimensional subspaces

d =~

d where ~²=³~²=³dim dim and where each subspace < has the property that

²" ³ ~  " c  # ²# ³ ~  " b  # and where the scalars  ~ b range over the distinct eigenvalues d ÁÁÃÁÁ  of .

Now we wish to drop down to = . For each ~ÁÃÁ , let : ~span ²"Á# ³ be the subspace of = spanned by the real and imaginary parts of the eigenvectors "b# and "c#  that span <  . To see that :  is two- dimensional, consider its complexification

d :~¸%b&“%Á&:¹ 

d Since <‹:  , we have

d  ~dim ²< ³  dim ²: ³ ~ dim ²: ³  

d (This can also be seen directly by applying  to the equation " b # ~  and solving the resulting pair of equations in "# and .)

d Next, we observe that if %: and &: with £ , then since % <  and ddd & <, we have % ž& and so %ž& . Thus, : ž: .

In summary, if 8~²"Á#³ , then the subspaces :  are two-dimensional, - invariant and pairwise orthogonal, with matrix

c ´µ 8 ~>? 

It follows that :pÄp:‹= but since the dimensions of both sides are equal, we have equality

=~:pÄp: Theorem 10.14 ()The structure theorem for normal operators: real case Let == be a finite-dimensional real inner product space. A linear operator  on is normal if and only if Structure Theory for Normal Operators 221

=~;; pÄp p:pÄp: where ¸ÁÃÁ¹ is the spectrum of  and each :  is a two-dimensional  - invariant subspace for which there exists an ordered basis 8~²"Á#³ for which

c ´µ 8 ~>?  for Á s. Proof. We need only show that if = has such a decomposition, then  is normal. ! But since ´ µ88 ²´ µ ³ ~ ² b  ³0 , it is clear that ´  µ 8  is real normal. it follows easily that  is real normal.

The following theorem includes the structure theorems stated above for the real and complex cases, along with some further refinements related to self-adjoint and unitary/orthogonal operators.

Theorem 10.15 ()The structure theorem for normal operators 1) ()Complex case Let = be a finite-dimensional complex inner product space. Then a) An operator  on == is normal if and only if has an orthonormal basis 8 consisting entirely of eigenvectors of , that is

=~;; pÄp 

where ¸ÁÃÁ¹ is the spectrum of  . Put another way,  is normal if and only if it is unitarily diagonalizable. b) Among the normal operators, the Hermitian operators are precisely those for which all complex eigenvalues are real. c) Among the normal operators, the unitary operators are precisely those for which all eigenvalues have norm  . 2) ()Real case Let = be a finite-dimensional real inner product space. Then a) A linear operator  on = is normal if and only if

=~;; pÄp p:pÄp:

where ¸ÁÃÁ¹ is the spectrum of  and each :  is a two- dimensional  -invariant subspace for which there exists an ordered basis 8~²"Á#³ for which

c ´µ 8 ~>? 

for Á s. b) Among the real normal operators, the symmetric operators are precisely those for which there are no subspaces < in the 222 Advanced Linear Algebra

decomposition of part 2b). Hence, an operator is symmetric if and only if it is orthogonally diagonalizable. c) Among the real normal operators, the orthogonal operators are precisely those for which the eigenvalues are equal to f and the

matrices ´µ 8 described in part 2a) have rows (and columns) of norm , that is, sin c cos ´µ ~ 8 >?cos sin

for some s . Proof. We have proved part 1a). As to part 1b), it is only necessary to look at the matrix ( of 8 with respect to a basis consisting of eigenvectors for  . This matrix is diagonal and so it is Hermitian ((~(i ) if and only if the diagonal entries are real. Similarly, ((~( is unitary (c i ) if and only if the diagonal entries have absolute value equal to  .

We have proved part 2a). Parts 2b) and 2c) are seen to be true by looking at the ! matrix (~´ µ8 , which is symmetric ( (~( ) if and only if ( is diagonal and (~f4 is orthogonal if and only if  and the matrices have orthonormal rows. Matrix Versions We can formulate matrix versions of the structure theorem for normal operators.

Theorem 10.16 ()The structure theorem for normal matrices 1) ()Complex case a) A complex matrix (< is normal if and only if there is a unitary matrix for which

c <(< ~diag ² ÁÃÁ ³

where ¸ÁÃÁ¹ is the spectrum of  . Put another way, ( is normal if and only if it is unitarily diagonalizable . b) A complex matrix ( is Hermitian if and only if 1a) holds where all eigenvalues  are real. c) A complex matrix ( is unitary if and only if 1a) holds where all eigenvalues  have norm  . 2) ()Real case a) A real matrix ( is real normal if and only if there is an orthogonal matrix 66(6 for which c has the block diagonal form

c c   c 6(6 ~diag67 ÁÃÁ Á>?>? ÁÃÁ    

b) A real matrix ( is symmetric if and only if there is an orthogonal matrix 66(6 for which c has the block diagonal form Structure Theory for Normal Operators 223

c 6(6 ~diag ² Á Ã Á ³ That is, a real matrix ( is symmetric if and only if it is orthogonally diagonalizable. c) A real matrix ( is orthogonal if and only if there is an orthogonal matrix 66(6 for which c has the block diagonal form 6(6c

sin cc cos sin  cos ~diag67 ÁÃÁ Á>?>? ÁÃÁ cos  sin cos  sin

for some sÁÃÁ  . Orthogonal Projections We now wish to characterize unitary diagonalizability in terms of projection operators.

Definition Let =~:p:žž . The projection map  ¢=¦: on : along : is called orthogonal projection onto : . Put another way, a projection map  is an orthogonal projection if im²³žker ²³ .

Note that some care must be taken to avoid confusion between the term orthogonal projection and the concept of projections that are orthogonal to each other, that is, for which ~~  .

We saw in Chapter 8 that an operator  is a projection operator if and only if it is idempotent. Here is the analogous characterization of orthogonal projections.

Theorem 10.17 Let = be a finite-dimensional inner product space. The following are equivalent for an operator  on = : 1)  is an orthogonal projection 2)  is idempotent and self-adjoint 3)  is idempotent and does not expand lengths, that is ))))²#³  # for all #= . Proof. To see that 1) and 2) are equivalent, we have ~¯²³~²³iiim  im  and ker ²³~²³  ker  i ¯im ²³~ker ²³žž and ker ²³~  im ²³ ¯²³ž²³im ker To prove that 1) implies 3), note that # ~ ²#³ b ' where ' ker ² ³ and since ²#³ ž ' we have 224 Advanced Linear Algebra

))))))))# ~ ²#³ b ' ‚ ²#³ from which the result follows.

Now suppose that 3) holds. We know that =~im ²³lker ²³ and wish to show that this sum is orthogonal. According to Theorem 9.13, it is sufficient to show that im²³‹²³keržž . Let $²³ im  . Then since =~²³p²³ ker  ker , we have $~%b& , where %²³&²³ker and ker ž and so $~²$³~²%³b²&³~²&³ Hence

))% b )) & ~ ) $ )  ~ ) ²&³ )   )) &  which implies that ))%~ and hence that %~ . Thus, $~&ker ²³ ž and so im²³‹ker ²³ž , as desired.

Note that for an orthogonal projection  , we have º#Á ²#³» ~ º#Á ²#³» ~ º ²#³Á ²#³» ~)) ²#³  Next we give some additional properties of orthogonal projections.

Theorem 10.18 Let = be an inner product space over a field of characteristic£ . Let  ,  ÁÃÁ  and  be projections, each of which is orthogonal. Then 1) ~ if and only if  ~ . 2) bžb is an orthogonal projection if and only if  , in which case  is projection on im²³p im ²³ along ker ²³q  ker ²³ . 3) ~bÄb  is an orthogonal projection if and only if   ž  for all £. 4) c is an orthogonal projection if and only if ~~   in which case c²³q²³²³p²³ is projection on im ker  along ker  im  . 5) If ~  then  is an orthogonal projection. In this case,  is projection on im²³q im ²³ along ker ²³p  ker ²³  . 6) a) ižž is orthogonal projection onto ker²³ along im ²³ b) iž is orthogonal projection onto ker²³  along ker ²³  c) iž is orthogonal projection onto im²³  along im ²³  Proof. We prove only part 3). If the  's are orthogonal projections and if ž for all £ then   ~ for all £ and so it is straightforward to check that i~~ and that  . Hence,  is an orthogonal projection. Conversely, suppose that  is an orthogonal projection and that %im ² ³ for a fixed ²%³~% . Then  and so Structure Theory for Normal Operators 225

))% ‚ ) ²%³ ) ~ º ²%³Á ²%³» ~ º ²%³Á %»  ~ º ²%³Á %» ~)))))) ²%³ ‚ ²%³ ~ %  which implies that ²%³ ~  for  £  . In other words,

ž im²³‹ker ²³~²³ im Therefore,

 ~ º ²#³Á ²$³» ~ º   ²#³Á $» for all #Á $  = , which shows that  ~  , that is,   ž   .

For orthogonal projections  and , we can define a partial order by defining ²³‹²³ to be im  im  . it is easy to verify that this is a reflexive, antisymmetric, transitive relation on the set of all orthogonal projections. Furthermore, we have the following characterizations.

Theorem 10.19 The following statements are equivalent for orthogonal projections  and : 1)  2) ~  3) ~  4) ))))²#³  ²#³ for all #  = . 5) 8²#³‚c , for all #= . Proof. First, we show that 2) and 3) are equivalent. If 2) holds then ~~²³~~ ii   i  i  and so 3) holds. Similarly, 3) implies 2). Next, note that 4) and 5) are equivalent, since

8c ²#³ ~ º ²#³Á #» c º ²#³Á #» ~ º ²#³Á ²#³» c º  ²#³Á ²#³» ~)))) ²#³ c ²#³

Now, 2) is equivalent to the statement that  fixes each element of im²³ , which is equivalent to the statement that im²³‹ im ²³ , which is 1). Hence, 1)–3) are equivalent.

Finally, if 2) holds, then 3) also holds and so by Theorem 10.18, the difference c is an orthogonal projection, from which it follows that

8c ²#³ ~ º² c ³²#³Á #» ~ º²  c ³²#³Á ²  c ³²#³» ‚  which is 5). Also, if 4) holds, then for any # im ² ³ , we have #~%b& , where 226 Advanced Linear Algebra

%im ² ³Á&ker ² ³ and %ž& . Then, ))%b&~#~²#³²#³~% )) )) ) )  ) )  ))  and so &~ , that is, # im ² ³ . Hence, 1) holds. Orthogonal Resolutions of the Identity Recall that resolutions of the identity

bÄb ~ correspond to direct sum decompositions of the space = . It is the mutual orthogonality of the projections that gives the directness of the sum. If, in addition, the projections are themselves orthogonal, then the direct sum is an orthogonal sum.

Definition If bÄb ~ is a resolution of the identity and if each   is orthogonal, then we say that bÄb ~ is an orthogonal resolution of the identity.

The following theorem displays a correspondence between orthogonal direct sum decompositions of = and orthogonal resolutions of the identity.

Theorem 10.20 1) If bÄb ~ is an orthogonal resolution of the identity then

=~im ² ³pÄp im ² ³

2) Conversely, if =~:pÄp: and  is projection on :  along : pÄp:V pÄp: , where the hat ^ means that the corresponding term is missing from the direct sum, then

bÄb ~ is an orthogonal resolution of the identity. Proof. To prove 1) suppose that bÄb ~ is an orthogonal resolution of the identity. According to Theorem 8.17, we have

=~im ² ³lÄl im ² ³

However, since the  's are orthogonal, they are self-adjoint and so for £ ,

º ²#³Á ²$³»~º#Á   ²$³»~º#Á»~ Hence

=~im ² ³pÄp im ² ³

For the converse, we know from Theorem 8.17 that bÄb ~ is a resolution of the identity and we need only show that each  is an orthogonal Structure Theory for Normal Operators 227 projection. But this follows from the fact that

ž im² ³ ~ im ² ³pÄp im ²  cb ³p im ²  ³pÄp im ²  ³~ker ² ³

The Spectral Theorem We can now characterize the normal (unitarily diagonalizable) operators on a finite-dimensional complex inner product space using projections.

Theorem 10.21 ( The spectral theorem for normal operators ) Let  be an operator on a finite-dimensional complex inner product space = . The following statements are equivalent: 1)  is normal 2)  is unitarily diagonalizable, that is,

=~;; pÄp 3)  has the orthogonal spectral resolution

~bÄb   (10.1)

where dbÄb~ and where    is an orthogonal resolution of the identity. Moreover, if  has the form (10.1), where the  's are distinct and the 's are nonzero then the  's are the eigenvalues of and im²³ is the eigenspace associated with  . Proof. We have seen that 1) and 2) are equivalent. Suppose that  is unitarily diagonalizable. Let ; be orthogonal projection onto  . Then any #= can be written as a sum of orthogonal eigenvectors

#~# bÄb# and so

²#³ ~ # b Ä b   # ~ ²   b Ä b  ³²#³ Hence, 3) holds. Conversely, if (10.1) holds, we have

=~im ² ³pÄp im ² ³

But Theorem 8.18 implies that im²³~;  and so  is unitarily diagonalizable.

In the real case, we have the following.

Theorem 10.22 ( The spectral theorem for self-adjoint operators ) Let  be an operator on a finite-dimensional real inner product space = . The following statements are equivalent: 1)  is self-adjoint 228 Advanced Linear Algebra

2)  is orthogonally diagonalizable, that is,

=~;; pÄp 3)  has the orthogonal spectral resolution

~bÄb   (10.2)

where sbÄb~ and    is an orthogonal resolution of the identity. Moreover, if  has the form (10.2), where the  's are distinct and the 's are nonzero then the  's are the eigenvalues of and im²³ is the eigenspace associated with  . Spectral Resolutions and Functional Calculus Let  be a linear operator on a finite-dimensional inner product space = , and let  have spectral resolution

~bÄb  

 Since  is idempotent, we have  ~‚ for all . The mutual orthogonality of the projections means that ~ for all £ and so  ~² bÄb  ³ ~  bÄb  More generally, for any polynomial ²%³ over - , we have

² ³ ~ ² ³ b Ä b ²   ³ Now, we can extend this further by defining, for any function

¢¸ ÁÃÁ ¹ ¦ - the linear operator ² ³ by setting

² ³ ~ ² ³ bÄb²  ³ For example, we may define jÁÁ c  and so on. Notice, however, that since the spectral resolution of  is a finite sum, we gain nothing (but convenience) by using functions other than polynomials, for we can always find a polynomial ²%³ for which ² ³ ~ ² ³ for  ~ Á Ã Á  and so

² ³ ~ ²  ³ bÄb²   ³ ~ ²  ³ bÄb²   ³ ~ ² ³ The study of the properties of functions of an operator  is referred to as the functional calculus of  .

According to the spectral theorem, if =²³ is complex and  is normal then is a normal operator whose eigenvalues are ² ³ . Similarly, if = is real and is self-adjoint then ² ³ is self-adjoint, with eigenvalues ² ³ . Let us consider some special cases of this construction. Structure Theory for Normal Operators 229

If ²%³ is a polynomial for which

² ³~ Á for ~ÁÃÁ, then

² ³~ and so we see that each projection  in the spectral resolution is a polynomial function of  .

c If  is invertible then  £ for all  and so we may take ²%³~% , giving c c c ~bÄb  as can easily be verified by direct calculation.

If ² ³ ~ and if  is normal then each   is self-adjoint and so i ² ³ ~ bÄb   ~

Commutativity The functional calculus can be applied to the study of the commutativity properties of operators. Here are two simple examples.

Theorem 10.23 Let = be a finite-dimensional complex inner product space. For Á²=³ B , let us write  —  to denote the fact that  and  commute. Let  and have spectral resolutions

~bÄb   ~bÄb   Then 1) For any B²=³ , we have  — if and only if  — for all  . 2) ——Á if and only if   , for all . 3) If ¢¸ ÁÃÁ ¹¦- and ¢¸   ÁÃÁ ¹¦- are injective functions, then ² ³ — ² ³ if and only if  — . Proof. The proof is based on the fact that if  and are operators then — implies that ² ³ — ² ³ for any polynomials ²%³ and ²%³ , and hence ² ³ — ² ³ for any functions  and  .

For 1), it is clear that — for all implies that  — . The converse follows from the fact that  is a polynomial in . Part 2) is similar. For part 3), — clearly implies ² ³ — ² ³ . For the converse, let $ ~ ¸ ÁÃÁ ¹ . Since  is injective, the inverse function ¢²³¦c $$ is well-defined and 230 Advanced Linear Algebra

c ²² ³³ ~. Thus,  is a function of ²  ³ . Similarly,  is a function of ²  ³ . it follows that ² ³ — ² ³ implies  — .

Theorem 10.24 Let = be a finite-dimensional complex inner product space and let  and be normal operators on = . Then  and commute if and only if they have the form ~ ² ² Á ³³ ~ ² ² Á ³³ where ²%³Á ²%³ and ²%Á &³ are polynomials. Proof. If  and are polynomials in then they clearly commute. For the converse, suppose that ~  and let

~bÄb   and

~bÄb   be the orthogonal spectral resolutions of  and . Then according to Theorem 10.23, ~   . Now, let us choose any polynomial ²%Á &³ with the property that Á~ ²  Á  ³ are distinct. Since each   and  is self-adjoint, we may set ~ ²Á ³ and deduce (after some algebra) that

~ ²Á ³~ Á   Á

We also choose ²%³ and ²%³ so that ²Á ³ ~  for all  and ²  Á ³ ~  for all . Then

² ³ ~ ² Á ³    ~     ~ ²    ³²   ³ ~    ~  Á Á    and similarly, ²  ³ ~ . Positive Operators One of the most important cases of the functional calculus is when ²%³~j % . First, we need some definitions. Recall that the quadratic form associated with a linear operator  is

8 ²#³ ~ º ²#³Á #» Definition A self-adjoint linear operator B²=³ is 1) positive if 8²#³‚ for all #= 2) positive definite if 8²#³€ for all #£ .

Theorem 10.25 A self-adjoint operator  on a finite-dimensional inner product space is Structure Theory for Normal Operators 231

1) positive if and only if all of its eigenvalues are nonnegative 2) positive definite if and only if all of its eigenvalues are positive. Proof. If 8 ²#³ ‚  and  ²#³ ~ # then º ²#³Á#»~ º#Á#» and so ‚ . Conversely, if all eigenvalues of are nonnegative then we have

~bÄbÁ‚    and since ~bÄb  ,  º ²#³Á #» ~ º ²#³Á  ²#³» ~ ))  ²#³ ‚  Á  and so  is positive. Part 2) is proved similarly.

If  is a positive operator, with spectral resolution

~bÄbÁ‚    then we may take the positive square root of  ,

j~bÄbjj   where j is the nonnegative square root of .

It is clear that ²³~j and it is not hard to see that j is the only positive operator whose square is . In other words, every positive operator has a unique positive square root. Conversely, if  has a positive square root, that is, if ~  , for some positive operator  then is positive. Hence, an operator  is positive if and only if it has a positive square root.

If  is positive then j is self-adjoint and so

²³jji ~

Conversely, if ~ i for some operator  then  is positive, since it is clearly self-adjoint and º ²#³Á #» ~ ºi ²#³Á #» ~ º ²#³Á ²#³» ‚  Thus,  is positive if and only if it has the form ~ i for some operator .

Here is an application of square roots. 232 Advanced Linear Algebra

Theorem 10.26 If  and  are positive operators and ~  then  is positive. Proof. Since  is a positive operator, it has a positive square root j , which is a polynomial in  . A similar statement holds for . Therefore, since and commute, so do jj and . Hence,

²³~²³²³~jj j  j  

Since jj and are self-adjoint and commute, their product is self-adjoint and so  is positive. The Polar Decomposition of an Operator It is well known that any nonzero complex number ' can be written in the polar form '~  , where is a positive number and is real. We can do the same for any nonzero linear operator  on a finite-dimensional complex inner product space.

Theorem 10.27 Let  be a nonzero linear operator on a finite-dimensional complex inner product space = . Then 1) There exists a positive operator  and a unitary operator for which ~ . Moreover,  is unique and if  is invertible then  is also unique. 2) There exists a positive operator  and a unitary operator for which ~ . Moreover,  is unique and if  is invertible then  is also unique. Proof. Let us suppose for a moment that ~ . Then iiiic~² ³ ~ ~ and so ic~~   Also, if #= then ²#³ ~ ² ²#³³ These equations give us a clue as to how to define  and .

Let us define  to be the unique positive square root of the positive operator i . Then ))²#³ ~ º ²#³Á ²#³» ~ ºi ²#³Á #» ~ º  ²#³Á #» ~ )) ²#³ (10.3) Let us define  on im²³ by ² ²#³³ ~  ²#³ (10.4) for all #  = . Equation (10.3) shows that  ²%³ ~ ²&³ implies that  ²%³ ~ ²&³ and so this definition of  on im²³ is well-defined. Structure Theory for Normal Operators 233

Moreover,  is an isometry on im²³ , since (10.3) gives )² ²#³³ ))))) ~  ²#³ ~  ²#³

Thus, if 8~¸ÁÃÁ¹ is an orthonormal basis for im ² ³ , then 8² ³ ~ ¸  ² ³Á à Á  ² ³¹ is an orthonormal basis for  ²im ²  ³³ ~ im ²  ³ . Finally, we may extend both orthonormal bases to orthonormal bases for = and then extend the definition of  to an isometry on =~ , for which .

As for the uniqueness, we have seen that  must satisfy i~ and since  has a unique positive square root, we deduce that  is uniquely defined. Finally, if  is invertible then so is since ker²³‹ ker ²³ . Hence, ~ c is uniquely determined by  .

Part 2) can be proved by applying the previous theorem to the map  i , to get ~² ii ³ ~²  ³ i ~  c ~  where  is unitary.

We leave it as an exercise to show that any unitary operator  has the form ~, where is a self-adjoint operator. This gives the following corollary.

Corollary 10.27 (Polar decomposition ) Let  be a nonzero linear operator on a finite-dimensional complex inner product space. Then there is a positive operator  and a self-adjoint operator for which has the polar decomposition ~ Moreover,  is unique and if is invertible then  is also unique.

Normal operators can be characterized using the polar decomposition.

Theorem 10.28 Let ~ be a polar decomposition of a nonzero linear operator  . Then is normal if and only if ~ . Proof. Since ic~   ~  and icc~   ~    we see that  is normal if and only if ~c    234 Advanced Linear Algebra or equivalently, ~   (10.5) Now,  is a polynomial in  and is a polynomial in   and so (10.5) holds if and only if ~  . Exercises 1. Let B²<Á=³ . If  is surjective, find a formula for the right inverse of  in terms of i . If is injective, find a formula for the left inverse of  in terms of iii . Hint : Consider ker²³ and ker ²³ . 2. Let B²=³ where = is a complex vector space and let  ~²b³ii and ~ ²c³  

Show that  and are self-adjoint and that i ~b  and  ~c    What can you say about the uniqueness of these representations of  and  i? 3. Prove that all of the roots of the characteristic polynomial of a skew- Hermitian matrix are pure imaginary. 4. Give an example of a normal operator that is neither self-adjoint nor unitary. 5. Prove that if ))))²#³ ~i ²#³ for all #  = , where = is complex then  is normal. 6. a) Show that if  is a normal operator on a finite-dimensional inner product space then i ~²³ , for some polynomial ²%³-´%µ . b) Show that if  is normal and ~~ then ii . In other words, i commutes with all operators that commute with . 7. Show that a linear operator  on a finite-dimensional complex inner product space =: is normal if and only if whenever is an invariant subspace under , so is :ž . 8. Let = be a finite-dimensional inner product space and let  be a normal operator on = . a) Prove that if  is idempotent then it is also self-adjoint. b) Prove that if  is nilpotent then ~ . c) Prove that if ~ then  is idempotent. 9. Show that if  is a normal operator on a finite-dimensional complex inner product space then the algebraic multiplicity is equal to the geometric multiplicity for all eigenvalues of  . 10. Show that two orthogonal projections  and are orthogonal to each other if and only if im²³ž im ²³ . 11. Let  be a normal operator and let be any operator on = . If the eigenspaces of  are -invariant, show that  and  commute. Structure Theory for Normal Operators 235

12. Prove that if  and are normal operators on a finite-dimensional inner complex product space and if  ~~  for some operator then ii  . 13. Prove that if two normal d complex matrices are similar then they are unitarily similar, that is, similar via a unitary matrix. 14. If  is a unitary operator on a complex inner product space, show that there exists a self-adjoint operator  for which ~ . 15. Show that a positive operator has a unique positive square root. 16. Let  , be complex numbers, for ~ÁÃÁ . Construct a polynomial ²%³ for which ² ³ ~ for all  . 17. Prove that if  has a square root, that is, if ~  , for some positive operator  then is positive. 18. Prove that if  and if is a positive operator that commutes with both  and then    . 19. Does every self-adjoint operator on a finite-dimensional real inner product space have a square root?  20. Let d be a linear operator on and let ÁÃÁ be the eigenvalues of , each one written a number of times equal to its algebraic multiplicity. Show that

 i (( ²tr ³  where tr is the trace. Show also that equality holds if and only if  is normal. 21. If B²=³ where = is a real inner product space, show that the Hilbert space adjoint satisfies ²³~²³iidd . Part II—Topics Chapter 11 Metric Vector Spaces: The Theory of Bilinear Forms

In this chapter, we study vector spaces over arbitrary fields that have a bilinear form defined upon them.

Unless otherwise mentioned, all vector spaces are assumed to be finite- dimensional. The symbol -- denotes an arbitrary field and  denotes a finite field of size  . Symmetric, Skew-Symmetric and Alternate Forms We begin with the basic definition.

Definition Let = be a vector space over - . A mapping ºÁ »¢ = d = ¦ - is called a bilinear form if it is a linear function of each coordinate, that is, if º %b &Á'»~  º%Á'»b  º&Á'» and º'Á % b &» ~  º'Á %» b  º'Á &» A bilinear form is 1) symmetric if º%Á &» ~ º&Á %» for all %Á &  = . 2) skew-symmetric (or antisymmetric ) if º%Á &» ~ cº&Á %» for all %Á &  = . 240 Advanced Linear Algebra

3) alternate (or alternating ) if º%Á %» ~  for all %= . A bilinear form that is either symmetric, skew-symmetric, or alternate is referred to as an inner product and a pair ²=ÁºÁ»³ , where = is a vector space and ºÁ » is an inner product on = , is called a metric vector space or inner product space. If ºÁ » is symmetric then ²= Á ºÁ »³ (or just = ) is called an orthogonal geometry over - and if ºÁ» is alternate then ²=ÁºÁ»³ (or just = ) is called a symplectic geometry over - .

As an aside, the term symplectic , from the Greek for “intertwined” was introduced in 1939 by the famous mathematician Hermann Weyl in his book The Classical Groups, as a substitute for the term complex . According to the dictionary, symplectic means “relating to or being an intergrowth of two different minerals.” An example is ophicalcite , which is marble spotted with green serpentine.

Example 11.1 Minkowski space 44 is the four-dimensional real orthogonal geometry s with inner product defined by

º Á  » ~ º  Á  » ~ º33 Á  » ~  º44 Á  » ~ c º Á  » ~  for  £ 

 where ÁÃÁ 4 is the standard basis for s .

As is traditional, when the inner product is understood, we will use the phrase “let = be a metric vector space.”

The real inner products discussed in Chapter 9 are inner products in the present sense and have the additional property of being positive definite —a notion that does not even make sense if the base field is not ordered. Thus, a real inner product space is an orthogonal geometry. On the other hand, the complex inner products of Chapter 9, being sesquilinear, are not inner products in the present sense. For this reason, we prefer to use the term metric vector space rather than inner product space.

If :=: is a vector subspace of a metric vector space then inherits the metric structure from =:= . With this structure, we refer to as a subspace of .

The concepts of being symmetric, skew-symmetric and alternate are not independent. However, their relationship depends on the characteristic of the base field - , as do many other properties of metric vector spaces. In fact, the next theorem tells us that we do not need to consider skew-symmetric forms per Metric Vector Spaces: The Theory of Bilinear Forms 241 se, since skew-symmetry is always equivalent to either symmetry or alternateness.

Theorem 11.1 Let =- be a vector space over a field . 1) If char²- ³ ~  then symmetric¯ skew-symmetric alternate¬ skew-symmetric

2) If char²- ³ £  then alternate¯ skew-symmetric Also, the only form that is both alternate and symmetric is the zero form: º%Á&»~ for all %Á&= . Proof. First note that for any base field, if ºÁ » is alternate then  ~ º% b &Á % b &» ~ º%Á %» b º%Á &» b º&Á %» b º&Á &» ~ º%Á &» b º&Á %» Thus, º%Á &» b º&Á %» ~  or º%Á &» ~ cº&Á %» which shows that ºÁ » is skew-symmetric. Thus, alternate always implies skew- symmetric.

If char²- ³ ~  then c ~  and so the definitions of symmetric and skew- symmetric are equivalent. This proves 1). If char²- ³ £  and ºÁ » is skew- symmetric, then for any %  = , we have º%Á %» ~ cº%Á %» or º%Á %» ~  , which implies that º%Á%»~ . Hence, ºÁ» is alternate. Finally, if the form is alternate and symmetric, then it is also skew-symmetric and so º"Á #» ~ cº"Á #» for all "Á #  = and so º"Á #» ~  for all "Á #  = .

Example 11.2 The standard inner product on =²Á³ , defined by

²% ÁÃÁ% ³h²& ÁÃÁ& ³~% & bÄb% & is symmetric, but not alternate, since ²ÁÁÃÁ³h²ÁÁÃÁ³~£ 242 Advanced Linear Algebra

The Matrix of a Bilinear Form

If 8 ~²ÁÃÁ ³ is an ordered basis for a metric vector space = then the form ºÁ » is completely determined by the  d  matrix of values

48 ~ ²Á ³ ~ ²º  Á   »³ This is referred to as the matrix of the form ºÁ » with respect to the ordered basis 88 . We also refer to 4=8 as the matrix of with respect to and write 4²=³8 when the space needs emphasis.

Observe that multiplication of the coordinate matrix of a vector by 48 produces a vector of inner products, to wit, if %~'  then

vyº Á %» 4´%µ~88 Å wzº Á %» and

! ´%µ8 48 ~  º%Á  » Ä º%Á  »

It follows that if &~  then

vy  ! ´%µ8 488 ´&µ ~ º%Á  » Ä º%Á  » Å ~ º%Á &» wz 

! and this uniquely defines the matrix 488 , that is, if ´%µ8 (´&µ ~ º%Á &» then (~48.

Notice also that a form is symmetric if and only if the matrix 48 is symmetric, skew-symmetric if and only if 48 is skew-symmetric and alternate if and only if 48 is skew-symmetric and has 's on the main diagonal. The latter type of matrix is referred to as alternate .

Now let us see how the matrix of a form behaves with respect to a change of basis. Let 9 ~²ÁÃÁ ³ be an ordered basis for = . Recall from Chapter 2 that the change of basis matrix 498Á , whose th column is ´µ 8 , satisfies

´#µ8989 ~ 4Á ´#µ Hence,

! º%Á &» ~ ´%µ8 488 ´&µ !! ~ ²´%µ998 4Á ³48989 ²4Á ´&µ ³ !! ~ ´%µ998 ²4Á 48989 4Á ³´&µ Metric Vector Spaces: The Theory of Bilinear Forms 243 and so

! 4~4989898Á 44Á This prompts the following definition.

Definition Two matrices (Á ) C ²- ³ are said to be congruent if there exists an invertible matrix 7 for which (~7! )7 The equivalence classes under congruence are called congruence classes .

Let us summarize.

Theorem 11.2 If the matrix of a bilinear form on = with respect to an ordered basis 8 ~²ÁÃÁ ³ is

48 ~ ²º Á  »³ then

! º%Á &» ~ ´%µ8 488 ´&µ

Furthermore, if 9 ~²ÁÃÁ ³ is an ordered basis for = then ! 4~4989898Á 44Á where 498Á is the change of basis matrix from 98 to .

We have shown that if two matrices represent the same bilinear form on = , they must be congruent. Conversely, congruent matrices represent the same bilinear form on =)~4= . For suppose that 8 represents a bilinear form on , with respect to the ordered basis 8 and that (~7! )7 where 7 is nonsingular. We saw in Chapter 2 that there is an ordered basis 9 for = with the property that

7~498Á and so

! (~498Á 4898 4 Á

Thus, (~49 represents the same form with respect to 9 .

Theorem 11.3 Two matrices () and represent a bilinear form ºÁ»= on if and only if they are congruent, in which case they represent the same set of bilinear forms on = . 244 Advanced Linear Algebra

In view of the fact that congruent matrices have the same rank, we may define the rank of a bilinear form (or of = ) to be the rank of any matrix that represents that form. The Discriminant of a Form If () and are congruent matrices then det²(³ ~ det ²7! (7 ³ ~ det ²7 ³ det ²)³ and so det²(³ and det ²)³ differ by a square factor. The discriminant of a bilinear form is the set of all determinants of the matrices that represent the form under all choices of ordered bases. Thus, if det²(³ ~  for some matrix ( representing the form then the discriminant of the form is the set " ~- ~¸ “£ -¹

Quadratic Forms There is a close link between symmetric bilinear forms and another important type of function defined on a vector space.

Definition A quadratic form on a vector space =8¢=¦- is a map with the following properties: 1) For all -Á#= 8² #³ ~  8²#³ 2) The map

º"Á #»8 ~ 8²" b #³ c 8²"³ c 8²#³ is a (symmetric) bilinear form.

Thus, every quadratic form 8º"Á#» defines a symmetric bilinear form 8 . On the other hand, if char²- ³ £  and if ºÁ » is a symmetric bilinear form on = then we can define a quadratic form 8 by  8²%³ ~ º%Á %»  We leave it to the reader to verify that this is indeed a quadratic form. Moreover, if 8 is defined from a bilinear form in this way then the bilinear form associated with 8 is Metric Vector Spaces: The Theory of Bilinear Forms 245

º"Á #»8 ~ 8²" b #³ c 8²"³ c 8²#³  ~ º" b #Á " b #» c º"Á "» c º#Á #»   ~ º"Á #» b º#Á "» ~ º"Á #»  which is the original bilinear form. In other words, the maps ºÁ » ¦ 8 and 8 ¦ ºÁ »8 are inverses and so there is a one-to-one correspondence between symmetric bilinear forms on == and quadratic forms on . Put another way, knowing the quadratic form is equivalent to knowing the corresponding bilinear form.

Again assuming that char²- ³ £  , if 8 ~ ²# Á à Á # ³ is an ordered basis for an orthogonal geometry == and if the matrix of the symmetric form on is 48 ~ ²Á ³ then for % ~' %  #  ,

!  8²%³~ º%Á%»~ ´%µ8 488 ´%µ ~ Á %%   Á  and so 8²%³ is a homogeneous polynomial of degree 2 in the coordinates % . (The term “form” means homogeneous polynomial —hence the term quadratic form.) Orthogonality As we will see, not all metric vector spaces behave as nicely as real inner product spaces and this necessitates the introduction of a new set of terminology to cover various types of behavior. (The base field - is the culprit, of course.) The most striking differences stem from the possibility that º%Á %» ~  for a nonzero vector %= .

The following terminology should be familiar.

Definition A vector %&%ž&º%Á&»~ is orthogonal to a vector , written , if . A vector %= is orthogonal to a subset : of = , written %ž: , if º%Á »~ for all : . A subset : of = is orthogonal to a subset ; of = , written :ž; , if º Á !» ~  for all  : and !  ; . The orthogonal complement of a subset ?=? of a metric vector space , denoted by ž , is the subspace ?~¸#=“#ž?¹ž Note that regardless of whether the form is symmetric or alternate (and hence skew-symmetric), orthogonality is a symmetric relation, that is, %ž& implies &ž%. Indeed, this is precisely why we restrict attention to these two types of bilinear forms. We will have more to say about this issue momentarily. 246 Advanced Linear Algebra

There are two types of degenerate behaviors that a vector may possess: It may be orthogonal to itself or, worse yet, it may be orthogonal to every vector in = . With respect to the former, we have the following terminology.

Definition Let ²=ÁºÁ»³ be a metric vector space. 1) A nonzero %  = is isotropic (or null ) if º%Á %» ~  ; otherwise it is nonisotropic. 2) = is nonisotropic (also called anisotropic ) if it contains no isotropic vectors. 3) = is isotropic if it contains at least one isotropic vector. 4) == is totally isotropic (that is, symplectic) if all vectors in are isotropic.

With respect to the latter (and more severe) form of degeneracy, we have the following terminology.

Definition Let ²=ÁºÁ»³ be a metric vector space. 1) The set ==ž of all degenerate vectors is called the radical of and written rad²= ³ ~ = ž

2) = is nonsingular , or nondegenerate , if rad ²= ³ ~ ¸¹ . 3) = is singular , or degenerate , if rad ²= ³ £ ¸¹ . 4) =²=³~= is totally singular or totally degenerate if rad .

Let us make a few remarks about these terms. Some of the above terminology is not entirely standard, so care should be exercised in reading the literature. Also, it is not hard to see that a metric vector space = is nonsingular if and only if the matrix 48 is nonsingular, for any ordered basis 8 .

If ##- is an isotropic vector then so is for all . This can be expressed by saying that the set 0== of isotropic vectors in is a cone in .

With respect to subspaces, to say that a subspace := of is totally degenerate, for example, means that : is totally degenerate as a metric vector space in its own right and so each vector in :: is orthogonal to all other vectors in , not necessarily in = . In fact, we have

rad²:³ ~ :žž: ~ : q : = where the symbols žž:= and refer to the orthogonal complements with respect to := and , respectively. Metric Vector Spaces: The Theory of Bilinear Forms 247

Example 11.3 Recall that =²Á³ is the set of all ordered  -tuples, whose components come from the finite field - . It is easy to see that the subspace : ~ ¸Á Á Á ¹ of = ²Á ³ has the property that : ~ :ž . Note also that = ²Á ³ is nonsingular and yet the subspace : is totally singular.

The following result explains why we restrict attention to symmetric or alternate forms (which includes skew-symmetric forms).

Theorem 11.4 Let ºÁ » be a bilinear form on = . Then orthogonality is a symmetric relation, that is, %ž& ¬ &ž% (11.1) if and only if ºÁ » is either symmetric or alternate, that is, if and only if = is a metric vector space. Proof. It is clear that (11.1) holds if ºÁ » is symmetric. If ºÁ » is alternate then it is skew-symmetric and so (11.1) also holds. For the converse, assume that (11.1) holds.

Let us introduce the notation % › & to mean that º%Á &» ~ º&Á %» and the notation % › = to mean that º%Á #» ~ º#Á %» for all #  = . For %Á &Á '  = , let $ ~ º%Á &»' c º%Á '»& Then %ž$ and so by (11.1) we have $ž% , that is, º%Á &»º'Á %» c º%Á '»º&Á %» ~  From this we deduce that % › & ¬ º%Á &»²º'Á %» c º%Á '»³ ~  ¬º%Á&»~ or %›' for all '= It follows that %›&¬%ž& or ²%›= and &›=³ (11.2) Of course, %›% and so for all %= %ž% or %›= (11.3) Now we can show that if == is not symmetric, it must be alternate. If is not symmetric, there exist "Á#= for which "›#\\\ . Thus, "›= and #›= , which implies by (11.3) that "# and are both isotropic. We wish to show that all vectors in = are isotropic. 248 Advanced Linear Algebra

According to (11.3), if $›=\ , then $ is isotropic. On the other hand, if $›= then to see that $%& is also isotropic, we use the fact that if and are orthogonal isotropic vectors, then %c& is also isotropic.

In particular, consider the vectors $b" and " . We have seen that " is isotropic. The fact that $›= implies $›" and $›# and so (11.2) gives $ž" and $ž#. Hence, ²$b"³ž" . To see that $b" is isotropic, note that º$ b "Á #» ~ º"Á #» £ º#Á "» ~ º#Á $ b "» and so ²$b"³›=\ . Hence, (11.3) implies that $b" is isotropic. Thus, " and $b" are orthogonal isotropic vectors, and so $~²$b"³c" is also isotropic. Linear Functionals Recall that the Riesz representation theorem says that for any linear functional  on a finite-dimensional real or complex inner product space = , there is a vector 9= , which we called the Riesz vector for  , that represents  , in the sense that

²#³ ~ º#Á 9 » for all #= . A similar result holds for nonsingular metric vector spaces.

Let =-%= be a metric vector space over . Let and consider the “inner product on the right” map %¢= ¦- defined by

%²#³ ~ º#Á %» This is easily seen to be a linear functional and so we can define a function ¢= ¦=i by

²%³ ~ % The bilinearity of the form insures that  is linear and the kernel of is

ž ker² ³~¸%=“% ~¹~¸%=“º#Á%»~ for all #=¹~= Hence, if = is nonsingular then ker ² ³ ~ =ž ~ ¸¹ and so is injective. Moreover, since dim²= ³ ~ dim ²=i ³ , it follows that  is surjective and so is an isomorphism from == onto i . This implies that every linear functional on = has the form % , for a unique %= . We have proved the Riesz representation theorem for finite-dimensional nonsingular metric vector spaces.

Theorem 11.5 ( The Riesz representation theorem ) Let = be a finite- dimensional nonsingular metric vector space. The linear functional ¢= ¦=i defined by

²%³ ~ % Metric Vector Spaces: The Theory of Bilinear Forms 249

i where %²#³ ~ º#Á %» for all #  = , is an isomorphism from = to = . It follows i that for each = there exists a unique vector %= for which ~% , that is, ²#³ ~ º#Á %» for all #= .

The requirement that == be nonsingular is necessary. As a simple example, if is totally singular, then no nonzero linear functional could possibly be represented by an inner product.

We would like to extend the Riesz representation theorem to the case of subspaces of a metric vector space. The Riesz representation theorem applies to nonsingular metric vector spaces. Thus, if := is a nonsingular subspace of , the Riesz representation theorem applies to :: and so all linear functionals on have the form of an inner product by a (unique) element of : . This is nothing new.

As long as =: is nonsingular, even if is singular, we can still say something very useful. The reason is that any linear functional :i can be extended to a linear functional = on (perhaps in many ways) and since = is nonsingular, the extension = has the form of inner product by a vector in , that is, ²#³ ~ º#Á %» for some %= . Hence,  also has this form, where its “Riesz vector” is an element of =: , not necessarily . Here is the formal statement.

Theorem 11.6 Let =:= be a metric vector space and let be a subspace of . If either =: or is nonsingular, the linear transformation  ¢=¦:i defined by

²%³ ~%: O

i where %²#³ ~ º#Á %» , is surjective. Hence, for any linear functional   : there is a (not necessarily unique) vector %  = for which ² ³ ~ º Á %» . Moreover, if :%: is nonsingular then can be taken from , in which case it is unique. Orthogonal Complements and Orthogonal Direct Sums If : is a subspace of a real inner product space, then the projection theorem says that the orthogonal complement ::ž of is a true vector space complement of :, that is, =~:p:ž Hence, the term orthogonal complement is justified. However, in general metric vector spaces, an orthogonal complement may not be a vector space 250 Advanced Linear Algebra complement. In fact, Example 11.3 shows that we may have the opposite extreme, that is, :~:ž . As we will see, the orthogonal complement of : is a true complement if and only if : is nonsingular.

Definition A metric vector space = is the orthogonal direct sum of the subspaces :; and , written =~:p; if =~:l; and :ž;.

In a real inner product space, if =~:p; then ;~:ž . However, in a metric vector space in general, we may have a proper inclusion ;‰:žž . (In fact, : may be all of = .)

Many nice properties of orthogonality in real inner product spaces do carry over to nonsingular metric vector spaces. The next result shows that the restriction to nonsingular spaces is not that severe.

Theorem 11.7 Let = be a metric vector space. Then = ~rad ²= ³ p : where :²=³ is nonsingular and rad is totally singular. Proof. Ignoring the metric structure for a moment, we know that all subspaces of a vector space, including rad²= ³ , have a complement, say = ~ rad ²= ³ l : . But rad²= ³ ž : and so = ~ rad ²= ³ p : . To see that : is nonsingular, if %rad ²:³ then %ž: and so %ž= , which implies that % rad ²= ³ q : ~ ¸¹, that is, % ~  . Hence, rad ²:³ ~ ¸¹ and : is nonsingular.

Under the assumption of nonsingularity of = , we get many nice properties, just short of the projection theorem. The first property in the next theorem is key: It says that if := is a subspace of a nonsingular space , then the orthogonal complement of : always has the “correct” dimension, even if it is not well behaved with respect to its intersection with : , that is, dim²:ž ³ ~ dim ²= ³ c dim ²:³ just as in the case of a real inner product space.

Theorem 11.8 Let ==: be a nonsingular metric vector space and let be any subspace of = . Then 1) dim²:³ b dim ²:ž ³ ~ dim ²= ³ 2) if =~:b:žž then =~:p: 3) :~:žž 4) rad²:³ ~ rad ²:ž ³ 5) :: is nonsingular if and only if ž is nonsingular Metric Vector Spaces: The Theory of Bilinear Forms 251

Proof. For part 1), the map ¢= ¦:i of Theorem 11.6 is surjective and

ž ker² ³~¸%=“%: O ~¹~¸%=“º Á%»~ for all :¹~: Thus, the rank-plus-nullity theorem implies that dim²:iž ³ b dim ²: ³ ~ dim ²= ³ However, dim²:i ³ ~ dim ²:³ and so part 1) follows.

For part 2), we have using part 1) dim²= ³ ~ dim ²: b :ž ³ ~²:³b²:³c²:q:³dim dimžž dim ~²=³c²:q:³dim dim ž and so : q :ž ~ ¸¹ .

For part 3), part 1) implies that dim²:³ b dim ²:ž ³ ~ dim ²= ³ and dim²:žžž ³ b dim ²: ³ ~ dim ²= ³ and so dim²:žž ³ ~ dim ²:³ . But : ‹ : žž and so equality holds.

For part 4), we have rad²:³ ~ : q :žžžžž ~ : q : ~ rad ²: ³ and part 5) follows from part 4).

The previous theorem cannot in general be strengthened. Consider the two- dimensional metric vector space =~span²"Á #³ where º"Á "» ~ Á º"Á #» ~ Á º#Á #» ~  Let :~span ²"³ . Then :žž ~ span ²#³ . Now, : is nonsingular but : is singular and so 5) does not hold. Also, rad²:³ ~ ¸¹ and rad ²:žž ³ ~ : and so 4) fails. Finally, :~=£:žž and so 3) fails.

On the other hand, we should note that if :: is singular, then so is ž , regardless of whether =: is singular or nonsingular. To see this, note that if is singular then there is a nonzero rad²:³ ~ :q:žžž . Hence, :‹ : and so :žžž q: ~rad ²: ž ³ , which implies that : ž is singular.

Now let us state the projection theorem for arbitrary metric vector spaces. 252 Advanced Linear Algebra

Theorem 11.9 Let : be a subspace of a finite-dimensional metric vector space = . Then =~:p:ž if and only if : is nonsingular, that is, if and only if : q :ž ~ ¸¹ . Proof. If =~:p:ž then by definition of orthogonal direct sum, we have rad²:³ ~ : q :ž ~ ¸¹ and so : is nonsingular. Conversely, if : is nonsingular, then : q :ž ~ ¸¹ and so :p:ž exists. Now, the same proof used in part 1) of the previous theorem works if := is nonsingular (even if is singular). To wit, the map ¢= ¦:i of Theorem 11.6 is surjective and

ž ker² ³~¸%=“%: O ~¹~¸%=“º Á%»~ for all :¹~: Thus, the rank-plus-nullity theorem gives dim²:iž ³ b dim ²: ³ ~ dim ²= ³ But dim²:i ³ ~ dim ²:³ and so dim²: p :žž ³ ~ dim ²:³ b dim ²: ³ ~ dim ²= ³ It follows that =~:p:ž . Isometries We now turn to a discussion of structure-preserving maps on metric vector spaces.

Definition Let = and > be metric vector spaces. We use the same notation ºÁ » for the bilinear form on each space. A bijective linear map ¢= ¦> is called an isometry if º "Á #» ~ º"Á #» for all vectors "#= and in . If an isometry exists from => to , we say that = and >=š> are isometric and write . It is evident that the set of all isometries from == to forms a group under composition.

If == is a nonsingular orthogonal geometry, an isometry of is called an orthogonal transformation. The set E²= ³ of all orthogonal transformations on == is a group under composition, known as the orthogonal group of .

If == is a nonsingular symplectic geometry, an isometry of is called a symplectic transformation. The set Sp²= ³ of all symplectic transformations on == is a group under composition, known as the symplectic group of . Metric Vector Spaces: The Theory of Bilinear Forms 253

Here are a few of the basic properties of isometries.

Theorem 11.10 Let B²=Á>³ be a linear transformation between finite- dimensional metric vector spaces => and . 1) Let 8~¸#ÁÃÁ# ¹ be a basis for = . Then is an isometry if and only if is bijective and

º # Á # » ~ º#  Á # » for all Á  . 2) If =²-³£ is orthogonal and char then  is an isometry if and only if it is bijective and º ²#³Á ²#³» ~ º#Á #» for all #= . 3) Suppose that  is an isometry and =~:p:žž and >~;p; . If ²:³ ~ ; then ²:žž ³ ~ ; . In particular, if B  ²= ³ is an isometry and =~:p:žž then if : is  -invariant, so is : . Proof. We prove part 3) only. To see that ²:žž ³ ~ ; , if '  : ž and !  ; then since ; ~ ²:³ , we can write ! ~ ² ³ for some  : and so º ²'³Á!»~º²'³Á ² ³»~º'Á »~ whence ²:žž ³ ‹ ; . But since dim ² ²: ž ³³ ~ dim ²; ž ³ , we deduce that ²:žž ³ ~ ; . Hyperbolic Spaces A special type of two-dimensional metric vector space plays an important role in the structure theory of metric vector spaces.

Definition Let ="Á#= be a metric vector space. If have the property that º"Á "» ~ º#Á #» ~ Á º"Á #» ~  the ordered pair ²"Á #³ is called a hyperbolic pair . Note that º#Á "» ~  if = is an orthogonal geometry and º#Á "» ~ c if = is symplectic. In either case, the subspace / ~span ²"Á #³ is called a hyperbolic plane and any space of the form

> ~/ pÄp/ where each /²"Á#³ is a hyperbolic plane, is called a hyperbolic space . If  is a hyperbolic pair for /²"Á#ÁÃÁ"Á#³ then we refer to the basis for > as a hyperbolic basis. (In the symplectic case, the usual term is symplectic basis .)

Note that any hyperbolic space > is nonsingular. 254 Advanced Linear Algebra

In the orthogonal case, hyperbolic planes can be characterized by their degree of isotropy, so-to-speak. (In the symplectic case, all spaces are totally isotropic by definition.) Indeed, we leave it as an exercise to prove that a two-dimensional nonsingular orthogonal geometry == is a hyperbolic plane if and only if contains exactly two one-dimensional totally isotropic (equivalently: totally degenerate) subspaces. Put another way, the cone of isotropic vectors is the union of two one-dimensional subspaces of = . Nonsingular Completions of a Subspace Let <=< be a subspace of a nonsingular metric vector space . If is singular, it is of interest to find a minimal nonsingular extension of < , that is, minimal nonsingular subspace of =< containing . Such extensions of < are called nonsingular completions of < .

Theorem 11.11 (Nonsingular extension theorem ) Let = be a nonsingular metric vector space over -²-³£= . We assume that char when is orthogonal. 1) Let := be a subspace of . For each isotropic vector #:±:ž , there is a hyperbolic plane / ~span ²#Á '³ contained in :ž . Hence, / p : is an extension of :# containing . 2) Let <=<~²<³p>> be a subspace of and write rad where is nonsingular and ¸# ÁÃÁ# ¹ is a basis for rad ²<³ . Then there is a hyperbolic space /pÄp/ with hyperbolic basis ²#Á'ÁÃÁ#Á'³  for which

<~/pÄp/p> is a nonsingular extension of << , called a nonsingular completion of . Proof. For part 1), the nonsingularity of =:~:#¤: implies that žž and so is equivalent to #¤:žž . Hence, there is a vector %: ž for which º#Á%»£ . If = is symplectic then we can take ' ~ ²°º#Á %»³% . If = is orthogonal, let '~ #b %. The conditions defining ²#Á'³ as a hyperbolic pair are (since # is isotropic)  ~ º#Á '» ~ º#Á # b %» ~ º#Á %» and  ~ º'Á '» ~ º # b %Á # b %» ~  º#Á %» b  º%Á %» Since º#Á %» £  , the first of these equations can be solved for and since char²- ³ £ , the second can then be solved for . Thus, in either case, a vector '  :ž exists for which ²#Á '³ is a hyperbolic pair and span ²#Á '³ ‹ :ž .

For part 2), we proceed by indution on ~dim ² rad ²<³³ . If ~ then # is ž isotropic and #>±> . Hence, part 1) applied to :~> implies that there Metric Vector Spaces: The Theory of Bilinear Forms 255

is a hyperbolic plane / ~span ²# Á '³ for which / p > is an extension of > containing span²# ³ ~ rad ²< ³ . Hence, part 2) holds when  ~  .

Let us assume that the result is true when dim²²<³³rad and assume that dim²²<³³~rad . Let

< ~span ²# ÁÃÁ# ³p>

ž Then ##<±< is (still) isotropic and  . Hence, we may apply part 1) to the subspace :~< to deduce the existence of a hyperbolic plane ž /~span ²#Á'³ contained in < .

Now, since / is nonsingular, we have ž =~/p/  žžžž Since / ‹< , it follows that < ~< ‹/ . Thus, we may apply the ž induction hypothesis to ‹/  containing < . It follows that / p/ pÄp/ p> is the desired extension of < .

Note that if << is a nonsingular extension of then dim²< ³ ~ dim ²< ³ b dim ²rad ²< ³³

Theorem 11.12 Let =< be a nonsingular metric vector space and let be a subspace of = . The following are equivalent: 1) >< is a nonsingular completion of 2) >< is a minimal nonsingular extension of 3) >< is a nonsingular extension of and dim²> ³ ~ dim ²< ³ b dim ²rad ²< ³³ Moreover, any two nonsingular completions of < are isomorphic. Proof. If 1) holds and if <‹?‹> where ? is nonsingular, then we may apply the nonsingular extension theorem to But <>Z and have the same dimension and so must be equal. Hence, ?~> . Thus >< is a minimal nonsingular extention of and 2) holds. If 2) holds then we have <‹<‹> where < is a nonsingular completion of < . But the minimality of ><~> implies that and so 3) holds. If 3) holds then again we have <‹<‹> . But dim ²<³~ dim ²>³ and so >~< is a nonsingular completion of < and 1) holds. 256 Advanced Linear Algebra

If ?~>> p> and @ ~Z p> are nonsingular completions of < ~ > prad ²< ³ then >> and Z are hyperbolic spaces of the same dimension and are therefore isometric. It follows that ?@ and are isometric. Extending Isometries to Nonsingular Completions Let == and Z be isometric nonsingular metric vector spaces and let < ~ > prad ²< ³ be a subspace of = , with nonsingular completion < . If ¢< ¦ ²<³ is an isometry, then it is a simple matter to extend  to an isometry  from <²<³ onto a nonsingular completion of . To see this, let <~> p> where >²"Á#ÁÃÁ"Á#³ is nonsingular and   is a hyperbolic basis for > . Since ²" ÁÃÁ" ³ is a basis for rad ²<³ , it follows that ² ²"  ³ÁÃÁ ²"  ³³ is a basis for rad²²<³³ .

Now we complete ²< ³ ~ ²> ³ prad ²  ²> ³³ to get >²< ³ ~Z p ²> ³

Z where > has hyperbolic basis ² ²" ³Á' ÁÃÁ ²"  ³Á' ³ . To extend , simply set ²#³~' for all ~ÁÃÁ .

Theorem 11.13 Let == and Z be isometric nonsingular metric vector spaces and let <= be a subspace of , with nonsingular completion < . Any isometry ¢< ¦ ²<³ can be extended to an isometry from < onto a nonsingular completion of ²< ³ . The Witt Theorems: A Preview There are two important theorems that are quite easy to prove in the case of real inner product spaces, but require more work in the case of metric vector spaces in general. Let == and Z be nonsingular isometric metric vector spaces over a field -²-³£= . We assume that char if is orthogonal.

The Witt extension theorem says that if := is a subspace of and ¢: ¦ ²:³‹=Z is an isometry, then  can be extended to an isometry from == to Z . The Witt cancellation theorem says that if =~:p:žZžand = ~;p; then :š;¬:žž š; Metric Vector Spaces: The Theory of Bilinear Forms 257

We will prove these theorems in both the orthogonal and symplectic cases later in the chapter. For now, we simply want to show that it is easy to prove one Witt theorem from the other.

Suppose that the Witt extension theorem holds and assume that =~:p:žZžand = ~;p; and :š; . Then any isometry  ¢:¦; can be extended to an isometry from == to Zžžžž . According to Theorem 11.10, we have  ²:³~; and so :š; . Hence, the Witt cancellation theorem holds.

Conversely, suppose that the Witt cancellation theorem holds and let ¢: ¦ ²:³‹=Z be an isometry. Then we may extend  to a nonsingular completion of :: . Hence, we may assume that is nonsingular. Then =~:p:ž Since  is an isometry, ²:³ is also nonsingular and we can write =Zž ~ ²:³ p ²:³ Since :š ²:³ , Witt's cancellation theorem implies that :žž š ²:³ . If ¢:žž ¦ ²:³ is an isometry then the map  ¢= ¦= Z defined by ²" b #³ ~ ²"³ b ²#³ for ": and #:ž is an isometry that extends  . Hence Witt's extension theorem holds. The Classification Problem for Metric Vector Spaces The classification problem for a class of metric vector spaces (such as the orthogonal or symplectic spaces) is the problem of determining when two metric vector spaces in the class are isometric. The classification problem is considered “solved,” at least in a theoretical sense, by finding a set of canonical forms or a complete set of invariants for matrices under congruence.

To see why, suppose that 8¢= ¦> is an isometry and ~²# ÁÃÁ# ³ is an ordered basis for =~²²#³ÁÃÁ²#³³ . Then 9  is an ordered basis for > and

489 ²= ³ ~ ²º# Á # »³ ~ ²º ²#  ³Á ²#  ³»³ ~ 4 ²> ³ Thus, the congruence class of matrices representing = is identical to the congruence class of matrices representing > .

Conversely, suppose that => and are metric vector spaces with the same congruence class of representing matrices. Then if 8 ~²#ÁÃÁ# ³ is an ordered basis for = , there is an ordered basis 9 ~²$ ÁÃÁ$ ³ for > for which 258 Advanced Linear Algebra

²º# Á # »³ ~ 489 ²= ³ ~ 4 ²> ³ ~ ²º$  Á $  »³

Hence, the map ¢ = ¦ > defined by ²# ³ ~ $ is an isometry from = to > .

We have shown that two metric vector spaces are isometric if and only if they have the same congruence class of representing matrices. Thus, we can determine whether any two metric vector spaces are isometric by representing each space with a matrix and determining if these matrices are congruent, using a set of canonical forms or a set of complete invariants. Symplectic Geometry We now turn to a study of the structure of orthogonal and symplectic geometries and their isometries. Since the study of the structure (and the structure itself) of symplectic geometries is simpler than that of orthogonal geometries, we begin with the symplectic case. The reader who is interested only in the orthogonal case may omit this section.

Throughout this section, let = be a nonsingular symplectic geometry. The Classification of Symplectic Geometries Among the simplest types of metric vector spaces are those that possess an orthogonal basis, that is, a basis 8 ~¸"ÁÃÁ" ¹ for which º"Á"»~  when £. For in this case, we may write = as an orthogonal direct sum of one- dimensional subspaces

= ~span ²" ³ p Ä p span ²" ³ However, it is easy to see that a symplectic geometry = has an orthogonal basis if and only if it is totally degenerate. For if 8 is an orthogonal basis for = then º" Á " » ~  for  ~  since the form is alternate and for  £  since the basis is orthogonal. It follows that = is totally degenerate. Thus, no “interesting” symplectic geometries have orthogonal bases.

Thus, in searching for an orthogonal decomposition of = , we must look not at one-dimensional subspaces, but at two-dimensional subspaces. Given a nonzero "=, the nonsingularity of = implies that there must exist a vector #= for which º"Á#»~£ . Replacing # by c # , we have º"Á "» ~ º#Á #» ~ Á º"Á #» ~ and º#Á "» ~ c and so /~span ²"Á#³ is a hyperbolic plane and the matrix of / with respect to the hyperbolic pair 8 ~²"Á#³ is  4~ 8 >?c 

Since / is nonsingular, we can write Metric Vector Spaces: The Theory of Bilinear Forms 259

=~/p/ž where /ž is also nonsingular. Hence, we may repeat the preceding decomposition in /=ž , eventually obtaining an orthogonal decomposition of of the form

=~/p/pÄp/  where each / is a hyperbolic plane. This proves the following structure theorem for symplectic geometries.

Theorem 11.14 1) A symplectic geometry has an orthogonal basis if and only if it is totally degenerate. 2) Any nonsingular symplectic geometry = is a hyperbolic space, that is,

=~/p/pÄp/ 

where each /= is a hyperbolic plane. Thus, there is a basis for for which the matrix of the form is

vy x{c  x{ x{ x{ @~x{c   x{ x{Æ x{  wzc 

In particular, the dimenison of = is even. 3) Any symplectic geometry = has the form =~rad ²=³p> where > is a hyperbolic space and rad²= ³ is a totally degenerate space. The rank of the form is dim²³> and = is uniquely determined up to isometry by its rank and its dimension. Put another way, up to isometry, there is precisely one symplectic geometry of each rank and dimension.

Symplectic forms are represented by alternate matrices, that is, skew-symmetric matrices with zero diagonal. Moreover, according to Theorem 11.14, each d alternate matrix is congruent to a matrix of the form

@ ?~Ác >? 0c block

Since the rank of ?Ác is , no two such matrices are congruent. 260 Advanced Linear Algebra

Theorem 11.15 The set of d matrices of the form ?Ác is a set of canonical forms for alternate matrices under congruence.

The previous theorems solve the classification problem for symplectic geometries by stating that the rank and dimension of = form a complete set of invariants under congruence and that the set of all matrices of the form ?Ác is a set of canonical forms. Witt's Extension and Cancellation Theorems We now prove the Witt theorems for symplectic geometries.

Theorem 11.16 ( Witt's extension theorem ) Let == and Z be nonsingular isometric symplectic geometries over a field -: . Suppose that is a subspace of = and ¢: ¦ ²:³‹=Z is an isometry. Then  can be extended to an isometry from == to Z . Proof. According to Theorem 11.13, we can extend  to a nonsingular completion of : , so we may simply assume that : and  ²:³ are nonsingular. Hence, =~:p:ž and =Zž ~ ²:³ p ²:³ To complete the extension of  to = , we need only choose a hyperbolic basis

² Á ÁÃÁ  Á ³ for :ž and a hyperbolic basis ZZ ZZ ² Á ÁÃÁ  Á ³

žZZ for ²:³ and define the extension by setting ² ³ ~  and ² ³ ~  .

As a corollary to Witt's extension theorem, we have Witt's cancellation theorem.

Theorem 11.17 ( Witt's cancellation theorem ) Let == and Z be isometric nonsingular symplectic geometries over a field - . If =~:p:žZžand = ~;p; then :š;¬:žž š; Metric Vector Spaces: The Theory of Bilinear Forms 261

The Structure of the Symplectic Group: Symplectic Transvections To understand the nature of symplectic transformations on a nonsingular symplectic geometry = , we begin with the following definition.

Definition Let =-#= be a nonsingular symplectic geometry over . Let be nonzero and let - . The map #Á ¢= ¦= defined by

#Á²%³ ~ % b º%Á #»# is called the symplectic transvection determined by # and .

The first thing to notice about a symplectic transvection #Á is that if ~ then #Á is the identity and if £ then #Á is the identity precisely on the subspace span²#³ž, which is very large, in the sense of having codimension  . Thus, despite the name, symplectic transvections are not highly complex maps. On the other hand, we should point out that since #²#³ is isotropic, the subspace span is singular and span²#³ q span ²#³ž ~ span ²#³ . Hence, span ²#³ is not a vector ž space complement of the space span²#³ upon which #Á is the identity. In other words, while we can write = ~span ²#³ž l < where dim²< ³ ~  and  Ospan²#³ž ~ , we cannot say that < ~span ²#³ .

Here are the basic properties of symplectic transvections.

Theorem 11.18 Let #Á be a symplectic transvection on = . Then 1) #Á is a symplectic transformation. 2) #Á ~~ if and only if . 3) If %ž# then #Á ²%³~% . For £ , %ž# if and only if #Á ²%³~% 4) #Á #Á~  #Áb c 5) #Á ~ #Ác 6) For any symplectic transformation  ,

c #Á ~ ²#³Á 7) For -i ,

#Á ~ #Á

Note that if <= is a subspace of and if "Á is a symplectic transvection on < then, by definition, "< . However, the map "Á can also be thought of as a symplectic transvection on = , defined by the same formula

"Á²%³ ~ % b º%Á "»" 262 Advanced Linear Algebra

ž where %='<²'³~' can be any vector in . Moreover, for any we have "Á ž and so "Á is the identity on < .

We now wish to prove that any symplectic transformation on a nonsingular symplectic geometry = is the product of symplectic transvections. The proof is not difficult, but it is a bit lengthy, so we break it up into parts. Our first goal is to show that we can get from any hyperbolic pair to any other hyperbolic pair using a product of symplectic transvections.

Let us say that two hyperbolic pairs ²%Á &³ and ²$Á '³ are connected if there is a product  of symplectic transvections that carries %$ to and &' to and write ¢²%Á&³©²$Á'³ or just ²%Á &³ © ²$Á '³ . It is clear that connectedness is an equivalence relation on the set of hyperbolic pairs.

Theorem 11.19 Let = be a nonsingular symplectic geometry. 1) For every hyperbolic pair ²"Á #³ and nonzero vector $  = , there is a vector % for which ²"Á #³ © ²$Á %³ . 2) Any two hyperbolic pairs ²"Á #³ and ²"Á $³ with the same first coordinate are connected. 3) Every pair ²"Á #³ and ²$Á '³ of hyperbolic pairs is connected. Proof. For part 1), all we need to do is find a product  of symplectic transvections for which ²"³ ~ $ , because an isometry maps hyperbolic pairs to hyperbolic pairs and so we can simply set % ~ ²#³ .

If º"Á $» £  then " £ $ and

"c$Á²"³ ~ " b º"Á " c $»²" c $³ ~ " c º"Á $»²" c $³

Taking  ~ °º"Á $» gives "c$Á ²"³ ~ $ , as desired.

Now suppose that º"Á $» ~  . If there is a vector & that is not orthogonal to either "$ or , then by what we have just proved, there is a vector % such that ²"Á #³ © ²&Á % ³ and a vector % for which ²&Á % ³ © ²$Á %³ . Then transitivity implies that ²"Á #³ © ²$Á %³ .

But there is a nonzero vector &= that is not orthogonal to either " or $ since there is a linear functional  on = for which ²"³£  and ²$³£  . But the Riesz representation theorem implies that there is a nonzero vector & such that ²%³ ~ º%Á &» for all %  = .

For part 2), suppose first that º#Á $» £  . Then # £ $ and since º"Á # c $» ~  , we know that #c$Á²"³ ~ " . Also, Metric Vector Spaces: The Theory of Bilinear Forms 263

#c$Á²#³ ~ # b º#Á # c $»²# c $³ ~ # c º#Á $»²# c $³

Taking  ~ °º#Á $» gives #c$Á ²#³ ~ $ , as desired. If º#Á $» ~  , then º#Á " b #» £  implies that ²"Á #³ © ²"Á " b #³ and º" b #Á $» £  implies that ²"Á " b #³ © ²"Á $³. It follows by transitivity that ²"Á #³ © ²"Á $³ .

For part 3), parts 1) and 2) imply that there is a vector % for which ²"Á #³ © ²"Á %³ © ²$Á '³ as desired.

We can now show that the symplectic transvections generate the symplectic group.

Theorem 11.20 Every symplectic transformation on a nonsingular symplectic geometry = is the product of symplectic transvections. Proof. Let  be a symplectic transformation on = . We proceed by induction on  ~dim ²= ³ , which must be even.

If ~ then = ~/~span ²"Á#³ is a hyperbolic plane and by the previous theorem, there is a product  of symplectic transvections on = for which ¢ ²"Á #³ © ² ²"³Á ²#³³ Hence ~~ . This proves the result if . Assume that the result holds for all dimensions less than ²=³~ and let dim .

Let /~span ²"Á#³ be a hyperbolic plane in = and write =~/p/ž where /ž is a nonsingular symplectic geometry of degree less than .

Since ²²"³Á²#³³ is a hyperbolic pair, we again have a product  of symplectic transvections on = for which ¢ ²"Á #³ © ² ²"³Á ²#³³ Thus ~// on the subspace . Also, since is invariant under c , so is /ž.

If we restrict c to / ž , we may apply the induction hypotheses to get a product  of symplectic transvections on /~/žcž for which on . Hence, ~/ on ž and  ~/ on . But since the vectors that define the symplectic transvections making up  belong to /=ž , we may extend to and ~/ on . Thus,  ~/ on as well, and we have  ~= on . 264 Advanced Linear Algebra

The Structure of Orthogonal Geometries: Orthogonal Bases We have seen that no interesting (not totally degenerate) symplectic geometries have orthgonal bases. In contradistinction to the symplectic case, almost all interesting orthogonal geometries = have orthogonal bases. The only problem arises when =²-³~ is also symplectic and char .

In particular, if = is orthogonal and symplectic, then it cannot possess an orthogonal basis unless it is totally degenerate. When char²- ³ £  , the only orthogonal, symplectic geometries are the totally degenerate ones, since the matrix of = with respect to any basis is both symmetric and skew-symmetric, with zeros on the main diagonal and so must be the zero matrix. However, when char²- ³ ~ , such a nonzero matrix exists, for example  4~ >?

Thus, there are orthogonal, symplectic geometries that are not totally degenerate when char²- ³ ~  . These geometries do not have orthogonal bases and we will not consider them further.

Once we have an orthogonal basis for = , the natural question is: “How close can we come to obtaining an orthonormal basis?” Clearly, this is possible only if = is nonsingular. As we will see, the answer to this question depends on the nature of the base field, and is different for algebraically closed fields, the real field and finite fields—the three cases that we will consider in this book.

We should mention that, even when = has an orthogonal basis, the Gram– Schmidt orthogonalization process may not apply, because even nonsingular orthogonal geometries may have isotropic vectors, and so division by º"Á "» is problematic.

For example, consider an orthogonal hyperbolic plane / ~span ²"Á #³ and assume that char²- ³ £  . Thus, " and # are isotropic and º"Á #» ~ º#Á "» ~  . The vector " cannot be extended to an orthogonal basis, as would be possible for a real inner product space, using the Gram–Schmidt process, for it is easy to see that the set ¸"Á"b#¹ cannot be an orthogonal basis for any Á- . However, / has an orthogonal basis, namely, ¸"b#Á"c#¹ . Orthogonal Bases Let === be an orthogonal geometry. If is also symplectic, then has an orthogonal basis if and only if it is totally degenerate. Moreover, when char²- ³ £ , these are the only types of orthogonal, symplectic geometries. Metric Vector Spaces: The Theory of Bilinear Forms 265

Now, let == be an orthogonal geometry that is not symplectic. Hence, contains a nonisotropic vector "²"³ , the subspace span is nonsingular and

= ~span ²" ³ p =

ž where = ~span ²" ³ . If = is not symplectic, then we may decompose =  to get

= ~span ²" ³ p span ²" ³ p = This process may be continued until we reach a decomposition

= ~span ²" ³ p Ä p span ²" ³ p < where < is symplectic as well as orthogonal. (This includes the case < ~ ¸¹ .)

If char²- ³ £  , then < is totally degenerate. Thus, if 89 ~ ²" Á Ã Á " ³ and is any basis for

When char²- ³ ~  , we must work a bit harder. Since < is symplectic, it has the form <~>> prad ²<³ where is a hyperbolic space and so

= ~span ²"³pÄp span ²"³p>D p where D8 is totally degenerate and the " are nonisotropic. If ~²" ÁÃÁ" ³ and 9>:~²%Á&ÁÃÁ%  Á& ³ is a hyperbolic basis for and ~²'ÁÃÁ'   ³ is an ordered basis for D then the union

;89:~ r r ~²" ÁÃÁ" Á% Á& ÁÃÁ% Á& Á' ÁÃÁ' ³ is an ordered basis for = . However, we can do better.

The following lemma says that, when char²- ³ ~  , a pair of isotropic basis vectors (such as %Á&³ can be replaced by a pair of nonisotropic basis vectors, in the presence of a nonisotropic basis vector (such as " ).

Lemma 11.21 Suppose that char²- ³ ~  . Let > be a three-dimensional orthogonal geometry with ordered basis 8 ~ ²"Á %Á &³ for which the matrix of the form with respect to 8 is

vy 4~8  wz where £ . Then the vectors 266 Advanced Linear Algebra

# ~"b%b& #~"b% # ~ " b ² c ³% b & form an orthogonal basis of > consisting of nonisotropic vectors. Proof. It is straightforward to check that the vectors #Á# and #  are linearly independent and mutually orthogonal. Details are left to the reader.

Using the previous lemma, we can replace the vectors ¸" Á% Á& ¹ with the nonisotropic vectors ¸#bb Á# Á# ¹ and still have an ordered basis

;~²"ÁÃÁ"Á#Á#Á#Á%Á&ÁÃÁ%Á&³  c  b b     for = . The replacement process can be repeated until the isotropic vectors are absorbed, leaving an orthogonal basis of nonisotropic vectors.

Let us summarize.

Theorem 11.22 Let = be an orthogonal geometry. 1) If == is also symplectic, then has an orthogonal basis if and only if it is totally degenerate. (When char²- ³ £  , these are the only types of orthogonal, symplectic geometries. When char²- ³ ~  , orthogonal, symplectic geometries that are not totally degenerate do exist.) 2) If == is not symplectic, then has an ordered orthogonal basis 8 ~²" ÁÃÁ" Á' ÁÃÁ' ³ for which º"Á"»~  £ and º'Á'»~  . Hence, 48 has the diagonal form

vy x{Æ x{ x{ 4~x{ 8 x{ x{ Æ wz

with ~rk ²4³8 nonzero entries and  zeros on the diagonal.

As a corollary, we get a nice theorem about symmetric matrices.

Corollary 11.23 Let 4 be a symmetric matrix that is not alternate if char²- ³ ~ . Then 4 is congruent to a diagonal matrix. The Classification of Orthogonal Geometries: Canonical Forms We now want to consider the question of improving upon Theorem 11.22. The diagonal matrices of this theorem do not form a set of canonical forms for congruence. In fact, if ÁÃÁ  are nonzero scalars, then the matrix of = with Metric Vector Spaces: The Theory of Bilinear Forms 267

respect to the basis 9 ~²  " ÁÃÁ  " Á'  ÁÃÁ'  ³ is

 vy   x{Æ x{ x{  4~x{  (11.5) 9 x{ x{ Æ wz

Hence, 4489 and are congruent diagonal matrices, and by a simple change of basis, we can multiply any diagonal entry by a nonzero square in - .

The determination of a set of canonical forms for symmetric (nonalternate when char²- ³ ~ ) matrices under congruence depends on the properties of the base field. Our plan is to consider three types of base fields: algebraically closed fields, the real field s and finite fields. Here is a preview of the forthcoming results.

1) When the base field - is algebraically closed, there is an ordered basis 8 for which

vy x{Æ x{ x{ 4~A ~x{ 8 Á x{ x{ Æ wz

If =4 is nonsingular, then 8 is an identity matrix and = has an orthonormal basis. 2) Over the real base field, there is an ordered basis 8 for which

vy x{Æ x{ x{ x{ x{c x{ 4~Z ~x{Æ 8 ÁÁ x{ x{c x{ x{ x{Æ wz 268 Advanced Linear Algebra

3) If - is a finite field, there is an ordered basis 8 for which

vy x{Æ x{ x{ x{ 4~Z ²³~x{ 8 Á x{ x{ x{ Æ wz

where ²-³~ is unique up to multiplication by a square and if char then we can take ~ .

Now let us turn to the details. Algebraically Closed Fields If - -%c is algebraically closed then for every , the polynomial  has a root in --- , that is, every element of has a square root in . Therefore, we may choose ~°j in (11.5), which leads to the following result.

Theorem 11.24 Let = be an orthogonal geometry over an algebraically closed field -= . Provided that is not symplectic as well when char ²-³~= , then has an ordered orthogonal basis 8 ~²" ÁÃÁ" Á' ÁÃÁ' ³ for which º"Á"»~ and º'Á'»~  . Hence, 48 has the diagonal form

vy x{Æ x{ x{ 4~A ~x{ 8 Á x{ x{ Æ wz with  ones and zeros on the diagonal. In particular, if = is nonsingular then = has an orthonormal basis.

The matrix version of Theorem 11.24 follows.

Theorem 11.25 Let I be the set of all d symmetric matrices over an algebraically closed field -²-³~ . If char , we restrict I to the set of all symmetric matrices with at least one nonzero entry on the main diagonal. 1) Any matrix 4 in IÁ is congruent to a unique matrix of the form Z , in fact, ~rk ²4³ and ~c rk ²4³. 2) The set of all matrices of the form ZÁ for b~ , is a set of canonical forms for congruence on I . 3) The rank of a matrix is a complete invariant for congruence on I . Metric Vector Spaces: The Theory of Bilinear Forms 269

The Real Field s

If -~s , we can choose  ~°j((  , so that all nonzero diagonal elements in (11.5) will be either  , or c .

Theorem 11.26 ( Sylvester's law of inertia ) Any orthogonal geometry = over the real field s has an ordered orthogonal basis

8 ~²" ÁÃÁ" Á# ÁÃÁ# Á' ÁÃÁ' ³ for which º"Á"»~ , º#Á#»~c  and º'Á'»~  . Hence, the matrix 48 has the diagonal form

vy x{Æ x{ x{ x{ x{c x{ 4~Z ~x{Æ 8 ÁÁ x{ x{c x{ x{ x{Æ wz with  ones, negative ones and  zeros on the diagonal.

Here is the matrix version of Theorem 11.26.

Theorem 11.27 Let I be the set of all d symmetric matrices over the real field s. 1) Any matrix in I is congruent to a unique matrix of the form Z ÁÁÁ for some Á and ~cc . 2) The set of all matrices of the form ZÁÁ for bb~ is a set of canonical forms for congruence on I . 3) Let 4I and let 4 be congruent to A ÁÁ . The number b is the rank of 4c4 , the number is the signature of and the triple ²ÁÁ³ is the inertia of 4 . The pair ²Á³ , or equivalently the pair ²bÁc³, is a complete invariant under congruence on I . Proof. We need only prove the uniqueness statement in part 1). Let

8 ~²" ÁÃÁ" Á# ÁÃÁ# Á' ÁÃÁ' ³ and ZZZZZZ 9 ~²" ÁÃÁ"ZZZ Á# ÁÃÁ# ' ÁÃÁ' ³ be ordered bases for which the matrices 4489 and have the form shown in Theorem 11.26. Since the rank of these matrices must be equal, we have b~ZZ b and so ~ Z . 270 Advanced Linear Algebra

If %span ²" ÁÃÁ" ³ and %£ then

 º%Á%»~LM "Á  "   ~  º"Á"»~     Á ~   € Á Á

ZZ On the other hand, if &span ²# ÁÃÁ#Z ³ and &£ then

ZZ ZZ  º&Á&»~LM #Á # ~   º#Á#»~c   Á  ~c    Á Á

ZZZZ Hence, if &span ²# ÁÃÁ#Z Á' ÁÃÁ'Z ³ then º&Á&» . It follows that ZZZZ span²" ÁÃÁ" ³q span ²# ÁÃÁ#ZZ Á' ÁÃÁ' ³~¸¹ and so  b ² c Z ³   that is, ZZ . By symmetry,   and so ~ Z . Finally, since ~ Z , it follows that ~Z . Finite Fields To deal with the case of finite fields, we must know something about the distribution of squares in finite fields, as well as the possible values of the scalars º#Á #».

Theorem 11.28 Let - be a finite field with elements. 1) If char²- ³ ~  then every element of - is a square. 2) If char²- ³ £  then exactly half of the nonzero elements of - are squares, that is, there are ² c ³° nonzero squares in - . Moreover, if % is any  nonsquare in - % - then all nonsquares have the form , for some . i Proof. Write -~- , let - be the subgroup of all nonzero elements in - and let ²-i ³ ~ ¸  “   - i ¹ be the subgroup of all nonzero squares in - . The Frobenius map ¢ -ii ¦ ²- ³ defined by ²³ ~   is a surjective group homomorphism, with kernel ker² ³ ~ ¸  - “  ~ ¹ ~ ¸cÁ ¹ If char²-³~ , then ker ² ³~¸¹ and so is bijective and (( -ii ~ ( ²-³ ( , which proves part 1). If char²- ³ £  , then ((ker ² ³ ~  and so (((( -ii ~  ²- ³ , which proves the first part of part 2). We leave proof of the last statement to the reader. Metric Vector Spaces: The Theory of Bilinear Forms 271

Definition A bilinear form on =- is universal if for any nonzero there exists a vector #  = for which º#Á #» ~  .

Theorem 11.29 Let =- be an orthogonal geometry over a finite field with char²- ³ £  and assume that = has a nonsingular subspace of dimension at least = . Then the bilinear form of is universal. Proof. Theorem 11.22 implies that = contains two linearly independent vectors "# and for which º"Á"»~£Áº#Á#»~£Áº"Á#»~ Given any - , we want to find  and for which ~º"b#Á"b#»~b   or ~c If ( ~ ¸ “  - ¹ then (( ( ~ ² b ³° , since there are ² c ³° nonzero squares  and also we must consider ~ . Also, if )~¸c “ -¹ then for the same reasons (() ~ ² b ³° . It follows that ( q ) cannot be the empty set and so there are  and for which ~c   , as desired.

Now we can proceed with the business at hand.

Theorem 11.30 Let =- be an orthogonal geometry over a finite field and assume that =²-³~²-³£ is not symplectic if char . If char then let be a fixed nonsquare in -- . For any nonzero , write

vy x{Æ x{ x{ x{ ?²³~x{  x{ x{ x{ Æ wz where rk²? ²³³ ~ . 1) If char²- ³ ~  then there is an ordered basis 8 for which 48 ~ ? ²³ . 2) If char²- ³ £  , then there is an ordered basis 8 for which 48 equals ?²³ or ?²³ . Proof. We can dispose of the case char²- ³ ~  quite easily: Referring to (11.5), c since every element of - ~²³ has a square root, we may take j .

If char²- ³ £  , then Theorem 11.22 implies that there is an ordered orthogonal basis 272 Advanced Linear Algebra

8 ~²" ÁÃÁ" Á' ÁÃÁ' ³ for which º"Á"»~  £ and º'Á'»~  . Hence, 48 has the diagonal form

vy x{Æ x{ x{ 4~x{ 8 x{ x{ Æ wz

Now, consider the nonsingular orthogonal geometry = ~span ²" Á " ³ . According to Theorem 11.29, the form is universal when restricted to = . Hence, there exists a # = for which º#Á#»~  .

Now, #~ "b " for Á - not both  , and we may swap "  and "  if necessary to ensure that £ . Hence,

8~²# Á" ÁÃÁ" Á' ÁÃÁ' ³

is an ordered basis for =4 for which the matrix 8 is diagonal and has a  in the upper left entry. We can repeat the process with the subspace = ~span ²# Á # ³ . Continuing in this way, we can find an ordered basis

9 ~²#Á#ÁÃÁ#Á'ÁÃÁ'³   for which 4~?²³9  for some nonzero - . Now, if  is a square in - then we can replace # by ²°j ³# to get a basis : for which 4: ~ ?  ²³ . If  -~  -# is not a square in , then for some and so replacing  by ²° ³# gives a basis : for which 4" ~ ? ²³ .

Theorem 11.31 Let I be the set of all d symmetric matrices over a finite field -²-³~ . If char , we restrict I to the set of all symmetric matrices with at least one nonzero entry on the main diagonal. 1) If char²- ³ ~  then any matrix in I is congruent to a unique matrix of the form ? ²³ and the matrices ¸? ²³ “  ~ Á à Á ¹ form a set of canonical forms for I under congruence. Also, the rank is a complete invariant. 2) If char²- ³ £  , let  be a fixed nonsquare in - . Then any matrix I is congruent to a unique matrix of the form ?²³ or ?²³ . The set ¸? ²³Á ? ²³ “  ~ Á à Á ¹ is a set of canonical forms for congruence on I . (Thus, there are exactly two congruence classes for each rank  .) The Orthogonal Group Having “settled” the classification question for orthogonal geometries over certain types of fields, let us turn to a discussion of the structure-preserving maps, that is, the isometries. Metric Vector Spaces: The Theory of Bilinear Forms 273

Rotations and Reflections We have seen that if 8 is an ordered basis for =%Á&= , then for any ! º%Á &» ~ ´%µ8 488 ´&µ Now, for any B²=³ , we have

!!! º%Á&»~´%µ4´&µ~´%µ²´µ4´µ³´&µ 88888   888  and so  is an isometry if and only if ! ´µ4´µ8 88 ~4 8 Taking determinants gives

 det²4888 ³ ~ det ²´ µ ³ det ²4 ³ Therefore, if = is nonsingular then

det²´ µ8 ³ ~ f  Since the determinant is an invariant under similarity, we can make the following definition.

Definition Let  be an isometry on a nonsingular orthogonal geometry = . The determinant of  is the determinant of any matrix ´µ8 representing . If det²³~ then is called a rotation and if det ²³~c  then  is called a reflection.

The set EEb²= ³ of rotations forms a subgroup of the orthogonal group ²= ³ and the surjective determinant map det¢EE ²=³¦¸cÁ¹ has kernel b ²=³ . Hence, if char²- ³ £  , then EEb ²= ³ is a of ²= ³ of index  . Symmetries Recall that for a real (or complex) inner product space = , we defined a reflection to be a linear map /# for which

/## # ~ c#Á ²/ ³Oº#»ž ~  The term symmetry is often used in the context of general orthogonal geometries.

In particular, suppose that =- is a nonsingular orthogonal geometry over , where char²- ³ £  and let "  = be nonisotropic. Write = ~span ²"³ p span ²"³ž

Then there is a unique isometry " with the properties 274 Advanced Linear Algebra

1) "²"³ ~ c" ž 2) "²%³ ~ % for all % span ²"³

We can also write " ~c p , that is

"²%b&³~c%b& for all %span ²"³ and & span ²"³ž . It is easy to see that º#Á "»  ²#³ ~ # c " " º"Á "»

The map " is called the symmetry determined by " .

Note that the requirement that " be nonisotropic is required, since otherwise we ž would have " span ²"³ and so c" ~" ²"³ ~ " , which implies that " ~  . (Thus, symplectic geometries do not have symmetries.)

In the context of real inner product spaces, Theorem 10.11 says that if ))#~$£ ) ) , then /#c$ is the unique reflection sending # to $ , that is, /²#³~$#c$ . In the present context, we must be careful, since symmetries are defined for nonisotropic vectors only. Here is what we can say.

Theorem 11.32 Let =- be a nonsingular orthogonal geometry over a field , with char²- ³ £  . If "Á #  = have the same nonzero “length,” that is, if º"Á "» ~ º#Á #» £  then there exists a symmetry  for which ²"³ ~ #or ²"³ ~ c#

Proof. In general, if %& and are orthogonal isotropic vectors, then %b& and %c& are also isotropic. Hence, since " and # are not isotropic, it follows that one of "c# and "b# must be nonisotropic. If "b# is nonisotropic, then

"b#²" b #³ ~ c²" b #³ and

"b#²" c #³ ~ " c #

Combining these two gives "b#²"³ ~ c# . On the other hand, if " c # is nonisotropic, then

"c#²" c #³ ~ c²" c #³ and

"c#²" b #³ ~ " b # Metric Vector Spaces: The Theory of Bilinear Forms 275

These equations give "c#²"³ ~ # .

Recall that an operator on a real inner product space is unitary if and only if it is a product of reflections. Here is the generalization to nonsingular orthogonal geometries.

Theorem 11.33 Let =- be a nonsingular orthogonal geometry over a field with char²- ³ £  . A linear transformation  on = is an orthogonal transformation (an isometry) if and only if  is the product of symmetries on = . Proof. We proceed by induction on  ~dim ²= ³ . If  ~  then = ~ span ²#³ where º#Á #» £  . Let  ²#³ ~ # where   - . Since  is unitary º#Á #» ~ º #Á #» ~ º ²#³Á ²#³» ~ º#Á #»

 and so ~f . If ~ then is the identity, which is equal to # . On the other hand, if ~c then ~ # . In either case, is a product of symmetries.

Assume now that the theorem is true for dimensions less than  and let dim²= ³ ~ . Let #  = be nonisotropic. Since º ²#³Á ²#³» ~ º#Á #» £  , Theorem 11.32 implies the existence of a symmetry  on = for which ²²#³³~#  where ~f . Thus, ~f on span ²#³ . Since Theorem 11.15 implies that span²#³ž is  -invariant, we may apply the induction hypothesis to  on span²#³ž to get

ž O~Ä~span²#³ $$  

žž where $$span ²#³ and  is a symmetry on span ²#³ . But each $ can be extended to a symmetry on =²#³~# by setting $ . Assume that is the extension of  to =~ , where on span²#³ . Hence, ~²#³ on span ž and ~  on span²#³.

If ~ then  ~ on = and so  ~ , which completes the proof. If ž ž ~ c then ~## on span ²#³ since is the identity on span²#³ and ~~=~= ### on span²#³. Hence,   on and so   on . The Witt's Theorems for Orthogonal Geometries We are now ready to consider the Witt theorems for orthogonal geometries.

Theorem 11.34 ( Witt's cancellation theorem ) Let => and be isometric nonsingular orthogonal geometries over a field -²-³£ with char . Suppose that =~:p:žžand >~;p; 276 Advanced Linear Algebra

Then :š;¬:žž š; Proof. First, we prove that it is sufficient to consider the case =~> . Suppose that the result holds when =~> and that  ¢=¦> is an isometry. Then ²:³ p ²:žž ³ ~  ²: p : ³ ~  ²= ³ ~ > ~ ; p ; ž Furthermore, ²:³ š : š ; . We can therefore apply the theorem to > to get :žžž š ²: ³ š ; as desired.

To prove the theorem when =~> , assume that =~:p:žž ~;p; where : and ; are nonsingular and :š; . Let  ¢:¦; be an isometry. We proceed by induction on dim²:³ .

Suppose first that dim²:³ ~  and that : ~span ² ³ . Since º ² ³Á ² ³» ~ º Á » £  Theorem 11.32 implies that there is a symmetry  for which ² ³ ~ ² ³ where ~ f  . Hence, is an isometry of = for which ; ~  ²:³ and žž Theorem 11.10 implies that ;~²:³ . Thus, O:ž is the desired isometry.

Now suppose the theorem is true for dim²:³   and let dim ²:³ ~  . Let ¢: ¦; be an isometry. Since : is nonsingular, we can choose a nonisotropic vector : and write :~span ² ³p< , where < is nonsingular. It follows that = ~:p:žž ~span ² ³p< p: and = ~; p;žž ~ ²span ² ³³p ²<³p; Now we may apply the one-dimensional case to deduce that < p :žž š ²< ³ p ; If ¢< p:žž ¦ ²<³p; is an isometry then ²< ³ p ²:žž ³ ~  ²< p : ³ ~  ²< ³ p ; ž But ²< ³ š ²< ³ and since dim ²  ²< ³³ ~ dim ²< ³   , the induction hypothesis implies that :žžž š ²: ³ š ; . Metric Vector Spaces: The Theory of Bilinear Forms 277

As we have seen, Witt's extension theorem is a corollary of Witt's cancellation theorem.

Theorem 11.35 ( Witt's extension theorem ) Let == and Z be isometric nonsingular orthogonal geometries over a field -²-³£ , with char . Suppose that <= is a subspace of and ¢< ¦ ²<³‹=Z is an isometry. Then  can be extended to an isometry from == to Z . Maximal Hyperbolic Subspaces of an Orthogonal Geometry We have seen that any orthogonal geometry = can be written in the form = ~ < prad ²= ³ where < is nonsingular. Nonsingular spaces are better behaved than singular ones, but they can still possess isotropic vectors.

We can improve upon the preceding decomposition by noticing that if "< is isotropic, then span²"³ is totally degenerate and so it can be “captured” in a hyperbolic plane / ~span ²"Á %³ , namely, the nonsingular extension of span²"³. Then we can write

= ~ / p /ž< prad ²= ³ where //<ž< is the orthogonal complement of in and has “one fewer” isotropic vector.

In order to generalize this process, we first discuss maximal totally degenerate subspaces. Maximal Totally Degenerate Subspaces Let = be a nonsingular orthogonal geometry over a field - , with char ²- ³ £  . Suppose that << and Z are maximal totally degenerate subspaces of = . We claim that dim²< ³ ~ dim ²

We have proved the following.

Theorem 11.36 Let =- be a nonsingular orthogonal geometry over a field , with char²- ³ £ . 278 Advanced Linear Algebra

1) All maximal totally degenerate subspaces of = have the same dimension, which is called the Witt index of = and is denoted by $²= ³ . 2) Any totally degenerate subspace of = of dimension $²= ³ is maximal. Maximal Hyperbolic Subspaces We can prove by a similar argument that all maximal hyperbolic subspaces of = have the same dimension. Let

>~/  pÄp/  and

A~2  pÄp2  be maximal hyperbolic subspaces of =/~²"Á#³ and suppose that span and 2~span ²%Á&³ . We may assume that dim ²>A ³ dim ² ³ .

The linear map >¢¦ A defined by

²" ³ ~ % Á ²# ³ ~ & is clearly an isometry from >> to ²³ . Thus, Witt's extension theorem implies the existence of an isometry A¢= ¦= that extends . In particular, c ² ³ is a hyperbolic space that contains >A> and so c²³~ . It follows that dim ²³ A ~²³dim > .

It is not hard to see that the maximum dimension ²= ³ of a hyperbolic subspace of = is $²= ³ , where $²= ³ is the Witt index of = . First, the nonsingular extension of a maximal totally degenerate subspace <=$ of is a hyperbolic space of dimension $²= ³ and so ²= ³ ‚ $²= ³ . On the other hand, there is a totally degenerate subspace < contained in any hyperbolic space > and so $²=³, that is, dim ²> ³$²=³ . Hence ²=³$²=³ and so ²=³~$²=³.

Theorem 11.37 Let =- be a nonsingular orthogonal geometry over a field , with char²- ³ £ . 1) All maximal hyperbolic subspaces of =$²=³ have dimension . 2) Any hyperbolic subspace of dimenison $²= ³ must be maximal. 3) The Witt index of a hyperbolic space > is  . The Anisotropic Decomposition of an Orthogonal Geometry If > is a maximal hyperbolic subspace of = then =~>> p ž Since >> is maximal, žž is anisotropic, for if " > is isotropic then the nonsingular extension of > p²"³span would be a hyperbolic space strictly larger than > . Metric Vector Spaces: The Theory of Bilinear Forms 279

Thus, we arrive at the following decomposition theorem for orthogonal geometries.

Theorem 11.38 () The anisotropic decomposition of an orthogonal geometry Let = ~ < prad ²= ³ be an orthogonal geometry over - , with char ²- ³ £  . Let >> be a maximal hyperbolic subspace of < , where ~ ¸¹ if < has no isotropic vectors. Then =~:p> prad ²=³ where : is anisotropic, > is hyperbolic of dimension $²= ³ and rad ²= ³ is totally degenerate. Exercises 1. Let <Á> be subspaces of a metric vector space = . Show that a) <‹>¬>žž ‹< b) <‹<žž c) <~<ž žžž 2. Let <Á> be subspaces of a metric vector space = . Show that a) ²< b > ³žžž ~ < q > b) ²< q > ³žž ~ < b > ž 3. Prove that the following are equivalent: a) = is nonsingular b) º"Á %» ~ º#Á %» for all %  = implies " ~ # 4. Show that a metric vector space = is nonsingular if and only if the matrix 48 of the form is nonsingular, for every ordered basis 8 . 5. Let =ºÁ» be a finite-dimensional vector space with a bilinear form . We do not assume that the form is symmetric or alternate. Show that the following are equivalent: a) ¸#= “º#Á$»~ for all $=¹~ b) ¸#= “º$Á#»~ for all $=¹~ Hint : Consider the singularity of the matrix of the form. 6. Find a diagonal matrix congruent to

vy    wzc

7. Prove that the matrices   0~ and 4~  >? >? 

are congruent over the base field -~r of rational numbers. Find an ! invertible matrix 7707~4 such that  . 280 Advanced Linear Algebra

8. Let = be an orthogonal geometry over a field - with char ²- ³ £  . We wish to construct an orthogonal basis E ~²"ÁÃÁ" ³ for = , starting with any generating set = ~²#ÁÃÁ# ³ . Justify the following steps, essentially due to Lagrange. We may assume that = is not totally degenerate. a) If º# Á # » £  for some  then let "  ~ #  . Otherwise, there are indices £ for which º#Á#»£ . Let "  ~#  b#  . b) Assume we have found an ordered set of vectors E~²"ÁÃÁ"³  that form an orthogonal basis for a subspace == of and that none of ž the "=~=p= 's are isotropic. Then  . c) For each # = , let

 º# Á " » $~#c "  ~ º" Á " »

žž Then the vectors $== span  . If is totally degenerate, take any žž basis for == and append it to E . Otherwise, repeat step a) on to get another vector "b and let E b ~²"  ÁÃÁ" b ³ . Eventually, we arrive at an orthogonal basis E for = . 9. Prove that orthogonal hyperbolic planes may be characterized as two- dimensional nonsingular orthogonal geometries that have exactly two one- dimensional totally isotropic (equivalently: totally degenerate) subspaces. 10. Prove that a two-dimensional nonsingular orthogonal geometry is a hyperbolic plane if and only if its discriminant is -²c³ . 11. Does Minkowski space contain any isotropic vectors? If so, find them. 12. Is Minkowski space isometric to Euclidean space s ? 13. If ºÁ » is a symmetric bilinear form on = and char ²- ³ £  , show that 8²%³ ~ º%Á %»° is a quadratic form. 14. Let =-~²#ÁÃÁ#³ be a vector space over a field , with ordered basis 8  . Let ²% Á à Á % ³ be a homogeneous polynomial of degree  over - , that is, a polynomial each of whose terms has degree  . The -form defined by is the function from =- to defined as follows. If #~#'  then

²#³~² ÁÃÁ ³ (We use the same notation for the form and the polynomial.) Prove that  - forms are the same as quadratic forms. 15. Show that  is an isometry on = if and only if 8² ²#³³ ~ 8²#³ where 8 is the quadratic form associated with the bilinear form on = . (Assume that char²- ³ £ . ³ 16. Show that a quadratic form 8= on satisfies the parallelogram law: 8²% b &³ b 8²% c &³ ~ ´8²%³ b 8²&³µ 17. Show that if =- is a nonsingular orthogonal geometry over a field , with char²- ³ £  then any totally isotropic subspace of = is also a totally degenerate space. 18. Is it true that = ~rad ²= ³ p rad ²= ³ž ? Metric Vector Spaces: The Theory of Bilinear Forms 281

19. Let = be a nonsingular symplectic geometry and let #Á be a symplectic transvection. Prove that a) #Á #Á~  #Áb b) For any symplectic transformation  ,

c #Á ~ ²#³Á c) For -i ,

#Á ~ #Á

d) For a fixed #£ , the map ª#Á is an isomorphism from the additive group of - onto the group ¸#Á “   - ¹ ‹ Sp ²= ³ . 20. Prove that if %- is any nonsquare in a finite field  then all nonsquares have the form % , for some - . Hence, the product of any two nonsquares in - is a square. 21. Formulate Sylvester's law of inertia in terms of quadratic forms on = . 22. Show that a two-dimensional space is a hyperbolic plane if and only if it is nonsingular and contains an isotropic vector. Assume that char²- ³ £  . 23. Prove directly that a hyperbolic plane in an orthogonal geometry cannot have an orthogonal basis when char²- ³ ~  . 24. a) Let <= be a subspace of . Show that the inner product º%b<Á&b<»~º%Á&» on the quotient space =°< is well-defined if and only if < ‹rad ²= ³ . b) If <‹rad ²=³ , when is =°< nonsingular? 25. Let =~5p: , where 5 is a totally degenerate space. a) Prove that 5~rad ²=³ if and only if : is nonsingular. b) If ::š=°²=³ is nonsingular, prove that rad . 26. Let dim²= ³ ~ dim ²> ³ . Prove that = °rad ²= ³ š > ° rad ²> ³ implies =š>. 27. Let =~:p; . Prove that a) rad²= ³ ~ rad ²:³ p rad ²; ³ b) = °rad ²= ³ š :° rad ²:³ p ; ° rad ²; ³ c) dim²²=³³~²²:³³b²²;³³rad dim rad dim rad d) =:; is nonsingular if and only if and are both nonsingular. 28. Let = be a nonsingular metric vector space. Because the Riesz representation theorem is valid in = , we can define the adjoint  i of a linear map B²=³ exactly as in the case of real inner product spaces. Prove that  is an isometry if and only if it is bijective and unitary (that is, i ~ ). 29. If char²- ³ £  , prove that B  ²= Á > ³ is an isometry if and only if it is bijective and º ²#³Á ²#³» ~ º#Á #» for all #  = . 30. Let 8B~ ¸# Á à Á # ¹ be a basis for = . Prove that  ²= Á > ³ is an isometry if and only if it is bijective and º #Á # »~º#Á#  » for all Á . 31. Let 8 be a linear operator on a metric vector space = . Let ~²# ÁÃÁ# ³ be an ordered basis for =4 and let 8 be the matrix of the form relative to 282 Advanced Linear Algebra

8. Prove that is an isometry if and only if ! ´µ8 4´µ88 ~4 8 32. Let = be a nonsingular orthogonal geometry and let B  ²= ³ be an isometry. a) Show that dim² ker ²c³³~ dim ²im ²c³³ ž . b) Show that ker²c ³~im ²c  ³ž . How would you describe ker²c ³ in words? c) If  is a symmetry, what is dim²²c³³ ker ? d) Can you characterize symmetries by means of dim²²c³³ ker  ? 33. A linear transformation B²=³ is called unipotent if  c is nilpotent. Suppose that = is a nonisotropic metric vector space and that  is unipotent and isometric. Show that ~ . 34. Let =< be a hyperbolic space of dimension and let be a hyperbolic subspace of = of dimension  . Show that for each  , there is a hyperbolic subspace >> of =<‹‹= for which  . 35. Let char²- ³ £  . Prove that if ? is a totally degenerate subspace of an orthgonal geometry =²?³²=³° then dim dim . 36. Prove that an orthogonal geometry = of dimension is a hyperbolic space if and only if == is nonsingular, is even and contains a totally degenerate subspace of dimension ° . 37. Prove that a symplectic transformation has determinant equal to  . Chapter 12 Metric Spaces

The Definition In Chapter 9, we studied the basic properties of real and complex inner product spaces. Much of what we did does not depend on whether the space in question is finite or infinite-dimensional. However, as we discussed in Chapter 9, the presence of an inner product and hence a metric, on a vector space, raises a host of new issues related to convergence. In this chapter, we discuss briefly the concept of a metric space. This will enable us to study the convergence properties of real and complex inner product spaces.

A metric space is not an algebraic structure. Rather it is designed to model the abstract properties of distance.

Definition A metric space is a pair ²4Á ³ , where 4 is a nonempty set and ¢ 4 d 4 ¦s is a real-valued function, called a metric on 4 , with the following properties. The expression ²%Á &³ is read “the distance from % to & .” 1) ()Positive definiteness For all %Á &  4 , ²%Á &³ ‚  and ²%Á &³ ~  if and only if % ~ & . 2) ()Symmetry For all %Á &  4 , ²%Á &³ ~ ²&Á %³

3) ()Triangle inequality For all %Á &Á '  4 , ²%Á &³  ²%Á '³ b ²'Á &³ As is customary, when there is no cause for confusion, we simply say “let 4 be a metric space.” 284 Advanced Linear Algebra

Example 12.1 Any nonempty set 4 is a metric space under the discrete metric, defined by %~&if ²%Á &³ ~ F %£&if

Example 12.2  1) The set s is a metric space, under the metric defined for %~²% ÁÃÁ% ³ and &~²& ÁÃÁ& ³ by

 ²%Á&³~j ²% c& ³ bÄb²%  c& ³ This is called the Euclidean metric on ss . We note that is also a metric space under the metric

 ²%Á&³~(( % c& bÄb ( % c& (

 Of course, ²Á³²Á³ss and  are different metric spaces. 2) The set d is a metric space under the unitary metric

 ²%Á&³~k(( % c& bÄb ( %  c& (

 where %~²% ÁÃÁ% ³ and &~²&  ÁÃÁ& ³ are in d .

Example 12.3 1) The set *´Áµ of all real-valued (or complex-valued) continuous functions on ´Á µ is a metric space, under the metric ²Á ³ ~sup (( ²%³ c ²%³ %´Áµ

We refer to this metric as the sup metric . 2) The set *´Áµ of all real-valued ² or complex-valued) continuous functions on ´Á µ is a metric space, under the metric

  ²²%³Á ²%³³ ~ (( ²%³ c ²%³ dx a Example 12.4 Many important sequence spaces are metric spaces. We will often use boldface roman letters to denote sequences, as in % ~²%³ and & ~²&³ . B 1) The set Ms of all bounded sequences of real numbers is a metric space under the metric defined by

²%& Á ³ ~sup(( % c & 

B The set Md of all bounded complex sequences, with the same metric, is also a metric space. As is customary, we will usually denote both of these spaces by MB. Metric Spaces 285

 2) For ‚ , let M be the set of all sequences % ~²%³ of real ² or complex) numbers for which

B   ((%B ~ We define the -norm of % by

B °  ))%  ~%89 ( ( ~ Then M is a metric space, under the metric

B °  ²%& Á ³ ~)) % c & ~89 ( % c & ( ~ The fact that M is a metric follows from some rather famous results about sequences of real or complex numbers, whose proofs we leave as (well- hinted) exercises.

Holder's¨ inequality Let Á  ‚  and  b  ~  . If %&  M and  M then  the product sequence %& ~²%& ³ is in M and

))%& )))) % & that is,

BB° B °  ((%& 8989 (( %   (( &  ~ ~ ~

A special case of this (with ~~ 2) is the Cauchy-Schwarz inequality

BBB  ((%& mm (( %  (( &  ~ ~ ~ Minkowski's inequality For ‚ , if %& Á M then the sum % b &  ~²% b& ³ is in M and

))))))%&b % b & that is,

BBB° ° °  898989((%b&  (( %  b (( &  ~ ~ ~ 286 Advanced Linear Algebra

If 4:4 is a metric space under a metric then any nonempty subset of is also a metric under the restriction of :d: to . The metric space : thus obtained is called a subspace of 4 . Open and Closed Sets

Definition Let 4%4 be a metric space. Let  and let be a positive . 1) The open ball centered at %  , with radius , is

)²%Á ³~¸%4“²%Á%³ ¹

2) The closed ball centered at %  , with radius , is

)²%Á ³~¸%4“²%Á%³ ¹

3) The sphere centered at %  , with radius , is

:²%Á ³~¸%4“²%Á%³~ ¹ Definition A subset :4 of a metric space is said to be open if each point of : is the center of an open ball that is contained completely in : . More specifically, :%: € is open if for all , there exists an such that )²%Á ³ ‹ :. Note that the empty set is open. A set ; ‹ 4 is closed if its complement ;4 in is open.

It is easy to show that an open ball is an open set and a closed ball is a closed set. If %4 , we refer to any open set : containing % as an open neighborhood of % . It is also easy to see that a set is open if and only if it contains an open neighborhood of each of its points.

The next example shows that it is possible for a set to be both open and closed, or neither open nor closed.

Example 12.5 In the metric space s with the usual Euclidean metric, the open balls are just the open intervals

)²% Á ³ ~ ²% c Á % b ³ and the closed balls are the closed intervals

)²% Á ³ ~ ´% c Á % b µ Consider the half-open interval : ~ ²Á µ , for a   . This set is not open, since it contains no open ball centered at : and it is not closed, since its complement : ~ ²cBÁ µ r ²Á B³ is not open, since it contains no open ball about . Metric Spaces 287

Observe also that the empty set is both open and closed, as is the entire space s . (Although we will not do so, it is possible to show that these are the only two sets that are both open and closed in s .³

It is not our intention to enter into a detailed discussion of open and closed sets, the subject of which belongs to the branch of mathematics known as topology . In order to put these concepts in perspective, however, we have the following result, whose proof is left to the reader.

Theorem 12.1 The collection E of all open subsets of a metric space 4 has the following properties: 1) JEE, 4 2) If :;, EE then :q;  3) If ¸: “   2¹ is any collection of open sets then 2 :  E . These three properties form the basis for an axiom system that is designed to generalize notions such as convergence and continuity and leads to the following definition.

Definition Let ?? be a nonempty set. A collection E of subsets of is called a topology for ? if it has the following properties: 1) JEE Á ? 2) If :Á; EE then :q; 3) If ¸: “   2¹ is any collection of sets in EE then  :  . 2 We refer to subsets in EE as open sets and the pair ²?Á ³ as a topological space.

According to Theorem 12.1, the open sets (as we defined them earlier) in a metric space 44 form a topology for , called the topology induced by the metric.

Topological spaces are the most general setting in which we can define concepts such as convergence and continuity, which is why these concepts are called topological concepts. However, since the topologies with which we will be dealing are induced by a metric, we will generally phrase the definitions of the topological properties that we will need directly in terms of the metric. Convergence in a Metric Space Convergence of sequences in a metric space is defined as follows.

Definition A sequence ²% ³ in a metric space 4 converges to %  4 , written ²% ³ ¦ %, if

lim ²% Á %³ ~  ¦B

Equivalently, ²% ³ ¦ % if for any  €  , there exists an 5 €  such that 288 Advanced Linear Algebra

€5¬²%Á%³  or, equivalently

€5¬% )²%Á³

In this case, %²%³ is called the limit of the sequence  .

If 4:4: is a metric space and is a subset of , by a sequence in , we mean a sequence whose terms all lie in : . We next characterize closed sets and therefore also open sets, using convergence.

Theorem 12.2 Let 4:‹4 be a metric space. A subset is closed if and only if whenever ²% ³ is a sequence in : and ²% ³ ¦ % then %  : . In loose terms, a subset : is closed if it is closed under the taking of sequential limits. Proof. Suppose that :²%³¦%%: is closed and let  , where for all . Suppose that %¤: . Then since %: and : is open, there exists an  € for which %)²%Á³‹:  . But this implies that

)²%Á ³ q ¸% ¹ ~ J which contradicts the fact that ²% ³ ¦ % . Hence, %  : .

Conversely, suppose that : is closed under the taking of limits. We show that :%: is open. Let and suppose to the contrary that no open ball about % is contained in : . Consider the open balls )²%Á °³ , for all  ‚  . Since none of  these balls is contained in : , for each  , there is an %  : q )²%Á °³ . It is  clear that ²% ³ ¦ % and so %  : . But % cannot be in both : and : . This contradiction implies that :: is open. Thus, is closed. The Closure of a Set Definition Let :4: be any subset of a metric space . The closure of , denoted by cl²:³ , is the smallest closed set containing : .

We should hasten to add that, since the entire space 4 is closed and since the intersection of any collection of closed sets is closed (exercise), the closure of any set :: does exist and is the intersection of all closed sets containing . The following definition will allow us to characterize the closure in another way.

Definition Let :4%4 be a nonempty subset of a metric space . An element is said to be a limit point , or accumulation point of : if every open ball centered at %: meets at a point other than % itself. Let us denote the set of all limit points of : by M²:³ .

Here are some key facts concerning limit points and closures. Metric Spaces 289

Theorem 12.3 Let :4 be a nonempty subset of a metric space . 1) %  M²:³ if and only if there is a sequence ²% ³ in : for which % £ % for all ²%³¦% and  . 2) :M²:³‹:: is closed if and only if . In words, is closed if and only if it contains all of its limit points. 3) cl²:³ ~ : r M²:³. 4) %cl ²:³ if and only if there is a sequence ²%³ in : for which ²%³¦% . Proof. For part 1), assume first that %M²:³ . For each  , there exists a point % £ % such that %  )²%Á °³ q : . Thus, we have

²% Á %³  ° and so ²% ³ ¦ % . For the converse, suppose that ²% ³ ¦ % , where % £ %  : . If )²%Á ³ is any ball centered at % then there is some 5 such that  € 5 implies %  )²%Á ³ . Hence, for any ball )²%Á ³ centered at % , there is a point % £ %, such that %  : q )²%Á ³ . Thus, % is a limit point of : .

As for part 2), if : is closed then by part 1), any %  M²:³ is the limit of a sequence ²% ³ in : and so must be in : . Hence, M²:³ ‹ : . Conversely, if M²:³‹: then : is closed. For if ²%³ is any sequence in : and ²%³¦% then there are two possibilities. First, we might have %~% for some  , in which case %~% : . Second, we might have % £% for all  , in which case ²% ³ ¦ % implies that %  M²:³ ‹ : . In either case, %  : and so : is closed under the taking of limits, which implies that : is closed.

For part 3), let ;~:rM²:³ . Clearly, :‹; . To show that ; is closed, we show that it contains all of its limit points. So let %M²;³ . Hence, there is a sequence ²% ³  ; for which % £ % and ²% ³ ¦ % . Of course, each %  is either in ::%;% , or is a limit point of . We must show that , that is, that is either in :: or is a limit point of .

Suppose for the purposes of contradiction that % ¤ : and % ¤ M²:³ . Then there is a ball )²%Á ³ for which )²%Á ³ q : £ J . However, since ²% ³ ¦ % , there must be an %)²%Á ³ . Since % cannot be in : , it must be a limit point of : . Referring to Figure 12.1, if ²%Á%³~  then consider the ball )²% Á² c³°³. This ball is completely contained in )²%Á ³ and must contain an element &: of , since its center % is a limit point of : . But then &:q)²%Á ³, a contradiction. Hence, %: or %M²:³ . In either case, %; ~:rM²:³ and so ; is closed.

Thus, ; is closed and contains : and so cl ²:³ ‹ ; . On the other hand, ; ~ : r M²:³ ‹cl ²:³ and so cl ²:³ ~ ; . 290 Advanced Linear Algebra

Figure 12.1 For part 4), if %cl ²:³ then there are two possibilities. If %: then the constant sequence ²% ³ , with % ~ % for all % , is a sequence in : that converges to %%¤:%M²:³ . If then and so there is a sequence ²%³: in for which %£% and ²%³¦% . In either case, there is a sequence in : converging to % . Conversely, if there is a sequence ²% ³ in : for which ²% ³ ¦ % then either % ~ % for some  , in which case %  : ‹cl ²:³ , or else % £ % for all  , in which case %  M²:³ ‹cl ²:³ . Dense Subsets The following concept is meant to convey the idea of a subset :‹4 being “arbitrarily close” to every point in 4 .

Definition A subset :44²:³~4 of a metric space is dense in if cl . A metric space is said to be separable if it contains a countable dense subset.

Thus, a subset :4 of is dense if every open ball about any point %4 contains at least one point of : .

Certainly, any metric space contains a dense subset, namely, the space itself. However, as the next examples show, not every metric space contains a countable dense subset.

Example 12.6 1) The real line sr is separable, since the rational numbers form a countable dense subset. Similarly, sr is separable, since the set is countable and dense. 2) The complex plane dd is separable, as is  for all  . 3) A discrete metric space is separable if and only if it is countable. We leave proof of this as an exercise. Metric Spaces 291

Example 12.7 The space MMBB is not separable. Recall that is the set of all bounded sequences of real numbers (or complex numbers), with metric

²%& Á ³ ~sup(( % c &  To see that this space is not separable, consider the set : of all binary sequences

:~¸²%³“% ~ or  for all ¹ This set is in one-to-one correspondence with the set of all subsets of o and so L is uncountable. (It has cardinality 2€L . ³ Now, each sequence in : is certainly bounded and so lies in M£MBB . Moreover, if %& then the two sequences must differ in at least one position and so ²%Á &³ ~  .

In other words, we have a subset :M of B that is uncountable and for which the distance between any two distinct elements is  . This implies that the uncountable collection of balls ¸)² Á ° ) “  :¹ is mutually disjoint. Hence, no countable set can meet every ball, which implies that no countable set can be dense in MB .

Example 12.8 The metric spaces M‚: are separable, for . The set of all sequences of the form

~² ÁÃÁ ÁÁó for all € , where the  's are rational, is a countable set. Let us show that it is dense in M%M . Any satisfies B   ((%B ~ Hence, for any  € , there exists an 5 such that B    ((% ~5b 

Since the rational numbers are dense in s , we can find rational numbers  for which  ((%c   5 for all ~ÁÃÁ5 . Hence, if ~²5 ÁÃÁ ÁÁó then

5B   ²%Á ³~(( %cb (( %  b~ ~ ~5b  which shows that there is an element of :M arbitrarily close to any element of  . Thus, :MM is dense in  and so is separable. 292 Advanced Linear Algebra

Continuity Continuity plays a central role in the study of linear operators on infinite- dimensional inner product spaces.

Definition Let ¢4 ¦ 4Z be a function from the metric space ²4Á³ to the ZZ metric space ²4 Á  ³ . We say that  is continuous at %  4 if for any  €  , there exists a  € such that Z ²%Á % ³  ¬  ²²%³Á ²% ³³  or, equivalently,

45 )²% Á ³ ‹ )²²% ³Á ³

(See Figure 12.2.³ A function is continuous if it is continuous at every %4 .

Figure 12.2 We can use the notion of convergence to characterize continuity for functions between metric spaces.

Theorem 12.4 A function ¢4 ¦ 4Z is continuous if and only if whenever ²% ³ is a sequence in 4 that converges to %  4 then the sequence ²²% ³³ converges to ²% ³ , in short,

²% ³ ¦ % ¬ ²²%  ³³ ¦ ²%  ³

Proof. Suppose first that  is continuous at % and let ²% ³ ¦ % . Then, given €, the continuity of  implies the existence of a € such that

 ²)²% Á ³³ ‹ )²²% ³Á ³

Since ²% ³ ¦ % , there exists an 5 €  such that %  )²% Á ³ for  € 5 and so

 € 5 ¬ ²% ³  )²²% ³Á ³

Thus, ²% ³ ¦ ²% ³.

Conversely, suppose that ²% ³ ¦ % implies ²²%  ³³ ¦ ²%  ³ . Suppose, for the purposes of contradiction, that % is not continuous at  . Then there exists an Metric Spaces 293

€ such that, for all €

45 )²% Á ³ ‹\ )²²% ³Á ³

Thus, for all € ,  )%Á4567 ‹\ )²²% ³Á ³  and so we may construct a sequence ²% ³ by choosing each term % with the property that  %)%Á67, but ²%³¤)²²%³Á³   

Hence, ²% ³ ¦ % , but ²%  ³ does not converge to ²%  ³ . This contradiction implies that % must be continuous at  .

The next theorem says that the distance function is a continuous function in both variables.

Theorem 12.5 Let ²4Á ³ be a metric space. If ²% ³ ¦ % and ²& ³ ¦ & then ²% Á & ³ ¦ ²%Á &³. Proof. We leave it as an exercise to show that

((²% Á & ³ c ²%Á &³  ²%  Á %³ b ²&  Á &³

But the right side tends to  as ¦B and so ²%Á& ³¦²%Á&³ . Completeness The reader who has studied analysis will recognize the following definitions.

Definition A sequence ²% ³ in a metric space 4 is a Cauchy sequence if, for any  € , there exists an 5€ for which

Á  € 5 ¬ ²% Á % ³   We leave it to the reader to show that any convergent sequence is a Cauchy sequence. When the converse holds, the space is said to be complete .

Definition Let 4 be a metric space. 1) 444 is said to be complete if every Cauchy sequence in converges in . 2) A subspace :4 of is complete if it is complete as a metric space. Thus, : is complete if every Cauchy sequence ²  ³ in : converges to an element in :.

Before considering examples, we prove a very useful result about completeness of subspaces. 294 Advanced Linear Algebra

Theorem 12.6 Let 4 be a metric space. 1) Any complete subspace of 4 is closed. 2) If 4:4 is complete then a subspace of is complete if and only if it is closed. Proof. To prove 1), assume that : is a complete subspace of 4 . Let ²% ³ be a sequence in :²%³¦%4²%³ for which  . Then is a Cauchy sequence in : and since :²%³ is complete,  must converge to an element of : . Since limits of sequences are unique, we have %: . Hence, : is closed.

To prove part 2), first assume that :: is complete. Then part 1) shows that is closed. Conversely, suppose that :²%³ is closed and let  be a Cauchy sequence in :²%³ . Since  is also a Cauchy sequence in the complete space 4 , it must converge to some %4 . But since : is closed, we have ²%³¦%: . Hence, : is complete.

Now let us consider some examples of complete (and incomplete) metric spaces.

Example 12.9 It is well known that the metric space s is complete. (However, a proof of this fact would lead us outside the scope of this book.³ Similarly, the complex numbers d are complete.

Example 12.10 The Euclidean space sd and the unitary space are complete.  Let us prove this for ss . Suppose that ²% ³ is a Cauchy sequence in , where

%ÁÁ ~²% ÁÃÁ% ³ Thus,

  ²% Á % ³ ~ ²% ÁÁ c % ³ ¦  as Á  ¦ B ~ and so, for each coordinate position  ,

 ²%Á c % Á ³  ²%  Á %  ³ ¦  which shows that the sequence ²%Á ³ ~Á2 Áà of  th coordinates is a Cauchy sequence in ss . Since is complete, we must have

²%Á ³ ¦ &  as  ¦ B

If &~²& ÁÃÁ& ³ then   ²%Á Á &³ ~ ²% c & ³ ¦  as  ¦ B ~

 and so ²% ³ ¦ &  ss . This proves that is complete. Metric Spaces 295

Example 12.11 The metric space ²*´Á µÁ ³ of all real-valued (or complex- valued) continuous functions on ´Á µ , with metric ²Á ³ ~sup (( ²%³ c ²%³ %´Áµ is complete. To see this, we first observe that the limit with respect to  is the uniform limit on ´Áµ , that is ² Á³¦ if and only if for any  € , there is an 5€ for which

€5 ¬((  ²%³c²%³ for all %´Áµ

Now, let ² ³ be a Cauchy sequence in ²*´Á µÁ ³ . Thus, for any  €  , there is an 5 for which

Á  € 5 ¬((  ²%³ c  ²%³  for all %  ´Á µ (12.1)

This implies that, for each %  ´Á µ , the sequence ² ²%³³ is a Cauchy sequence of real (or complex) numbers and so it converges. We can therefore define a function ´Áµ on by

²%³ ~lim  ²%³ ¦B Letting ¦B in (12.1), we get

€5¬((  ²%³c²%³ for all %´Áµ

Thus,  ²%³ converges to ²%³ uniformly. It is well known that the uniform limit of continuous functions is continuous and so ²%³ *´Áµ . Thus, ² ²%³³ ¦ ²%³  *´Á µ and so ²*´Á µÁ ³ is complete.

Example 12.12 The metric space ²*´Á µÁ  ³ of all real-valued (or complex- valued) continuous functions on ´Á µ , with metric

  ²²%³Á ²%³³ ~ (( ²%³ c ²%³ % a is not complete. For convenience, we take ´Á µ ~ ´Á µ and leave the general case for the reader. Consider the sequence of functions ²%³ whose graphs are shown in Figure 12.3. (The definition of ²%³ should be clear from the graph. ³ 296 Advanced Linear Algebra

Figure 12.3

We leave it to the reader to show that the sequence ² ²%³³ is Cauchy, but does not converge in ²*´Á µÁ  ³ . (The sequence converges to a function that is not continuous.)

B Example 12.13 The metric space M²%³ is complete. To see this, suppose that  is a Cauchy sequence in MB , where

%ÁÁ ~²% Á%2 Áó Then, for each coordinate position  , we have

((((%c%Á Ásup %c%¦Á¦B Á Á as (12.2) 

Hence, for each ²%³ , the sequence Á of th coordinates is a Cauchy sequence in sd (or ). Since sd (or ) is complete, we have

²%Á ³ ¦ &  as  ¦ B

B for each coordinate position &~²&³M . We want to show that  and that ²% ³ ¦ &.

Letting ¦B in ² 12.2) gives

sup ((%c&¦¦BÁ  as (12.3)  and so, for some  ,

((%c&Á  for all  and so

((&b%Á ( ( for all 

B But since %M , it is a bounded sequence and therefore so is ²&³ . That is, B B & ~ ²& ³  M. Since ² 12.3) implies that ²% ³ ¦ & , we see that M is complete. Metric Spaces 297

 Example 12.14 The metric space M²%³ is complete. To prove this, let  be a Cauchy sequence in M , where

%ÁÁ ~²% Á%2 Áó Then, for each coordinate position  , B  ((((%c%Á Á %c% Á Á ~²%Á%³¦   ~ which shows that the sequence ²%Á ³ of  th coordinates is a Cauchy sequence in sd (or ). Since sd (or ) is complete, we have

²%Á ³ ¦ &  as  ¦ B

 We want to show that &~²&³M and that ²%³¦& .

To this end, observe that for any  € , there is an 5 for which  Á  € 5 ¬(( %Á c % Á   ~ for all € . Now, we let ¦B , to get  €5¬(( %Á c&   ~ for all € . Letting ¦B , we get, for any €5 , B  ((%c&Á   ~

 which implies that ²% ³ c &  M and so & ~ & c ²% ³ b ²% ³  M and in addition, ²% ³ ¦ &.

As we will see in the next chapter, the property of completeness plays a major role in the theory of inner product spaces. Inner product spaces for which the induced metric space is complete are called Hilbert spaces . Isometries A function between two metric spaces that preserves distance is called an isometry. Here is the formal definition.

Definition Let ²4Á ³ and ²4ZZ Á  ³ be metric spaces. A function ¢ 4 ¦ 4 Z is called an isometry if Z ²²%³Á ²&³³ ~ ²%Á &³ 298 Advanced Linear Algebra for all %Á &  4 . If ¢ 4 ¦ 4ZZ is a bijective isometry from 4 to 4 , we say that 44 and ZZ are isometric and write 4š4 .

Theorem 12.7 Let ¢ ²4Á ³ ¦ ²4ZZ Á  ³ be an isometry. Then 1)  is injective 2)  is continuous 3) ¢²4³¦4c is also an isometry and hence also continuous. Proof. To prove 1), we observe that ²%³ ~ ²&³ ¯ Z ²²%³Á ²&³³ ~  ¯ ²%Á &³ ~  ¯ % ~ &

To prove 2), let ²% ³ ¦ % in 4 then Z  ²²% ³Á ²%³³ ~ ²% Á %³ ¦  as  ¦ B and so ²²% ³³ ¦ ²%³ , which proves that  is continuous. Finally, we have ²c ²²%³³Á  c ²²&³³ ~ ²%Á &³ ~  Z ²²%³Á ²&³³ and so ¢²4³¦4c is an isometry. The Completion of a Metric Space While not all metric spaces are complete, any metric space can be embedded in a complete metric space. To be more specific, we have the following important theorem.

Theorem 12.8 Let ²4Á ³ be any metric space. Then there is a complete metric space ²4ZZ Á  ³ and an isometry  ¢ 4 ¦ ²4³ ‹ 4 Z for which  ²4³ is dense in 4ZZZ . The metric space ²4 Á  ³ is called a completion of ²4Á ³ . Moreover, ²4ZZ Á  ³ is unique, up to bijective isometry. Proof. The proof is a bit lengthy, so we divide it into various parts. We can simplify the notation considerably by thinking of sequences ²% ³ in 4 as functions ¢o ¦ 4, where ²³~ %. Cauchy Sequences in 4 The basic idea is to let the elements of 4 Z be equivalence classes of Cauchy sequences in 4²4³ . So let CS denote the set of all Cauchy sequences in 4 . If Á  CS ²4³ then, intuitively speaking, the terms ²³ get closer together as ¦B and so do the terms ²³ . Therefore, it seems reasonable that ²²³Á ²³³ should approach a finite limit as  ¦ B . Indeed, since ((²²³Á ²³³ c ²²³Á ²³³  ²²³Á ²³³ b ²²³Á ²³³ ¦  as Á  ¦ B it follows that ²²³Á ²³³ is a Cauchy sequence of real numbers, which implies that Metric Spaces 299

lim ²²³Á ²³³  B (12.4) ¦B (That is, the limit exists and is finite.³ Equivalence Classes of Cauchy Sequences in 4 We would like to define a metric ²4³Z on the set CS by Z ²Á ³ ~lim ²²³Á ²³³ ¦B However, it is possible that lim ²²³Á ²³³ ~  ¦B for distinct sequences  and , so this does not define a metric. Thus, we are led to define an equivalence relation on CS²4³ by  —  ¯lim ²²³Á ²³³ ~  ¦B

Let CS²4³ be the set of all equivalence classes of Cauchy sequences and define, for Á  CS ²4³ Z ²Á ³ ~lim ²²³Á ²³³ (12.5) ¦B where  and .

To see that ZZZ is well-defined, suppose that and . Then since —ZZ and — , we have ((²ZZ ²³Á  ²³³ c ²²³Á ²³³  ² Z ²³Á ²³³ b ² Z ²³Á ²³³ ¦  as ¦B. Thus, ZZ —  and  —  ¬ lim ² ZZ ²³Á  ²³³ ~ lim ²²³Á ²³³ ¦B ¦B ¬²Á³~²Á³ZZZ Z which shows that ZZ is well-defined. To see that is a metric, we verify the triangle inequality, leaving the rest to the reader. If Á and  are Cauchy sequences then ²²³Á ²³³  ²²³Á ²³³ b ²²³Á ²³³ Taking limits gives lim²²³Á ²³³  lim ²²³Á ²³³ b lim ²²³Á ²³³ ¦B ¦B ¦B 300 Advanced Linear Algebra and so

²Á³²Á³b²Á³ZZZ

Embedding ²4Á ³ in ²4ZZ Á  ³ For each %  4 , consider the constant Cauchy sequence ´%µ , where ´%µ²³ ~ % for all ¢4¦4 . The map  Z defined by ²%³ ~ ´%µ is an isometry, since

ZZ ² ²%³Á ²&³³ ~  ²´%µÁ ´&µ³ ~lim ²´%µ²³Á ´&µ²³³ ~ ²%Á &³ ¦B Moreover, ²4³ is dense in 4 Z . This follows from the fact that we can approximate any Cauchy sequence in 4 by a constant sequence. In particular, let 4Z . Since  is a Cauchy sequence, for any  € , there exists an 5 such that Á  ‚ 5 ¬ ²²³Á ²³³   Now, for the constant sequence ´²5³µ we have

Z45 ´²5³µÁ ~lim ²²5³Á²³³  ¦B and so ²4³ is dense in 4 Z . ²4ZZ Á  ³ Is Complete Suppose that

ÁÁÁÃ3 is a Cauchy sequence in 44Z . We wish to find a Cauchy sequence in for which

Z  ² Á ³ ~lim ² ²³Á ²³³ ¦  as  ¦ B ¦B

ZZ Since 4 and since  ²4³ is dense in 4 , there is a constant sequence

´ µ~² Á Áó for which  Z ² Á ´ µ³    Metric Spaces 301

We can think of  as a constant approximation to  , with error at most ° . Let  be the sequence of these constant approximations

²³ ~ 

This is a Cauchy sequence in 4 . Intuitively speaking, since the  's get closer to each other as ¦B , so do the constant approximations. In particular, we have

Z ²Á³~²´µÁ´µ³   ZZZ   ²´ µÁ  ³ b  ²  Á  ³ b  ²  Á ´ µ³  b²Á³b¦Z  as Á ¦ B . To see that  converges to  , observe that

ZZ Z   ² Á ³   ² Á ´ µ³ b  ²´ µÁ ³  blim ²  Á ²³³  ¦B  ~blim ²Á³  ¦B Now, since €5 is a Cauchy sequence, for any  , there is an such that

Á ‚ 5 ¬ ² Á ³   In particular,

‚5¬lim ²Á³  ¦B and so  ‚5¬²Á³Z b   which implies that ¦ , as desired. Uniqueness Finally, we must show that if ²4Z Á  Z ³ and ²4 ZZ Á  ZZ ³ are both completions of ²4Á ³ then 4ZZZ š 4 . Note that we have bijective isometries ¢4 ¦ ²4³‹4ZZZ and  ¢4 ¦ ²4³‹4 Hence, the map ~c ¢ ²4³ ¦  ²4³ is a bijective isometry from ²4³ onto ²4³ , where  ²4³ is dense in 4 Z . ²³See Figure 12.4. 302 Advanced Linear Algebra

Figure 12.4 Our goal is to show that  can be extended to a bijective isometry from 4 Z to 4 ZZ.

Z Let %4 . Then there is a sequence ²³ in  ²4³ for which ²³¦% . Since ² ³ is a Cauchy sequence in  ²4³ , ² ² ³³ is a Cauchy sequence in ZZ ZZ ²4³ ‹ 4 and since 4 is complete, we have ² ² ³³ ¦ & for some &4ZZ. Let us define  ²%³~& .

To see that  is well-defined, suppose that ² ³ ¦ % and ² ³ ¦ % , where both sequences lie in ²4³ . Then ZZ Z  ² ² ³Á ² ³³ ~  ²  Á  ³ ¦  as  ¦ B

ZZ and so ² ² ³³ and ² ² ³³ converge to the same element of 4 , which implies that ²%³ does not depend on the choice of sequence in ²4³ converging to % . Thus,  is well-defined. Moreover, if  ²4³ then the constant sequence ´µ converges to  and so  ²³ ~ lim ²³ ~ ²³ , which shows that  is an extension of  .

To see that  is an isometry, suppose that ² ³ ¦ % and ² ³ ¦ & . Then ZZ ² ² ³³ ¦ ²%³ and ² ² ³³ ¦ ²&³ and since  is continuous, we have ZZ ZZ Z Z  ² ²%³Á ²&³³ ~lim  ²  ² ³Á  ² ³³ ~ lim  ²  Á  ³ ~  ²%Á &³ ¦B ¦B Thus, we need only show that  is surjective. Note first that ²4³ ~im ² ³ ‹ im ² ³. Thus, if im ² ³ is closed, we can deduce from the fact ZZ ZZ that ²4³ is dense in 4 that im ²³~4 . So, suppose that ²²%³³ is a sequence in im²³ and ²²%³³¦' . Then ²²%³³  is a Cauchy sequence and Z therefore so is ²% ³ . Thus, ²% ³ ¦ %  4 . But  is continuous and so ²²%³³¦²%³ , which implies that  ²%³~' and so 'im ²³  . Hence,  is surjective and 4š4ZZZ . Metric Spaces 303

Exercises 1. Prove the generalized triangle inequality

²%Á%³²%Á%³b²%Á%³bÄb²%  3 c Á%³ 2. a) Use the triangle inequality to prove that ((²%Á &³ c ²Á ³  ²%Á ³ b ²&Á ³ b) Prove that ((²%Á '³ c ²&Á '³  ²%Á &³ 3. Let :‹MB be the subspace of all binary sequences (sequences of  's and :'s). Describe the metric on . 4. Let 4 ~ ¸Á ¹ be the set of all binary  -tuples. Define a function ¢ : d : ¦s by letting ²%Á &³ be the number of positions in which % and & differ. For example, ´²³Á ²³µ ~  . Prove that  is a metric. (It is called the Hamming distance function and plays an important role in the theory of error-correcting codes.³ 5. Let B .  a) If % ~²%³M show that % ¦ b) Find a sequence that converges to M but is not an element of any  for B.  6. a) Show that if %%~²%³M then M for all € .  b) Find a sequence % ~²%³ that is in M for € , but is not in M . 7. Show that a subset :4 of a metric space is open if and only if : contains an open neighborhood of each of its points. 8. Show that the intersection of any collection of closed sets in a metric space is closed. 9. Let ²4Á ³ be a metric space. The diameter of a nonempty subset : ‹ 4 is ²:³ ~sup ²%Á &³ %Á&:

A set :²:³B is bounded if  . a) Prove that :%4  is bounded if and only if there is some and s for which :‹)²%Á ³ . b) Prove that ²:³ ~  if and only if : consists of a single point. c) Prove that :‹; implies  ²:³ ²;³ . d) If :; and are bounded, show that :r; is also bounded. 10. Let ²4Á ³ be a metric space. Let Z be the function defined by ²%Á &³ ²%Á&³~Z b²%Á&³ 304 Advanced Linear Algebra

a) Show that ²4Á Z ³ is a metric space and that 4 is bounded under this metric, even if it is not bounded under the metric  . b) Show that the metric spaces ²4Á ³ and ²4Á Z ³ have the same open sets. 11. If : and ; are subsets of a metric space ²4Á ³ , we define the distance between :; and by ²:Á ; ³ ~ inf ²%Á &³ %:Á!;

a) Is it true that ²:Á ; ³ ~  if and only if : ~ ; ? Is a metric? b) Show that % cl ²:³ if and only if  ²¸%¹Á :³ ~  . 12. Prove that %4 is a limit point of :‹4 if and only if every neighborhood of %: meets in a point other than % itself. 13. Prove that %4 is a limit point of :‹4 if and only if every open ball )²%Á ³ contains infinitely many points of : . 14. Prove that limits are unique, that is, ²% ³ ¦ % , ²% ³ ¦ & implies that %~&. 15. Let : be a subset of a metric space 4 . Prove that % cl ²:³ if and only if there exists a sequence ²% ³ in : that converges to % . 16. Prove that the closure has the following properties: a) : ‹cl ²:³ b) clcl²²:³³~: c) cl²: r ; ³ ~ cl ²:³ r cl ²; ³ d) cl²: q ; ³ ‹ cl ²:³ q cl ²; ³ Can the last part be strengthened to equality? 17. a) Prove that the closed ball )²%Á ³ is always a closed subset. b) Find an example of a metric space in which the closure of an open ball )²%Á ³ is not equal to the closed ball )²%Á ³ . 18. Provide the details to show that s is separable. 19. Prove that d is separable. 20. Prove that a discrete metric space is separable if and only if it is countable. 21. Prove that the metric space 8´Á µ of all bounded functions on ´Á µ , with metric ²Á ³ ~sup (( ²%³ c ²%³ %´Áµ

is not separable. 22. Show that a function ¢ ²4Á ³ ¦ ²4ZZ Á  ³ is continuous if and only if the inverse image of any open set is open, that is, if and only if ²<³~¸%4“²%³<¹c is open in 4 whenever < is an open set in 4 Z. 23. Repeat the previous exercise, replacing the word open by the word closed. 24. Give an example to show that if ¢ ²4Á ³ ¦ ²4ZZ Á  ³ is a continuous function and <4 is an open set in , it need not be the case that ²<³ is open in 4 Z . Metric Spaces 305

25. Show that any convergent sequence is a Cauchy sequence.

26. If ²% ³ ¦ % in a metric space 4 , show that any subsequence ²% ³ of ²% ³ also converges to % . 27. Suppose that ²% ³ is a Cauchy sequence in a metric space 4 and that some

subsequence ²% ³ of ²% ³ converges. Prove that ²%  ³ converges to the same limit as the subsequence. 28. Prove that if ²% ³ is a Cauchy sequence then the set ¸% ¹ is bounded. What about the converse? Is a bounded sequence necessarily a Cauchy sequence? 29. Let ²% ³ and ²& ³ be Cauchy sequences in a metric space 4 . Prove that the sequence ~²%Á&³ converges. 30. Show that the space of all convergent sequences of real numbers ² or complex numbers) is complete as a subspace of MB . 31. Let Fd denote the metric space of all polynomials over , with metric ²Á³ ~sup (( ²%³ c ²%³ %´Áµ

Is F complete? 32. Let :‹MB be the subspace of all sequences with finite support (that is, with a finite number of nonzero terms). Is : complete? 33. Prove that the metric space { of all integers, with metric ²Á ³ ~((  c  , is complete. 34. Show that the subspace :*´Áµ of the metric space (under the sup metric) consisting of all functions   *´Á µ for which ²³ ~ ²³ is complete. 35. If 4š4ZZ and 4 is complete, show that 4 is also complete. 36. Show that the metric spaces *´Áµ and *´Áµ , under the sup metric, are isometric. 37. Prove Ho¨lder's inequality

BB° B °  ((%& 8989 (( %   (( &  ~ ~ ~ as follows. a) Show that ~!c ¬!~ c b) Let "# and be positive real numbers and consider the rectangle 9 in s with corners ²Á ³ , ²"Á ³ , ²Á #³ and ²"Á #³ , with area "# . Argue geometrically (that is, draw a picture) to show that "# "#  !c ! b c   and so "# "#  b   c) Now let ?~''(( % B and @ ~ (( & B . Apply the results of part b), to 306 Advanced Linear Algebra

((%& (( "~ Á #~ ?@° ° and then sum on  to deduce Ho¨lder's inequality. 38. Prove Minkowski's inequality

BBB° ° °  898989((%b&  (( %  b (( &  ~ ~ ~ as follows. a) Prove it for ~ first. b) Assume € . Show that

cc ((((((((((%b& %  %b& b&  %b& c) Sum this from ~ to  and apply Ho¨lder's inequality to each sum on the right, to get

  ((%b& ~ ° ° °   %b&H8(( 9 8 (( 9 I8 ( %b& ( 9 ~ ~ ~ Divide both sides of this by the last factor on the right and let ¦B to deduce Minkowski's inequality. 39. Prove that M is a metric space. Chapter 13 Hilbert Spaces

Now that we have the necessary background on the topological properties of metric spaces, we can resume our study of inner product spaces without qualification as to dimension. As in Chapter 9, we restrict attention to real and complex inner product spaces. Hence - will denote either sd or . A Brief Review Let us begin by reviewing some of the results from Chapter 9. Recall that an inner product space =- over is a vector space = , together with an inner product ºÁ »¢ = d = ¦ - . If - ~ s then the inner product is bilinear and if -~d, the inner product is sesquilinear.

An inner product induces a norm on = , defined by ))# ~j º#Á #» We recall in particular the following properties of the norm.

Theorem 13.1 1) ()The Cauchy-Schwarz inequality For all "Á #  = , (())))º"Á #»  " # with equality if and only if "~ # for some - . 2) ()The triangle inequality For all "Á #  = , ))))))"b#  " b # with equality if and only if "~ # for some - . 3) ()The parallelogram law

))))))))"b# b "c# ~ " b # We have seen that the inner product can be recovered from the norm, as follows. 308 Advanced Linear Algebra

Theorem 13.2 1) If = is a real inner product space then  º"Á#»~ ²)))) "b# c "c# ³  2) If = is a complex inner product space then  º"Á#»~ ²)))) "b# c "c# ³b ²"b# ) )))  c "c#  ³  The inner product also induces a metric on = defined by ²"Á #³ ~)) " c # Thus, any inner product space is a metric space.

Definition Let => and be inner product spaces and let B ²=Á>³ . 1)  is an isometry if it preserves the inner product, that is, if º ²"³Á ²#³» ~ º"Á #» for all "Á #  = . 2) A bijective isometry is called an isometric isomorphism . When ¢= ¦> is an isometric isomorphism, we say that => and are isometrically isomorphic.

It is easy to see that an isometry is always injective but need not be surjective, even if =~> . (See Example 10.3. ³

Theorem 13.3 A linear transformation B²=Á>³ is an isometry if and only if it preserves the norm, that is, if and only if ))))²#³ ~ # for all #= .

The following result points out one of the main differences between real and complex inner product spaces.

Theorem 13.4 Let = be an inner product space and let B  ²= ³ . 1) If º²#³Á$»~ for all #Á$= then ~ . 2) If = is a complex inner product space and 8 ²#³ ~ º ²#³Á #» ~  for all #= then  ~ . 3) Part 2) does not hold in general for real inner product spaces. Hilbert Spaces Since an inner product space is a metric space, all that we learned about metric spaces applies to inner product spaces. In particular, if ²% ³ is a sequence of Hilbert Spaces 309 vectors in an inner product space = then

²% ³ ¦ % if and only if )) % c % ¦  as  ¦ B The fact that the inner product is continuous as a function of either of its coordinates is extremely useful.

Theorem 13.5 Let = be an inner product space. Then 1) ²% ³ ¦ %Á ²& ³ ¦ & ¬ º%  Á & » ¦ º%Á &» 2) ²% ³ ¦ % ¬)) % ¦ )) %

Complete inner product spaces play an especially important role in both theory and practice.

Definition An inner product space that is complete under the metric induced by the inner product is said to be a Hilbert space .

Example 13.1 One of the most important examples of a Hilbert space is the space M of Example 10.2. Recall that the inner product is defined by B ºÁ»~%&  %&  ~ (In the real case, the conjugate is unnecessary.³ The metric induced by this inner product is

2 B °  ²%& Á ³ ~)) % c & ~89 ( % c & ( ~ which agrees with the definition of the metric space M given in Chapter 12. In other words, the metric in Chapter 12 is induced by this inner product. As we saw in Chapter 12, this inner product space is complete and so it is a Hilbert space. (In fact, it is the prototype of all Hilbert spaces, introduced by David Hilbert in 1912, even before the axiomatic definition of Hilbert space was given by John von Neumann in 1927.³

The previous example raises the question of whether or not the other metric spaces M²£ ), with distance given by

B °  ²%& Á ³ ~)) % c & ~89 ( % c & ( (13.1) ~ are complete inner product spaces. The fact is that they are not even inner product spaces! More specifically, there is no inner product whose induced metric is given by (13.1). To see this, observe that, according to Theorem 13.1, 310 Advanced Linear Algebra any norm that comes from an inner product must satisfy the parallelogram law

))))))))%&bbc~b %& % & But the norm in (13.1) does not satisfy this law. To see this, take %&~ ²Á Á à ³ and ~ ²Á cÁ à ³ . Then

))%&b~Ác~ )) %& and

° ° ))%&~ Á )) ~

Thus, the left side of the parallelogram law is h and the right side is 2° , which equals ~ if and only if .

Just as any metric space has a completion, so does any inner product space.

Theorem 13.6 Let = be an inner product space. Then there exists a Hilbert space /¢=¦/²=³/ and an isometry  for which is dense in . Moreover, / is unique up to isometric isomorphism. Proof. We know that the metric space ²= Á ³ , where  is induced by the inner product, has a unique completion ²=ZZ Á  ³ , which consists of equivalence classes ZZ of Cauchy sequences in = . If ²% ³  ²% ³  = and ²&  ³  ²& ³  = then we set

²% ³ b ²& ³ ~ ²%  b & ³Á ²% ³ ~ ² %  ³ and

º²% ³Á ²& ³» ~lim º%  Á & » ¦B

It is easy to see that, since ²% ³ and ²& ³ are Cauchy sequences, so are ²% b & ³ and ² %  ³ . In addition, these definitions are well-defined, that is, they are independent of the choice of representative from each equivalence class. For instance, if ²%V ³  ²% ³ then

lim ))%c%V ~ ¦B and so

(((())))º% Á & » c º%VVV  Á & » ~ º%  c %  Á & »  %  c %  &  ¦ 

(The Cauchy sequence ²& ³ is bounded. ³ Hence,

º²% ³Á ²&  ³» ~lim º%  Á & » ~ lim º%VV  Á & » ~ º²%  ³Á ²&  ³» ¦B ¦B We leave it to the reader to show that = Z is an inner product space under these operations. Hilbert Spaces 311

Moreover, the inner product on =ZZ induces the metric , since

º²% c & ³Á ²% c & ³» ~lim º%  c & Á % c & » ¦B  ~lim ²% Á & ³ ¦B Z ~  ²²% ³Á ²& ³³ Hence, the metric space isometry ¢= ¦=Z is an isometry of inner product spaces, since º ²%³Á ²&³» ~ Z ²  ²%³Á ²&³³ ~ ²%Á &³ ~ º%Á &» Thus, =²=³=Z is a complete inner product space and  is a dense subspace of Z that is isometrically isomorphic to = . We leave the issue of uniqueness to the reader.

The next result concerns subspaces of inner product spaces.

Theorem 13.7 1) Any complete subspace of an inner product space is closed. 2) A subspace of a Hilbert space is a Hilbert space if and only if it is closed. 3) Any finite-dimensional subspace of an inner product space is closed and complete. Proof. Parts 1) and 2) follow from Theorem 12.6. Let us prove that a finite- dimensional subspace := of an inner product space is closed. Suppose that ²% ³ is a sequence in : , ²% ³ ¦ % and % ¤ : . Let 8 ~ ¸  Á Ã Á  ¹ be an orthonormal Hamel basis for : . The Fourier expansion  ~ º%Á  » ~ in :%c has the property that £ but

º% c Á  » ~ º%Á  » c º Á  » ~ 

Thus, if we write & ~ % c and & ~ % c  : , the sequence ²&  ³ , which is in :&: , converges to a vector that is orthogonal to . But this is impossible, because &ž& implies that  )&c&~&b&‚&¦ ) ) ) )) )) ° This proves that : is closed.

To see that any finite-dimensional subspace : of an inner product space is complete, let us embed : (as an inner product space in its own right) in its completion ::Z . Then (or rather an isometric copy of : ) is a finite-dimensional 312 Advanced Linear Algebra subspace of a complete inner product space :Z and as such it is closed. However, :::~:: is dense in ZZ and so , which shows that is complete. Infinite Series Since an inner product space allows both addition of vectors and convergence of sequences, we can define the concept of infinite sums, or infinite series.

Definition Let = be an inner product space. The th partial sum of the sequence ²% ³ in = is

~%bÄb% 

If the sequence ²  ³ of partial sums converges to a vector  = , that is, if

))  c ¦ as ¦B then we say that the series  % converges to and write B  %~ ~ We can also define absolute convergence.

Definition A series  % is said to be absolutely convergent if the series B ))% ~ converges.

The key relationship between convergence and absolute convergence is given in the next theorem. Note that completeness is required to guarantee that absolute convergence implies convergence.

Theorem 13.8 Let == be an inner product space. Then is complete if and only if absolute convergence of a series implies convergence. Proof. Suppose that =%B is complete and that )) . Then the sequence of partial sums is a Cauchy sequence, for if € , we have

 )) c ~ii %   )) %  ¦ ~b ~b

Hence, the sequence ²  ³ converges, that is, the series  % converges.

Conversely, suppose that absolute convergence implies convergence and let ²% ³ be a Cauchy sequence in = . We wish to show that this sequence converges. Since ²% ³ is a Cauchy sequence, for each  €  , there exists an 5 Hilbert Spaces 313 with the property that  Á  ‚ 5 ¬)) % c %  

Clearly, we can choose 55Ä , in which case  ))%c% 55b   and so BB  ))%c%55b   B ~ ~  Thus, according to hypothesis, the series

B

²%55b c %  ³ ~ converges. But this is a telescoping series, whose  th partial sum is

%c%55b 

and so the subsequence ²%5 ³ converges. Since any Cauchy sequence that has a convergent subsequence must itself converge, the sequence ²% ³ converges and so = is complete. An Approximation Problem Suppose that =:= is an inner product space and that is a subset of . It is of considerable interest to be able to find, for any %= , a vector in : that is closest to % in the metric induced by the inner product, should such a vector exist. This is the approximation problem for = .

Suppose that %= and let  ~%cinf )) :

Then there is a sequence  for which

~%c)) ¦ as shown in Figure 13.1. 314 Advanced Linear Algebra

Figure 13.1

Let us see what we can learn about this sequence. First, if we let &~%c then according to the parallelogram law

 ))))))))&b& b&c&  ~²&  b&  ³ or

 &b& ))))))&c& ~²&  b&  ³chh (13.2) 

Now, if the set : is convex , that is, if %Á &  : ¬ % b ² c ³&  : for all     (in words : contains the line segment between any two of its points) then ²  b ³°  : and so

&b& b hhhh~%c ‚  Thus, (13.2) gives

 ))))))&c& ²&  b&  ³c ¦ as Á  ¦ B . Hence, if : is convex then the sequence ²& ³ ~ ²% c ³ is a Cauchy sequence and therefore so is ²  ³ .

If we also require that :² be complete then the Cauchy sequence ³ converges to a vector %:VV and by the continuity of the norm, we must have )) %c% ~ . Let us summarize and add a remark about uniqueness.

Theorem 13.9 Let =: be an inner product space and let be a complete convex subset of =%=%: . Then for any , there exists a unique V for which ))%c%V ~inf )) %c :

The vector %%:V is called the best approximation to in . Hilbert Spaces 315

Proof. Only the uniqueness remains to be established. Suppose that ))%c%V ~ ~ ) %c%Z ) Then, by the parallelogram law,

 )))%VV c %ZZ ~ ²% c % ³ c ²% c %³ )  ~))))) %c%VV b %c%ZZ c %c%c% ) Z   %b%V ~)))) %c%V  b %c%Z c %c hh2  b c ~ and so %~%V Z .

Since any subspace := of an inner product space is convex, Theorem 13.9 applies to complete subspaces. However, in this case, we can say more.

Theorem 13.10 Let =: be an inner product space and let be a complete subspace of =%=%: . Then for any , the best approximation to in is the unique vector %:ZZ for which %c%ž: . Proof. Suppose that %c%ZZ ž: , where % : . Then for any : , we have %c%ZZ ž c% and so

))))))))%c  ~ %c%ZZ b % c ‚ %c% Z 

Hence %~%Z V is the best approximation to % in : . Now we need only show that %c%ž:VV, where % is the best approximation to % in : . For any : , a little computation reminiscent of completing the square gives

))%c 2 ~º%c Á%c » ~)) % c º%Á »c º Á%»b ))

 º%Á » º%Á » ~)) % b )) 89 c  c )) ))  º%Á » º%Á »(( º%Á » ~%)) b )) 8989 c c c  )) )) ))

  º%Á »(( º%Á » ~%)) b )) ee c c )) ))

Now, this is smallest when º%Á » ~  • ))  316 Advanced Linear Algebra in which case

 ((º%Á » ))))%c  ~ % c )) 

Replacing %%c% by V gives  ((º% c %ÁV » ))))%c%c VV ~ %c% c )) 

But %%:%c VV is the best approximation to in and since  : we must have  ))))%c%c VV ‚ %c% Hence,

((º% c %ÁV »  ~ ))  or, equivalently, º% c %ÁV » ~  Hence, %c%ž:V .

According to Theorem 13.10, if : is a complete subspace of an inner product space =%= then for any , we may write % ~ %VV b ²% c %³ where %VV  : and % c %  :žžž . Hence, = ~ : b : and since : q : ~ ¸¹ , we also have =~:p:ž . This is the projection theorem for arbitrary inner product spaces.

Theorem 13.11 ( The projection theorem ) If : is a complete subspace of an inner product space = then =~:p:ž In particular, if :/ is a closed subspace of a Hilbert space then /~:p:ž Theorem 13.12 Let :; , and ;Z be subspaces of an inner product space = . 1) If =~:p; then ;~:ž . 2) If :p;~:p;ZZ then ;~; . Proof. If =~:p; then ;‹:ž by definition of orthogonal direct sum. On the other hand, if ':ž then '~ b! , for some : and !; . Hence,  ~ º'Á » ~ º Á » b º!Á » ~ º Á » Hilbert Spaces 317 and so ~ , implying that '~!; . Thus, :ž ‹; . Part 2) follows from part 1).

Let us denote the closure of the span of a set :²:³ of vectors by cspan .

Theorem 13.13 Let / be a Hilbert space. 1) If (/ is a subset of then cspan²(³ ~ (žž 2) If :/ is a subspace of then cl²:³ ~ :žž 3) If 2/ is a closed subspace of then 2~2žž Proof. We leave it as an exercise to show that ´²(³µ~(cspan žž . Hence / ~cspan ²(³ p ´ cspan ²(³µžž ~ cspan ²(³ p ( But since (ž is closed, we also have /~(žžž p( and so by Theorem 13.12, cspan²(³ ~ (žž . The rest follows easily from part 1).

In the exercises, we provide an example of a closed subspace 2 of an inner product space =2£2 for which žž . Hence, we cannot drop the requirement that / be a Hilbert space in Theorem 13.13.

Corollary 13.14 If ( is a subset of a Hilbert space / then span ²(³ is dense in / if and only if (ž ~ ¸¹ . Proof. As in the previous proof, / ~cspan ²(³ p (ž and so (ž ~ ¸¹ if and only if / ~cspan ²(³ . Hilbert Bases We recall the following definition from Chapter 9.

Definition A maximal orthonormal set in a Hilbert space / is called a Hilbert basis for / .

Zorn's lemma can be used to show that any nontrivial Hilbert space has a Hilbert basis. Again, we should mention that the concepts of Hilbert basis and Hamel basis (a maximal linearly independent set) are quite different. We will show 318 Advanced Linear Algebra later in this chapter that any two Hilbert bases for a Hilbert space have the same cardinality.

Since an orthonormal set EE is maximal if and only if ž ~ ¸¹ , Corollary 13.14 gives the following characterization of Hilbert bases.

Theorem 13.15 Let E be an orthonormal subset of a Hilbert space / . The following are equivalent: 1) E is a Hilbert basis 2) Ež ~ ¸¹ 3) EE is a total subset of /²³~/ , that is, cspan .

Part 3) of this theorem says that a subset of a Hilbert space is a Hilbert basis if and only if it is a total orthonormal set. Fourier Expansions We now want to take a closer look at best approximations. Our goal is to find an explicit expression for the best approximation to any vector % from within a closed subspace :/ of a Hilbert space . We will find it convenient to consider three cases, depending on whether : has finite, countably infinite, or uncountable dimension. The Finite-Dimensional Case

Suppose that E ~¸"ÁÃÁ" ¹ is an orthonormal set in a Hilbert space / . Recall that the Fourier expansion of any %/ , with respect to E , is given by  %V ~ º%Á " »" ~ where º%Á " » is the Fourier coefficient of % with respect to " . Observe that

º% c %ÁVV " » ~ º%Á " » c º%Á " » ~  and so %c%žV span ²E ³ . Thus, according to Theorem 13.10, the Fourier expansion %%²³V is the best approximation to in span E . Moreover, since %c%ž%VV, we have ))%~%c%c%%VV )) ) )  )) and so ))%%V )) with equality if and only if %~%V , which happens if and only if %span ²E ³ . Let us summarize. Hilbert Spaces 319

Theorem 13.16 Let E ~¸"ÁÃÁ" ¹ be a finite orthonormal set in a Hilbert space /%/ . For any , the Fourier expansion %%V of is the best approximation to %²³ in span E . We also have Bessel's inequality ))%%V )) or, equivalently

  (())º%Á " »  % (13.3) ~ with equality if and only if %span ²E ³ . The Countably Infinite-Dimensional Case In the countably infinite case, we will be dealing with infinite sums and so questions of convergence will arise. Thus, we begin with the following.

Theorem 13.17 Let E ~¸"Á"Áù be a countably infinite orthonormal set in a Hilbert space / . The series B  " (13.4) ~ converges in / if and only if the series B  ((  (13.5) ~ converges in s . If these series converge then they converge unconditionally (that is, any series formed by rearranging the order of the terms also converges). Finally, if the series (13.4) converges then

BB  ii " ~((  ~ ~

Proof. Denote the partial sums of the first series by  and the partial sums of the second series by  . Then for

  )) c ~ii "  ~ ((((  ~c  ~b ~b

Hence ²  ³ is a Cauchy sequence in / if and only if ² ³ is a Cauchy sequence in ss . Since both / and are complete, ²  ³ converges if and only if ² ³ converges.

If the series (13.5) converges then it converges absolutely and hence unconditionally. (A real series converges unconditionally if and only if it 320 Advanced Linear Algebra converges absolutely.³ But if (13.5) converges unconditionally then so does (13.4). The last part of the theorem follows from the continuity of the norm.

Now let E ~¸"Á"Áù be a countably infinite orthonormal set in / . The Fourier expansion of a vector %/ is defined to be the sum B %V ~ º%Á " »" (13.6) ~ To see that this sum converges, observe that, for any € , (13.3) gives   (())º%Á " »  % ~ and so

B  (())º%Á " »  % ~ which shows that the series on the left converges. Hence, according to Theorem 13.17, the Fourier expansion (13.6) converges unconditionally.

Moreover, since the inner product is continuous,

º% c %ÁVV " » ~ º%Á " » c º%Á " » ~  and so %c%´VVspan ²EE ³µžž ~´ cspan ² ³µ . Hence, % is the best approximation to %²³%c%ž% in cspan E . Finally, since VV , we again have ))%~%c%c%%VV )) ) )  )) and so ))%%V )) with equality if and only if %~%V , which happens if and only if %cspan ²E ³ . Thus, the following analog of Theorem 13.16 holds.

Theorem 13.18 Let E ~¸"Á"Áù be a countably infinite orthonormal set in a Hilbert space /%/ . For any , the Fourier expansion B %V ~ º%Á " »" ~ of %%²³ converges unconditionally and is the best approximation to in cspan E . We also have Bessel's inequality ))%%V )) Hilbert Spaces 321 or, equivalently

B  (())º%Á " »  % ~ with equality if and only if %cspan ²E ³ . The Arbitrary Case

To discuss the case of an arbitrary orthonormal set E ~¸" “2¹ , let us first define and discuss the concept of the sum of an arbitrary number of terms. (This is a bit of a digression, since we could proceed without all of the coming detailsc³ but they are interesting.

Definition Let A ~¸% “2¹ be an arbitrary family of vectors in an inner product space = . The sum

 % 2 is said to converge to a vector %= and we write

%~ % (13.7) 2 if for any  € , there exists a finite set :‹2 for which

;Š:Á; finite ¬ii % c%  ; For those readers familiar with the language of convergence of nets, the set F²2³ of all finite subsets of 2 is a directed set under inclusion (for every (Á ) FF ²2³ there is a *  ²2³ containing ( and ) ) and the function

:¦ % : is a net in /² . Convergence of 13.7) is convergence of this net. In any case, we will refer to the preceding definition as the net definition of convergence.

It is not hard to verify the following basic properties of net convergence for arbitrary sums.

Theorem 13.19 Let A ~¸% “2¹ be an arbitrary family of vectors in an inner product space = . If

%~% and &~& 2 2 then 322 Advanced Linear Algebra

1) ()Linearity

 ² % b & ³ ~ % b & 2 for any Á  - 2) ()Continuity

º% Á&»~º%Á&» and º&Á% »~º&Á%» 2 2 The next result gives a useful “Cauchy type” description of convergence.

Theorem 13.20 Let A ~¸% “2¹ be an arbitrary family of vectors in an inner product space = . 1) If the sum

% 2 converges then for any  € , there exists a finite set 0‹2 such that

1q0~JÁ1 finite ¬ii %  1 2) If = is a Hilbert space then the converse of 1) also holds. Proof. For part 1), given  € , let :‹2 , : finite, be such that

 ;Š:Á; finite ¬ % c%  ii 2 ; If 1q:~J1 , finite then

iii%~²%b%c%³c²%c%³ i 11::   % c% b % c%  b ~ iiii22 1r: :

As for part 2), for each € , let 0 ‹2 be a finite set for which

 1q0 ~JÁ1 finite ¬ii %  1  and let

&~ %

0 Hilbert Spaces 323

Then ²& ³ is a Cauchy sequence, since

))&c& ~iiii %c  %  ~  %c   %  00 0c00c0   %b%b¦ iiii 0c0 0  c0

Since =²&³¦& is assumed complete, we have  .

Now, given  € , there exists an 5 such that

 ‚5 ¬)) & c& ~ % c&  ii2 0

Setting ~max ¸5Á°¹ gives for ; Š0 Á ; finite

iii%c&~ %c&b % i ; 0;c0   %c&b%b  iiii 0;c0  and so 2%& converges to .

The following theorem tells us that convergence of an arbitrary sum implies that only countably many terms can be nonzero so, in some sense, there is no such thing as a nontrivial uncountable sum.

Theorem 13.21 Let A ~¸% “2¹ be an arbitrary family of vectors in an inner product space = . If the sum

% 2 converges then at most a countable number of terms % can be nonzero. Proof. According to Theorem 13.20, for each € , we can let 0 ‹2 , 0 finite, be such that

 1q0 ~JÁ1 finite ¬ii %  1   Let 0~ 0 . Then 0 is countable and  ¤0¬¸¹q0 ~J for all ¬)) %  for all ¬% ~   324 Advanced Linear Algebra

Here is the analog of Theorem 13.17.

Theorem 13.22 Let E ~¸" “2¹ be an arbitrary orthonormal family of vectors in a Hilbert space / . The two series

  " and ((  2 2 converge or diverge together. If these series converge then

  ii " ~((  2 2 Proof. The first series converges if and only if for any  € , there exists a finite set 0‹2 such that

  1q0~JÁ1 finite ¬ii "  1 or, equivalently

  1q0~JÁ1 finite ¬((   1 and this is precisely what it means for the second series to converge. We leave proof of the remaining statement to the reader.

The following is a useful characterization of arbitrary sums of nonnegative real terms.

Theorem 13.23 Let ¸  “   2¹ be a collection of nonnegative real numbers. Then

 ~sup (13.8) 1 finite 21‹2 1 provided that either of the preceding expressions is finite. Proof. Suppose that

sup  ~9B 1 finite 1‹2 1

Then, for any  € , there exists a finite set :‹2 such that

9‚  ‚9c : Hilbert Spaces 325

Hence, if ;‹2 is a finite set for which ;Š: then since  ‚ ,

9‚ ‚ ‚9c ; : and so

ii9c   ; which shows that  9 converges to . Finally, if the sum on the left of (13.8) converges then the supremum on the right is finite and so (13.8) holds.

The reader may have noticed that we have two definitions of convergence for countably infinite series: the net version and the traditional version involving the limit of partial sums. Let us write

B %% and ob ~ for the net version and the partial sum version, respectively. Here is the relationship between these two definitions.

Theorem 13.24 Let /%/ be a Hilbert space. If  then the following are equivalent:

1)  %% converges (net version) to ob B 2)  %% converges unconditionally to ~

Proof. Assume that 1) holds. Suppose that o is any permutation of b . Given any o€ , there is a finite set :‹ b for which

;Š:Á; finite ¬ii % c%  ;

Let us denote the set of integers ¸Á à Á ¹ by 0 and choose a positive integer  such that ²0 ³ Š : . Then for  ‚  we have



²0 ³ Š ²0 ³ Š : ¬ % c % ~ % c %   ii²³  ~  ²0 ³ and so 2) holds. 326 Advanced Linear Algebra

Next, assume that 2) holds, but that the series in 1) does not converge. Then there exists an o€ such that, for any finite subset 0‹ b , there exists a finite subset 11q0~J with for which

ii %€  1

From this, we deduce the existence of a countably infinite sequence 1 of mutually disjoint finite subsets of ob with the property that

max²1bb ³ ~ 4   ~ min ²1 ³ and

ii %€  1 Now, we choose any permutation o¢¦bb o with the following properties 1) ²´ Á 4 µ³ ‹ ´  Á 4 µ

2) if 1ÁÁ" ~¸ ÁÃÁ ¹ then

² ³ ~  Á Á ²  b1) ~  Á2 Á Ã Á  ²  b "  c 1) ~  Á" The intention in property 2) is that, for each  ,  takes a set of consecutive integers to the integers in 1 .

For any such permutation  , we have

b"c iiii%~²³ %€  ~ 1 which shows that the sequence of partial sums of the series

B %²³ ~ is not Cauchy and so this series does not converge. This contradicts 2) and shows that 2) implies at least that 1) converges. But if 1) converges to &/ then since 1) implies 2) and since unconditional limits are unique, we have &~%. Hence, 2) implies 1).

Now we can return to the discussion of Fourier expansions. Let E ~¸" “2¹ be an arbitrary orthonormal set in a Hilbert space / . Given any %/ , we may apply Theorem 13.16 to all finite subsets of E , to deduce Hilbert Spaces 327 that

 sup (())º%Á " »  % 1 finite 1‹2 1 and so Theorem 13.23 tells us that the sum

 ((º%Á " » 2 converges. Hence, according to Theorem 13.22, the Fourier expansion

%V ~ º%Á " »" 2 of % also converges and

 ))%V ~ ( º%Á " » ( 2 Note that, according to Theorem 13.21, %V is a countably infinite sum of terms of the form º%Á " »" and so is in cspan ²E ³ .

The continuity of infinite sums with respect to the inner product (Theorem 13.19) implies that

º% c %ÁVV " » ~ º%Á " » c º%Á " » ~  and so %c%´VVspan ²EE ³µžž ~´ cspan ² ³µ . Hence, Theorem 3.10 tells us that % is the best approximation to %²³%c%ž% in cspan E . Finally, since VV , we again have

))%~%c%c%%VV )) ) )  )) and so ))%%V )) with equality if and only if %~%V , which happens if and only if %cspan ²E ³ . Thus, we arrive at the most general form of a key theorem about Hilbert spaces.

Theorem 13.25 Let E ~¸" “2¹ be an orthonormal family of vectors in a Hilbert space /%/ . For any , the Fourier expansion

%V ~ º%Á " »" 2 of %/ converges in and is the unique best approximation to %²³ in cspan E . Moreover, we have Bessel's inequality ))%%V )) 328 Advanced Linear Algebra or, equivalently

 (())º%Á " »  % 2 with equality if and only if %cspan ²E ³ . A Characterization of Hilbert Bases

Recall from Theorem 13.15 that an orthonormal set E ~¸" “2¹ in a Hilbert space / is a Hilbert basis if and only if cspan²³~/E Theorem 13.25 then leads to the following characterization of Hilbert bases.

Theorem 13.26 Let E ~¸" “2¹ be an orthonormal family in a Hilbert space / . The following are equivalent: 1) E is a Hilbert basis (a maximal orthonormal set) 2) Ež ~ ¸¹ 3) EE is total (that is, cspan²³~/ ) 4) %~%V for all %/ 5) Equality holds in Bessel's inequality for all %/ , that is, ))%~% ))V for all %/ . 6) Parseval's identity º%Á &» ~ º%ÁVV &» holds for all %Á &  / , that is,

º%Á &» ~ º%Á " »º&Á " » 2 Proof. Parts 1), 2) and 3) are equivalent by Theorem 13.15. Part 4) implies part 3), since %V cspan ²E ³ and 3) implies 4) since the unique best approximation of any %cspan ²E ³ is itself and so %~%V . Parts 3) and 5) are equivalent by Theorem 13.25. Parseval's identity follows from part 4) using Theorem 13.19. Finally, Parseval's identity for &~% implies that equality holds in Bessel's inequality. Hilbert Dimension We now wish to show that all Hilbert bases for a Hilbert space / have the same cardinality and so we can define the Hilbert dimension of / to be that cardinality. Hilbert Spaces 329

Theorem 13.27 All Hilbert bases for a Hilbert space / have the same cardinality. This cardinality is called the Hilbert dimension of / , which we denote by hdim²/³ . Proof. If / has a finite Hilbert basis then that set is also a Hamel basis and so all finite Hilbert bases have size dim²/³ . Suppose next that 8 ~ ¸ “   2¹ and 9 ~¸“1¹ are infinite Hilbert bases for / . Then for each  , we have

 ~ º Á  »

1 where 1 is the countable set ¸ “ º Á  » £ ¹ . Moreover, since no  can be  orthogonal to every 1~11 , we have 2 . Thus, since each  is countable, we have

((1~ee 1 L2~2 (( (( 2 By symmetry, we also have ((21 (( and so the Schro¨der-Bernstein theorem implies that ((1~2 ( ( .

Theorem 13.28 Two Hilbert spaces are isometrically isomorphic if and only if they have the same Hilbert dimension. Proof. Suppose that hdim²/ ³ ~ hdim ²/ ³ . Let E ~ ¸" “   2¹ be a Hilbert basis for /~¸#“2¹ and E a Hilbert basis for /  . We may define a map ¢/ ¦/ as follows

45 " ~ #  2 2 We leave it as an exercise to verify that  is a bijective isometry. The converse is also left as an exercise. A Characterization of Hilbert Spaces ) We have seen that any vector space =²-³ is isomorphic to a vector space  of all functions from )- to that have finite support. There is a corresponding result for Hilbert spaces. Let 2 be any nonempty set and let

M ²2³ ~Dc ¢ 2 ¦d (( ²³  B E 2 The functions in M²2³ are referred to as square summable functions . ² We can also define a real version of this set by replacing ds by .³ We define an inner product on M²2³ by

ºÁ» ~ ²³²³ 2 The proof that M²2³ is a Hilbert space is quite similar to the proof that 330 Advanced Linear Algebra

M~M²³o is a Hilbert space and the details are left to the reader. If we define    M ²2³ by ~if ²³ ~ ~ ÁF £if then the collection

E~¸ “2¹ is a Hilbert basis for M²2³ , of cardinality (( 2 . To see this, observe that

º Á » ~   ²³   ²³ ~  Á 2 and so E is orthonormal. Moreover, if M²2³ then ²³£ for only a Z countable number of 2 , say ¸ÁÁù . If we define  by B Z ~ ²³  ~ then ZZ cspan ²E ³ and  ²³ ~ ²³ for all   2 , which implies that  ~  Z . This shows that M²2³~ cspan ²EE ³ and so is a total orthonormal set, that is, a Hilbert basis for M²2³ .

Now let /~¸"“2¹ be a Hilbert space, with Hilbert basis 8  . We define a map 8¢/ ¦M ²2³ as follows. Since is a Hilbert basis, any %/ has the form

% ~ º%Á " »" 2 Since the series on the right converges, Theorem 13.22 implies that the series

 ((º%Á " » 2 converges. Hence, another application of Theorem 13.22 implies that the following series converges

²%³ ~ º%Á " » 2 It follows from Theorem 13.19 that  is linear and it is not hard to see that it is also bijective. Notice that ²" ³ ~ and so  takes the Hilbert basis 8 for / to the Hilbert basis E for M²2³ . Hilbert Spaces 331

Notice also that

  ))²%³ ~ º ²%³Á ²%³» ~ ( º%Á " » ( ~ii º%Á " »" ~ )) % 2 2 and so  is an isometric isomorphism. We have proved the following theorem.

Theorem 13.29 If /2 is a Hilbert space of Hilbert dimension  and if is any set of cardinality  then / is isometrically isomorphic to M ²2³ . The Riesz Representation Theorem We conclude our discussion of Hilbert spaces by discussing the Riesz representation theorem. As it happens, not all linear functionals on a Hilbert space have the form “take the inner product withà ,” as in the finite- dimensional case. To see this, observe that if &/ then the function

& ²%³ ~ º%Á &» is certainly a linear functional on / . However, it has a special property. In particular, the Cauchy-Schwarz inequality gives, for all %/

(((())))& ²%³ ~ º%Á &»  % & or, for all %£ , ((²%³ & &)) ))% Noticing that equality holds if %~& , we have ((²%³ sup & ~&)) %£ ))%

This prompts us to make the following definition, which we do for linear transformations between Hilbert spaces (this covers the case of linear functionals).

Definition Let ¢/ ¦/ be a linear transformation from /  to / . Then is said to be bounded if ))²%³ sup B %£ ))%

If the supremum on the left is finite, we denote it by )) and call it the norm of . 332 Advanced Linear Algebra

Of course, if ¢/ ¦ - is a bounded linear functional on / then ((²%³ ))~sup %£ ))%

The set of all bounded linear functionals on a Hilbert space / is called the continuous dual space, or conjugate space , of // and denoted by i . Note that this differs from the algebraic dual of / , which is the set of all linear functionals on / . In the finite-dimensional case, however, since all linear functionals are bounded (exercise), the two concepts agree. (Unfortunately, there is no universal agreement on the notation for the algebraic dual versus the continuous dual. Since we will discuss only the continuous dual in this section, no confusion should arise.³

The following theorem gives some simple reformulations of the definition of norm.

Theorem 13.30 Let ¢/ ¦/ be a bounded linear transformation. 1) ))~²%³sup ) ) ))%~ 2) ))~²%³sup ) ) ))% 3) ))s~¸“²%³%inf ) ) )) for all %/¹

The following theorem explains the importance of bounded linear transformations.

Theorem 13.31 Let ¢/ ¦/ be a linear transformation. The following are equivalent: 1)  is bounded 2)  is continuous at any point %/ 3)  is continuous. Proof. Suppose that  is bounded. Then

))))))))²%³c ²%³~  ²%c%³   %c%  ¦ as %¦% . Hence,  is continuous at % . Thus, 1) implies 2). If 2) holds then for any &/ , we have

)))²%³c²&³~  ²%c&b%³c²%³¦  ) as %¦& , since  is continuous at % and %c&b% ¦% as &¦% . Hence, is continuous at any &/ and 3) holds. Finally, suppose that 3) holds. Thus,  is continuous at € and so there exists a  such that ))% ¬ ) ²%³ ) Hilbert Spaces 333

In particular, ))²%³  ))%~ ¬  ))%  and so ))²%³²%³ ))  ))%~¬%~ ) ) ¬  ¬  ))%% ))  Thus,  is bounded.

Now we can state and prove the Riesz representation theorem.

Theorem 13.32 ( The Riesz representation theorem ) Let / be a Hilbert space. For any bounded linear functional / on , there is a unique '/ such that

²%³ ~ º%Á ' » for all %/ . Moreover, )) ' ~ ))  . Proof. If ~ , we may take ' ~ , so let us assume that £ . Hence, 2 ~ker ²³ £ / and since  is continuous, 2 is closed. Thus /~2p2ž Now, the first isomorphism theorem, applied to the linear functional ¢/ ¦ - , implies that /°2 š - (as vector spaces). In addition, Theorem 3.5 implies that /°2 š 2žž and so 2 š - . In particular, dim ²2 ž ³ ~  .

For any '2ž , we have %2¬²%³~~º%Á'» Since dim²2žž ³ ~  , all we need do is find a  £ '  2 for which ²'³ ~ º'Á'» for then ² '³ ~ ²'³ ~ º'Á '» ~ º 'Á '» for all  - , showing that ²%³ ~ º%Á '» for %  2 as well.

But if £'2ž then

²'³ '~ '  º'Á '» has this property, as can be easily checked. The fact that ))'~ )) has already been established. 334 Advanced Linear Algebra

Exercises 1. Prove that the sup metric on the metric space *´Áµ of continuous functions on ´Á µ does not come from an inner product. Hint: let ²!³ ~  and ²!³ ~ ²! c a ³°² c a ³ and consider the parallelogram law. 2. Prove that any Cauchy sequence that has a convergent subsequence must itself converge. 3. Let =()= be an inner product space and let and be subsets of . Show that a) (‹)¬)žž ‹( b) (=ž is a closed subspace of c) ´²(³µ~(cspan žž 4. Let =:‹= be an inner product space and . Under what conditions is :~:žžž ž? 5. Prove that a subspace :/ of a Hilbert space is closed if and only if :~:žž. 6. Let =M be the subspace of  consisting of all sequences of real numbers, with the property that each sequence has only a finite number of nonzero terms. Thus, =2= is an inner product space. Let be the subspace of consisting of all sequences %~²%³ in = with the property that žž '%°~ . Show that 2 is closed, but that 2 £2 . Hint: For the latter, show that 2ž ~¸¹ by considering the sequences "~²ÁÃÁcÁó , where the term c is in the nth coordinate position. 7. Let E'~¸"Á"Áù be an orthonormal set in / . If %~ "  converges, show that

B  ))%~ (  ( ~ 8. Prove that if an infinite series

B % ~ converges absolutely in a Hilbert space / then it also converges in the sense of the “net” definition given in this section. 9. Let ¸  “   2¹ be a collection of nonnegative real numbers. If the sum on the left below converges, show that

 ~sup 1 finite 21‹2 1 10. Find a countably infinite sum of real numbers that converges in the sense of partial sums, but not in the sense of nets. 11. Prove that if a Hilbert space / has infinite Hilbert dimension then no Hilbert basis for / is a Hamel basis. 12. Prove that M²2³ is a Hilbert space for any nonempty set 2 . Hilbert Spaces 335

13. Prove that any linear transformation between finite-dimensional Hilbert spaces is bounded. 14. Prove that if /i then ker ²³ is a closed subspace of / . 15. Prove that a Hilbert space is separable if and only if hdim²/³  L . 16. Can a Hilbert space have countably infinite Hamel dimension? 17. What is the Hamel dimension of M² o ³ ? 18. Let  and be bounded linear operators on / . Verify the following: a) )) ~  (()) b) ))))))b  b  c) )) ))))   19. Use the Riesz representation theorem to show that /š/i for any Hilbert space /. Chapter 14 Tensor Products

In the preceding chapters, we have seen several ways to construct new vector spaces from old ones. Two of the most important such constructions are the direct sum

Referring to Figure 14.1, consider a set ( and two functions and , with domain (. f A S

W g

X

Figure 14.1 Suppose that this diagram commutes , that is, that there exists a unique function ¢: ¦? for which ~ k What does this say about the relationship between the functions  and ? 338 Advanced Linear Algebra

Let us think of the “information” contained in a function ¢ ( ¦ ) to be the way in which () distinguishes elements of using labels from . The relationship above implies that ²³ £ ²³ ¬ ²³ £ ²³ This can be phrased by saying that whatever ability  has to distinguish elements of ( is also possessed by . Put another way, except for labeling differences, any information contained in  is also contained in . This is sometimes expressed by saying that  can be factored through .

If  happens to be injective, then the only difference between  and is the values of the labels. That is, the two functions have equal ability to distinguish elements of ( . However, in general,  is not required to be injective and so may contain more information than  .

Suppose now that for all sets ?: in some family I of sets that includes and for all functions ¢ ( ¦ ? in some family < of functions that includes  , the diagram in Figure 14.1 commutes. This says that the information contained in every function in < is also contained in  . In other words, captures and preserves the information in every member of < . In this sense, ¢( ¦ : is universal among all functions ¢ ( ¦ ? in < .

Moreover, since  is a member of the family < , we can also say that contains the information in < but no more . In other words, the information in the universal function  is precisely the same as the information in the entire family <. In this way, a single function ¢( ¦ : (more precisely, a single pair ²:Á³ ) can capture a concept, as described by a family of functions, such as the concepts of basis, quotient space, direct sum and bilinearity (as we will see)!

Let us make a formal definition.

Definition Referring to Figure 14.2, let ( be a set and let I be a family of sets. Let be a family of functions from ( to members of . Let be a family of functions on members of I> . We assume that has the following structure: 1) >I contains the identity function for each member of 2) > is closed under composition of functions 3) Composition of functions in > is associative. We also assume that for any > and < , the composition  k is defined and is a member of < . Tensor Products 339

S1 f 1 W1 W A S 2 f 2 2 W3 f 3 S3

Figure 14.2 Let us refer to > as the measuring set and its members as measuring functions.

A pair ²:Á¢(¦:³ , where : I< and   has the universal property for <> as measured by , or is a universal pair for ²Á <>I ³ , if for any ? and any ¢ ( ¦ ? in <> , there is a unique ¢ : ¦ ? in for which the diagram in Figure 14.1 commutes, that is, ~ k Another way to express this is to say that any < can be factored through , or that any <> can be lifted to a function : on .

Universal pairs are essentially unique, as the following describes.

Theorem 14.1 Let ²:Á¢(¦:³ and ²;Á¢(¦;³ be universal pairs for ²Á<> ³. Then there is a bijective measuring function   > for which ²:³ ~ ; . Proof. With reference to Figure 14.3, there are unique measuring functions ¢: ¦; and ¢; ¦: for which ~ k ~ k Hence, ~² k ³k ~² k ³k

However, referring to the third diagram in Figure 14.3, both k¢:¦: and the identity map >¢: ¦: are members of that make the diagram commute. Hence, the uniqueness requirement implies that k~  . Similarly  k~  and so  and are inverses of one another, making ~ the desired bijection. 340 Advanced Linear Algebra

f g f A S A T A S

W V VW L g f f

T S S

Figure 14.3

Now let us look at some examples of the universal property. Let Vect²- ³ denote the family of all vector spaces over the base field - . (We use the term family informally to represent what in set theory is formally referred to as a class. A class is a “collection” that is too large to be considered a set. For example, Vect²- ³ is a class.)

Example 14.1 (Bases: the universal property for set functions from a set ( into a vector space, as measured by linearity) Let ( be a nonempty set. Let I<~²-³Vect and let be the family of set functions with domain ( . The measuring set > is the family of linear transformations.

If =(( is the vector space with basis , that is, the set of all formal linear combinations of members of (- with coefficients in , then the pair ²=(( Á¢(¦= ³ where  is the injection map ²³~ , is universal for ²<> Á ³ . To see this, note that the condition that < can be factored through  ~ k is equivalent to the statement that ²³ ~ ²³ for each basis vector   ( . But this uniquely defines a linear transformation  . Note also that Theorem 14.1 implies that if ²>Á¢(¦>³ is also universal for ²<> Á ³ , then there is a bijective measuring function from =>(( to , that is, > and = are isomorphic.

Example 14.2 (Quotient spaces and canonical projections: the universal property for linear transformations from = whose kernel contains a particular subspace 2= of , as measured by linearity) Let = be a vector space with subspace 2 . Let I< ~Vect ²- ³ . Let be the family of linear transformations with domain =2 whose kernel contains . The measuring set > is the family of linear transformations. Then Theorem 3.4 says precisely that the pair ²= °2Á ¢ = ¦ = °2³, where is the canonical projection map, has the universal property for <> as measured by .

Example 14.3 (Direct sums: the universal property for pairs ²Á ³ of linear maps with the same range, where <= has domain and has domain , as measured by linearity) Let <= and be vector spaces and consider the ordered Tensor Products 341 pair ²< Á = ³ . Let I< ~Vect ²- ³ . A member of is an ordered pair ²¢< ¦>Á¢= ¦>³ of linear transformations, written ²Á ³¢ ²< Á = ³ ¦ > for which ²Á ³²"Á #³ ~ ²"³ b ²#³ The measure set >> is the set of all linear transformations. For  and ²Á ³  <, we set k²Á³~² kÁ k³

We claim that the pair ²<^^ = Á ² Á  ³¢ ²< Á = ³ ¦ < = ³ , where

 ²"³ ~ ²"Á ³  ²#³ ~ ²Á #³ are the canonical injections, has the universal property for ²Á<> ³ .

To see this, observe that for any function ²Á ³¢ ²< Á = ³ ¦ > , that is, for any pair of linear transformations ¢< ¦ > and ¢= ¦ > , the condition

²Á³~ k²Á³ is equivalent to

²Á ³ ~ ² k  Á k  ³ or ²"³ b ²#³ ~ ²"Á ³ b ²Á #³ ~ ²"Á #³ But the condition ²"Á #³ ~ ²"³ b ²#³ does indeed define a unique linear transformation ^¢< = ¦> . (For those familiar with category theory, we have essentially defined the coproduct of vector spaces, which is equivalent to the product, or direct sum.)

It should be clear from these examples that the notion of universal property is, well, universal. In fact, it happens that the most useful definition of tensor product is through its universal property. Bilinear Maps The universality that defines tensor products rests on the notion of a .

Definition Let <= , and > be vector spaces over - . Let

¢< d= ¦ > is bilinear if it is linear in both variables separately, that is, if ² "b "Á#³~ZZ ²"Á#³b ²"Á#³ and ²"Á #b #³ZZ ~ ²"Á#³b ²"Á#³ The set of all bilinear functions from is denoted by hom- ²<Á=Â>³. A bilinear function ¢< d= ¦- , with values in the base field -

It is important to emphasize that, in the definition of bilinear function,

In fact, if ==d= is a vector space, we have two classes of functions from to >, the linear maps B ²=d=Á>³ where =d= is the direct product of vector spaces, and the bilinear maps hom²=Á=Â>³ , where = d= is just the cartesian product of sets. We leave it as an exercise to show that these two classes have only the zero map in common. In other words, the only map that is both linear and bilinear is the zero map.

We made a thorough study of bilinear forms on a finite-dimensional vector space = in Chapter 11 (although this material is not assumed here). However, bilinearity is far more important and far reaching than its application to metric vector spaces, as the following examples show. Indeed, both multiplication and evaluation are bilinear.

Example 14.4 (Multiplication is bilinear ) If ( is an algebra, the product map ¢(d(¦( defined by ²Á ³ ~  is bilinear. Put another way, multiplication is linear in each position.

Example 14.5 (Evaluation is bilinear ) If => and are vector spaces, then the evaluation map B¢²=Á>³d=¦> defined by Tensor Products 343

²Á #³ ~ ²#³ is bilinear. In particular, the evaluation map ¢=i d= ¦- defined by ²Á #³ ~ ²#³ is a bilinear form on =i d = .

Example 14.6 If => and are vector spaces, and =>ii and then the product map ¢= d> ¦- defined by ²#Á $³ ~ ²#³²$³ is bilinear. Dually, if #= and $> then the map  ¢=ii d> ¦- defined by ²Á ³ ~ ²#³²$³ is bilinear.

It is precisely the tensor product that will allow us to generalize the previous example. In particular, if B²<Á>³ and B ²=Á>³ then we would like to consider a “product” map ¢< d= ¦> defined by ²"Á #³ ~ ²"³? ²#³ The tensor productn is just the thing to replace the question mark, because it has the desired bilinearity property, as we will see. In fact, the tensor product is bilinear and nothing else, so it is exactly what we need! Tensor Products Let <= and be vector spaces. Our guide for the definition of the tensor product

Referring to Figure 14.4, we seek to define a vector space ; and a bilinear map !¢

t bilinear U V T

W linear f bilinear

W

Figure 14.4 344 Advanced Linear Algebra

Definition Let

¹“>  ¹ be the family of all bilinear maps from . The measuring set > is the family of all linear transformations.

A pair ²;Á!¢ ³, that is, if for every bilinear map ¢ , there is a unique linear transformation ¢< n= ¦> for which ~ k! Let us now turn to the question of the existence of a universal pair for bilinearity. Existence I: Intuitive but Not Coordinate Free The universal property for bilinearity captures the essence of bilinearity and nothing more (as is the intent for all universal properties). To understand better how this can be done, let 8 ~¸“0¹ be a basis for < and let 9 ~¸“1¹ be a basis for = . Then a bilinear map ! on

The answer is that we should define !²Á³ on the pairs  in such a way that the images !² Á  ³ do not interact and then extend by bilinearity.

In particular, for each ordered pair ² Á  ³ , we invent a new formal symbol, say n and define ; to be the vector space with basis

:89~¸n “  Á   ¹

Then define the map ! by setting !² Á  ³ ~   n   and extending by bilinearity. This uniquely defines a bilinear map ! that is as “universal” as possible among bilinear maps.

Indeed, if ¢ < d = ¦ > is bilinear, the condition  ~ k ! is equivalent to

² n  ³ ~ ²  Á  ³ which uniquely defines a linear map ¢; ¦> . Hence, ²;Á!³ has the universal property for bilinearity. Tensor Products 345

A typical element of ; is a finite linear combination 

 Á²  n   ³ Á~ and if "~  and #~   then

" n # ~ !²"Á #³ ~ !45  Á   ~     ²  n   ³

Note that, as is customary, we have used the notation "n# for the image of any pair ²"Á #³ under ! . Strictly speaking, this is an abuse of the notation n as we have defined it. While it may seem innocent, it does contribute to the reputation that tensor products have for being a bit difficult to fathom.

Confusion may arise because while the elements "n# form a basis for ; (by definition), the larger set of elements of the form "n# span ; , but are definitely not linearly independent. This raises various questions, such as when a sum of the form "n# is equal to  , or when we can define a map  on ; by specifying the values ²" n #³ arbitrarily. The first question seems more obviously challenging when we phrase it by asking when a sum of the form !²" Á # ³ is  , since there is no algebraic structure on the cartesian product

The notation n;

It is also worth emphasizing that the tensor productn is not a product in the sense of a binary operation on a set, as is the case in rings and fields, for example. In fact, even when =~< , the tensor product "n" is not in < , but rather in

Let --

²"Á $³ b ²#Á $³ c ² " b #Á $³ (14.1) and ²"Á #³ b ²"Á $³ c ²"Á # b $³ (14.2) where Á  - and "Á # and $ are in the appropriate spaces. Note that these vectors are  if we replace the ordered pairs by according to our previous definition.

The quotient space -

45 ²"Á#³ b:~  ²"Á#³b:  !

However, since ²"Á#³c² "Á#³: and ²"Á#³c²"Á #³: , we can absorb the scalar in either coordinate, that is, ´²"Á#³b:µ~² "Á#³b: ~²"Á #³b: and so the elements of

´²" Á # ³ b :µ

It is customary to denote the coset ²"Á #³ b : by " n # and so any element of

"n# as before. Finally, the map !¢

Theorem 14.2 Let <= and be vector spaces. The pair ²

t

j S U V FU V U V

f V W

W Figure 14.5

Since !²"Á #³ ~ " n # ~ k ²"Á #³ , we have !~ k The universal property of vector spaces described in Example 14.1 implies that there is a unique linear transformation ¢- for which  k~ Note that  sends any of the vectors (14.1) and (14.2) that generate : to the zero vector and so :‹ker ² ³ . For example, ´ ²"Á $³ b ²#Á $³ c ² " b #Á $³µ ~ ´ ²"Á $³ b ²#Á $³ c ² " b #Á $³µ ~  ²"Á $³ b ²#Á $³ c ² " b #Á $³ ~ ²"Á$³b ²#Á$³c² "b #Á$³ ~ and similarly for the second coordinate. Hence, we may apply Theorem 3.4 (the universal property described in Example 14.2), to deduce the existence of a unique linear transformation ¢< n= ¦> for which k~  Hence, k!~ k k~ k~ As to uniqueness, if ZZZk!~ then ~ k satisfies ZZk~ k k~ Z k!~ The uniqueness of  then implies that Z ~ , which in turn implies that ZZk~ ~~k  and the uniqueness of  implies that Z ~ .

We now have two definitions of tensor product that are equivalent, since under the second definition the tensors 348 Advanced Linear Algebra

² Á  ³ b : ~   n   form a basis for -°:, there corresponds a unique linear function  ¢ . This establishes a correspondence B¢hom ²<Á=Â>³¦ ²³ given by ²³ ~ . In other words,  ²³¢ < n = ¦ > is the unique linear map for which ²³²" n #³ ~ ²"Á #³ Observe that  is linear, since if Áhom ²<Á=Â>³ then ´  ²³ b ²³µ²" n #³ ~ ²"Á #³ b ²"Á #³ ~ ²  b ³²"Á #³ and so the uniqueness part of the universal property implies that  ²³ b ²³ ~ ²  b ³ Also,  is surjective, since if ¢< n= ¦> is any linear map then ~ k!¢ is bilinear and by the uniqueness part of the universal property, we have ²³ ~ . Finally,  is injective, for if  ²³ ~  then  ~ ²³ k ! ~ . We have established the following result.

Theorem 14.3 Let <= , and > be vector spaces over - . Then the map B¢hom ²<Á=Â>³¦ ²³, where  ²³ is the unique linear map satisfying  ~ ²³ k ! , is an isomorphism. Thus, hom²<Á=Â>³šB ²³ Armed with the definition and the universal property, we can now discuss some of the basic properties of tensor products. When Is a Tensor Product Zero?

Let us consider the question of when a tensor  "n# is zero. The universal property proves to be very helpful in deciding this question.

First, note that the bilinearity of the tensor product gives n#~²b³n#~n#bn# and so n#~ . Similarly, "n~ . Tensor Products 349

Now suppose that

 "n#~  where we may assume that none of the vectors "# and are . According to the universal property of the tensor product, for any bilinear function ¢, there is a unique linear transformation  ¢ for which  k!~. Hence

 ~45 " n #  ~ ² k !³²"  Á # ³ ~  ²"  Á # ³   The key point is that this holds for any bilinear function ¢< d= ¦ > . One possibility for <= is to take two linear functionals ii and and multiply them ²"Á #³ ~ ²"³ ²#³ which is easily seen to be bilinear and gives

 ²" ³ ²# ³ ~  

If, for example, the vectors " are linearly independent, then we can consider the ii i dual vectors ""²"³~~" , for which Á . Setting  gives i  ~ " ²" ³ ²# ³ ~ ²#  ³ 

i for all linear functionals  = . This implies that # ~ . We have proved the following useful result.

Theorem 14.4 If "ÁÃÁ" are linearly independent vectors in < and #ÁÃÁ# are arbitrary vectors in = then

"n#~¬#~  for all 

In particular, "n#~ if and only if "~ or #~ .

The next result says that we can get a basis for the tensor product

Theorem 14.5 Let 89~¸"“0¹ be a basis for < and let ~¸#“1¹ be a basis for = . Then the set

: ~¸"n# “0Á1¹ is a basis for

Proof. To see that the : is linearly independent, suppose that

 ²"n#³~Á   Á

This can be written

"nÁ45 # ~  and so, by Theorem 14.4, we must have

 #~Á   for all  ~ and hence Á for all and . To see that : spans

"n#~ " n # 

~ "n 45 # 

~ 45 ²"n#³ 

~   ²"n#³   Á

Hence, any sum of elements of the form "n# is a linear combination of the vectors "n# , as desired.

Corollary 14.6 For finite-dimensional vector spaces, dim²< n = ³ ~ dim ²< ³ h dim ²= ³

Coordinate Matrices and Rank

If 89~¸"“0¹ is a basis for < and ~¸#“1¹ is a basis for = , then any vector '

'~ Á ²"n#³   0 1 where only a finite number of the coefficients Á are nonzero. In fact, for a fixed '

 '~ Á ²"n#³   ~ ~ Tensor Products 351

where none of the rows or columns of the matrix 9~² Á ³ consists only of  's. The matrix 9~² Á ³ is called a coordinate matrix of ' with respect to the bases 89 and .

Note that a coordinate matrix 9 is determined only up to the order of its rows and columns. We could remove this ambiguity by considering ordered bases, but this is not necessary for our discussion and adds a complication since the bases may be infinite.

Suppose that MN~¸$“0¹ and ~¸% “1¹ are also bases for < and = , respectively and that

 '~ Á ²$n%³   ~ ~ where :~² Á ³ is a coordinate matrix of ' with respect to these bases. We claim that the coordinate matrices 9: and have the same rank, which can then be defined as the rank of the tensor '

Each $ÁÃÁ$ is a finite linear combination of basis vectors in 8 , perhaps involving some of "ÁÃÁ" and perhaps involving other vectors in 8 . We can further reindex 8 so that each $ is a linear combination of the vectors "ÁÃÁ", where  and set

< ~span ²" ÁÃÁ" ³

Z Next, extend ¸$ ÁÃÁ$ ¹ to a basis M ~¸$ b ÁÃÁ$ Á$ ÁÃÁ$ ¹ for < . (Since we no longer need the rest of the basis M , we have commandeered the symbols $ÁÃÁ$b  , for simplicity.) Hence  $~Á  " for ~ÁÃÁ ~ where (~²Á ³ is invertible of size d .

Now repeat this process on the second coordinate. Reindex the basis 9 so that the subspace =~ span²# ÁÃÁ# ³ contains %  ÁÃÁ% and extend to a basis Z N ~¸%b ÁÃÁ% Á% ÁÃÁ% ¹ for = . Then  %~Á  # for ~ÁÃÁ ~ where )~²Á ³ is invertible of size d . 352 Advanced Linear Algebra

Next, write

 '~ Á ²"n#³   ~ ~ by setting Á ~  for  €  or  €  . Thus, the  d  matrix 9  ~ ² Á ³ comes from 9 by adding c rows of  's to the bottom and then c columns of 99's. In particular,  and have the same rank.

The expression for ' in terms of the basis vectors $ ÁÃÁ$ and % ÁÃÁ% can also be extended using  coefficients to  '~ Á ²$n%³   ~ ~ where the d matrix :Á ~² ³ has the same rank as : .

Now at last, we can compute

     Á²$n%³~    Á89   Á "n    Á #  ~ ~ ~ ~ ~ ~  ~ Á Á"n# Á   ~ ~ ~ ~  ! ~² Á Á³"n# Á   ~ ~ ~ ~  ! ~²(:³"n# ÁÁ   ~ ~ ~  ! ~²(:)³"n# Á  ~ ~ and so

  !  ²"n#³~Á    ²(:)³"n#  Á   ~ ~ ~ ~

! It follows that 9~(:) . Since ( and ) are invertible, we deduce that

rk²9³ ~ rk ²9 ³ ~ rk ²: ³ ~ rk ²:³ In block terms 9 : 9~and :~ >? >?  and Tensor Products 353

! ! (i )iÁ (~>?Á and )~ >? ii ii

! Then 9~(:) implies that, for the original coordinate matrices, ! 9~(Á :)Á

! where rk²(Á ³ ‚ rk ²9³ and rk ²)Á ³ ‚ rk ²9³.

We shall soon have use for the following special case. If

'~ " n# ~ $  n% (14.3) ~ ~ then, in the preceding argument, ~~~~ and 9~:~0 and 0 0 9~ and :~ >? >? 

! Hence, the equation 9~(Á :)Á becomes ! 0~() Á Á and we further have

$~Á  " for ~ÁÃÁ ~ where ( Á ~ ² Á ³ and %~Á  # for ~ÁÃÁ ~ where ) Á ~ ² Á ³. The Rank of a Decomposable Tensor

Recall that a tensor of the form "n# is said to be decomposable. If ¸" “0¹ is a basis for <¸#“1¹ and  is a basis for = then any decomposable vector has the form

" n # ~  ²"  n #  ³ Á

Hence, the rank of a decomposable vector is  . This implies that the set of decomposable vectors is quite “small” in

Characterizing Vectors in a Tensor Product There are several very useful representations of the tensors in

Theorem 14.7 Let ¸" “   0¹ be a basis for < and let ¸# “   1 ¹ be a basis for = . By a “unique” sum, we mean unique up to order and presence of zero terms. Then 1) Every '

 "n#Á   Á

where -Á . 2) Every '

"n& 

where &= . 3) Every '

%n# 

where %< . 4) Every nonzero '

where ¸% ¹ ‹ < and ¸& ¹ ‹ = are linearly independent sets. As to uniqueness, ' is the rank of and so it is unique. Also, we have %n&~ $n'  ~ ~

where ¸$ ¹ ‹ < and ¸' ¹ ‹ = are linearly independent sets, if and only if ! there exist d matrices (~²Á ³ and )~² Á ³ for which ()~0 and $~ÁÁ  %and '~  & ~ ~

for ~ÁÃÁ . Proof. Part 1) merely expresses the fact that ¸" n#¹ is a basis for < n= . From part 1), we write

 "n#~Á  @A "n  # Á  ~ "n&   Á    Tensor Products 355 which is part 2). Uniqueness follows from Theorem 14.4. Part 3) is proved similarly. As to part 4), we start with the expression from part 2)

 "n& ~ where we may assume that none of the & 's are . If the set ¸&¹ is linearly independent, we are done. If not, then we may suppose (after reindexing if necessary) that

c &~ & ~ Then

cc "n&~ "n&b 89 "  n  &  ~ ~ ~ c c ~"n&b "n&  ~ ~ c ~²"b "³n&  ~

But the vectors ¸" b " “      c ¹ are linearly independent. This reduction can be repeated until the second coordinates are linearly independent. Moreover, the identity matrix 0' is a coordinate matrix for and so  ~rk ²0 ³ ~ rk ²'³. As to uniqueness, one direction was proved earlier; see (14.3) and the other direction is left to the reader. Defining Linear Transformations on a Tensor Product One of the simplest and most useful ways to define a linear transformation  on the tensor product

Theorem 14.8 Let <= and be finite-dimensional vector spaces. Then

Proof. Informally, for fixed  and , the function ²"Á#³¦²"³²#³ is bilinear in " and # and so there is a unique linear map Á taking " n # to ²"³²#³ . 356 Advanced Linear Algebra

The function ²Á ³ ¦Á is bilinear in  and  since, as functions , bÁ~ Á b Á and so there is a unique linear map  taking n to Á.

A bit more formally, for fixed  and , the map -¢

-Á ²"Á #³ ~ ²"³²#³ is bilinear and so the universal property of tensor products implies that there exists a unique linear functional Á on

Á²" n #³ ~ - Á ²"Á #³ ~ ²"³²#³ Next, the map .¢

.²Á ³ ~ Á is bilinear since, for example,

.² b Á³²"n#³~ b Á ²"n#³ ~ ²  b ³²"³ h ²#³ ~ ²"³²#³ b ²"³²#³ ~ ² Á b Á ³²"Á #³ ~ ² .²Á ³ b .²Á ³³²" n #³ and so .²  b Á ³ ~ .²Á ³ b .²Á ³ which shows that . is linear in its first coordinate. Hence, the universal property implies that there exists a unique linear map ¢

² n ³ ~ .²Á ³ ~ Á that is,

² n ³²" n #³ ~Á ²" n #³ ~ ²"³²#³

Finally, we must show that  is bijective. Let ¸ ¹ be a basis for < , with dual basis ¸¹ and let ¸¹ be a basis for = , with dual basis ¸¹  . Then

²"# n  ³² n  ³ ~  "#Á"Á# ² ³  ² ³ ~   ~ ²Á³Á²"Á#³

i and so ¸² n  ³¹‹²

Combining the isomorphisms of Theorem 14.3 and Theorem 14.8, we have, for finite-dimensional vector spaces <= and ,

The Tensor Product of Linear Transformations We wish to generalize Theorem 14.8 to arbitrary linear transformations. Let B²<Á<³ZZ and B ²=Á=³ . While the product  ²"³²#³  does not make sense, the tensor product ²"³ n ²#³ does and is bilinear in " and # ²"Á #³ ~ ²"³ n ²#³ The same informal argument that we used in the proof of Theorem 14.8 will work here. Namely, the expression ²"³ n ²#³ 

²"³~ for all "< , that is, ~ . Hence, n ~ . In either case, we see that is injective.

Thus, is an embedding (injective linear transformation) and if all vector spaces are finite-dimensional, then dim²²<Á<³n²=Á=³³~BBZZ dim ²²

The embedding of BB²< Á

Theorem 14.9 The linear transformation B¢ ²<Á

There are several special cases of this result that are of importance.

— Corollary 14.10 Let us use the symbol ?@Æ to denote the fact that there is an embedding of ?@ into that is an isomorphism if ?@ and are finite- dimensional. 1) Taking <~-Z gives —

2) Taking <~-ZZ and =~- gives — ~=Z) — BÆB²< Á ²< Á < n > ³ where ² n $³²"³ ~ ²"³ n $ 4) Taking <~-ZZ and =~- gives (letting >~= ) — i ÆB ²<Á>³ where ² n $³²"³ ~ ²"³$

Change of Base Field The tensor product gives us a convenient way to extend the base field of a vector space. (We have already discussed the complexification of a real vector space.) For convenience, let us refer to a vector space over a field -- as an - space and write =- . Actually, there are several approaches to “upgrading” the base field of a vector space. For instance, suppose that 2 is an extension field of --‹2¸¹=, that is, . If - is a basis for then every %= - has the form

%~  where -2 . We can define an 2 -space = simply by taking all formal linear combinations of the form

%~  where 22 . Note that the dimension of = as a 2 -space is the same as the dimension of =--2 as an -space. Also, =- is an -space (just restrict the scalars to -¢=¦=%= ) and as such, the inclusion map -2 sending - to ²%³ ~ %  =2 , is an - -monomorphism.

The approach described in the previous paragraph uses an arbitrarily chosen basis for =- and is therefore not coordinate free. However, we can give a coordinate–free approach using tensor products as follows. Since 2 is a vector 360 Advanced Linear Algebra space over - , we can consider the tensor product

>~2n=---

It is customary to include the subscript -n on - to denote the fact that the tensor product is taken with respect to the base field - . (All relevant maps are --³=2-bilinear and -linear. However, since - is not a -space, the only tensor product that makes sense in 2n=- is the - -tensor product and so we will drop the subscript - .

The vector space >-- is an -space by definition of tensor product, but we may make it into a 22 -space as follows. For  , the temptation is to “absorb” the scalar  into the first coordinate ²n#³~²³n#  But we must be certain that this is well-defined, that is, that n#~  n$¬²  ³n#~²  ³n$ This becomes easy if we turn to the universal property for bilinearity. In particular, consider the map ¢ ²2d=-- ³¦²2n= ³ defined by

²Á#³~²  ³n# This map is obviously well-defined and since it is also bilinear, the universal property of tensor products implies that there is a unique (and well-defined!) - - linear map ¢²2n=-- ³¦²2n= ³ for which

²n#³~²³n# 

Note also that since  is - -linear, it is additive and so

´² n#³b²  n$³µ~  ² n#³b  ² n$³ that is, ´² n#³b²  n$³µ~  ² n#³b  ² n$³ which is one of the properties required of a scalar multiplication. Since the other defining properties of scalar multiplication are satisfied, the set 2n=- is indeed a 2> -space under this operation (and addition), which we denote by 2 .

To be absolutely clear, we have three distinct vector spaces: the -= -spaces - and >-- ~2n= and the 2 -space > 2- ~2n= , where the tensor product in both cases is with respect to ->> . The spaces -2 and are identical as sets and as abelian groups. It is only the “permission to multiply by” that is different. Even though in >-- we can multiply only by scalars from , we still get the same set of vectors. Accordingly, we can recover >>-2 from simply by restricting scalar multiplication to scalars from - . Tensor Products 361

It follows that we can speak of “-=> -linear” maps  from -2 into , with the expected meaning, that is, ² " b #³ ~ ²"³ b ²#³ for all scalars Á  - (not in 2 ).

If the dimension of 2- as a vector space over is then

dim²>-- ³ ~ dim ²2 n = ³ ~ dim ²2³ h dim ²= -- ³ ~  h dim ²= ³

As to the dimension of >¸¹=2- , it is not hard to see that if is a basis for then ¸ n 2 ¹ is a basis for > . Hence

dim²>2- ³ ~ dim ²= ³ even when =- is infinite-dimensional.

The map ¢ =-- ¦ > defined by ²#³ ~  n # is easily seen to be injective and -> -linear and so -- contains an isomorphic copy of = . We can also think of  as mapping =>-2 into , in which case is called the 2-extension map of =- . This map has a universal property of its own, as described in the next theorem.

Theorem 14.11 The 2¢=¦> -extension map  -2 has the universal property for the family of all -=2 -linear maps with domain - and range a -space, as measured by 2-¢=¦@ -linear maps. In particular, for any -linear map - , where @2 is a -space, there exists a unique 2 -linear map  ¢>¦@2 for which the diagram in Figure 14.6 commutes, that is, k~

Proof. If such a map ¢2n=- ¦> is to exist then it must satisfy ²  n #³ ~  ² n #³ ~  ²#³ ~  ²#³ (14.4) This shows that if  exists, it is uniquely determined by  . As usual, when searching for a linear map  on a tensor product such as >~2n=2- , we look for a bilinear map. Let ¢ ²2 d =- ³ ¦ @ be defined by ² Á #³ ~ ²#³ Since this is bilinear, there exists a unique - -linear map  for which (14.4) holds. It is easy to see that  is also 22 -linear, since if then ´  ²  n#³µ~  ²  n#³~  ²#³~  ²  n#³ 362 Advanced Linear Algebra

VF WK

W f

Y

Figure 14.6 Theorem 14.11 is the key to describing how to extend an -2 -linear map to a - linear map. Figure 14.7 shows an -¢=¦>-= -linear map  between -spaces and >2 . It also shows the -extensions for both spaces, where 2n= and 2n> are 2 -spaces.

W V W

PV PW

W K V K W Figure 14.7 If there is a unique 2 -linear map  that makes the diagram in Figure 14.7 commute, then this would be the obvious choice for the extension of the - - linear map "2 to a -linear map.

Consider the -~²k³¢=¦2n>2 -linear map > into the -space 2n>. Theorem 14.11 implies that there is a unique 2 -linear map ¢2n= ¦2n> for which

k~=  that is,

k~=>  k  Now,  satisfies ² n #³ ~  ² n #³ ~ ² k = ³²#³ ~ ²> k  ³²#³ ~ ² n ²#³³ ~ n ²#³ ~²2 n ³² n#³ and so ~n2  . Tensor Products 363

Theorem 14.12 Let =>- and be -spaces, with 2 -extension maps = and > , respectively. (See Figure 14.7.³-¢=¦> Then for any -linear map , the map ~n2  is the unique 2 -linear map that makes the diagram in Figure 14.7 commute, that is, for which k~k 

Multilinear Maps and Iterated Tensor Products The tensor product operation can easily be extended to more than two vector spaces. We begin with the extension of the concept of bilinearity.

Definition If =ÁÃÁ= and > are vector spaces over - , a function ¢= dÄd= ¦ > is said to be multilinear if it is linear in each variable separately, that is, if Z ²"ÁÃÁ"Á #bcb #Á"ÁÃÁ"³~ Z ²"cbcb ÁÃÁ" Á#Á" ÁÃÁ" ³b ²" ÁÃÁ" Á#Á" ÁÃÁ" ³ for all ~ÁÃÁ . A multilinear function of  variables is also referred to as an -linear function . The set of all -linear functions as defined above will be denoted by hom²= ÁÃÁ= Â>³ . A multilinear function from =  dÄd=  to the base field - is called a multilinear form or -form .

Example 14.7 1) If (¢(dÄd(¦( is an algebra then the product map  defined by ² ÁÃÁ ³~ Ä is  -linear. 2) The determinant function det¢¦-C is an - on the columns of the matrices in C .

We can extend the quotient space definition of the tensor product to  -linear functions as follows.

Let 8Á~¸ “1¹ be a basis for =  for ~ÁÃÁ . For each ordered  - tuple ²Á ÁÃÁ Á ³ , we invent a new formal symbol  Á nÄn Á and define ; to be the vector space with basis

:8~¸Á nÄn Á “ Á   ¹

Then define the map ! by setting !²Á ÁÃÁ Á ³~ Á nÄn Á  and extending by multilinearity. This uniquely defines a multilinear map ! that is as “universal” as possible among multilinear maps.

Indeed, if ¢ = d Ä d = ¦ > is multilinear, the condition  ~ k ! is equivalent to 364 Advanced Linear Algebra

²Á nÄn Á ³~² Á ÁÃÁ Á ³ which uniquely defines a linear map ¢; ¦> . Hence, ²;Á!³ has the universal property for bilinearity.

Alternatively, we may take the coordinate-free quotient space approach as follows.

Definition Let =ÁÃÁ= be vector spaces over - and let ; be the subspace of the free vector space < on =dÄd= , generated by all vectors of the form Z ²#cbcb ÁÃÁ# Á"Á# ÁÃÁ# ³b ²# ÁÃÁ# Á"Á# ÁÃÁ# ³ Z c²#ÁÃÁ#cb Á "b "Á# ÁÃÁ# ³

Z for all Á  - , "Á "  < and # Á à Á #  = . The quotient space < °; is called the tensor product of = ÁÃÁ= and denoted by =  nÄn=  .

As before, we denote the coset ²# ÁÃÁ# ³b; by #  nÄn#  and so any element of =nÄn= is a sum of decomposable tensors, that is,

#nÄn# where the vector space operations are linear in each variable.

Let us formally state the universal property for multilinear functions.

Theorem 14.13 ( The universal property for multilinear functions as measured by linearity) Let =ÁÃÁ= be vector spaces over the field - . The pair ²= nÄn= Á!³, where

!¢=dÄd=¦=nÄn= is the multilinear map defined by

!²#ÁÃÁ# ³~# nÄn#  has the following property. If ¢= dÄd= ¦ > is any multilinear function from =dÄd= to a vector space > over - then there is a unique linear transformation ¢= nÄn= ¦> that makes the diagram in Figure 14.8 commute, that is, for which  k!~

Moreover, =nÄn= is unique in the sense that if a pair ²?Á ³ also has this property then ? is isomorphic to = nÄn= . Tensor Products 365

t V1u˜˜˜uVn V1 ˜˜˜ Vn

W f

W

Figure 14.8 Here are some of the basic properties of multiple tensor products. Proof is left to the reader.

Theorem 14.14 The tensor product has the following properties. Note that all vector spaces are over the same field - . 1) ()Associativity There exists an isomorphism

¢²= nÄn= ³n²> nÄn> ³¦= nÄn= n> nÄn> for which

´²# nÄn# ³n²$ nÄn$ ³µ~# nÄn# n$ nÄn$ In particular, ² š< n²=n>³š< n= n>

2) ()Commutativity Let  be any permutation of the indices ¸Á à Á ¹ . Then there is an isomorphism

¢= nÄn= ¦=²³ nÄn= ²³ for which

²# nÄn# ³~#²³ nÄn# ²³

3) There is an isomorphism ¢- n= ¦= for which

² n #³ ~ #

and similarly, there is an isomorphism ¢= n- ¦= for which

²# n ³ ~ # Hence, -n=== š š n-.

The analog of Theorem 14.3 is the following.

Theorem 14.15 Let =ÁÃÁ= and > be vector spaces over - . Then the map B¢hom ²=ÁÃÁ=Â>³¦ ²=nÄn=Á>³  , defined by the fact that  ²³ is the unique linear map for which  ~ ²³ k ! , is an isomorphism. Thus, 366 Advanced Linear Algebra

hom²= ÁÃÁ= Â>³šB ²=  nÄn=  Á>³ Moreover, if all vector spaces are finite-dimensional then

 dim´ hom ²= Á Ã Á = Â > ³µ ~ dim ²> ³ h dim ²=  ³ ~ Theorem 14.9 and its corollary can also be extended.

Theorem 14.16 The linear transformation ZZ ZZ B¢ ²<Á<³nÄn B ²<Á<³¦ B ²

² n Ä n  ³²" n Ä n " ³ ~  ²" ³ n Ä n  ²" ³ is an embedding, and is an isomorphism if all vector spaces are finite- dimensional. Thus, the tensor product nÄn of linear transformations is (via this embedding) a linear transformation on tensor products. Two important special cases of this are

ii— i < nÄn<Æ ²< nÄn< ³ where

² nÄn ³²" nÄn" ³~ ²"³Ä²"³ and

ii—

² nÄn n#³²" nÄn" ³~ ²"³Ä²"³#

Tensor Spaces Let = be a finite-dimensional vector space. For nonnegative integers and , the tensor product

iinin ; ²=³~’•••“•••” = nÄn= n ’••••“••••” = nÄn= ~= n²= ³  factors factors is called the space of tensors of type ²Á ³ , where  is the contravariant type  and  is the covariant type . If ~~ then ; ²=³~- , the base field. Here we use the notation =n for the -fold tensor product of = with itself. We will also write =d for the -fold cartesian product of = with itself. Tensor Products 367

Since all vector spaces are finite-dimensional, == and ii are isomorphic and so  n in in ni id d ; ²= ³ ~ = n ²= ³ š ²²= ³ n = ³ šhom- ²²= ³ d = Á - ³ This is the space of all multilinear functionals on ’••••“••••”=ii dÄd= d ’•••“•••” = dÄd=  factors factors

In fact, tensors of type ²Á ³ are often defined as multilinear functionals in this way.

Note that

b dim²; ²= ³³ ~ ´ dim ²= ³µ Also, the associativity and commutativity of tensor products allows us to write

 b ; ²=³n; ²=³~;b ²=³ at least up to isomorphism.

Tensors of type ²Á ³ are called contravariant tensors   ; ²= ³ ~ ; ²= ³ ~’•••“•••” = n Ä n =  factors and tensors of type ²Á ³ are called covariant tensors

i i ; ²= ³ ~ ; ²= ³ ~’••••“••••” = n Ä n =  factors

Tensors with both contravariant and covariant indices are called mixed tensors .

In general, a tensor can be interpreted in a variety of ways as a multilinear map on a cartesian product, or a linear map on a tensor product. (The interpretation we mentioned above that is sometimes used as the definition is only one possibility.) We simply need to decide how many of the contravariant factors and how many of the covariant factors should be “active participants” and how many should be “passive participants.”

More specifically, consider a tensor of type ²Á ³ , written  # nÄn# nÄn# n nÄn nÄn ; ²=³ where  and  . Here we are choosing the first  vectors and the first  linear functionals as active participants. This determines the number of arguments of the map. In fact, we define a map from the cartesian product 368 Advanced Linear Algebra

’••••“••••”=ii dÄd= d ’•••“•••” = dÄd=  factors factors to the tensor product ’•••“•••”= nÄn= n ’••••“••••” =ii nÄn= c factors c factors of the remaining factors by

²# nÄn#  n nÄn  ³² ÁÃÁ Á% ÁÃÁ% ³ ~  ²# ³Ä  ²# ³ ²% ³Ä  ²% ³#b nÄn#  n b nÄn 

In words, the first group #nÄn# of (active) vectors interacts with the first set ÁÃÁ of arguments to produce the scalar ²#³Ä²#³ . The first group nÄn of (active) functionals interacts with the second group %ÁÃÁ% of arguments to produce the scalar  ²% ³Ä  ²% ³ . The remaining (passive) vectors #b nÄn#  and functionals  b nÄn  are just “copied” to the image vector.

It is easy to see that this map is multilinear and so there is a unique linear map from the tensor product ’••••“••••”=ii nÄn= n ’•••“•••” = nÄn=  factors factors to the tensor product ’•••“•••”= nÄn= n ’••••“••••” =ii nÄn= c factors c factors defined by

²# nÄn# n nÄn ³² nÄn n% nÄn% ³ ~  ²# ³Ä  ²# ³ ²% ³Ä  ²% ³#b nÄn#  n b nÄn 

(What justifies the notation # nÄn# n nÄn for this map?)

Let us look at some special cases. For a contravariant tensor of type ²Á ³  #nÄn#;²=³ we get a linear map ’••••“••••”=ii nÄn= ¦ ’•••“•••” = nÄn= c factors factors

(where ³ defined by

²# nÄn# ³² nÄn ³~  ²# ³Ä  ²# ³ # b nÄn# Tensor Products 369

For a covariant tensor of type ²Á ³

 nÄn;²=³ we get a linear map from ’•••“•••”= nÄn= ¦ ’••••“••••” =ii nÄn= c factors factors

(where  ) defined by

² nÄn ³²% nÄn% ³ ~  ²% ³Ä  ²% ³  b nÄn The special case ~ gives a linear functional on =n , that is, each element of ²=in ³ is a distinct member of ²= ni ³ , whence the embedding

ii— i = nÄn=Æ ²= nÄn= ³ that we described earlier.

Let us consider some small values of  and . For a mixed tensor #n of type ²Á ³ here are the possibilities. When  ~  and  ~  we get the linear map ²# n ³¢ = ¦ = defined by ²# n ³²$³ ~ ²$³# When  ~  and  ~  we get the linear map ²# n ³¢ =ii ¦ = defined by ²# n ³²³ ~ ²#³ Finally, when ~~ we get a multilinear form ²#n³¢=i d= ¦- defined by ²# n ³²Á $³ ~ ²#³²$³ Consider also a tensor  n  of type ²Á ³ . When  ~  ~  we get a multilinear functional  n ¢ ²= d = ³ ¦ - defined by ² n ³²#Á $³ ~ ²#³²$³ This is just a bilinear form on =~ . When we get a multilinear map ² n ³¢ = ¦ = i defined by ² n ³²#³ ~ ²#³

Contraction Covariant and contravariant factors can be “combined” in the following way. Consider the map

d i d c ¢ = d ²= ³ ¦ ;c ²= ³ 370 Advanced Linear Algebra defined by

²#ÁÃÁ# Á ÁÃÁ ³~ ²#³²# nÄn#  n nÄn  ³ This is easily seen to be multilinear and so there is a unique linear map

 c ¢; ²=³¦;c ²=³ defined by

²# nÄn# n nÄn ³~ ²#³²# nÄn# n nÄn ³ This is called the contraction in the contravariant index  and covariant index . Of course, contraction in other indices (one contravariant and one covariant) can be defined similarly.

 Example 14.8 Consider the tensor space ; ²= ³ , which is isomorphic to B ²= ³ via the fact that ²# n ³²$³ ~ ²$³# For ~~ , the contraction takes the form ²# n ³ ~ ²#³ Now, for #£ , the operator #n has kernel equal to ker ²³ , which has codimension "= and so there is a nonzero vector for which = ~ º"» lker ²³ .

Now, if ²#³ £  then = ~ º#» lker ²³ and ²# n ³²#³ ~ ²#³# and so # is an eigenvector for the nonzero eigenvalue ²#³ . Hence, =~;;²#³ l and so the trace of #n is precisely ²#³ . Since the trace is linear, we deduce that the trace of any linear operator on = is the contraction of  the corresponding vector in ;²=³ . The Tensor Algebra of = Consider the contravariant tensor spaces

n ; ²= ³ ~ ; ²= ³ ~ = For  ~  we take ; ²= ³ ~ - . The external direct sum B ; ²= ³ ~ ; ²= ³ ~ of these tensor spaces is a vector space with the property that Tensor Products 371

; ²=³n; ²=³~; b ²=³ This is an example of a graded algebra , where ;²=³ are the elements of grade ;²=³=. The graded algebra is called the tensor algebra over . (We will formally define graded structures a bit later in the chapter.)

Since iii ; ²=³~’••••“••••” = nÄn= ~; ²= ³  factors there is no need to look separately at ;²=³ . Special Multilinear Maps The following definitions describe some special types of multilinear maps.

Definition 1) A multilinear map ¢= ¦ > is symmetric if interchanging any two coordinate positions changes nothing, that is, if

²# ÁÃÁ#ÁÃÁ# ÁÃÁ# ³~²#  ÁÃÁ# ÁÃÁ#ÁÃÁ# ³ for any £ . 2) A multilinear map ¢= ¦ > is antisymmetric or skew-symmetric if interchanging any two coordinate positions introduces a factor of c , that is, if

²# ÁÃÁ#ÁÃÁ# ÁÃÁ# ³~c²#  ÁÃÁ# ÁÃÁ#ÁÃÁ# ³ for £. 3) A multilinear map ¢= ¦ > is alternate or alternating if

²# ÁÃÁ# ³ ~ 

whenever any two of the vectors # are equal.

As in the case of bilinear forms, we have some relationships between these concepts. In particular, if char²- ³ ~  then alternate¬¯ symmetric skew-symmetric and if char²- ³ £  then alternate¯ skew-symmetric A few remarks about permutations, with which the reader may very well be familiar, are in order. A permutation of the set 5 ~ ¸Á à Á ¹ is a bijective function ¢5 ¦5 . We denote the group (under composition) of all such permutations by : . This is the symmetric group on symbols. A cycle of 372 Advanced Linear Algebra

length  is a permutation of the form ²ÁÁÃÁ³  , which sends   to Á   to ÁÃÁ3 c to   and   to   . All other elements of 5 are left fixed. Every permutation is the product (composition) of disjoint cycles.

A transposition is a cycle ²Á ³ of length  . Every cycle (and therefore every permutation) is the product of transpositions. In general, a permutation can be expressed as a product of transpositions in many ways. However, no matter how one represents a given permutation as such a product, the number of transpositions is either always even or always odd. Therefore, we can define the parity of a permutation  : to be the parity of the number of transpositions in any decomposition of  as a product of transpositions. The sign of a permutation is defined by

sg²³~²c³ parity²³

If sg²³~ then is an even permutation and if sg ²³~c  then  is an odd permutation. The sign of  is often written ²c³ .

With these facts in mind, it is apparent that  is symmetric if and only if

²# ÁÃÁ# ³~²#²²³1) ÁÃÁ# ³ for all permutations  : and that  is alternating if and only if  ²# ÁÃÁ# ³~²c³ ²#²²³1) ÁÃÁ# ³ for all permutations  : . Graded Algebras We need to pause for a few definitions that are useful when discussing tensor algebra. An algebra (- over is said to be a graded algebra if as a vector space over -( , can be written in the form B (~ ( ~ for subspaces (( of , and where multiplication behaves nicely, that is,

(( ‹( b

The elements of (( are said to be homogeneous of degree . If is written

~ bÄb

for (£ , , then    is called the homogeneous component of  of degree .

The ring of polynomials -´%µ provides a prime example of a graded algebra, since Tensor Products 373

B -´%µ~ - ´%µ ~

 where - ´%µ is the subspace of - ´%µ consisting of all scalar multiples of % .

More generally, the ring -´% ÁÃÁ% µ of polynomials in several variables is a graded algebra, since it is the direct sum of the subspaces of homogeneous polynomials of degree  . (A polynomial is homogeneous of degree if each  term has degree  . For example, ~%% b%%%  is homogeneous of degree .) Graded Ideals

A graded ideal 0(~(0 in a graded algebra   is an ideal for which, as a subspace of ( , B 0 ~ ²0 q ( ³ ~

(Note that 0q( is not, in general, an ideal.) For example, the ideal 0 of -´%µ consisting of all polynomials with zero constant term is graded. However, the ideal 1 ~ º b %» ~ ¸²%³² b %³ “ ²%³  - ´%µ¹ generated by b% is not graded, since -´%µ contains only monomials and so 1 q - ´%µ ~ ¸¹.

Theorem 14.17 Let (0( be a graded algebra. An ideal of is graded if and only if it is generated by homogeneous elements of ( . Proof. If 0 is graded then it is generated by the elements of the direct summands 0q(, which are homogeneous. Conversely, suppose that 0~º “2» where each "0 is homogeneous. Any has the form

"~ "#  where "Á# ( . Since ( is graded, each "  and #  is a sum of homogeneous terms and we can expand "# into a sum of homogeneous terms of the form  where   and   (and ³  are homogeneous. Hence, if

deg²   ³ ~ deg ²  ³ h deg ²  ³ h deg ²  ³ ~  then  (  q0 and so 374 Advanced Linear Algebra

B 0 ~ ²0 q ( ³ ~ is graded.

If 0( is a graded ideal in , then the quotient ring (°0 is also graded, since it is easy to show that ((b0B ~   00~

Moreover, for %Á &  0 and   ( , (b0 ´² b%³b0µ´² b&³b0µ~² b0³² b0³~ b0    0

The Symmetric Tensor Algebra We wish to study tensors in ;²=³ for ‚ that enjoy a symmetry property. Let : be the symmetric group on ¸Á à Á ¹ . For each   : , the multilinear d  map ¢= ¦;²=³ defined by

 ²#ÁÃÁ# ³~#²³ nÄn# ²³

 determines (by universality) a unique linear operator  on ;²=³ for which

²# nÄn# ³~#²³ nÄn# ²³

Let 8 ~¸ÁÃÁ ¹ be a basis for = . Since the set

88~¸ nÄn “  ¹

 is a basis for ;²=³ and 8 is a bijection of , it follows that is an   isomorphism of ;²=³ . A tensor !; ²=³ is symmetric if  ²!³~! for all permutations  :.

A word of caution is in order with respect to the definition of  . The permutation  permutes the coordinate positions in a decomposable tensor, not the indices. Suppose, for example, that  ~  and  ~ ²³ . If ¸ Á  ¹ is a basis for = then

²³² n  ³ ~   n  because  permutes the positions, not the indices. Thus, the following is not true no ²³² n  ³ ~  ²³ n  ²³ ~   n  The set of all symmetric tensors Tensor Products 375

 :; ²=³~¸!;²=³“ ²!³~! for all : ¹ is a subspace of ;²=³ .

 To study :; ²=³ in more detail, let  ÁÃÁ be a basis for ; ²=³ . Any tensor #; ²=³ has the form 

#~ ÁÃÁ   nÄn   ÁÃÁ~

where ÁÃÁ£ . It will help if we group the terms in such a sum according to the multiset of indices. Specifically, for each nonempty subset : of indices ¸Á à Á ¹ and each multiset 4 ~ ¸ Á à Á  ¹ of size  with underlying set : , let .4 consist of all possible decomposable tensors

nÄn where ² ÁÃÁ ³ is a permutation of ¸  ÁÃÁ ¹ . For example, if 4 ~ ¸Á Á ¹ then

.4 ~¸  n n Á n n Á n n ¹ Now, ignoring coefficients, the terms in the expression for # can be organized into the groups .#4 . Let us denote the set of terms of (without coefficients) that lie in .44 by . ²#³ . For example, if

#~nnbnnbnn  then

.¸ÁÁ¹ ²#³~¸nnÁnn¹ and

.¸ÁÁ¹ ²#³~¸nn¹

Further, let :4 ²#³ denote the sum of the terms in # that belong to .4 ²#³ (including the coefficients). For example,

:¸ÁÁ¹ ²#³~nnbnn  Thus,

#~ :4 ²#³ multisets 4

Note that each permutation  is a permutation of the elements of .4 , for each multiset 4# . It follows that is a symmetric tensor if and only if the following conditions are satisfied: 376 Advanced Linear Algebra

1) (All or none ) For each multiset 4 of size with underlying set : ‹ ¸Á à Á ¹, we have

.²#³~J444or .²#³~.

2) If .44 ²#³ ~ . , then the coefficients in the sum : 4 ²#³ are the same and so

:44 ²#³ ~ ²#³ h !

!.4

where 4 ²#³ is the common coefficient.

Hence, for a symmetric tensor # , we have

#~894 ²#³ ! multisets 4 !.4 Now, symmetric tensors act as though the tensor product was commutative. Of course, it is not, but we can deal with this as follows.

 Let ¢; ²=³¦- ´ ÁÃÁ  µ be the function from ; ²=³ to the vector space -´ÁÃÁµ  of all homogeneous polynomials of degree  in the formal variables ÁÃÁ , defined by

45 ÁÃÁnÄn    ~    ÁÃÁ Ä  

In this context, the product in -´ÁÃÁµ  is often denoted by the symbol v , so we have

45 ÁÃÁ  nÄn   ~    ÁÃÁ ²  vÄv   ³

It is clear that  is well-defined, linear and surjective. We want to use to explore the properties of symmetric tensors, first as a subspace of ;²=³ and then as a quotient space. (The subspace approach is a bit simpler, but works only when char²- ³ ~  .) The Case char²- ³ ~ 

Note that  takes every member of a group .4 to the same monomial, whose indices are precisely 4#:;²=³ . Hence, if  is symmetric then

²#³ ~894 ²#³ ²!³ multisets 4 !.4

~ ²44 ²#³(( . ³² v Ä v   ³ Ä Tensor Products 377

(Here we identify each multiset 4 with the nondecreasing sequence Ä of its members.)

 As to the kernel of  , if ²#³ ~  for #  :; ²= ³ then 44 ²#³(( . ~  for all multisets 4 and so, if char ²-³~ , we may conclude that 4 ²#³~ for all multisets 4#~²-³~ , that is, . Hence, if char , the restricted map  O:; ²= ³ is injective and so it is an isomorphism. We have proved the following.

Theorem 14.18 Let =- be a finite-dimensional vector space over a field with char²- ³ ~ . Then the vector space :; ²= ³ of symmetric tensors of degree  is isomorphic to the vector space of homogeneous polynomials -´ÁÃÁµ  , via the isomorphism

45 ÁÃÁ  nÄn   ~    ÁÃÁ ²  vÄv   ³

The vector space :; ²= ³ of symmetric tensors of degree  is often called the symmetric tensor space of degree = for . However, this term is also used for an isomorphic vector space that we will study next.

The direct sum

B :; ²= ³ ~ :; ²= ³ ~ is sometimes called the symmetric tensor algebra of = , although this term is also used for a slightly different (but isomorphic) algebra that we will define momentarily.

We can use the vector space isomorphisms described in the previous theorem to move the product from the algebra of polynomials -´ ÁÃÁ µ to the symmetric tensor space :; ²= ³ . In other words, if char ²- ³ ~  then :; ²= ³ is a graded algebra isomorphic to the algebra of polynomials -´ ÁÃÁ µ . The Arbitrary Case We can define the symmetric tensor space in a different, although perhaps slightly more complex, manner that holds regardless of the characteristic of the base field. This is important, since many important fields (such as finite fields) have nonzero characteristic.

Consider again the kernel of the map  , but this time as defined on all of ;²=³ ,  not just :; ²= ³ . The map  sends elements of different groups .4 ²#³ to different monomials in -´ÁÃÁµ  , and so #ker ²³ if and only if

²:4 ²#³³ ~ 

Hence, the sum of the coefficients of the elements in .²#³4 must be  . 378 Advanced Linear Algebra

Conversely, if the sum of the coefficients of the elements in .²#³4 is for all multisets 4#²³, then ker  .

Suppose that 4 ~¸ ÁÃÁ ¹ is a multiset for which

!~4 nÄn . ²#³

Then each decomposable tensor in .44 ²#³ is a permutation of ! and so : ²#³ may be written in the form

:4 ²#³~  nÄn  b   ² nÄn ³  where the sum is over a subset of the symmetric group : , corresponding to the terms that appear in :²#³4 and where

b~  

Substituting for  in the expression for :²#³4 gives

:4 ²#³~   ´ ² nÄn  ³c²  nÄn  ³µ 

 It follows that #0;²=³ is in the subspace  of generated by tensors of the form ²!³ c !, that is  0 ~ º ²!³ c ! “ !  ; ²= ³Á  : » and so ker²³‹0  . Conversely,

²²nÄn³c²nÄn³³~  and so 0‹ ker ²³ .

Theorem 14.19 Let =- be a finite-dimensional vector space over a field . For  ‚, the surjective linear map  ¢; ²=³¦- ´ ÁÃÁ  µ defined by

45 ÁÃÁnÄn    ~    ÁÃÁ vÄv    has kernel  0 ~ º ²!³ c ! “ !  ; ²= ³Á  : » and so ;²=³ š- ´ ÁÃÁ  µ 0 The vector space ;²=³°0 is also referred to as the symmetric tensor space of degree = of . The ideal of ;²=³ defined by Tensor Products 379

 0 ~ º ²!³ c ! “ !  ; ²= ³Á  : Á  ‚ » being generated by homogeneous elements, is graded, so that

B 0~ 0 ~ where 0 ~ ¸¹ . The graded algebra ;²=³B ; ²=³b0 ~  00~ is also called the symmetric tensor algebra for = and is isomorphic to -´ ÁÃÁ µ.

Before proceeding to the universal property, we note that the dimension of the symmetric tensor space :; ²= ³ is equal to the number of monomials of degree  in the variables  ÁÃÁ and this is

 bc dim²:; ²= ³³ ~ 67 

The Universal Property for Symmetric  -Linear Maps

The vector space -´%ÁÃÁ%µ  of homogeneous polynomials, and therefore  also the isomorphic spaces of symmetric tensors :; ²= ³ and ; ²= ³°0 , have the universal property for symmetric  -linear maps.

Theorem 14.20 ( The universal property for symmetric multilinear maps, as measured by linearity) Let = be a finite-dimensional vector space. Then the d pair ²- ´% ÁÃÁ%  µÁ!¢= ¦-  ´% ÁÃÁ%  µ³, where

!²#ÁÃÁ# ³~# vÄv#  has the universal property for symmetric = -linear maps with domain d , as measured by linearity. That is, for any symmetric ¢=¦< -linear map d where <¢-´%ÁÃÁ%µ¦< is a vector space, there is a unique linear map    for which

²# vÄv# ³~²#ÁÃÁ# ³ for any vectors #= . Proof. The universal property requires that

² vÄv ³~² ÁÃÁ ³ and this does indeed uniquely define a linear transformation  , provided that it is well-defined. However, 380 Advanced Linear Algebra

 vÄv ~ vÄv

if and only if the multisets ¸ ÁÃÁ ¹ and ¸   ÁÃÁ ¹ are the same, which implies that ² ÁÃÁ ³~²   ÁÃÁ ³ , since  is symmetric. The Symmetrization Map When char²-³~ , we can define a linear map :¢;²=³¦:;²=³ , called the symmetrization map, by  :²!³ ~  ²!³ [  :

(Since char²- ³ ~  we have [ £  .)

Since ~   , we have  ²:²!³³ ~ ²!³ ~ ²!³ ~ ²!³ ~ :²!³ [  [ [ : : : and so :²!³ is, in fact, symmetric. The reason for the factor °[ is that if # is a symmetric tensor, then ²#³ ~ # and so  :²#³ ~ ²#³ ~ # ~ # [ [ : : that is, the symmetrization map fixes all symmetric tensors.

It follows that for any tensor !; ²=³ : ²!³ ~ :²:²!³³ ~ :²!³ Thus, : is idempotent and is therefore the projection map of ; ²= ³ onto im²:³ ~ :; ²= ³ The Antisymmetric Tensor Algebra: The Exterior Product Space Let us repeat our discussion for antisymmetric tensors. Before beginning officially, we want to introduce a very useful and very simple concept.

Definition Let ,~¸“0¹ be a set, which we refer to as an alphabet . A word, or string over ,€$~%Ä% of finite length is a sequence  where %, . There is one word of length  over , , denoted by  and called the empty word. Let M²,³ be the set of all words over , of length  and let M(E) be the set of all words over , (of finite length). Tensor Products 381

Concatenation of words is done by placing one word after another: If #~&Ä& then $#~%Ä%&Ä&  . Also,  $~$~$ .

If the alphabet ,$, is an ordered set, we say that a word over is in ascending order if each , appears at most once in $ and if the order of the letters in $, is that given by the order of . (For example,  is in ascending order but  and   are not.) The empty word is in ascending order by definition. Let 7²,³ be the set of all words in ascending order over , of length , and let 7 (E) be the set of all words in ascending order over .

We will assume throughout that char²- ³ £  . For each   : , the multilinear d  map ¢= ¦;²=³ defined by   ²#ÁÃÁ# ³~²c³#²³ nÄn# ²³ where ²c³ is the sign of  , determines (by universality) a unique linear  operator  on ;²=³ for which  ²# nÄn# ³~²c³#²³ nÄn# ²³

Note that if # ~# for some £ , then for  ~²Á³ , we have

²³²# nÄn# nÄn# nÄn# ³ ~c# nÄn# nÄn# nÄn# ~c# nÄn# nÄn# nÄn# and since char²- ³ £  , we conclude that ²³ ²# n Ä n # ³ ~  .

Let 8 ~¸ÁÃÁ ¹ be a basis for = . Since the set

88~¸ nÄn “  ¹

 is a basis for ;²=³ and 8 is a bijection of , it follows that is an isomorphism of ;²=³ . A tensor !; ²=³ is antisymmetric if  ²!³ ~²c³ ! for all permutations  : .

The set of all antisymmetric tensors   (:²=³~¸!;²=³“ ²!³~²c³! for all : ¹ is a subspace of ;²=³ .

 Let  Á à Á  be a basis for ; ²= ³ . Any tensor #  ; ²= ³ has the form 

#~ ÁÃÁ   nÄn   ÁÃÁ~

where ÁÃÁ£ . As before, we define the groups . 4 ²#³ and the sums 382 Advanced Linear Algebra

:²#³44. Each permutation  sends an element !. to another element of  .²c³#, multiplied by . It follows that is an antisymmetric tensor if and only if the following hold:

1) If 4 is a multiset of size  with underlying set : ‹ ¸Á à Á ¹ and if at least one element of 44 has multiplicity greater than , that is, if is not a set, then .²#³~J4 . 2) For each subset 4 of size  of ¸Á à Á ¹ , we have

.²#³~J444or .²#³~.

3) If .²#³~.44 , then since 4 is a set, there is a unique member

"~4 nÄn of . for which  "Á! Ä . If  denotes the

permutation in :"! for which "Á! takes to , then

:4 ²#³ ~ 4 ²#³ h sg ² "Á! ³!

!.4

where 4 ²#³ is the absolute value of the coefficient of " .

Hence, for an antisymmetric tensor # , we have

# ~894"Á! ²#³sg ² ³! 4 !.4

Next, we need the counterpart of the polynomials -´ÁÃÁµ  in which multiplication acts anticommutatively, that is,  ~c  . To this end, we define a map M¢ ²,³ ¦ 7 ²,³ r ¸  ¹ as follows.

For !~ Ä , let  ²!³~ if ! has any repeated variables. Otherwise, there is exactly one permutation of positions that will reorder ! in ascending order. If the resulting word in ascending order is " , denote this permutation by !Á" . Set

²!³ ~sg ²!Á" ³ !Á" ²!³ ~ sg ² !Á" ³" We can now define our anticommutative “polynomials.”

Definition Let , ~² ÁÃÁ ³ be a sequence of independent variables. Let c -´ÁÃÁµ  be the vector space over - generated by all words over , in ascending order. For ~ , this is - , which we identify with - . Define a multiplication on the direct sum

B cc c -~-´ÁÃÁµ~ - ~

cc as follows. For monomials ~%Ä%- and ~&Ä&-  set Tensor Products 383

 ~ ²% Ä% & Ä& ³ and extend by distributivity to - c . The resulting multiplication makes c -´ÁÃÁµ into a (noncommutative) algebra over - .

c It is customary to use the notationw for the product in - ´ ÁÃÁ µ . This product is called the wedge product or exterior product . We will not name the algebra - c , but we will name an isomorphic algebra to be defined soon.

c Now we can define a function ¢; ²=³¦- ´ ÁÃÁ µ by

45 ÁÃÁ  nÄn   ~    ÁÃÁ ²   wÄw   ³

It is clear that  is well-defined, linear and surjective. The Case char²- ³ ~  Just as with the symmetric tensors, if char²- ³ £  , we can benefit by restricting  to the space (; ²= ³ . Let

~ nÄnand !~   nÄn 

belong to the same group .44 and suppose that "~ nÄn . is in ascending order, that is, Ä .

If #(;²=³ is antisymmetric, then

²#³ ~894"Á! ²#³sg ² ³ ²!³ 4 !.4

~²#³²³²³"894"Á!!Á"sg sg 4 !.4

~²#³"894 4 !.4

~²#³."44(( 4

 Now, if ²#³ ~  for #  (; ²= ³ then 44 ²#³(( . ~  for all sets 4 and so, if char²-³~ , we may conclude that 4 ²#³~ for all sets 4 , that is, #~ . Hence, if char²- ³ ~  , the restricted map  O(; ²= ³ is injective and so it is an isomorphism. We have proved the following.

Theorem 14.21 Let =- be a finite-dimensional vector space over a field with char²- ³ ~ . Then the vector space (; ²= ³ of antisymmetric tensors of degree 384 Advanced Linear Algebra

c  is isomorphic to the vector space - ´ ÁÃÁ µ , via the isomorphism

45 ÁÃÁnÄn    ~    ÁÃÁ wÄw   

The vector space (; ²= ³ of antisymmetric tensors of degree  is called the antisymmetric tensor space of degree = for or the exterior product space of degree = over .

The direct sum

B (; ²= ³ ~ (; ²= ³ ~ is called the antisymmetric tensor algebra of == or the of .

We can use the vector space isomorphisms described in the previous theorem to c move the product from the algebra -´ÁÃÁµ to the antisymmetric tensor space (; ²= ³ . In other words, if char ²- ³ ~  then (; ²= ³ is a graded algebra c isomorphic to the algebra -´ÁÃÁµ . The Arbitrary Case We can define the antisymmetric tensor space in a different manner that holds regardless of the characteristic of the base field.

Consider the kernel of the map  , as defined on all of ;²=³ . Suppose that #ker ² ³. Since sends elements of different groups .4 ²#³ to different monomials in -´ÁÃÁµ  , it follows that  must send each sum : 4 ²#³ to 

²:4 ²#³³ ~ 

Hence, the sum of the coefficients of the elements in .²#³4 must be  . Conversely, if the sum of the coefficients of the elements in .²#³4 is for all multisets 4#²³, then ker  .

Suppose that 4 ~¸ ÁÃÁ ¹ is a set for which

"~4 nÄn . ²#³ Then

 :4 ²#³~  nÄn  b ²c³  ²  nÄn  ³  where the sum is over a subset of the symmetric group : , corresponding to the terms that appear in :²#³4 and where Tensor Products 385

 b²c³~  

Substituting for  in the expression for :²#³4 gives

 :4 ²#³~   ´²c³ ²  nÄn  ³c²  nÄn  ³µ 

 It follows that #0;²=³ is in the subspace  of generated by tensors of the form  ²c³ ²!³ c !, that is   0 ~ º²c³ ²!³ c ! “ !  ; ²= ³Á  : » and so ker²³‹0  . Conversely,

²²nÄn³c²nÄn³³~  and so 0‹ ker ²³ .

Theorem 14.22 Let =- be a finite-dimensional vector space over a field . For c ‚, the surjective linear map  ¢; ²=³¦- ´ ÁÃÁ µ defined by

45 ÁÃÁnÄn    ~    ÁÃÁ wÄw    has kernel   0 ~ º²c³ ²!³ c ! “ !  ; ²= ³Á  : » and so  ;²=³ c š- ´ ÁÃÁ µ 0 The vector space ;²=³°0 is also referred to as the antisymmetric tensor space of degree = of or the exterior algebra of degree = of . The ideal of ;²=³ defined by   0 ~ º²c³ ²!³ c ! “ !  ; ²= ³Á  : Á  ‚ » being generated by homogeneous elements, is graded, so that

BB 0~ 0 ~ 0 ~ ~ where 0 ~ ¸¹ . The graded algebra ;²=³B ; ²=³b0 (; ²= ³ ~ ~  00~ 386 Advanced Linear Algebra is also called the antisymmetric tensor space of = or the exterior algebra of c = and is isomorphic to - ´ ÁÃÁ µ .

 The isomorphic exterior spaces (; ²= ³ and ; ²= ³°0 are usually denoted by = and the isomorphic exterior algebras (;²=³ and ;²=³°0 are usually denoted by = .

Before proceding to the universal property, we note that the dimension of the exterior tensor space ²= ³ is equal to the number of words of length  in ascending order over the alphabet , ~¸ ÁÃÁ ¹ and this is

  dim²²=³³~  67 

The Universal Property for Antisymmetric  -Linear Maps c The vector space -´%ÁÃÁ%µ  and therefore also the isomorphic spaces of   antisymmetric tensors  ²= ³ and ; ²= ³°0 , have the universal property for symmetric -linear maps.

Theorem 14.23 ( The universal property for antisymmetric multilinear maps, as measured by linearity) Let = be a finite-dimensional vector space over a field - . Then the pair cdc ²- ´% ÁÃÁ% µÁ!¢= ¦- ´%  ÁÃÁ% µ³ where

!²#ÁÃÁ# ³~# wÄw#  has the universal property for antisymmetric = -linear maps with domain d , as measured by linearity. That is, for any antisymmetric  -linear map ¢=d ¦ < where < is a vector space, there is a unique linear map c ¢- ´% ÁÃÁ% µ¦< for which

²# wÄw# ³~²#ÁÃÁ# ³ for any vectors #= . Proof. Since  is antisymmetric, it is completely determined by the fact that it is alternate and by its values on ascending words wÄw , where Ä. Accordingly, we can define  by

² wÄw ³~² ÁÃÁ ³ and this does indeed uniquely define a well-defined linear transformation  . Tensor Products 387

The Determinant The universal property for antisymmetric multilinear maps has the following corollary.

Corollary 14.24 Let =- be a vector space of dimension over a field . Let , ~² ÁÃÁ ³ be an ordered basis for = . Then there is a unique antisymmetric ¢=¦- -linear form d for which

² Á Ã Á  ³ ~  Proof. According to the universal property for antisymmetric  -linear forms, for d  every such form ¢= ¦ - , there is a unique linear map  ¢ = ¦ - for which

² wÄw  ³~²ÁÃÁ   ³~ 23 But the dimension of = is  ~  and ¸ w Ä w  ¹ is a basis for ²= ³ .  Hence, there is only one linear map ¢ = ¦- with ² wÄw ³~ . It follows that if  and are two such forms, then

²ÁÃÁ³~ ²wÄw³~²ÁÃÁ³    and the antisymmetry of  and imply that  and agree on every permutation of ² ÁÃÁ ³ . Since  and  are multilinear, we must have  ~ .

We now wish to construct the unique antisymmetric form  guaranteed by the previous result. For any #= , write ´#µ,Á for the  th coordinate of the coordinate matrix ´#µ, . Thus,

#~ ´#µ,Á   

For clarity, and since we will not change the basis, let us write ´#µ,Á for ´#µ .

Consider the map ¢ =d ¦ - defined by

 ²#ÁÃÁ#³~ ²c³´#µ () Ä´#µ ()

: Then  is multilinear since 388 Advanced Linear Algebra

 ²# b " Á Ã Á #  ³ ~ ²c³ ´#  b " µ() Ä´# µ ()

:  ~²c³²´#µb´"µ³Ä´#µ ²³ () () :  ~  ²c³ ´# µ²³ Ä´# µ() :  b  ²c³´"µ²³ Ä´#µ() : ~ ²# Á à Á # ³ b ²"  Á # Á à Á # ³ and similarily for any coordinate position.

The map ²-³£ is alternating, and therefore antisymmetric since char . To see this, suppose for instance that #~# . For any permutation  :  , let ²³ ~  and ²³ ~  . Then the permutation Z ~ ²³ satisfies

1) For % £  and % £ , Z ²%³ ~ ²%³ 2) Z²³ ~ ²³ ²³ ~ ²³²³ ~  ~  ²³ 3) Z²³ ~ ²³ ²³ ~ ²³²³ ~  ~  ²³.

Hence, ZZZ£²³~ and it is easy to check that   . It follows that if the sets ¸ÁZZ ¹ and ¸Á  ¹ intersect, then they are identical. In other words, the distinct Z sets ¸Á ¹ form a partition of : .

Hence,

 ²#Á#ÁÃÁ#³~  ²c³´#µ () ´#µ () Ä´#µ   ()

:

 ~ > ²c³ ´# µ() ´# µ () Ä´#  µ  () pairs ¸ÁZ ¹

Z b ²c³ ´# µZZ() ´# µ () Ä´#  µ  Z ()?

But

´#µ() ´#µ () ~´#µ  ZZ ()  ´#µ  () 

Z and since ²c³ ~ c²c³ , the sum of the two terms involving the pair ¸ ÁZ ¹ is  . Hence, ²# Á# ÁÃÁ#  ³~ . A similar argument holds for any coordinate pair. Tensor Products 389

Finally, we have

 ²ÁÃÁ³~ ²c³´µ () Ä´µ ()

:  ~ ²c³Á ²³ Ä Á ²³ : ~

Thus, the map = is indeed the unique antisymmetric -linear form on d for which ² Á Ã Á  ³ ~ .

 Given the ordered basis , ~² ÁÃÁ ³ , we can view = as the space - of d coordinate vectors and view =4²-³d as the space  of matrices, via the isomorphism

vy´# µ Ä ´#  µ ²# ÁÃÁ# ³ª ÅÅ wz´# µ Ä ´#  µ where all coordinate matrices are with respect to , .

With this viewpoint,  becomes an antisymmetric -form on the columns of a matrix ( ~ ²Á ³ given by

 ²(³ ~ ²c³ Á ²³ Ä Á ²³ :

This is called the determinant of the matrix ( . Properties of the Determinant Let us explore some of the properties of the determinant function.

! Theorem 14.25 If (  4 ²- ³ then ²(³ ~ ²( ³ . Proof. We know that

 ²(³ ~ ²c³ Á ²³ Ä Á ²³ : which can be written in the form

 ²(³ ~ ²c³ c² ²³³Á  ²³ Ä  c ² ²³³Á  ²³ : But we can reorder the factors in each term so that the second indices are in ascending order, giving 390 Advanced Linear Algebra

 ²(³ ~ ²c³ c²³Á Ä c ²³Á : c ~²c³Ä c²³Á c ²³Á c  :  ~²c³Ä ²³Á ²³Á : ~ ²(! ³ as desired.

Theorem 14.26 If (Á )  4 ²-³ then ²()³ ~ ²(³²)³ . Proof. Consider the map ¢4²-³¦-(  defined by

( ²?³ ~ ²(?³

We can consider ?( as a function on the columns of and write ²³ ²³ ²³ ²³ ( ¢²? ÁÃÁ? ³ª²(? ÁÃÁ(? ³ª²(?³ Now, this map is multilinear since multiplication by ( is distributive and the determinant is multilinear. For example, let &-Z and let ? come from ? by replacing the first column by & . Then

²³ ²³ ²³ ²³ ( ²? b &Á à Á ? ³ ª ²(? b (&Á à Á (? ³ ª²(?³b²(?³Z Z ~²?³b²?³((

The map ( is also alternating since is alternating and interchanging two coordinates in ²?²³ ÁÃÁ? ²³ ³ is equivalent to interchanging the corresponding columns of (? .

Thus, ( is an antisymmetric -linear form and so must be a scalar multiple of the determinant function, say ( ²?³ ~ ²?³ . Then

²(?³ ~ ( ²?³ ~ ²?³

Setting ?~0 gives ²(³~ and so ²(?³ ~ ²(³²?³ as desired.

c If 74²-³ is invertible, then 77 ~0 and so ²7³²7c ³ ~  which shows that ²7³ £  and ²7c ³ ~ °²7³ . Tensor Products 391

But any matrix (4 ²-³ is equivalent to a diagonal matrix (~7+8 where 78 and are invertible and + is diagonal with  's and  's on the main diagonal. Hence, ²(³ ~ ²7 ³²+³²8³ and so if ²(³ £  then ²+³ £  . But this can happen only if + ~ 0 , whence ( is invertible. We have proved the following.

Theorem 14.27 A matrix (4 ²-³ is invertible if and only if ²(³£ . Exercises 1. Show that if ¢> ¦? is a linear map and ¢< d= ¦> is bilinear then  k¢< d= ¦? is bilinear. 2. Show that the only map that is both linear and ‚ -linear (for ) is the zero map. 3. Find an example of a bilinear map ¢= d= ¦> whose image im² ³ ~ ¸ ²"Á #³ “ "Á #  = ¹ is not a subspace of > . 4. Prove that the universal property of tensor products defines the tensor product up to isomorphism only. That is, if a pair ²?Á ¢Á¢³ with  bilinear characterizes the tensor product ²Á¢³ with  bilinear if ¸"¹ is a basis for <¸#¹ and  is a basis for = then ¸²"Á#³¹ is a basis for > . 6. Prove that can be extended to a linear function ¢< n= ¦ >. Deduce that the function  can be extended in a unique way to a bilinear map V¢< d= ¦ > . Show that all bilinear maps are obtained in this way. 10. Let :Á: be subspaces of < . Show that

²: n = ³ q ²: n = ³ š ²:  q : ³ n = 11. Let :‹< and ;‹= be subspaces of vector spaces < and = , respectively. Show that ²:n=³q²

12. Let :Á:‹<;Á;‹= and  be subspaces of <= and , respectively. Show that

²: n ; ³ q ²:  n ; ³ š ²:  q : ³ n ²;  n ; ³ 13. Find an example of two vector spaces <= and and a nonzero vector %

 %~ " n# ~

where the "# 's are linearly independent and so are the 's. 14. Let ? denote the identity operator on a vector space ? . Prove that =>=n>p~ . ZZ 15. Suppose that ¢< ¦=Á ¢= ¦> and  22 ¢< ¦= Á  ¢= ¦> . Prove that

²k³p²k³~²p³k²p³          16. Connect the two approaches to extending the base field of an -= -space to 2 (at least in the finite-dimensional case) by showing that  - n- 2 š ²2³ . 17. Prove that in a tensor product ? *block we have ²4³ ~ ²(³²*³ . 19. Let (Á )  4 ²-³ . Prove that if either ( or ) is invertible, then the matrices (b ) are invertible except for a finite number of 's. The Tensor Product of Matrices

20. Let ( ~ ²Á ³ be the matrix of a linear operator B  ²= ³ with respect to the ordered basis 7 ~ ²" Á à Á " ³ . Let ) ~ ² Á ³ be the matrix of a linear operator B²=³ with respect to the ordered basis 8 ~²#ÁÃÁ#³ . Consider the ordered basis 9 ~²"n#³ ordered by lexicographic order, that is " n# " M n# if M or ~M and  . Show that the matrix of n with respect to 9 is

ps)Á )Ä) Á Á ru) )Ä) (n)~ruÁ Á Á ÅÅ Å qt ))Ä)Á Á Á block Tensor Products 393

This matrix is called the tensor product , or direct product of the matrix () with the matrix . 21. Show that the tensor product is not, in general, commutative. 22. Show that the tensor product (n) is bilinear in both ( and ) . 23. Show that (n)~ if and only if (~ or )~ . 24. Show that a) ²( n )³!!! ~ ( n ) b) ²( n )³iii ~ ( n ) (when - ~ d) 25. Show that if "Á#-!! then (as row vectors) "#~" n# . 26. Suppose that (Á)Á*Á Á Á and + Á are matrices of the given sizes. Prove that ²( n )³²* n +³ ~ ²(*³ n ²)+³ Discuss the case ~ ~ . 27. Prove that if () and are nonsingular, then so is (n) and ²( n )³c ~ ( c n ) c 28. Prove that tr²( n )³ ~ tr ²(³ h tr ²)³ 29. Suppose that -( is algebraically closed. Prove that if has eigenvalues ÁÃÁ and ) has eigenvalues   ÁÃÁ both lists including multiplicity then (n) has eigenvalues ¸ “Á¹ , again counting multiplicity.  30. Prove that det²(Á n ) Á ³ ~ ² det ²( Á ³³ ² det ²) Á ³³ . Chapter 15 Positive Solutions to Linear Systems: Convexity and Separation

Given a matrix (CsÁ ² ³ consider the homogeneous system of linear equations (% ~  It is of obvious interest to determine conditions that guarantee the existence of positive solutions to this system, in a manner made precise by the following definition.

 Definition Let #~² ÁÃÁ ³s . Then 1) ##‚ is nonnegative , written if

 ‚ for all ~ÁÃÁ (Note that the term positive is also used in the literature for this property.) The set of all nonnegative vectors in ss is the nonnegative orthant in À 2) ##€# is strictly positive , written if is nonnegative but not , that is, if

 ‚ for all ~ÁÃÁ and  € for at least one ~ÁÃÁ

 The set ssb of all strictly positive vectors in is the strictly positive orthant in sÀ 3) ##ˆ is strongly positive , written if

 € for all ~ÁÃÁ

 The set ssbb of all strongly positive vectors in is the strongly positive orthant in sÀ

We are interested in conditions under which the system (% ~  has strictly positive or strongly positive solutions. Since the strictly and strongly positive orthants in ss are not subspaces of , it is difficult to use strictly linear 396 Advanced Linear Algebra methods in studying this issue: we must also use geometric methods, in particular, methods of convexity.

Let us pause briefly to consider an important application of strictly positive solutions to the system (%~ . If ? ~²% ÁÃÁ% ³ is a strictly positive solution then so is the vector  &~?~²%ÁÃÁ%³~²ÁÃÁ³ ''%% which is a probability distribution . Note that if we replace “strictly” with “strongly” then the probability distribution has the property that each probability is positive.

Now, the product ((& is the expected value of the columns of with respect to the probability distribution & . Hence, (% ~  has a strictly (strongly) positive solution if and only if there is a strictly (strongly) positive probability distribution for which the columns of ( have expected value . If these columns represent the payoffs from a game of chance then the game is fair when the expected value of the columns is (%~ . Thus, has a strictly (strongly) positive solution if and only if the “game” ( , where in the strongly positive case, all outcomes are possible, is fair.

As another (related) example, in discrete option pricing models of mathematical finance, the absence of arbitrage opportunities in the model is equivalent to the fact that a certain vector describing the gains in a portfolio does not intersect the strictly positive orthant in s . As we will see in this chapter, this is equivalent to the existence of a strongly positive solution to a homogeneous system of equations. This solution, when normalized to a probability distribution, is called a martingale measure.

Of course, the equation (% ~  has a strictly positive solution if and only if ker²(³ contains a strictly positive vector, that is, if and only if ker²(³ ~RowSpace ²(³ž meets the strictly positive orthant in s . Thus, we wish to characterize the subspaces :: of ssž for which meets the strictly positive orthant in  , in symbols

ž :qsb £J for these are precisely the row spaces of the matrices ((%~ for which has a strictly positive solution. A similar statement holds for strongly positive solutions.

Looking at the real plane s , we can divine the answer with a picture. A one- dimensional subspace : of s has the property that its orthogonal complement Positive Solutions to Linear Systems: Convexity and Separation 397

::%ž meets the strictly positive orthant (quadrant) in s if and only if is the - axis, the & -axis or a line with negative slope. For the case of the strongly positive orthant, : must have negative slope. Our task is to generalize this to s.

This will lead us to the following results, which are quite intuitive in ss and

ž  : qssbb £J if and only if :q b ~J (15.1) and

ž  : qssbbb £J if and only if :q ~J (15.2) Let us apply this to the matrix equation (% ~  . If : ~RowSpace ²(³ then :ž ~ker ²(³ and so we have  ker²(³ qssbb £ J if and only if RowSpace ²(³ q b ~ J and  ker²(³ qssbbb £ J if and only if RowSpace ²(³ q ~ J Now,  RowSpace²(³ qsb ~ ¸#( “ #( € ¹ and  RowSpace²(³ qsbb ~ ¸#( “ #( ˆ ¹ and so these statements become (%~ has a strongly positive solution if and only if ¸#(“#(€¹~J and (% ~  has a strictly positive solution if and only if ¸#( “ #( ˆ ¹ ~ J We can rephrase these results in the form of a theorem of the alternative , that is, a theorem that says that exactly one of two conditions holds.

Theorem 15.1 Let (CsÁ ² ³ . 1) Exactly one of the following holds: a) (" ~  for some strongly positive "  s b) #( €  for some #  s 2) Exactly one of the following holds: a) (" ~  for some strictly positive "  s b) #( ˆ  for some #  s .

Before proving statements (15.1) and (15.2), we require some background. 398 Advanced Linear Algebra

Convex, Closed and Compact Sets We shall need the following concepts.

Definition  1) Let %ÁÃÁ% s . Any linear combination of the form

! % bÄb!  %

where !bÄb!~Á!  is called a convex combination of the vectors %ÁÃÁ%. 2) A subset ?‹s is convex if whenever %Á&? then the entire line segment between %& and also lies in ? , in symbols ¸ %b!&“ b!~Á Á!¹‹?

 3) A subset ? ‹s is closed if whenever ²% ³ is a convergent sequence of elements of ?? , then the limit is also in . Simply put, a subset is closed if it is closed under the taking of limits. 4) A subset ?‹s is compact if it is both closed and bounded. 5) A subset ?‹s is a cone if %? implies that %? for all ‚ .

We will also have need of the following facts from analysis.

1) A continuous function that is defined on a compact set ? in s takes on its maximum and minimum values at some points within the set ? . 2) A subset ?? of s is compact if and only if every sequence in has a subsequence that converges in ? .

Theorem 15.2 Let ?@ and be subsets of s . Define ?b@ ~¸b“?Á@¹ 1) If ?@ and are convex then so is ?b@ 2) If ?@?b@ is compact and is closed then is closed. Proof. For 1) let %b& and %b&  be in ?b@ . The line segment between these two points is

!²% b & ³ b ² c !³²%  b & ³ ~ !% b ² c !³% b !& b ² c !³& ?b@ where ! and so ?b@ is convex.

For part 2) let %b& be a convergent sequence in ?b@ . Suppose that %b&¦'. We must show that '?b@ . Since %  is a sequence in the compact set ?%%? , it has a convergent subsequence  whose limit lies in .

Since b¦' and ¦%   we can conclude that ¦'c%   . Since @ is Positive Solutions to Linear Systems: Convexity and Separation 399 closed, we must have 'c%@ and so '~%b²'c%³?b@ , as desired. Convex Hulls We will have use for the notion of .

 Definition The convex hull of a set : ~¸% ÁÃÁ% ¹ of vectors in s is the  smallest convex set in s that contains the vectors %ÁÃÁ% . We denote the convex hull of :²:³ by 9 .

Here is a characterization of convex hulls.

 Theorem 15.3 Let : ~¸% ÁÃÁ% ¹ be a set of vectors in s . Then the convex hull 9"²:³ is the set of all convex combinations of vectors in : , that is,

9"²:³~ •¸!% bÄb!  % “!  Á ' !  ~¹ Proof. First, we show that " is convex. Let

?~! % bÄb!  % @ ~  % bÄb  % be convex combinations of : and let b~ÁÁ . Then

?b@ ~²!% bÄb!  % ³b² % bÄb  % ³ ~ ²! b  ³% b Ä b ²!  b  ³% But this is also a convex combination of the vectors in : because

! b  ²b³max ² Á!³~ max ² Á!³ and

 ²!b ³~ !  b  ~b~ ~ ~ ~ Thus, ?Á@ "" ¬ ? b @  which says that ""9" is convex. Since : ‹ , we have ²:³ ‹ . Clearly, if + is a convex set that contains :+ then also contains ""9 . Hence ‹²:³ .

Theorem 15.4 The convex hull 9²:³ of a finite set : ~ ¸% Á Ã Á % ¹ of vectors in s is a compact set. 400 Advanced Linear Algebra

Proof. Let

+~HI ²! ÁÃÁ! ³“!   and  !  ~ 

 and define a function ¢+¦s as follows. If !~²! ÁÃÁ! ³ then

²!³~ ! % bÄb!  %

To see that  is continuous, let ~² ÁÃÁ ³4~²%³ and let max ))  . For €, if )) c!  °4 then  (()) c! c!  4 and so

)))² ³c²!³ ~ ²  c! ³% bÄb²  c! ³% ) (())(()) c! % bÄb c! % 4)) c! ~  Finally, since +²:³²:³ maps the compact set onto 99 , we deduce that is compact. Linear and Affine Hyperplanes We next discuss hyperplanes in ss . A linear hyperplane in is an ² c ³ - dimensional subspace of s . As such, it is the solution set of a linear equation of the form

 % bÄb  % ~ or º5Á %» ~  where 5 ~² ÁÃÁ ³ is nonzero and %~²%  ÁÃÁ% ³ . Geometrically speaking, this is the set of all vectors in s that are perpendicular (normal) to the vector 5 .

An (affine) hyperplane is a linear hyperplane that has been translated by a vector. Thus, it is the solution set to an equation of the form

 ²% c  ³bÄb  ²% c  ³~ or

 % bÄb  % ~   bÄ  or finally Positive Solutions to Linear Systems: Convexity and Separation 401

º5Á %» ~ º5Á )» where ) ~² ÁÃÁ ³ .

Let us write >ss²5Á ³ , where 5  and   , to denote the hyperplane >s²5Á ³ ~ ¸%  “ º5Á %» ~ ¹ Note that the hyperplane

>s²5Á)) 5 ³ ~ ¸%  “ º5Á %» ~ )) 5 ¹ contains the point 5²5Á³ , which is the point of > closest to the origin, since Cauchy's inequality gives

))5~º5Á%»5% )))) and so ))5% )) for all %²5Á5³> )) . Moreover, any hyperplane has the form >²5Á)) 5 ³ for any appropriate vector 5 .

A hyperplane defines two (nondisjoint) closed half-spaces  >sb²5Á ³ ~ ¸%  “ º5Á %» ‚ ¹  >sc²5Á ³ ~ ¸%  “ º5Á %»  ¹ and two (disjoint) open half-spaces k >sb²5Á ³ ~ ¸%  “ º5Á %» € ¹ k >sc²5Á ³ ~ ¸%  “ º5Á %»  ¹ It is not hard to show that

>>bc²5Á ³ q ²5Á ³ ~ > ²5Á ³

kk and that >>bc²5Á ³ , ²5Á ³ and > ²5Á ³ are pairwise disjoint and kk  >>>sbc²5Á ³ r ²5Á ³ r ²5Á ³ ~ Definition The subsets ?@ and of s are strictly separated by a hyperplane >>²5Á ³ if ? lies in one open half-space determined by ²5Á ³ and @ lies in the other. Thus, one of the following holds:

1) º5Á%» º5Á&» for all %?Á&@ 2) º5Á&» º5Á%» for all %?Á&@ .

Note that 1) holds for 5 and  if and only if 2) holds for c5 and c , and so we need only consider one of the conditions to demonstrate that two sets ?@ and are not stricctly separated. In particular, suppose that 1) fails for all 5 and . Then the condition 402 Advanced Linear Algebra

ºc5Á &»  c  ºc5Á %» also fails and so 1) and 2) both fail for all 5?@ and and and are not strictly separated.

The following type of separation is stronger than strict separation.

Definition The subsets ?@ and of s are strongly separated by a hyperplane >²5Á ³ if there is an  €  for which one of the following holds:

1) º5Á%»cbº5Á&» for all %?Á&@ 2) º5Á&»cb º5Á%» for all %?Á&@

Note that, as before, we need only consider one of the conditions to show that two sets are not strongly separated. Separation Now that we have the preliminaries out of the way, we can get down to some theorems. The first is a well known separation theorem that is the basis for many other separation theorems. It says that if a closed convex set * in s does not contain a vector * , then can be strongly separated from  by a hyperplane.

Theorem 15.5 Let * be a closed convex subset of s . 1) *5 contains a unique vector of minimum norm, that is, there is a unique vector 5* for which ))5% )) for all %*Á%£5 . 2) If ** does not contain the origin then lies in the closed half-space º5Á %» ‚)) 5 €  where 5£ is the vector in * of minimum norm. Hence,  and * are strongly separated by the hyperplane >²5Á)) 5 °³ . 3) If ¤* then  and * are strongly separated. Proof. For part 1), we first show that *5 contains a vector of minimum norm. Recall that the Euclidean norm (distance) is a continuous function. Although * need not be compact, if we choose a real number such that the closed ball  )²³~¸' s “)) '  ¹

Z intersects * , then that intersection * ~ * q ) ²³ is both closed and bounded and so is compact. The distance function therefore achieves its minimum on * Z , say at the point 5*Z ‹* . It is clear that if for some #* we have Z ))#5 ) ) then #)))5 ²³‹* , which is a contradiction to the minimality of 55 . Hence, is a vector of minimum norm in * . Let us write )) 5~ . Positive Solutions to Linear Systems: Convexity and Separation 403

Suppose now that %£5 is another vector in * with )) % ~ . Since * is convex, the line segment from 5% to must be contained in * . In particular, the vector ' ~ ²°³²% b 5³ is in * . Since % cannot be a scalar multiple of 5 , the Cauchy-Schwarz inequality is strict º5Á %» )))) 5 % ~  Hence  P'P ~)) % b 5    ~ ²P5P b º5Á %» b P%P ³   ~ ² b º5Á %»³   But this contradicts the minimality of %~5* and so . Thus, has a unique vector of minimum norm.

For part 2), if there is an %* for which º5Á %» )) 5 ~  then again setting ' ~ ²°³²% b 5³  * we find that ' has norm less than  , which is not possible. Hence,

º5Á %» ‚)) 5  for all %* .

For part 3), if ¤* is not the origin, then  is not in the closed convex set * c ¸¹ ~ ¸ c  “   *¹ Hence, by part 2), there is a nonzero 5* for which º5Á %» ‚)) 5  for all %*c¸¹ . But as % ranges over *c¸¹ , the vector %c ranges over * so we have º5Á % c » ‚)) 5  for all %* . This can be written º5Á %» ‚)) 5 b º5Á » for all %* . Hence 404 Advanced Linear Algebra

º5Á » )) 5 b º5Á »  º5Á %» from which it follows that * and are strongly separated.

The next result brings us closer to our goal by replacing the origin with a subspace :* disjoint from . However, we must strengthen the requirements on * a bit.

Theorem 15.6 Let *: be a compact convex subset of s and let be a subspace of s such that *q:~J . Then there exists a nonzero 5:ž such that º5Á %» ‚)) 5  for all %  * . Hence, the hyperplane > ²5Á)) 5 °³ strongly separates : and *. Proof. Consider the set :b* , which is closed since : is closed and * is compact. It is also convex since :* and are convex. Furthermore, ¤:b* because if ~ b then ~c would be in *q:~J .

According to Theorem 15.5, the set :b* can be strongly separated from the origin. Hence, there is a nonzero 5s such that º5Á %» ‚)) 5  for all %~ b:b* , that is, º5Á » b º5Á » ~ º5Á b » ‚)) 5  for all : and * . Now, if º5Á » is nonzero for any : , we can replace by an appropriate scalar multiple of to make the left side negative, which is impossible. Hence, we must have º5Á » ~  for all  : . Thus, 5:ž and º5Á » ‚)) 5  for all * , as desired.

We can now prove (15.1) and (15.2).

Theorem 15.7 Let : be a subspace of s . Then ž 1) :qssbbb ~J if and only if : q £J ž 2) :qssbb ~J if and only if : q b £J  Proof. For part 1), it is clear that there cannot exist vectors "sbb and ž #sssbbbb that are orthogonal. Hence, :q and : q cannot both be ž  nonempty, so if : qssbb £J then :q b ~J . The converse is more interesting. Positive Solutions to Linear Systems: Convexity and Separation 405

ž Suppose that :qssbbb ~J . A good candidate for an element of : q would  be a normal to a hyperplane that separates : from a subset of sb . Note that our  theorems do not allow us to separate : from sb , because it is not compact. So  consider instead the convex hull "s of the standard basis vectors ÁÃÁ in b

"~¸! bÄb!   “!Á!~¹  ' 

 It is clear that ""s"" is convex and ‹q:~Jb and so . Also, is closed and bounded and therefore compact. Hence, by Theorem 15.6, there is a ž nonzero vector 5 ~² ÁÃÁ ³: such that º5Á » ‚)) 5  for all "À Taking  ~ gives  ~º5Á»‚5 )) €

ž and so 5: qsbb , which is therefore nonempty.

 To prove part 2), again we note that there cannot exist vectors "sbb and ž #sssbbbb that are orthogonal. Hence, :q and : q cannot both be ž  nonempty, so if : qssbbb £J then :q ~J .

To prove that

ž :qssbb ~J¬: q b £J note first that a subspace contains a strictly positive vector 5 if and only if it contains a strictly positive vector whose coordinates sum to  .

Let 8 ~¸)ÁÃÁ)¹ be a basis for : and consider the matrix

4~²Á ³~²)  “)  “Ä“)  ³ whose columns are the basis vectors in 8 . Let the rows of 4 be denoted by  9ÁÃÁ9. Note that 9  s , where ~dim ²:³ .

ž Now, :5~²ÁÃÁ³ contains a strictly positive vector  if and only if

 9 bÄb  9 ~ for coefficients ‚ satisfying ' ~ , that is, if and only if  is contained in  the convex hull * of the vectors 9 ÁÃÁ9 in s . Hence, ž :qsb £J if and only if * Thus, we wish to prove that  :qsbb ~J¬* or, equivalently, 406 Advanced Linear Algebra

 £*¬:qsbb £J Now we have something to separate. Since * is closed and convex, it follows  from Theorem 15.5 that there is a nonzero vector ) ~² ÁÃÁ ³s for which

º)Á %» ‚)) ) €  for all %* . Consider the vector

#~ ) bÄb  ) : The # th coordinate of is

   Á bÄb   Á ~º)Á9»‚  )) ) €

 and so ##:q is strongly positive. Hence, sbb and so this set is nonempty. This completes the proof. Nonhomogeneous Systems We now turn our attention to nonhomogeneous systems (% ~  The following lemma is required.

Lemma 15.8 Let (CsÁ ² ³ . Then the set *~¸(&“&s Á&‚¹ is a closed, . Proof. We leave it as an exercise to prove that * is a convex cone and omit the proof that * is closed.

 Theorem 15.9 (Farkas's lemma ) Let (CsÁ ² ³ and let  s be nonzero. Then exactly one of the following holds: 1) There is a strictly positive solution "s to the system (%~ . 2) There is a vector #s for which #( and º#Á»€ . Proof. Suppose first that 1) holds. If 2) also holds, then ²#(³" ~ #²("³ ~ º#Á » €  However, #(   and " €  imply that ²#(³"   . This contradiction implies that 2) cannot hold.

Assume now that 1) fails to hold. By Lemma 15.8, the set *~¸(&“&ss Á&‚¹‹ is closed and convex. The fact that 1) fails to hold is equivalent to ¤* . Positive Solutions to Linear Systems: Convexity and Separation 407

Hence, there is a hyperplane that strongly separates * and . All we require is that * and be strictly separated, that is, for some s  and # s º#Á %»   º#Á » for all %  * Since * it follows that  € and so º#Á»€ . Also, the first inequality is equivalent to º#Á (&»   , that is, º(! #Á &»   for all &s! Á&‚ . We claim that this implies that (# cannot have any ! positive coordinates and thus #(   . For if the  th coordinate ²( #³ is positive, then taking &~  for € we get ! ²( #³  which does not hold for large  . Thus, 2) holds.

In the exercises, we ask the reader to show that the previous result cannot be improved by replacing #(   in statement 2) with #( ‡  . Exercises 1. If ( is an d matrix prove that the set ¸(%“%s Á%€¹ is a convex cone in s . 2. If () and are strictly separated subsets of s and if ( is finite, prove that () and are strongly separated as well. 3. Let = be a vector space over a field - with char ²- ³ £  . Show that a subset ?= of is closed under the taking of convex combinations of any two of its points if and only if ? is closed under the taking of arbitrary convex combinations, that is, for all ‚  %ÁÃÁ%?Á ~Á ¬   %?  ~ ~ 4. Explain why an ² c ³ -dimensional subspace of s is the solution set of a linear equation of the form  % bÄb  % ~ . 5À Show that

>>bc²5Á ³ q ²5Á ³ ~ > ²5Á ³

kk and that >>bc²5Á ³ , ²5Á ³ and > ²5Á ³ are pairwise disjoint and kk  >>>sbc²5Á ³ r ²5Á ³ r ²5Á ³ ~ 6. A function ; ¢ss ¦ is affine if it has the form ; ²#³ ~  ²#³ b  for sBss, where  ² Á ³ . Prove that if *‹ s is convex then so is ;²*³‹ s. 408 Advanced Linear Algebra

7. Find a cone in ss that is not convex. Prove that a subset ? of is a convex cone if and only if %Á &  ? implies that  % b &  ? for all Á‚.  8. Prove that the convex hull of a set ¸% ÁÃÁ% ¹ in s is bounded, without using the fact that it is compact. 9. Suppose that a vector %s has two distinct representations as convex combinations of the vectors #ÁÃÁ# . Prove that the vectors #c#ÁÃÁ#c#  are linearly dependent. 10. Suppose that *²5Á³ is a nonempty convex subset of s> and that is a hyperplane disjoint from ** . Prove that lies in one of the open half-spaces determined by >²5Á ³ . 11. Prove that the conclusion of Theorem 15.6 may fail if we assume only that * is closed and convex. 12. Find two nonempty convex subsets of s that are strictly separated but not strongly separated. 13. Prove that ? and @ are strongly separated by > ²5Á ³ if and only if ZZ ZZ º5Á % » €  for all %  ? and º5Á & »   for all &  @

where ? ~ ? b )²Á ³ and @ ~ @ b )²Á ³ and where )²Á ³ is the closed unit ball. 14. Show that Farkas's lemma cannot be improved by replacing #(   in statement 2) with #( ‡  . Hint : A nice counterexample exists for ~Á~. Chapter 16 Affine Geometry

In this chapter, we will study the geometry of a finite-dimensional vector space = , along with its structure-preserving maps. Throughout this chapter, all vector spaces are assumed to be finite-dimensional. Affine Geometry The cosets of a quotient space have a special geometric name.

Definition Let := be a subspace of a vector space . The coset #b:~¸#b “ :¹ is called a in =: with base . We also refer to #b:: as a translate of . The set 7²= ³ of all flats in = is called the affine geometry of = . The dimension dim²77 ²=³³ of ²=³ is defined to be dim ²=³ .

Here are some simple yet useful observations about flats.

1) A flat ?~%b: is a subspace if and only if %~ , that is, if and only if ?~:. 2) A subset ?%?:~c%b? is a flat if and only if for any the translate is a subspace. 3) If ?~%b: is a flat and 'b? then 'b?~: .

Definition Two flats %~%b: and @ ~&b; are said to be parallel if :‹; or ;‹: . This is denoted by ?”@ .

We will denote subspaces of = by the letters :Á;ÁÃ and flats in = by ?Á@ ÁÃ.

Here are some of the basic intersection properties of flats. 410 Advanced Linear Algebra

Theorem 16.1 Let :; and be subspaces of = and let ?~%b: and @~&b; be flats in = . 1) The following are equivalent: a) %b:~&b: %&b:) c) %c&: 2) The following are equivalent: a) $b?‹@ for some $= b) #b:‹; for some #= c) :‹; 3) The following are equivalent: a) $b?~@ for some $= b) #b:~; for some #= c) :~; 4) ?q@ £JÁ:‹; ¯?‹@ 5) ?q@ £JÁ:~; ¯?~@ 6) If ?”@ then ?‹@, @ ‹? or ?q@ ~J 7) ?”@ if and only if some translation of one of these flats is contained in the other. Proof. We leave proof of part 1) for the reader. To prove 2), if 2a) holds then c&b$b%b:‹; and so 2b) holds. Conversely, if 2b) holds then &b#c%b²%b:³‹&b; ~@ and so 2a) holds. Now, if 2b) holds then #~#b; and so :‹c#b;‹;, which is 2c). Finally, if :‹; then just take #~ to get 2b). Part 3) is proved in a similar manner.

For part 4), we know that $b?‹@ for some $= . However, if '?q@ then $b'@ and so $c'b@ ~@ , whence ?‹@ . Part 5) follows similarly. We leave proof of 6) and 7) to the reader.

Part 1) of the previous theorem says that, in general, a flat can be represented in many ways as a translate of the base :?~%b:% . If , then is called a flat representative, or coset representative of ? . Any element of a flat is a flat representative.

On the other hand, if %b:~&b; then %c&b:~; and the previous theorem tells us that :~; . Thus, the base of a flat is uniquely determined by the flat and we can make the following definition.

Definition The dimension of a flat %b: is dim ²:³ . A flat of dimension  is called a -flat . A -flat is a point , a  -flat is a line and a 2-flat is a plane . A flat of dimension dim²²=³³c7 is called a hyperplane . Affine Geometry 411

Affine Combinations

If  - and bÄb ~ then the linear combination

 % bÄb  % is referred to as an of the vectors %ÁÃÁ% .

Our immediate goal is to show that, while the subspaces of = are precisely the subsets of = that are closed under the taking of linear combinations, the flats of == are precisely the subsets of that are closed under the taking of affine combinations.

First, we need the following.

Theorem 16.2 If char²- ³ £  , then the following are equivalent for a subset ? of = . 1) ? is closed under the taking of affine combinations of any two of its points, that is, %Á &  ? ¬ % b ² c ³&  ? 2) ? is closed under the taking of arbitrary affine combinations, that is,

% ÁÃÁ% ?Á  bÄb  ~¬  % bÄb % ? Proof. It is clear that 2) implies 1). For the converse, we proceed by induction on ‚ . Part 1) is the case ~ . Assume the result true for c and consider the affine combination

'~  % bÄb  %

If one of  or is different from  , say £  then we may write

 '~ % b²c ³ 45 %  bÄb %  c  c and since the sum of the coefficients inside the large parentheses is  , the induction hypothesis implies that this sum is in ?'? . Then 1) shows that . On the other hand, if ~ ~ then since char ²-³£ 2, we may write  '~>? % b % b %33 bÄb %   and since 1) implies that ²°³% b ²°³%  ? , we may again deduce from the induction hypothesis that '? . In any case, '? and so 2) holds.

Note that the requirement char²- ³ £  is necessary, for if - ~ { then the subset 412 Advanced Linear Algebra

? ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ satisfies condition 1) but not condition 2). We can now characterize flats.

Theorem 16.3 1) A subset ?= of is a flat in = if and only if it is closed under the taking of affine combinations, that is, if and only if

% ÁÃÁ% ?Á  bÄb  ~¬  % bÄb % ? 2) If char²- ³ £ 2, a subset ? of = is a flat if and only if % contains the line through any two of its points, that is, if and only if %Á &  ? ¬ % b ² c ³&  ?

Proof. Suppose that ? ~%b: is a flat and % ÁÃÁ% ? . Then %  ~%b  , for : and so if ' ~ , we have

 % ~ ²%b  ³~%b   %b:   and so ?? is closed under affine combinations. Conversely, suppose that is closed under the taking of affine combinations. It is sufficient (and necessary) to show that for a given %? , the set :~c%b? is a subspace of = . But if

c% b % Á c% b %  : then for any Á -

²c%b%³b ²c%b%³~c² b ³%b %b %     ~c%b´²c c ³%b %b %µ      c% b? Hence, := is a subspace of . Part 2) follows from part 1) and Theorem 16.2. Affine Hulls The following definition gives the analog of the subspace spanned by a collection of vectors.

Definition Let *=* be a nonempty set of vectors in . The of , denoted by hull²*³ , is the smallest flat containing * .

Theorem 16.4 The affine hull of a nonempty subset *= of is the affine span of ** , that is, the set of all affine combinations of vectors in  hull²*³ ~Dc  %  ‚ Á %  Á à Á %   *Á  ~  E ~ ~ Affine Geometry 413

Proof. According to Theorem 16.3, any flat containing * must contain all affine combinations of vectors in *? . It remains to show that the set of all affine combinations of *&? is a flat, or equivalently, that for any , the set :~?c& is a subspace of = . To this end, let   &~ %Á&~ %Á  Áand &~ %  2 Á ~ ~ ~ where %* and ''' ~ ~ ~ Á Á Á . Hence, any linear combination of &c& and &c& has the form

'~ ²& c&³b!²& c&³  ~  %b! %c²Á 2 Á  b!³& ~ ~  ~ ² Á b! 2 Á ³%  c² b!c1) &c& ~  ~45 Á b! 2 Á c² b!c1) Á %  c& ~

But, since ''' ~ ~ ~Á Á Á , the sum of the coefficients of %  is equal to ':: and so the sum is an affine sum, which shows that . Hence, is a subspace of = .

The affine hull of a finite set of vectors is denoted by hull¸% ÁÃÁ% ¹ . We leave it as an exercise to show that for any 

hull¸% ÁÃÁ% ¹ (16.1) ~% bº%c%ÁÃÁ%   c c%Á%  b c%ÁÃÁ%   c%»  where º» denotes the subspace spanned by the vectors within the brackets. It follows that

dim²¸%ÁÃÁ%¹³chull  The affine hull of a pair of distinct points is the line through those points, denoted by %& ~ ¸ % b ² c ³& “  - ¹ ~ & b º% c &»

The Lattice of Flats Since flats are subsets of = , they are partially ordered by set inclusion. 414 Advanced Linear Algebra

Theorem 16.5 The intersection of a nonempty collection 9 ~¸%b: “2¹ of flats in = is either empty or is a flat. If the intersection is nonempty, then

²% b : ³ ~ % b :  2 2 for any vector % in the intersection. In other words, the base of the intersection is the intersection of the bases. Proof. If

%  ²% b : ³ 2 then %b:~%b:  for all 2 and so

²% b : ³ ~ ²% b :  ³ ~ % b :  2 2 2

Definition The join of a nonempty collection 9 ~¸%b: “2¹ of flats in = is the smallest flat containing all flats in 9 . We denote the join of the collection 99 of flats by  , or by

²% b : ³ 2 The join of two flats is written ²%b:³v²&b;³ .

Theorem 16.6 Let 9 ~¸%b: “2¹ be a nonempty collection of flats in the vector space = . 1) 99 is the intersection of all flats that contain all flats in . 2) 99 is hull²³ , where  9 is the union of all flats in 9 .

Theorem 16.7 Let ?~%b: and @ ~&b; be flats in = . Then 1) ? v @ ~ % b ²º% c &» b : b ; ³ 2) If ?q@ £J then ? v @ ~ % b ²: b ; ³ Proof. For part 1), let ? v @ ~ ²% b :³ v ²& b ; ³ ~ ' b < for some '= and subspace < of = . Of course, :b;‹< . But since %Á &  ' b <, we also have % c &  < . (Note that % c & is not necessarily in :b;.)

Let >~º%c&»b:b; . Then >‹< and so %b>‹%b<~?v@ . On the other hand Affine Geometry 415

?~%b:‹%b> and @~&b;~%c²%c&³b;‹%b> and so ?v@ ‹%b> . Thus, ?v@ ~%b> , as desired.

For part 2), if ?q@ £J then we may take the flat representatives for ? and @ to be any element '?q@ , in which case part 1) gives ?v@ ~'b²º'c'»b:b;³~'b:b; and since %?v@ we also have ?v@ ~%b:b; .

We can now describe the dimension of the join of two flats.

Theorem 16.8 Let ?~%b: and @ ~&b; be flats in = . 1) If ?q@ £J then dim²? v @ ³ ~ dim ²: b ; ³ ~ dim ²?³ b dim ²@ ³ c dim ²? q @ ³ 2) If ?q@ ~J then dim²? v @ ³ ~ dim ²: b ; ³ b 

Proof. According to Theorem 16.7, if ?q@ £J then ?v@ ~%b:b; and so by definition of the dimension of a flat dim²? v @ ³ ~ dim ²: b ; ³ On the other hand, if ?q@ ~J then ? v @ ~ % b º% c &» b : b ; and since dim²º% c &»³ ~  , we get dim²? v @ ³ ~ dim ²: b ; ³ b  Finally, we have dim²: b ; ³ ~ dim ²:³ b dim ²; ³ c dim ²: q ; ³ Therefore, if '²%b:³q²&b;³ then %b:~'b: and &b; ~'b; and so dim²: q ; ³ ~ dim ²' b ´: q ; µ³ ~dim ²´' b :µ q ´' b ; µ³ ~²?q@³dim 416 Advanced Linear Algebra

Affine Independence We now discuss the affine counterpart of linear independence.

Theorem 16.9 Let ? ~¸% ÁÃÁ% ¹ be a nonempty set of vectors in = . The following are equivalent: 1) / ~hull ¸% ÁÃÁ% ¹ has dimension c . 2) The set

¸% c%ÁÃÁ% cb c% Á% c%ÁÃÁ% c%¹ is linearly independent for all ~ÁÃÁ . 3) %cb ¤hull ¸% ÁÃÁ% Á% ÁÃÁ% ¹ for all ~ÁÃÁ . 4) If '' r%% and s  are affine combinations then

 % ~ % ¬  ~  for all  

A set ? ~¸% ÁÃÁ% ¹ of vectors satisfying any (any hence all) of these conditions is said to be affinely independent . Proof. The fact that 1) and 2) are equivalent follows directly from (16.1). If 3) does not hold, we have

hull¸% ÁÃÁ% ¹~ hull ¸% cb ÁÃÁ% Á% ÁÃÁ% ¹ where by (16.1), the latter has dimension at most c 2. Hence, 1) cannot hold and so 1) implies 3).

Next we show that 3) implies 4). Suppose that 3) holds and that '' % ~ % . Setting !~ c gives

!% ~ and !  ~ 

But if any of the !!£! 's are nonzero, say then dividing by  gives

% b ²! °! ³% ~  € or

%~c ²!°!³% € where

c²!°!³~  € Affine Geometry 417

Hence, % hull ¸% ÁÃÁ% ¹ . This contradiction implies that !  ~ for all  , that is, ~ for all  . Thus, 3) implies 4).

Finally, we show that 4) implies 2). For concreteness, let us show that 4) implies that ¸% c% ÁÃÁ%  c% ¹ is linearly independent. Indeed, if   ÁÃÁ  - and ' ~  then

²% c %  ³ ~  ¬  % ~ %  ¬ ² c ³%  b   % ~ %  ‚22 ‚ ‚ 2

But the latter is an equality between two affine combinations and so corresponding coefficients must be equal, which implies that  ~ for all ~ÁÃÁ. This shows that 4) implies 2).

Affinely independent sets enjoy some of the basic properties of linearly independent sets. For example, a nonempty subset of an affinely independent set is affinely independent. Also, any nonempty set ? contains an affinely independent set.

Since the affine hull /~hull ²?³ of an affinely independent set ? is not the affine hull of any proper subset of ?? , we deduce that is a minimal affine spanning set of its affine hull.

Note that if /~hull ²?³ where ?~¸%ÁÃÁ%¹ is affinely independent then for any  , the set

¸% c%ÁÃÁ% cb c% Á% c%ÁÃÁ% c%¹ is a basis for the base subspace of / . Conversely, if /~%b:£: and ) ~¸ ÁÃÁ ¹ is a basis for : then Z ) ~¸%Á%b ÁÃÁ%b ¹ is affinely independent and since hull²)Z ³ has dimension  and is contained in %b: we must have )Z ~/ . This provides a way to go between “affine bases” ))Z of a flat and linear bases of the base subspace of the flat.

Theorem 16.10 If ?b is a flat of dimension then there exist vectors %ÁÃÁ%b for which every vector %? has a unique expression as an affine combination

%~  % bÄb bb %

The coefficients % are called the barycentric coordinates of with respect to the vectors %ÁÃÁ%b . Affine Transformations Now let us discuss some properties of maps that preserve affine structure. 418 Advanced Linear Algebra

Definition A function ¢= ¦ = that preserves affine combinations, that is, for which

  ~  ¬ 45 % ~ ²% ³  is called an affine transformation (or affine map , or affinity ).

We should mention that some authors require that  be bijective in order to be an affine map. The following theorem is the analog of Theorem 16.2.

Theorem 16.11 If char²- ³ £  then a function ¢ = ¦ = is an affine transformation if and only if it preserves affine combinations of any two of its points, that is, if and only if ² % b ² c ³&³ ~r ²%³ b ² c ³²&³ Thus, if char²- ³ £  then a map  is an affine transformation if and only if it sends the line through % and & to the line through ²%³ and ²&³ . It is clear that linear transformations are affine transformations. So are the following maps.

Definition Let #= . The affine map ;¢=# ¦= defined by

;²%³~%b## for all %= , is called translation by # .

It is not hard to see that any map of the form “linear operator followed by translation,” that is, ;k# B , where  ²=³ , is affine. Conversely, any affine map must have this form.

Theorem 16.12 A function ¢= ¦ = is an affine transformation if and only if it is a linear operator followed by a translation,

~;# k where #= and B  ²=³. Proof. We leave proof that ;k#  is an affine transformation to the reader. Conversely, suppose that  is an affine map. If we expect to have the form ;### k then ²³ will equal ; k ²³ ~ ; ²³ ~ # . So let # ~ ²³ . We must show that  ~;c²³ k is a linear operator on = . However, for any %= ²%³ ~ ²%³ c ²³ and so Affine Geometry 419

² " b #³ ~ ² " b #³ c ²³ ~ ² " b # b ² c c ³³ c ²³ ~ ²"³b ²#³b²c c ³²³c²³ ~  ²"³b ²#³ Thus,  is linear.

Corollary 16.13 1) The composition of two affine transformations is an affine transformation. 2) An affine transformation ~;# k is bijective if and only if is bijective. 3) The set aff²= ³ of all bijective affine transformations on = is a group under composition of maps, called the affine group of = .

Let us make a few group-theoretic remarks about aff²= ³ . The set trans ²= ³ of all translations of =²=³ is a subgroup of aff . We can define a function B¢aff ²=³¦ ²=³ by

²;# k ³ ~ It is not hard to see that  is a well-defined group homomorphism from aff²= ³ onto B²= ³ , with kernel trans ²= ³ . Hence, trans ²= ³ is a normal subgroup of aff²= ³ and aff²= ³ š²=³B trans²= ³

Projective Geometry If dim²= ³ ~  , the join (affine hull) of any two distinct points in = is a line. On the other hand, it is not the case that the intersection of any two lines is a point, since the lines may be parallel. Thus, there is a certain asymmetry between the concepts of points and lines in = . This asymmetry can be removed by constructing the projective plane . Our plan here is to very briefly describe one possible construction of projective geometries of all dimensions.

By way of motivation, let us consider Figure 16.1. 420 Advanced Linear Algebra

Figure 16.1 Note that /=¤/ is a hyperplane in a 3-dimensional vector space and that . Now, the set 7²/³ of all flats of = that lie in / is an affine geometry of dimension / . (According to our definition of affine geometry, must be a vector space in order to define 7²/³ . However, we hereby extend the definition of affine geometry to include the collection of all flats contained in a flat of =³ .

Figure 16.1 shows a one-dimensional flat ?º?» and its , as well as a zero-dimensional flat @º@»?/ and its span . Note that, for any flat in , we have dim²º?»³ ~ dim ²?³ b 

Note also that if 33 and are any two distinct lines in / , the corresponding planes º3 and º3 » have the property that their intersection is a line through the origin, even if the lines are parallel . We are now ready to define projective geometries.

Let =//= be a vector space of any dimension and let be a hyperplane in not containing the origin. To each flat ?/ in , we associate the subspace º?»= of generated by ?7¢²/³¦²=³ . Thus, the linear span function from 7I maps affine subspaces ?/ of to subspaces º?»= of . The span function is not surjective: Its image is the set of all subspaces that are not contained in the base subspace 2/ of the flat .

The linear span function is one-to-one and its inverse is intersection with / 7²<³~

The affine geometry 7²/³ is, as we have remarked, somewhat incomplete. In the case dim²/³ ~  every pair of points determines a line but not every pair of lines determines a point.

Now, since the linear span function 7²/³ is injective, we can identify 7 with its image 7 ²7 ²/³³ , which is the set of all subspaces of = not contained in the base subspace 2 . This view of 77 ²/³ allows us to “complete” ²/³ by including the base subspace 2 . In the three-dimensional case of Figure 16.1, the base plane, in effect, adds a projective line at infinity. With this inclusion, every pair of lines intersects, parallel lines intersecting at a point on the line at infinity. This two-dimensional projective geometry is called the projective plane .

Definition Let =²=³= be a vector space. The set I of all subspaces of is called the projective geometry of =²:³ . The projective dimension pdim of : I ²= ³ is defined as pdim²:³ ~dim ²:³ c 

The projective dimension of F²=³ is defined to be pdim ²=³~ dim ²=³c . A subspace of projective dimension  is called a projective point and a subspace of projective dimension  is called a projective line .

Thus, referring to Figure 16.1, a projective point is a line through the origin and, provided that it is not contained in the base plane 2/ , it meets in an affine point. Similarly, a projective line is a plane through the origin and, provided that it is not 2/ , it will meet in an affine line. In short, span²³~affine point line through the origin ~ projective point span²³~affine line plane through the origin ~ projective line The linear span function has the following properties.

Theorem 16.14 The linear span function 7 ¢7I ²/³ ¦ ²= ³ from the affine geometry 7I²/³ to the projective geometry ²= ³ defined by 7²?³ ~ º?» satisfies the following properties: 1) The linear span function is injective, with inverse given by 7²<³~

span45 ? ~ span ²? ³ 2 2 422 Advanced Linear Algebra

5) For any collection of flats in / ,

span89? ~ span ²? ³ 2 2 6) The linear span function preserves dimension, in the sense that pdim² span ²?³³ ~dim ²?³ 7) ? ” @ if and only if one of º?» q 2 and º@ » q 2 is contained in the other. Proof. To prove part 1), let %b: be a flat in / . Then %/ and so / ~ % b 2, which implies that : ‹ 2 . Note also that º% b :» ~ º%» b : and 'º%b:»q/~²º%»b:³q²%b2³¬'~ %b ~%b for some : , 2 and - . This implies that ²c ³%2 , which implies that either %2 or ~ . But %/ implies %¤2 and so ~ , which implies that '~%b %b: . In other words, º% b :» q / ‹ % b : Since the reverse inclusion is clear, we have º% b :» q / ~ % b : This establishes 1).

To prove 2), let <= be a subspace of that is not contained in 2 . We wish to show that < is in the image of the linear span function. Note first that since <‹“ 2 and dim ²2³ ~ dim ²= ³ c  , we have < b 2 ~ = and so dim²< q 2³ ~ dim ²< ³ b dim ²2³ c dim ²< b 2³ ~ dim ²< ³ c  Now, let £%

Thus, %< q/ for some £ - . Hence, the flat %b²

Exercises

1. Show that if %ÁÃÁ% = then the set :~¸'' %  “ ~¹ is a subspace of = . 2. Prove that hull¸% ÁÃÁ% ¹~% bº% c% ÁÃÁ% c% » .  3. Prove that the set ? ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ in ²{ ³ is closed under the formation of lines, but not affine hulls. 4. Prove that a flat contains the origin  if and only if it is a subspace. 5. Prove that a flat ?%? is a subspace if and only if for some we have %  ? for some  £  - . 6. Show that the join of a collection 9 ~¸%b: “2¹ of flats in = is the intersection of all flats that contain all flats in 9 . 7. Is the collection of all flats in = a lattice under set inclusion? If not, how can you “fix” this? 8. Suppose that ? ~ % b : and @ ~ & b ; . Prove that if dim ²?³ ~ dim ²@ ³ and ?”@ then :~;. 9. Suppose that ?~%b: and @ ~&b; are disjoint hyperplanes in = . Show that :~; . 10. (The parallel postulate³? Let be a flat in = and #¤? . Show that there is exactly one flat containing #? , parallel to and having the same dimension as ?. 11. a) Find an example to show that the join ?v@ of two flats may not be the set of all lines connecting all points in the union of these flats. b) Show that if ?@ and are flats with ?q@£J?v@ then is the union of all lines %& where %  ? and &  @ . 12. Show that if ?”@ and ?q@ ~J then dim²? v @ ³ ~ max ¸ dim ²?³Á dim ²@ ³¹ b  13. Let dim²= ³ ~  . Prove the following: a) The join of any two distinct points is a line. b) The intersection of any two nonparallel lines is a point. 14. Let dim²= ³ ~  . Prove the following: a) The join of any two distinct points is a line. b) The intersection of any two nonparallel planes is a line. c) The join of any two lines whose intersection is a point is a plane. d) The intersection of two coplanar nonparallel lines is a point. e) The join of any two distinct parallel lines is a plane. f) The join of a line and a point not on that line is a plane. g) The intersection of a plane and a line not on that plane is a point. 15. Prove that ¢= ¦ = is a surjective affine transformation if and only if  ~B k ;$ for some $  = and  ²= ³ . 16. Verify the group-theoretic remarks about the group homomorphism B¢aff ²=³¦ ²=³ and the subgroup trans ²=³ of aff ²=³ . Chapter 17 Operator Factorizations: QR and Singular Value

The QR Decomposition Let =--~ be a finite-dimensional inner product space over , where s or -~d. Let us recall a definition.

Definition A linear operator  on = is upper triangular with respect to an ordered basis 8~²#ÁÃÁ# ³ if the matrix ´ µ8 is upper triangular, that is, if for all ~ÁÃÁ

²#³º# ÁÃÁ#»  The operator  is upper triangularizable if there is an ordered basis with respect to which  is upper triangular.

Given any orthonormal basis 8 for = , it is possible to find a unitary operator for which  is upper triangular with respect to 8 . In matrix terms, this is equivalent to the fact that any matrix ((~89 can be factored into a product where 89 is unitary (orthogonal) and is upper triangular. This is the well known QR factorization of a matrix. Before proving this fact, let us repeat one more definition.

Definition For a nonzero #= , the unique operator /# for which

/## # ~ c#Á ²/ ³Oº#»ž ~  is called a reflection or a Householder transformation .

According to Theorem 10.11, if ))#~$£ ) ) , then /#c$ is the unique reflection sending #$ to , that is, /#c$ ²#³~$ . 426 Advanced Linear Algebra

Theorem 17.1 (QR-Factorization of an operator ) Let  be a linear operator on a finite-dimensional real or complex vector space = . Then for any ordered orthonormal basis 8 ~²"ÁÃÁ" ³ for = , there is a unitary (orthogonal) operator  and an operator that is upper triangular with respect to 8 , that is,

²"³º" ÁÃÁ"»  for all ~ÁÃÁ , for which ~k Moreover, if  is invertible, then can be chosen with positive eigenvalues, in which case both  and are unique. Proof. Let ~))"%" . If ~ c " then

²/% k ³²"""" ³ ~ ²/"" c k ³² ³ ~   º » where, if  is invertible then  is positive.

Assume for the purposes of induction that, for a given  , we have ²³ found reflections /ÁÃÁ/%% for which, setting / ~/Ä/ %%  ²³ ²/ k ³""  º Á à Á "  » for all  . Assume also that if  is invertible, the coefficient of " in ²³ ²/ k ³" is positive.

We seek a reflection /%b for which ²/ k /²³ k ³""  º Á à Á " » %b   for b and for which, if  is invertible, the coefficient of " in ²/ k /²³ k ³" is positive. %b 

Note that if %"bº b ÁÃÁ "  » then /%b is the identity on º ""  ÁÃÁ  » and so, at least for  we have

²³ ²/%%b k / k ³"""""  / b ²º Á à Á »³ ~ º Á à Á » as desired. But we also want to choose %b so that

²³ ²/%b k / k ³"""b  º  Á à Á b » Let us write

²³ ²/ k ³"b ~ # b $ where #º"" ÁÃÁ » and $º " b ÁÃÁ " » . We can accomplish our goal by reflecting $º"»%~$c$" onto the subspace b . In particular, let b)) b . Operator Factorizations: QR and Singular Value 427

Since %b º"" b ÁÃÁ  » , the operator / %b is the identity on º ""  ÁÃÁ  » and so as noted earlier

²³ ²/%b k / k ³""  º Á à Á "  », for    Also

²³ ²/%b k / k ³"b ~ /$c)) $ "b ²# b $³ ~#b)) $"""b º  ÁÃÁ b » and if  is invertible, then $£ and so )) $ € . Thus, we have found /²b³ and by induction,

 ~/%% Ä/  is upper triangular with respect to 8 , which proves the first part of the theorem. It remains only to prove the uniqueness statement.

Suppose that  is invertible and that ~~    and that the coefficients c c of "" in  and " are positive. Then  ~  ~ is both unitary and upper triangular with respect to 8 and the coefficient of "" in is positive. We leave it to the reader to show that  must be the identity and so ~ and ~ .

Here is the matrix version of the preceding theorem.

Theorem 17.2 (The QR factorization ) Any real or complex matrix ( can be written in the form (~89 where 8 is unitary (orthogonal) and 9 is upper triangular. Moreover, if (9 is nonsingular then the diagonal entries of may be taken to be positive, in which case the factorization is unique. Proof. According to Theorem 17.1, there is a unitary (orthogonal) operator < for which ´<( µ; ~ 9 is upper triangular. Hence i (~´(µ~; ´<µ; 9~89 where 8 is a unitary (orthogonal) matrix.

The QR decomposition has important applications. For example, a system of linear equations (% ~ " can be written in the form 89% ~ " and since 8~8c i , we have 9% ~ 8i" This is an upper triangular system, which is easily solved by back substitution that is, starting from the bottom and working up. 428 Advanced Linear Algebra

Singular Values Let <= and be finite-dimensional inner product spaces over ds or . The spectral theorem can be of considerable help in understanding the relationship between a linear transformation B²<Á=³ and its adjoint i ²=Á<³ B . This relationship is shown in Figure 17.1. (We assume that <= and are finite- dimensional.)

im(W*) im(W)

W u1 ds1 v1 W W ONB of ONB of ur dsr vr eigenvectors W eigenvectors for * for W*W W W WW ur+1 0 vr+1

W W un 0 vm

ker(W)ker(W*) Figure 17.1 We begin with a simple observation: If ²Á³BB<= then i ²³ < is a positive Hermitian operator. Hence, if ~rk²² ³~ rk i ³ then < has an ordered orthonormal basis 8 ~²" b ÁÃÁ" Á" ÁÃÁ" ³ of eigenvectors for i , where the corresponding (not necessarily unique) eigenvalues satisfy

 b‚Ä‚ €~ ~Ä~ 

The numbers ~bj , for ~ÁÃÁ are called the singular values of  and for ~ÁÃÁ we have

i "~  " where  ~ for € .

It is not hard to show that ²" b ÁÃÁ"  ³ is an ordered orthonormal basis for ž ker² ³ and so ²" ÁÃÁ" ³ is an ordered orthonormal basis for ker ² ³ ~³im² i . For if € then i º"" Á »~º  ""  Á »~ Operator Factorizations: QR and Singular Value 429

and so ²" ³ ~  , that is, " ker ² ³ . On the other hand, if % ~ '   " is in ker²³ then

  ~ º ²%³Á ²%³» ~LM   ²" ³Á    ²" ³ ~ bb    and so  b ~ for  and so ker ² ³‹º" ÁÃÁ" » .

We can achieve some “symmetry” here between  and i by setting #~²° ³" for each  , giving #  " ~   F € and "   i# ~   F €

The vectors #ÁÃÁ# are orthonormal, since if Á then

i  º## Á »~ º "  Á "  »~ º  ""  Á »~ º ""  Á »~  Á   

iž Hence, ²ÁÃÁ³## is an orthonormal basis for im ² ³~²³ker , which can be extended to an orthonormal basis 9 ~²#ÁÃÁ# ³ for = , the extension i ²# b ÁÃÁ#  ³ being an orthonormal basis for ker ² ³ . The vectors " are called the right singular vectors for  and the vectors # are called the left singular vectors for  .

Moreover, since

i #~~ "  #  i the vectors ## ÁÃÁ are eigenvectors for  with the same eigenvalues i  ~  as for . This completes the picture in Figure 17.1.

Theorem 17.3 Let <= and be finite-dimensional inner product spaces over d or sB and let ²<Á=³ have rank . Then there is an ordered orthonormal basis 8 ~²" b ÁÃÁ" Á" ÁÃÁ" ³ of < and an ordered orthonormal basis 9 ~²# b ÁÃÁ# Á# ÁÃÁ# ³ of = with the following properties: ži 1) 8 ~²"ÁÃÁ"³ is an orthonormal basis for ker ² ³ ~ im ² ³ 2) ²" b ÁÃÁ"  ³ is an orthonormal basis for ker ² ³ iž 3) 9 ~²#ÁÃÁ#³ is an orthonormal basis for ker ² ³ ~ im ² ³ i 4) ²# b ÁÃÁ#  ³ is an orthonormal basis for ker ² ³ i 5) The operators  and behave “symmetrically” on 8 and 9 , specifically, for  , 430 Advanced Linear Algebra

²" ³ ~ # i  ²# ³ ~ "

where  € are called the singular values of  . The vectors "# are called the right singular vectors for  and the vectors  are called the left singular vectors for  .

The matrix version of the previous discussion leads to the well known singular value decomposition of a matrix. The matrix of  under the ordered orthonormal bases 89~²"ÁÃÁ" ³ and ~²#ÁÃÁ#  ³ is

´' µ89Á ~ ~diag² ÁÁ ÃÁ ÁÁ ÃÁ³

Given any matrix (CÁ of rank , let ~( be multiplication by ( . Then

(~´;;( µ;;Á where and are the standard bases for <= and , respectively. By changing orthonormal bases to 89 and we get i (~´µ'((;;ÁÁÁÁ ~4´µ4 9;  89 ;8  ~78

where 7~49;Á  is unitary (orthogonal for -~s ) with  th column equal to

´#Á µ;8; and 8 ~ 4 is unitary (orthogonal for - ~s ) with  th column equal to ´" µ; .

As to uniqueness, if (~7' 8i is a singular value decomposition then ((~²78³7iiiiii'' 8 ~8 '' 8

i    and since ''~ÃÁÃÁ³diag² ÁÁ ÁÁ , it follows that  is an eigenvalue i of (( . Hence, since  € , we deduce that the singular values are uniquely determined by ( .

We state without proof the following uniqueness facts. For a proof, the reader may wish to consult reference [HJ1]. If  and if the eigenvalues  are distinct then 7 is uniquely determined up to multiplication on the right by a diagonal matrix of the form +~diag²' ÁÃÁ' ³ with (( '  ~ . If  then 8 is never uniquely determined. If ~~ then for any given 7 there is a unique 8 . Thus, we see that, in general, the singular value decomposition is not unique. The Moore–Penrose Generalized Inverse Singular values lead to a generalization of the inverse of an operator that applies to all linear transformations. The setup is the same as in Figure 17.1. Referring to that figure, we are prompted to define a linear transformation  b¢¦=< by

 b " for   # ~ F  € for Operator Factorizations: QR and Singular Value 431 for then

b ²³O~ºÁÃÁ»""  b ²³OºÁÃÁ»"" b  ~ and

b ²³O~ ºÁÃÁ»##  b ²³O ºÁÃÁ»## b  ~ Hence, if ~~ then bb ~ c . The transformation  is called the Moore–Penrose generalized inverse or Moore–Penrose pseudoinverse of  . We abbreviate this as MP inverse .

Note that the composition b is the identity on the largest possible subspace of < upon which any composition of the form  could be the identity, namely, the orthogonal complement of the kernel of  . A similar statement holds for the composition bb . Hence,  is as “close” to an inverse for  as is possible.

We have said that if  is invertible then bc~ . More is true: If is injective then bb~  and so  is a left inverse for  . Also, if  is surjective then  b is a right inverse for  . Hence the MP inverse b generalizes the one-sided inverses as well.

Here is a characterization of the MP inverse.

Theorem 17.4 Let B²<Á=³ . The MP inverse b of  is completely characterized by the following four properties: 1) b ~  2) bb~  b 3) b is Hermitian 4) b is Hermitian Proof. We leave it to the reader to show that  b does indeed satisfy conditions 1)–4) and prove only the uniqueness. Suppose that  and satisfy 1)–4) when substituted for  b . Then ~  ~² ³i ~  ii ~² ³ii ~ iiii ~² ³iii ~ ii ~  ~  432 Advanced Linear Algebra and ~  ~² ³i ~ ii ~²³ii  ~ iiii  ~²³ii  i ~ ii  ~  ~  which shows that ~ .

The MP inverse can also be defined for matrices. In particular, if (4Á ²-³ b then the matrix operator ( has an MP inverse ( . Since this is a linear  b transformation from -- to , it is just multiplication by a matrix ( ~ ) . This matrix ) is the MP inverse for (( and is denoted by b .

b Since ( ~~(()(b and  ) , the matrix version of Theorem 17.4 implies that (b is completely characterized by the four conditions

1) ((b ( ~ ( 2) (((~(bb b 3) ((b is Hermitian 4) ((b is Hermitian

Moreover, if i (~<' < is the singular value decomposition of the matrix ( then b Zi (~<<'  where ''Z is obtained from by replacing all nonzero entries by their multiplicative inverses. This follows from the characterization above and also from the fact that, for 

Zi Z c c <<''# ~<~  <~  " and for € Zi Z <'' <# ~< ~ Operator Factorizations: QR and Singular Value 433

Least Squares Approximation Let us now discuss the most important use of the MP inverse. Consider the system of linear equations (% ~ # where (4Á ²-³ . (As usual, -~ds or -~ .) Of course, this system has a solution if and only if #²³im ( . If the system has no solution, then it is of considerable practical importance to be able to solve the system (% ~V # where V#²# is the unique vector in im (³ that is closest to , as measured by the unitary (or Euclidean) distance. This problem is called the linear least squares problem. Any solution to the system (% ~V # is called a least squares solution to the system (% ~## . Put another way, a least squares solution to (% ~ is a vector %(%c for which ))# is minimized.

Suppose that $' and are least squares solutions to (%~# . Then ($~#~('V and so $c'ker ²(³ . (We will write ( for ( .) Thus, if $ is a particular least squares solution, then the set of all least squares solutions is $ bker ²(³ . Among all solutions, the most interesting is the solution of minimum norm. Note that if there is a least squares solution $²(³ that lies in ker ž , then for any ' ker ²(³ , we have )$b'~$b'‚$ ) )) )) )) and so $ will be the unique least squares solution of minimum norm.

Before proceeding, we remind the reader of our discussion related to the projection theorem (Theorem 9.12) to the effect that if : is a subspace of a finite-dimensional inner product space = , then the best approximation to a vector #= from within : is the unique vector VV #: for which #c#ž: .

Now we can see how the MP inverse comes into play.

Theorem 17.5 Let (4Á ²-³ . Among the least squares solutions to the system (% ~V # there is a unique solution of minimum norm, given by (#bb , where ( is the MP inverse of ( . Proof. A vector $($~# is a least squares solution if and only if V . Using the characterization of the best approximation V#$ , we see that is a solution to ($ ~V # if and only if 434 Advanced Linear Algebra

(cž$#im ² (³ Since im²(³ži ~ker ²( ³ this is equivalent to (²(i $# c ³~ or ((ii$# ~( This system of equations is called the normal equations for (% ~ # . Its solutions are precisely the least squares solutions to the system (% ~ # .

To see that $#~(b is a least squares solution, recall that, in the notation of Figure 17.1 # ((b # ~   F € and so (#i  (ib (²( # ³ ~ ~ ( i#  F € 

b and since 9 ~²#ÁÃÁ# ³ is a basis for = we conclude that ( # satisfies the normal equations.

Finally, since (#bžker ²(³ , we deduce by the preceding remarks that (# b is the unique least squares solution of minimum norm. Exercises

QR-Factorization 1. Suppose that  is unitary and upper triangular with respect to an orthonormal basis 8 and that the coefficient of "" in is positive. Show that  must be the identity. 2. Assume that  is a nonsingular operator on a finite-dimensional inner product space. Use the Gram–Schmidt process to obtain the QR- factorization of  . 3. Prove that for reflections, /~/"# if and only if " is a scalar multiple of # .  4. For any nonzero #- , show that the reflection /# is given by ##i /~0c# ))#  5. Use the QR factorization to show that any matrix that is similar to an upper triangular matrix is also similar to an upper triangular matrix via a unitary (orthogonal) matrix. Operator Factorizations: QR and Singular Value 435

6. Let :Á²=³B and suppose that :~: . Let  ÁÃÁ be the eigenvalues for :ÁÃÁ and  be the eigenvalues for  . Show that the eigenvalues of :b are

bÁÃÁb  

where ² ÁÃÁ ³ is a permutation of ²ÁÃÁ³ . Hence, for commuting operators, ²: b ³ ‹ ²:³ b ² ³

7. Let :Á ²=³B be commuting operators. Let  ÁÃÁ be the eigenvalues for :ÁÃÁ and  be the eigenvalues for  . Using the previous exercise, show that if all of the sums b are nonzero, then :b is invertible. 8. Let 1 be the matrix

vyÄ   x{ÅÇ   1~x{ ÇÇÅ wzÄ

that has  's on the diagonal that moves up from left to right and 's elsewhere. Find 11c and i . Compare 1(( with . Compare (1( with . Compare 1(1i with ( . Show that any upper triangular matrix is unitarily equivalent to a lower triangular matrix. 9. If  ²=³B8 and ~²ÁÃÁ³"" is a basis for which

""º ÁÃÁ "  »

then find a basis 9 ~²%% ÁÃÁ ³ for which

%%º ÁÃÁ %  » 10. (Cholsky decomposition ) We have seen that a linear operator  is positive if and only if it has the form ~ i for some operator  . Using the QR- factorization of  , prove the following result, known as the Cholsky decomposition. A linear operator B²=³ is positive if and only if it has the form ~ i where  is upper triangularizable. Moreover, if  is invertible then  can be chosen with positive eigenvalues, in which case the factorization is unique. Singular Values 11. Let B²<³ . Show that the singular values of i are the same as those of . 12. Find the singular values and the singular value decomposition of the matrix 436 Advanced Linear Algebra

 (~ >? 

Find (b. 13. Find the singular values and the singular value decomposition of the matrix  (~ >?

Find (((((bii . Hint : is it better to work with or ? ! 14. Let ?~²%%Ä%³  be a column matrix over d . Find a singular value decomposition of ? . 15. Let (  4Á ²-³ and let )  4 bÁb ²-³ be the square matrix ( )~>?i (block Show that, counting multiplicity, the nonzero eigenvalues of ) are precisely the singular values of ( together with their negatives. Hint : Let i (~<' < be a singular–value decomposition of ( and try factoring ) into a product <:

ii < ' < )~>?>?>?i < ' < 

What are the eigenvalues of the middle factor on the right? (Try bb and bc .) 16. Use the results of the previous exercise to show that a matrix i! (4Á ²-³, its adjoint ( , its transpose ( and its conjugate ( all have the same singular values. Show also that if << and Z are unitary then ( and <(

°  ))(~- 89 Á Á

   Show that ))(~-  is the sum of the squares of the singular values of (. Chapter 18 The Umbral Calculus

In this chapter, we give a brief introduction to an area called the umbral calculus. This is a linear-algebraic theory used to study certain types of polynomial functions that play an important role in applied mathematics. We give only a brief introduction to the subject, emphasizing the algebraic aspects rather than the applications. For more on the umbral calculus, may we suggest The Umbral Calculus, by Roman ´µ 1984 ?

One bit of notation: The lower factorial numbers are defined by

²³ ~²c³Ä²cb³

Formal Power Series We begin with a few remarks concerning formal power series. Let < denote the algebra of formal power series in the variable ! , with complex coefficients. Thus, < is the set of all formal sums of the form B  ²!³~  ! (18.1) ~ where  d (the complex numbers). Addition and multiplication are purely formal

BBB    ! b  ! ~ ²  b  ³! ~ ~ ~ and

BB B   45454! ! ~   c 5 ! ~ ~ ~ ~

The order ²³ of  is the smallest exponent of ! that appears with a nonzero coefficient. The order of the zero series is defined to be bB . Note that a series 438 Advanced Linear Algebra

²³~ has a multiplicative inverse, denoted by c , if and only if . We leave it to the reader to show that ²³ ~ ²³ b ²³ and ² b ³ ‚min ¸²³Á ²³¹

If ²³¦B¦ is a sequence in < with as then for any series B  ²!³ ~  ! ~

 we may substitute ! for to get the series B ²!³ ~   ²!³ ~ which is well-defined since the coefficient of each power of ! is a finite sum. In particular, if ²³ ‚  then ² ³ ¦ B and so the composition B  ² k ³²!³ ~ ²²!³³ ~   ²!³ ~ is well-defined. It is easy to see that ² k ³ ~ ²³²³ .

If ²³ ~  then  has a compositional inverse, denoted by  and satisfying ² k ³²!³ ~ ² k ³²!³ ~ !

A series  with ²³ ~  is called a delta series .

The sequence of powers  of a delta series forms a pseudobasis for < , in the sense that for any < , there exists a unique sequence of constants  for which

B  ²!³ ~   ²!³ ~ Finally, we note that the formal derivative of the series (18.1) is given by

B Zc C²!³~²!³~! ! ~

The operator C! is a derivation, that is,

C!! ²³~ C ²³bC ! ²³ The Umbral Calculus 439

The Umbral Algebra Let Fd~´%µ denote the algebra of polynomials in a single variable % over the complex field. One of the starting points of the umbral calculus is the fact that any formal power series in < can play three different roles: as a formal power series, as a linear functional on FF and as a linear operator on . Let us first explore the connection between formal power series and linear functionals.

Let FFFi denote the vector space of all linear functionals on . Note that i is the algebraic dual space of F , as defined in Chapter 2. It will be convenient to denote the action of 3FFi on ²%³ by º3 “ ²%³» (This is the “bra-ket” notation of Paul Dirac.) The vector space operations on Fi then take the form º3 b 4 “ ²%³» ~ º3 “ ²%³» b º4 “ ²%³» and º 3 “ ²%³» ~ º3 “ ²%³»Á  d Note also that since any linear functional on F is uniquely determined by its values on a basis for FFÁ3 the functional i is uniquely determined by the values º3 “ % » for  ‚ .

Now, any formal series in < can be written in the form B  ²!³~  ! ! ~  and we can use this to define a linear functional ²!³ by setting  º²!³ “ % » ~  for  ‚  . In other words, the linear functional ²!³ is defined by B º²!³ “ % » ²!³~ ! ! ~  where the expression ²!³ on the left is just a formal power series. Note in particular that

 º! “ % » ~ !Á where Á is the Kronecker delta function. This implies that º!²³ “ ²%³» ~  ²³ and so !! is the functional “ th derivative at .” Also, is evaluation at . 440 Advanced Linear Algebra

As it happens, any linear functional 3 on F has the form ²!³ . To see this, we simply note that if B º3 “ % » ²!³~ ! 3 ! ~  then  º3 ²!³ “ % » ~ º3 “ % » for all  ‚  and so as linear functionals, 3 ~ 3 ²!³ .

i Thus, we can define a map F¢¦ < by  ²3³~²!³3 .

i Theorem 18.1 The map F¢¦ < defined by  ²3³~²!³3 is a vector space isomorphism from F

Let us consider an example.

i Example 18.1 For dF , the evaluation functional   is defined by

º “ ²%³» ~ ²³ The Umbral Calculus 441

 In particular, º“%»~ and so the formal power series representation for this functional is BBº“%»   ²!³~ !~! !~  !! ~ ~ which is the exponential series. If ! is evaluation at then ! ! ~ ²b³! and so the product of evaluation at  and evaluation at is evaluation at b.

When we are thinking of a delta series < as a linear functional, we refer to it as a delta functional . Similarly, an invertible series < is referred to as an invertible functional. Here are some simple consequences of the development so far.

Theorem 18.2 1) For any < , B º²!³ “ % » ²!³~ ! ~ [ 2) For any F , º! “ ²%³» ²%³ ~ % ‚ [

3) For any Á  < ,   º²!³²!³ “ %c » ~ 45 º²!³ “ % »º²!³ “ % » ~  4) ²²!³³€deg ²%³¬ º²!³“ ²%³»~  5) If ² ³ ~  for all  ‚  then B LcM  ²!³ ²%³ ~   º  ²!³ “ ²%³» ~ ‚ where the sum on the right is a finite one. 6) If ² ³ ~  for all  ‚  then

º ²!³ “ ²%³» ~ º ²!³ “ ²%³» for all  ‚  ¬ ²%³ ~ ²%³

7) If deg ²%³~ for all ‚ then

º²!³ “  ²%³» ~ º²!³ “  ²%³» for all  ‚  ¬ ²!³ ~ ²!³ 442 Advanced Linear Algebra

Proof. We prove only part 3). Let BB  ²!³ ~ ! and ²!³ ~ ! !! ~ ~ Then B ²!³²!³ ~4455   ! ! c ~ ~ and applying both sides of this ²% as linear functionals) to  gives    º²!³²!³ “ % » ~45 c  ~ 

 The result now follows from the fact that part 1) implies  ~ º²!³ “ % » and c c ~ º²!³ “ % ».

We can now present our first “umbral” result.

Theorem 18.3 For any ²!³

º²!³ “ %²%³» ~ ºC! ²!³ “ ²%³» Proof. By linearity, we need only establish this for ²%³ ~ % . But, if B  ²!³~  ! ! ~  then B  ºC ²!³ “ %c » ~LcM  ! % ! ! ~ ² c ³ B  ~    ! cÁ ~ ² c ³ ~b b ~º²!³“% » Let us consider a few examples of important linear functionals and their power series representations.

Example 18.2 1) We have already encountered the evaluation functional ! , satisfying º! “ ²%³» ~ ²³ The Umbral Calculus 443

2) The forward difference functional is the delta functional c! , satisfying º! c  “ ²%³» ~ ²³ c ²³

3) The Abel functional is the delta functional ! e! , satisfying º!e! “ ²%³» ~  Z ²³ 4) The invertible functional ² c !³c satisfies B º² c !³c “ ²%³» ~ ²"³ c" "  as can be seen by setting ²%³ ~ % and expanding the expression ² c !³c. 5) To determine the linear functional  satisfying  º²!³ “ ²%³» ~ ²"³ "  we observe that BBº²!³ “ %b! »  ! c  ²!³~ ! ~ ! ~ ! ~²b³[! ~ The inverse !°²! c ³ of this functional is associated with the , which play a very important role in mathematics and its applications. In fact, the numbers ! )~LcM %  c! are known as the Bernoulli numbers . Formal Power Series as Linear Operators We now turn to the connection between formal power series and linear operators on FF . Let us denote the ! th derivative operator on by  . Thus, !²%³~²³ ²%³ We can then extend this to formal series in ! B  ²!³~  ! (18.2) ! ~  444 Advanced Linear Algebra by defining the linear operator ²!³¢FF ¦ by B  ²!³²%³ ~ ´!²³ ²%³µ ~  ²%³ !! ~ ‚ the latter sum being a finite one. Note in particular that

 c ²!³% ~45  % (18.3) ~  With this definition, we see that each formal power series < plays three roles in the umbral calculus, namely, as a formal power series, as a linear functional and as a linear operator. The two notations º²!³ “ ²%³» and ²!³²%³ will make it clear whether we are thinking of  as a functional or as an operator.

It is important to note that ~ in < if and only if ~ as linear functionals, which holds if and only if ~ as linear operators. It is also worth noting that ´²!³²!³µ²%³ ~ ²!³´²!³²%³µ and so we may write ²!³²!³²%³ without ambiguity. In addition, ²!³²!³²%³ ~ ²!³²!³²%³ for all Á 

When we are thinking of a delta series  as an operator, we call it a . The following theorem describes the key relationship between linear functionals and linear operators of the form ²!³ .

Theorem 18.4 If Á  < then º²!³²!³ “ ²%³» ~ º²!³ “ ²!³²%³» for all polynomials ²%³  F . Proof. If  has the form (18.2) then by (18.3),   c  º! “( !³%»~Lc ! 45  % M ~ ~º²!³“% » ² 18.4) ~  By linearity, this holds for % replaced by any polynomial ²%³ . Hence, applying this to the product  gives º²!³²!³ “ ²%³» ~ º! “ ²!³²!³²%³» ~ º! “ ²!³´²!³²%³µ» ~ º²!³ “ ²!³²%³» Equation (18.4) shows that applying the linear functional !³ ( is equivalent to applying the operator ²!³ and then following by evaluation at % ~  . The Umbral Calculus 445

Here are the operator versions of the functionals in Example 18.2.

Example 18.3 1) The operator ! satisfies B %~!  !%~  45 %  c ~²%b³  ! ~ ~ and so ²%³~²%b³!

for all F . Thus ! is a translation operator . 2) The forward difference operator is the delta operator c! , where ²! c ) ²%³ ~ ²% b ³ c ²³

3) The Abel operator is the delta operator ! e! , where !²%³~²%b³e! Z 4) The invertible operator ² c !³c satisfies B ² c !³c ²%³ ~ ²% b "³ c"du  5) The operator ²! c  ) °! is easily seen to satisfy c! %b ²%³ ~ ²"³ " ! % We have seen that all linear functionals on F< have the form ²!³ , for   . However, not all linear operators on F have this form. To see this, observe that deg´²!³²%³µ  deg ²%³ but the linear operator F¢ ¦ F defined by  ²²%³³ ~ %²%³ does not have this property.

Let us characterize the linear operators of the form ²!³ . First, we need a lemma.

Lemma 18.5 If ; is a linear operator on 7 and ; ²!³ ~ ²!³; for some delta series ²!³ then deg ²; ²%³³  deg ²²%³³ . Proof. For any ‚ deg²; % ³ c  ~ deg ²²!³; % ³ ~ deg ²; ²!³% ³ and so 446 Advanced Linear Algebra

deg²; % ³ ~ deg ²; ²!³% ³ b  Since deg²²!³% ³ ~  c  we have the basis for an induction. When  ~  we get deg²; ³ ~  . Assume that the result is true for  c  . Then deg²; % ³ ~ deg ²; ²!³% ³ b    c  b  ~ 

Theorem 18.6 The following are equivalent for a linear operator ;¢FF ¦ . 1) ; has the form ²!³ , that is, there exists an  < for which ; ~ ²!³ , as linear operators. 2) ;;!~!; commutes with the derivative operator, that is, . 3) ; commutes with any delta operator ²!³ , that is, ; ²!³ ~ ²!³; . 4) ;;~; commutes with any translation operator, that is, ! ! . Proof. It is clear that 1) implies 2). For the converse, let B º! “ ; % » ²!³ ~ ! ! ~  Then º²!³ “ % » ~ º! “ ; % » Now, since ;! commutes with , we have º!“;%»~º!“!;%»  ~º! “;!% » c ~²³º! “;% » c ~ ²³ º! “ ²!³% » ~º! “²!³%» and since this holds for all  and we get ;~²!³ . We leave the rest of the proof as an exercise. Sheffer Sequences We can now define the principal object of study in the umbral calculus. When referring to a sequence ²%³ in F , we shall always assume that deg ²%³~ for all ‚ .

Theorem 18.7 Let  be a delta series, let be an invertible series and consider the geometric sequence Á Á  Á  Á Ã in

 º²!³ ²!³ “ Á ²%³» ~ [ (18.5) for all Á  ‚  . Proof. The uniqueness follows from Theorem 18.2. For the existence, if we set

 Á²%³~  % ~ and

B  ²!³ ²!³ ~ Á ! ~ where £Á then (18.5) is B  [Á ~Lc  Á !  Á %  M ~ ~ B  ~º!%» Á Ác  ~ ~  ~[ Á Á ~ Taking ~ we get  ~Á Á For ~c we have

 ~ cÁc  Ác ² c ³[ b  cÁ  Á [ and using the fact that Á ~ ° Á we can solve this for  Ác . By successively taking ~ÁcÁcÁÃ we can solve the resulting equations for the coefficients Á of the sequence  ²%³ .

Definition The sequence ²%³ in (18.5) is called the Sheffer sequence for the ordered pair ²²!³Á ²!³³ . We shorten this by saying that  ²%³ is Sheffer for ²²!³Á ²!³³.

Two special types of Sheffer sequences deserve explicit mention.

Definition The Sheffer sequence for a pair of the form ²Á ²!³³ is called the associated sequence for ²!³ . The Sheffer sequence for a pair of the form ²²!³Á !³ is called the for ²!³ . 448 Advanced Linear Algebra

Note that the sequence  ²%³ is Sheffer for ²²!³Á ²!³³ if and only if  º²!³ ²!³ “ Á ²%³» ~ [ which is equivalent to

 º ²!³ “ ²!³ Á ²%³» ~ [ which, in turn, is equivalent to saying that the sequence  ²%³ ~ ²!³ ²%³ is the associated sequence for ²!³ .

Theorem 18.8 The sequence  ²%³ is Sheffer for ²²!³Á ²!³³ if and only if the sequence  ²%³ ~ ²!³ ²%³ is the associated sequence for ²!³ .

Before considering examples, we wish to describe several characterizations of Sheffer sequences. First, we require a key result.

Theorem 18.9 ( The expansion theorems ) Let  ²%³ be Sheffer for ²²!³Á ²!³³ . 1) For any < , B º²!³ “ ²%³» ²!³ ~  ²!³ ²!³ ! ~  2) For any F , º²!³ ²!³ “ ²%³ ²%³ ~ ²%³ !  ‚  Proof. Part 1) follows from Theorem 18.2, since BBº²!³ “ ²%³» º²!³ “ ²%³» LcM²!³ ²!³ ²%³ ~ ! !!Á ~ ~ ~ º²!³ “  ²%³» Part 2) follows in a similar way from Theorem 18.2.

We can now begin our characterization of Sheffer sequences, starting with the . The idea of a generating function is quite simple. If ²%³ is a sequence of polynomials, we may define a formal power series of the form B ²%³ ²!Á %³ ~  ! ! ~  This is referred to as the (exponential ) generating function for the sequence ²%³ . (The term exponential refers to the presence of  ! in this series. When this is not present, we have an ordinary generating function.³ Since the series is a formal one, knowing ²!Á %³ is equivalent (in theory, if not always in practice) The Umbral Calculus 449

to knowing the polynomials ²%³ . Moreover, a knowledge of the generating function of a sequence of polynomials can often lead to a deeper understanding of the sequence itself, that might not be otherwise easily accessible. For this reason, generating functions are studied quite extensively.

For the proofs of the following characterizations, we refer the reader to Roman ´µ1984 .

Theorem 18.10 () Generating function 1) The sequence ²%³ is the associated sequence for a delta series ²!³ if and only if B ²&³ ~&²!³  !  ! ~ 

where ²!³ is the compositional inverse of ²!³ . 2) The sequence  ²%³ is Sheffer for ²²!³Á ²!³³ if and only if B ²&³ ~&²!³  !  ! ²²!³³ ~ 

The sum on the right is called the generating function of ²%³ . Proof. Part 1) is a special case of part 2). For part 2), the expression above is equivalent to B ²&³ ~&!  ²!³  ! ²!³~  which is equivalent to B ²&³ &! ~  ²!³  ²!³ ! ~ 

But if  ²%³ is Sheffer for ²²!³Á ²!³³ then this is just the expansion theorem for &!. Conversely, this expression implies that B ²&³ ²&³ ~ º&! “ ²%³» ~  º²!³  ²!³ “ ²%³» !  ~ 

 and so º²!³ ²!³ “ Á ²%³» ~ [ , which says that ²%³ is Sheffer for ²Á ³.

We can now give a representation for Sheffer sequences.

Theorem 18.11 () Conjugate representation 450 Advanced Linear Algebra

1) A sequence ²%³ is the associated sequence for ²!³ if and only if    ²%³~  º²!³“%»% ~ [

2) A sequence  ²%³ is Sheffer for ²²!³Á ²!³³ if and only if   c    ²%³~ º²²!³³²!³“%»% ~ [

Proof. We need only prove part 2). We know that ²%³ is Sheffer for ²²!³Á ²!³³ if and only if B ²&³ ~&²!³  !  ! ²²!³³ ~  But this is equivalent to B ²&³ &²!³ “ %  ~LcM  !  %  ~ ²&³ NO!  ²²!³³ ~  Expanding the exponential on the left gives

BBº²²!³³c ²!³  “ %  » ²&³ & ~LcM ! % ~ ²&³ !  ~[ ~  Replacing &% by gives the result.

Sheffer sequences can also be characterized by means of linear operators.

Theorem 18.12 ² Operator characterization) 1) A sequence ²%³ is the associated sequence for ²!³ if and only if a) ²³~Á b) ²!³c ²%³ ~  ²%³ for  ‚  2) A sequence  ²%³ is Sheffer for ²²!³Á ²!³³ , for some invertible series ²!³ if and only if

²!³ c ²%³ ~  ²%³ for all ‚ . Proof. For part 1), if ²%³ is associated with ²!³ then !  ²³~º“²%³»~º²!³“²%³»~[ Á and The Umbral Calculus 451

b º²!³ “ ²!³ ²%³» ~ º²!³ “  ²%³» ~[Áb ~ ² c ³[cÁ  ~ º²!³ “ c ²%³» and since this holds for all ‚ we get 1b). Conversely, if 1a) and 1b) hold then

 º²!³ “  ²%³» ~ º! “ ²!³  ²%³» ~²³c ²³ ~²³cÁ ~[Á and so ²%³ is the associated sequence for ²!³ .

As for part 2), if  ²%³ is Sheffer for ²²!³Á ²!³³ then b º²!³²!³ “ ²!³  ²%³» ~ º²!³²!³ “ ²%³» ~[Áb ~ ² c ³[cÁ  ~ º²!³²!³ “ c ²%³» and so ²!³ c ²%³ ~  ²%³ , as desired. Conversely, suppose that

²!³ c ²%³ ~  ²%³ and let ²%³ be the associated sequence for ²!³ . Let ; be the invertible linear operator on = defined by

;  ²%³~ ²%³ Then

; ²!³ cc ²%³ ~ ; ²%³ ~  ²%³ ~ ²!³ ²%³ ~ ²!³;  ²%³ and so Theorem 18.5 implies that ; ~ ²!³ for some invertible series ²!³ . Then

 º²!³²!³ “  ²%³» ~ º²!³ “ ²!³ ²%³»  ~º! “²!³ ²%³» ~²³c ²³ ~²³cÁ ~[Á and so  ²%³ is Sheffer for ²²!³Á ²!³³ .

We next give a formula for the action of a linear operator ²!³ on a Sheffer sequence. 452 Advanced Linear Algebra

Theorem 18.13 Let  ²%³ be a Sheffer sequence for ²²!³Á ²!³³ and let  ²%³ be associated with ²!³ . Then for any ²!³ we have   ²!³ c ²%³ ~45 º²!³ “ ²%³» ²%³ ~  Proof. By the expansion theorem B º²!³ “ ²%³» ²!³ ~  ²!³ ²!³ ! ~  we have B º²!³ “ ²%³» ²!³ ²%³ ~  ²!³ ²!³ ²%³ ! ~  B º²!³ “ ²%³» ~²³²%³  ! c ~  which is the desired formula.

Theorem 18.14 1) ()The binomial identity A sequence ²%³ is the associated sequence for a delta series ²!³ if and only if it is of , that is, if and only if it satisfies the identity   c ²% b &³ ~45  ²&³ ²%³ ~  for all &d . 2) ()The Sheffer identity A sequence  ²%³ is Sheffer for ²²!³Á ²!³³ , for some invertible ²!³ if and only if   c ²% b &³ ~45  ²&³ ²%³ ~ 

for all &d , where  ²%³ is the associated sequence for ²!³ . Proof. To prove part 1), if ²%³ is an associated sequence then taking ²!³ ~ &! in Theorem 18.13 gives the binomial identity. Conversely, suppose that the sequence ²%³ is of binomial type. We will use the operator characterization to show that ²%³ is an associated sequence. Taking %~&~ we have for ~

 ²³ ~  ²³ ²³ and so ²³~ . Also, The Umbral Calculus 453

²³~²³²³b²³²³~²³ and so  ²³ ~  . Assuming that  ²³ ~  for  ~ Á Ã Á  c  we have

 ²³ ~  ²³ ²³ b  ²³ ²³ ~  ²³ and so Á ²³ ~  . Thus,  ²³ ~  .

Next, define a linear functional ²!³ by

º²!³ “ Á ²%³» ~ 

Since º²!³ “ » ~ º²!³ “  ²%³» ~  and º²!³ “  ²%³» ~  £  we deduce that ²!³ is a delta series. Now, the binomial identity gives  &!  º²!³ “  c ²%³» ~45  ²&³º²!³ “  ²%³» ~    ~45 cÁ ²&³ ~  ~c ²&³ and so

&! &! º “ ²!³c ²%³» ~ º “  ²%³» and since this holds for all & , we get ²!³c ²%³ ~  ²%³ . Thus,  ²%³ is the associated sequence for ²!³ .

&! For part 2), if ²%³ is a Sheffer sequence then taking ²!³~ in Theorem 18.13 gives the Sheffer identity. Conversely, suppose that the Sheffer identity holds, where ²%³ is the associated sequence for ²!³ . It suffices to show that ²!³  ²%³ ~  ²%³ for some invertible ²!³ . Define a linear operator ; by

;  ²%³~ ²%³ Then

&! &!  ;  ²%³ ~   ²%³ ~  ²% b &³ and by the Sheffer identity

 &!  ;   ²%³ ~45   ²&³; c ²%³ ~ 45   ²&³ c ²%³ ~ ~ and the two are equal by part 1). Hence, ; commutes with &! and is therefore of the form ²!³ , as desired. 454 Advanced Linear Algebra

Examples of Sheffer Sequences We can now give some examples of Sheffer sequences. While it is often a relatively straightforward matter to verify that a given sequence is Sheffer for a given pair ²²!³Á ²!³³ , it is quite another matter to find the Sheffer sequence for a given pair. The umbral calculus provides two formulas for this purpose, one of which is direct, but requires the usually very difficult computation of the series c ²²!³°!³. The other is a recurrence relation that expresses each  ²%³ in terms of previous terms in the Sheffer sequence. Unfortunately, space does not permit us to discuss these formulas in detail. However, we will discuss the recurrence formula for associated sequences later in this chapter.

 Example 18.4 The sequence ²%³~% is the associated sequence for the delta series ²!³~ ! . The generating function for this sequence is B & ~&! !  ~ [ and the binomial identity is the well known binomial formula   ²% b &³c ~45 % & ~  Example 18.5 The lower factorial polynomials

²%³ ~%²%c³Ä²%cb³ form the associated sequence for the forward difference functional ²!³~ ! c discussed in Example 18.2. To see this, we simply compute, using Theorem 18.12. Since ²³Á is defined to be  , we have ²³ ~  . Also, ! ² c ³²%³ ~ ²% b ³ c ²%³ ~ ´²% b ³%²% c ³Ä²% c  b ³µ c ´%²% c ³Ä²% c  b ³µ ~%²%c³Ä²%cb³´²%b³c²%cb³µ ~%²%c³Ä²%cb³ ~ ²%³c The generating function for the lower factorial polynomials is B ²&³ ~&²b!³log   !  ~ [ The Umbral Calculus 455 which can be rewritten in the more familiar form B & ² b !³& ~45 ! ~  Of course, this is a formal identity, so there is no need to make any restrictions on ! . The binomial identity in this case is   ²% b &³c ~45 ²%³ ²&³ ~  which can also be written in the form %b& % & 45~  4545 c~ This is known as the Vandermonde convolution formula .

Example 18.6 The Abel polynomials

c (²%³~%²%c³ form the associated sequence for the Abel functional ²!³~ !e! also discussed in Example 18.2. We leave verification of this to the reader. The generating function for the Abel polynomials is B &²& c ³c ~&²!³ !  ~ [ Taking the formal derivative of this with respect to & gives B ²& c ³²& c ³c ²!³&²!³ ~ !  ~ [ which, for &~ , gives a formula for the compositional inverse of the series ²!³~ !!, B ²c³c  ²!³~ ! ~ ²c³[

Example 18.7 The famous Hermite polynomials /²%³ form the Appell sequence for the invertible functional

 ²!³ ~ !° 456 Advanced Linear Algebra

We ask the reader to show that ²%³ is the Appell sequence for ²!³ if and only c  if ²%³~²!³% . Using this fact, we get

c! ° ²³  c /²%³~ % ~ ²c³ % ‚ [ The generating function for the Hermite polynomials is

B  /²&³ ~&!c! °  !  ~ [ and the Sheffer identity is

  c / ²% b &³ ~45 / ²%³& ~  We should remark that the Hermite polynomials, as defined in the literature, often differ from our definition by a multiplicative constant.

²³ Example 18.8 The well known and important 3²%³ of order  form the Sheffer sequence for the pair ! ²!³ ~ ² c !³cc Á ²!³ ~ !c It is possible to show ²³ although we will not do so here that  ²³ [ b   3²%³~  45 ²c%³ ~ [  c  The generating function of the Laguerre polynomials is

 B 3²%³²³ &!°²!c³   b ~ ! ² c !³~ [ As with the Hermite polynomials, some definitions of the Laguerre polynomials differ by a multiplicative constant.

We presume that the few examples we have given here indicate that the umbral calculus applies to a significant range of important polynomial sequences. In Roman ´µ 1984 , we discuss approximately 30 different sequences of polynomials that are (or are closely related to) Sheffer sequences. Umbral Operators and Umbral Shifts We have now established the basic framework of the umbral calculus. As we have seen, the umbral algebra plays three roles: as the algebra of formal power series in a single variable, as the algebra of all linear functionals on F and as the The Umbral Calculus 457 algebra of all linear operators on F that commute with the derivative operator. Moreover, since < is an algebra, we can consider geometric sequences Á Á  Á  Á Ã in < , where ²³ ~  and ²³ ~  . We have seen by example that the orthogonality conditions

 º²!³ ²!³ “ Á ²%³» ~ [ define important families of polynomial sequences.

While the machinery that we have developed so far does unify a number of topics from the classical study of polynomial sequences (for example, special cases of the expansion theorem include Taylor's expansion, the Euler- MacLaurin formula and Boole's summation formula), it does not provide much new insight into their study. Our plan now is to take a brief look at some of the deeper results in the umbral calculus, which center around the interplay between operators on F and their adjoints, which are operators on the umbral algebra

We begin by defining two important operators on F associated with each Sheffer sequence.

Definition Let  ²%³ be Sheffer for ²²!³Á ²!³³ . The linear operator FÁ ¢¦ F defined by  Á²% ³ ~  ²%³ is called the Sheffer operator for the pair ²²!³Á ²!³³ , or for the sequence  ²%³. If  ²%³ is the associated sequence for ²!³ , the Sheffer operator  ²% ³ ~  ²%³ is called the umbral operator for ²!³ , or for  ²%³ .

Definition Let  ²%³ be Sheffer for ²²!³Á ²!³³ . The linear operator FÁ ¢¦ F defined by

Á´  ²%³µ ~ b ²%³ is called the Sheffer shift for the pair ²²!³Á ²!³³ , or for the sequence  ²%³ . If ²%³ is the associated sequence for ²!³ , the Sheffer operator

´ ²%³µ ~  b ²%³ is called the umbral shift for ²!³ , or for  ²%³ . 458 Advanced Linear Algebra

It is clear that each Sheffer sequence uniquely determines a Sheffer operator and vice versa. Hence, knowing the Sheffer operator of a sequence is equivalent to knowing the sequence. Continuous Operators on the Umbral Algebra It is clearly desirable that a linear operator ; on the umbral algebra < pass under infinite sums, that is, that

BB ;45 ²!³~ ;´²!³µ   (18.6) ~ ~ whenever the sum on the left is defined, which is precisely when ² ²!³³ ¦ B as ¦B . Not all operators on < have this property, which leads to the following definition.

Definition A linear operator ; on the umbral algebra < is continuous if it satisfies ²18.6).

The term continuous can be justified by defining a topology on < . However, since no additional topological concepts will be needed, we will not do so here. Note that in order for (18.6) to make sense, we must have ²;´ ²!³µ³ ¦ B . It turns out that this condition is also sufficient.

Theorem 18.15 A linear operator ; on < is continuous if and only if

² ³ ¦ B ¬ ²;² ³³ ¦ B (18.7)

Proof. The necessity is clear. Suppose that (18.7) holds and that ² ³ ¦ B . For any ‚ , we have B  LcMLcMLcM; ²!³% ~ ; ²!³%  b ; ²!³%  (18.8) ~ ~ € Since

²!³¦B89  € (18.7) implies that we may choose  large enough so that

45 ;   ²!³ €  € and

²;´ ²!³µ³ €  for  €  The Umbral Calculus 459

Hence, (18.8) gives

B  LcMLcM;²!³%~;²!³%  ~ ~   ~;´²!³µ%LcM  ~ B  ~;´²!³µ%LcM  ~ which implies the desired result. Operator Adjoints If F¢¦ F is a linear operator on F then its (operator) adjoint d is an operator on F

Let us recall the basic properties of the adjoint from Chapter 3.

Theorem 18.16 For Á²³ BF, 1) ²b³~ddd  b  2) ²  ³dd ~ for any  d 3) ²³~ddd   4) ²³~²³cdd c for any invertible BF ²³

Thus, the map BF¢²³¦²³ B< that sends F ¢ ¦ F to its adjoint d ¢ < ¦ < is a linear transformation from BF²³ to B< ²³ . Moreover, since d ~ implies that º²!³“

Theorem 18.17 A linear operator ;B< ² ³ is the adjoint of a linear operator BBF²³ if and only if ; is continuous. d Proof. First, suppose that ; ~BF for some  ² ³ and let ² ²!³³ ¦ B . If ‚ then for all  we have

d  º  ²!³ “ % » ~ º ²!³ “ % »

 and so it is only necessary to take  large enough so that ² ²!³³ €deg  ²% ³ for all  , whence 460 Advanced Linear Algebra

d º²!³“%»~ 

ddd for all      and so ²  ²!³³ €  . Thus, ²  ²!³³ ¦ B and is continuous.

For the converse, assume that ;; is continuous. If did have the form  d then º;!“%»~º!“%»~º!“%» d  and since º! “ % » %~ % ‚ [ we are prompted to define  by º; ! “ % » %~ % ‚ [

This makes sense since ²;! ³ ¦ B as  ¦ B and so the sum on the right is a finite sum. Then º; ! “ % » ºd ! “%  »~º!  “ %  »~ º!  “%  »~º;!  “%  » ‚ [ which implies that ;!d ~ ! for all  ‚ . Finally, since ; and d are both continuous, we have ;~ d . Umbral Operators and Automorphisms of the Umbral Algebra Figure 18.1 shows the map  , which is an isomorphism from the vector space BF²³ onto the space of all continuous linear operators on < . We are interested in determining the images under this isomorphism of the set of umbral operators and the set of umbral shifts, as pictured in Figure 18.1. The Umbral Calculus 461

Figure 18.1

Let us begin with umbral operators. Suppose that  is the umbral operator for the associated sequence ²%³ , with delta series ²!³< . Then d  º ²!³ “ % » ~ º²!³ “Á % » ~ º²!³ “  ²%³» ~ [  ~ º! “ % »

d d for all  and . Hence,  ²!³~! and the continuity of implies that

d   !~²!³ More generally, for any ²!³  < ,

d  ²!³ ~ ²²!³³ (18.9)

d In words,  is composition by ²!³ .

d From (18.9), we deduce that  is a vector space isomorphism and that

ddd ´²!³²!³µ ~ ²²!³³²²!³³ ~ ²!³ ²!³

d Hence, < is an automorphism of the umbral algebra . It is a pleasant fact that this characterizes umbral operators. The first step in the proof of this is the following, whose proof is left as an exercise.

Theorem 18.18 If ;; is an automorphism of the umbral algebra then preserves order, that is, ²; ²!³³ ~ ²²!³³ . In particular, ; is continuous.

Theorem 18.19 A linear operator F on is an umbral operator if and only if its adjoint is an automorphism of the umbral algebra < . Moreover, if  is an umbral operator then 462 Advanced Linear Algebra

d  ²!³ ~ ²²!³³

d for all ²!³ < . In particular,  ²!³ ~ ! . Proof. We have already shown that the adjoint of  is an automorphism satisfying (18.9). For the converse, suppose that 

 d    [Á ~º! “% »~º ²!³ “% »~º²!³ “  % » which shows that % is the associated sequence for ²!³ and hence that is an umbral operator.

Theorem 18.19 allows us to fill in one of the boxes on the right side of Figure 18.1. Let us see how we might use Theorem 18.19 to advantage in the study of associated sequences.

We have seen that the isomorphism ª d maps the set K of umbral operators on F< onto the set aut²³ of automorphisms of

kk~!  ~~  

c we have ~ .

Thus, the set K of umbral operators is a group under composition with

k~  k and

c  ~  Let us see how this plays out with respect to associated sequences. If the The Umbral Calculus 463 associated sequence for ²!³ is   ²%³~Á  % ~

 then ¢ % ¦  ²%³ and so k ~ k is the umbral operator for the associated sequence

  ² k  ³% ~     ²%³ ~  Á   % ~  Á   ²%³ ~ ~ This sequence, denoted by

 Á ² ²%³³ ~   ²%³ (18.10) ~ is called the umbral composition of  ²%³ with  ²%³ . The umbral operator c  ~ ²%³~ % is the umbral operator for the associated sequence Á ' where

c   %~ ²%³ and so

  % ~ Á   ²%³ ~ Let us summarize.

Theorem 18.20 1) The set KF of umbral operators on is a group under composition, with

c k~  kand  ~  2) The set of associated sequences forms a group under umbral composition

 Á ² ²%³³ ~   ²%³ ~

In particular, the umbral composition  ² ²%³³ is the associated sequence for the composition k , that is  k¢% ¦  ² ²%³³

 The identity is the sequence % and the inverse of  ²%³ is the associated sequence for the compositional inverse ²!³ . 464 Advanced Linear Algebra

3) Let K ²!³ and < . Then as operators

c ²!³ ~ ²!³

4) Let K ²!³ and < . Then

²²!³³ ~ ²!³ Proof. We prove 3) as follows. For any ²!³ < and ²%³  7 º²!³ “dd ²!³²%³» ~ º´ ²!³µ²!³ “ ²%³» ~ ºdcd ´²!³² ³ ²!³µ “ ²%³» ~ º²!³²c ³ d ²!³ “ ²%³» ~ º²c ³ d ²!³ “ ²!³ ²%³» ~ º²!³ “c ²!³ ²%³» which gives the desired result. Part 4) follows immediately from part 3) since  is composition by  . Sheffer Operators

If Á ²%³ is Sheffer for ²Á ³ then the linear operator  defined by  Á²% ³ ~  ²%³ is called a Sheffer operator . Sheffer operators are closely related to umbral operators, since if ²%³ is associated with ²!³ then c c   ²%³ ~  ²!³ ²%³ ~  ²!³ % and so

c Á~  ²!³  It follows that the Sheffer operators form a group with composition

c c Ák Á ~  ²!³    ²!³   c c ~  ²!³ ²²!³³ c ~ ´²!³²²!³³µ k ~ h²k³Ák and inverse

c Á ~ ²³Ác From this, we deduce that the umbral composition of Sheffer sequences is a Sheffer sequence. In particular, if  ²%³ is Sheffer for ²Á ³ and  !²%³~Á' ! % is Sheffer for ²Á³ then The Umbral Calculus 465

  Ák²%³~! Á Á  Á % ~  ~! Á ²%³ ~ ~! ² ²%³³ is Sheffer for ² h ² k ³Á  k ³ . Umbral Shifts and Derivations of the Umbral Algebra We have seen that an operator on F is an umbral operator if and only if its adjoint is an automorphism of < BF . Now suppose that  ²³ is the umbral shift for the associated sequence ²%³ , associated with the delta series ²!³ <. Then

d  º  ²!³ “  ²%³» ~ º²!³ “  ²%³»  ~ º²!³ “ b ²%³» ~²b1) [bÁ ~ ² c ³[Ác c ~ º²!³ “  ²%³» and so

d c  ²!³ ~ ²!³ (18.11) This implies that

dd d ´²!³ ²!³ µ ~ ´²!³ µ²!³ b ²!³  ´²!³ µ (18.12) and further, by continuity, that dd d ´²!³²!³µ ~ ´ ²!³µ²!³ b ²!³´  ²!³µ (18.13) Let us pause for a definition.

Definition Let 77 be an algebra. A linear operator C on is a derivation if C²³ ~ ²C³ b Cb for all Á   7 .

Thus, we have shown that the adjoint of an umbral shift is a derivation of the d umbral algebra < . Moreover, the expansion theorem and (18.11) show that  is surjective. This characterizes umbral shifts. First we need a preliminary result on surjective derivations. 466 Advanced Linear Algebra

Theorem 18.21 Let C be a surjective derivation on the umbral algebra < . Then Cc ~  for any constant  < and ²C f ²!³³ ~ ²²!³³ c  , if ²²!³³ ‚  . In particular, C is continuous. Proof. We begin by noting that C ~ C ~ C b C ~ C and so C ~ C ~  for all constants  < . Since C is surjective, there must exist an ²!³  < for which C²!³ ~ 

Writing ²!³ ~  b ! ²!³ , we have

 ~ C´ b ! ²!³µ ~ ²C!³  ²!³ b !C  ²!³

 which implies that ²C!³ ~  . Finally, if ²²!³³ ~  ‚  then ²!³ ~ !  ²!³ , where ² ²!³³ ~  and so c ´C²!³µ ~ ´C!  ²!³µ ~ ´! C²!³ b !  ²!³C!µ ~  c  Theorem 18.22 A linear operator F on is an umbral shift if and only if its adjoint is a surjective derivation of the umbral algebra < . Moreover, if  is an d umbral shift then  ~C is derivation with respect to ²!³ , that is,

d c  ²!³ ~ ²!³

d for all  ‚  . In particular,  ²!³ ~  . d Proof. We have already seen that  is derivation with respect to ²!³ . For the converse, suppose that d is a surjective derivation. Theorem 18.21 implies that d there is a delta functional ²!³ such that ²!³~  . If  ²%³ is the associated sequence for ²!³ then

d º²!³ “  ²%³» ~ º ²!³ “  ²%³» c d ~ º²!³ ²!³ “  ²%³» c ~ º²!³ “  ²%³» ~²b1) [bÁ  ~ º²!³ “ b ²%³»

Hence, ²%³~b ²%³ , that is, ~  is the umbral shift for ²%³  .

We have seen that the fact that the set of all automorphisms on < is a group under composition shows that the set of all associated sequences is a group under umbral composition. The set of all surjective derivations on < does not form a group. However, we do have the chain rule for derivations[ The Umbral Calculus 467

Theorem 18.23 ( The chain rule ) Let CC and be surjective derivations on < . Then

C ~ ²C ²!³³C  Proof. This follows from

c  C ²!³ ~ ²!³ C ²!³ ~ ²C ²!³³C ²!³ and so continuity implies the result.

The chain rule leads to the following umbral result.

Theorem 18.24 If  and are umbral shifts then

~ k C ²!³ Proof. Taking adjoints in the chain rule gives d ~ k²C  ²!³³ ~  kC ²!³

c We leave it as an exercise to show that C ²!³ ~ ´C ²!³µ . Now, by taking b ²!³ ~ ! in Theorem 18.24 and observing that !! % ~ % and so is multiplication by % , we get

c Z c ~%C!~%´C²!³µ ! ~%´²!³µ

Applying this to the associated sequence ²%³ for ²!³ gives the following important recurrence relation for ²%³ .

Theorem 18.25 ( The recurrence formula ) Let ²%³ be the associated sequence for ²!³ . Then Zc 1) b ²%³ ~ %´ ²!³µ   ²%³ Z 2) b ²%³ ~ %  ´²!³µ % Proof. The first part is proved. As to the second, using Theorem 18.20 we have

Zc b ²%³ ~ %´ ²!³µ   ²%³ Zc ~ %´ ²!³µ % Zc ~ % ´ ²²!³³µ % Z ~ % ´²!³µ % Example 18.9 The recurrence relation can be used to find the associated sequence for the forward difference functional ²!³ ~ !Z! c  . Since  ²!³ ~  , the recurrence relation is

c! b ²%³ ~ %   ²%³ ~ %  ²% c 1) 468 Advanced Linear Algebra

Using the fact that ²%³~ , we have

 ²%³ ~ %Á  ²%³ ~ %²% c ³Á 3 ²%³ ~ %²% c ³²% c ³ and so on, leading easily to the lower factorial polynomials

 ²%³ ~ %²% c ³Ä²% c  b ³ ~ ²%³ Example 18.10 Consider the delta functional ²!³ ~log ² b !³

Since ²!³~ ! c is the forward difference functional, Theorem 18.20 implies that the associated sequence ²%³ for ²!³ is the inverse, under umbral composition, of the lower factorial polynomials. Thus, if we write

  ²%³ ~ :²Á ³% ~ then

  % ~ :²Á ³²%³ ~ The coefficients :²Á³ in this equation are known as the Stirling numbers of the second kind and have great combinatorial significance. In fact, :²Á³ is the number of partitions of a set of size  into  blocks. The polynomials  ²%³ are called the exponential polynomials .

The recurrence relation for the exponential polynomials is Z b²%³ ~ %² b !³  ²%³ ~ %²  ²%³ b ²%³³ Equating coefficients of % on both sides of this gives the well known formula for the Stirling numbers :² b Á ³ ~ :²Á  c ³ b :²Á ³ Many other properties of the Stirling numbers can be derived by umbral means.

Now we have the analog of part 3) of Theorem 18.20.

Theorem 18.26 Let  be an umbral shift. Then d  ²!³ ~ ²!³ c ²!³ The Umbral Calculus 469

Proof. We have

d d º ²!³ “  ²!³ ²%³» ~ º´ ²!³µ ²!³ “  ²%³» d d ~ º  ´²!³ ²!³µ c ²!³  ²!³ “  ²%³» d c ~ º  ´²!³ ²!³µ “  ²%³» c º²!³ ²!³ “  ²%³»  c ~º²!³ “ ²!³   ²%³» c º ²!³ “ ²!³  ²%³» d ~ º ²!³ “ ²!³   ²%³» c º  ²!³ “ ²!³  ²%³»  ~ º ²!³ “ ²!³   ²%³» c º ²!³ “  ²!³  ²%³» from which the result follows.

d If ²!³~ ! then  is multiplication by % and  is the derivative with respect to ! and so the previous result becomes Z ²!³ ~ ²!³% c %²!³ as operators on F . The right side of this is called the Pincherle derivative of the operator ²!³ . (See [Pin].) Sheffer Shifts Recall that the linear map

Á´  ²%³µ ~ b ²%³ where  ²%³ is Sheffer for ²²!³Á ²!³³ is called a Sheffer shift. If  ²%³ is associated with ²!³ then ²!³  ²%³ ~  ²%³ and so c c ²!³²%³~b Á ´²!³²%³µ  and so

c Á~  ²!³  ²!³ From Theorem 18.26, the recurrence formula and the chain rule, we have

c Á~  ²!³  ²!³ c d ~  ²!³´²!³  c ²!³µ c ~  c  ²!³C ²!³ c ~  c  ²!³C ²!³ c ~ ! c  ²!³C !C ²!³ ~ %´ZccZcZ ²!³µ c  ²!³´ ²!³µ  ²!³ ²!³Z  ~%c>? ²!³ Z ²!³ We have proved the following. 470 Advanced Linear Algebra

Theorem 18.27 Let Á be a Sheffer shift. Then ²!³Z  1) Á ~%c<=²!³ Z ²!³ ²!³Z  2) b ²%³ ~<= % c²!³ Z ²!³  ²%³ The Transfer Formulas We conclude with a pair of formulas for the computation of associated sequences.

Theorem 18.28 (The transfer formulas) Let ²%³ be the associated sequence for ²!³. Then cc Z²!³ 1) ²%³~²!³ 45! % c ²!³ c 2) ²%³~% 45! % Proof. First we show that 1) and 2) are equivalent. Write ²!³ ~ ²!³°! . Then Z ²!³²!³ cc %  ~ ´!²!³µ Z ²!³ cc %  ~ ²!³c %  b ! Z ²!³²!³ cc %  ~ ²!³c %  b  Z ²!³²!³ cc % c ~²!³c %  b´²!³ c µ% Z c ~²!³c %  c´²!³ c% c %²!³c µ% c ~%²!³c % c To prove 1), we verify the operation conditions for an associated sequence for Zcc the sequence  ²%³ ~  ²!³²!³ % . First, when  ‚  the fourth equality above gives

Zcc º! “  ²%³» ~ º! “  ²!³²!³ % » ~º!ccZc “²!³ % c´²!³ µ% » ~ º²!³c “ %  » c º´²!³ c µ Z “ % c » ~ º²!³c “ %  » c º²!³ c “ %  » ~

 If ~ then º!“²%³»~Á and so, in general, we have º!“²%³»~ as required.

For the second required condition,

Zcc ²!³ ²%³ ~ ²!³ ²!³²!³ % ~ !²!³Zcc ²!³²!³ % ~ Zccc ²!³²!³ % ~c ²%³

Thus, ²%³ is the associated sequence for ²!³ . The Umbral Calculus 471

A Final Remark Unfortunately, space does not permit a detailed discussion of examples of Sheffer sequences nor the application of the umbral calculus to various classical problems. In [Rom1], one can find a discussion of the following polynomial sequences:

The lower factorial polynomials and Stirling numbers The exponential polynomials and Dobinski's formula The Gould polynomials The central factorial polynomials The Abel polynomials The Mittag–Leffler polynomials The Bessel polynomials The Bell polynomials The Hermite polynomials The Bernoulli polynomials and the Euler–Maclaurin expansion The Euler polynomials The Laguerre polynomials The Bernoulli polynomials of the second kind The Poisson–Charlier polynomials The actuarial polynomials The Meixner polynomials of the first and second kinds The Pidduck polynomials The Narumi polynomials The Boole polynomials The Peters polynomials The squared Hermite polynomials The The Mahler polynomials The Mott polynomials and more. In [Rom1], we also find a discussion of how the umbral calculus can be used to approach the following types of problems:

The connection constants problem Duplication formulas The Lagrange inversion formula Cross sequences Steffensen sequences Operational formulas Inverse relations Sheffer sequence solutions to recurrence relations Binomial convolution 472 Advanced Linear Algebra

Finally, it is possible to generalize the classical umbral calculus that we have described in this chapter to provide a context for studying polynomial sequences such as those of the name Gegenbauer, Chebyshev and Jacobi. Also, there is a q-version of the umbral calculus that involves the q-binomial coefficients (also known as the Gaussian coefficients ) ²c³Ä²c³ 45~  ² c ³Ä² c c ³² c ³Ä² c  ³ in place of the binomial coefficients. There is also a logarithmic version of the umbral calculus, which studies the harmonic logarithms and sequences of logarithmic type. For more on these topics, please see [LR], [Rom2] and [Rom3]. Exercises 1. Prove that ²³ ~ ²³ b ²³ , for any Á   < . 2. Prove that ² b ³ ‚ min ¸²³Á ²³¹ , for any Á   < . 3. Show that any delta series has a compositional inverse. 4. Show that for any delta series  , the sequence  is a pseudobasis. 5. Prove that C! is a derivation. 6. Show that  < is a delta functional if and only if º “ » ~  and º “ %» £ . 7. Show that  < is invertible if and only if º “ » £  . 8. Show that º²!³ “ ²%³» ~ º²!³ “ ²%³» for any a d< ,   and F. 9. Show that º! e! “ ²%³» ~  Z ² a ³ for any polynomial ²%³  F . 10. Show that ~ in < if and only if ~ as linear functionals, which holds if and only if ~ as linear operators. 11. Prove that if c ²%³ is Sheffer for ²²!³Á ²!³³ then ²!³ ²%³ ~  ²%³ . Hint: Apply the functionals ²!³ ²!³ to both sides. 12. Verify that the Abel polynomials form the associated sequence for the Abel functional. 13. Show that a sequence ²%³ is the Appell sequence for ²!³ if and only if c  ²%³~²!³%. d 14. If  is a delta series, show that the adjoint  of the umbral operator  is a vector space isomorphism of < . 15. Prove that if ;; is an automorphism of the umbral algebra then preserves order, that is, ²; ²!³³ ~ ²²!³³ . In particular, ; is continuous. 16. Show that an umbral operator maps associated sequences to associated sequences. 17. Let  ²%³ and  ²%³ be associated sequences. Define a linear operator  by ¢ ²%³¦ ²%³. Show that is an umbral operator. 18. Prove that if CC and are surjective derivations on < then c C ²!³ ~ ´C ²!³µ . References

General References [HJ1] Horn, R. and Johnson, C., Matrix Analysis , Cambridge University Press, 1985.

[HJ2] Horn, R. and Johnson, C., Topics in Matrix Analysis , Cambridge University Press, 1991.

[J1] Jacobson, N., Basic Algebra I , Second Edition, W.H. Freeman, 1985.

[KM] Kostrikin, A. and Manin, Y., Linear Algebra and Geometry , Gordon and Breach Science Publishers, 1997.

[MM1] Marcus, M., Finite Dimensional , Part I , Marcel Dekker, 1971.

[MM2] Marcus, M., Finite Dimensional Multilinear Algebra, Part II , Marcel Dekker, 1975.

[Sh] Shapiro, H., A survey of canonical forms and invariants for unitary similarity, Linear Algebra and Its Applications 147:101-167 (1991).

[ST] Snapper, E. and Troyer, R., Metric Affine Geometry , Dover Publications, 1971. References on the Umbral Calculus [LR] Loeb, D. and Rota, G.-C., Formal Power Series of Logarithmic Type, Advances in Mathematics, Vol. 75, No. 1, (May 1989) 1–118.

[Pin] Pincherle, S. "Operatori lineari e coefficienti di fattoriali." Alti Accad. Naz. Lincei, Rend. Cl. Fis. Mat. Nat. (6) 18, 417–519, 1933. 474 References

[Rom1] Roman, S., The Umbral Calculus , Pure and Applied Mathematics vol. 111, Academic Press, 1984.

[Rom2] Roman, S., The logarithmic binomial formula, American Mathematical Monthly 99 (1992) 641–648.

[Rom3] Roman, S., The harmonic logarithms and the binomial formula, Journal of Combinatorial Theory, series A, 63 (1992) 143–163. Index

Abel functional 443,455 ascending order 381 characteristic 29 Abel operator 445 associated sequence 447 characteristic equation 156 Abel polynomials 455 associates 25 characteristic polynomial 154,155 abelian 17 automorphism 55 characteristic value 156 absolutely convergent 312 characteristic vector 156 accumulation point 288 barycentric coordinates 417 Characterization of cyclic addition 17,28,30,33,94 base 409 submodules 146 adjoint 201 base ring 94 Cholsky decomposition 435 affine 407 basis 43,100 classification problem 257 affine combination 411 Bernoulli numbers 443 closed 286,398 affine geometry 409 Bernstein theorem 13 closed ball 286 affine group 419 Bessel's identity 192 closure 288 affine hull 412 Bessel's inequality codimension 80 affine map 418 192,319,320,327 column equivalent 8 affine span 412 best approximation 193,314 column rank 48 affine subspace 53 bijection 6 column space 48 affine transformation 418 bijective 6 common eigenvector 177 affinely independent 416 bilinear 342 commutative 17,18 affinity 418 bilinear form 239,342 commuting family 177 algebra 30 bilinearity 182 compact 398 algebraic closure 28 binomial identity 452 companion matrix 146 algebraic dual space 82 binomial type 452 complement 39,103 algebraic multiplicity 157 block diagonal matrix 3 complemented 103 algebraically closed 28 block matrix 3 complete 37,293 algebraically reflexive 85 blocks 7 complete decomposition 148 alphabet 380 bounded 303,331 complete invariant 8 alternate 240,242,371 complete system of invariants 8 alternating 240,371 canonical forms 8 completion 298 ancestor 13 canonical map 85 complexification 49,50,71 anisotropic 246 canonical projection 76 complexification map 50 annihilator 86,99 Cantor's theorem 13 composition 438 antisymmetric 239,371,381 cardinal number 12 cone 246,398 antisymmetric tensor algebra 384 cardinality 11,12 congruence classes 243 antisymmetric tensor space cartesian product 14 congruent 9,243 384,385,386 Cauchy sequence 293 congruent modulo 20,75 Antisymmetry 10 Cayley-Hamilton theorem 156 conjugate linear 196 Appell sequence 447 chain 11 conjugate linearity 182 Appolonius identity 196 chain rule 467 conjugate representation 449 approximation problem 313 change of basis matrix 61 conjugate space 332 ascending chain condition change of basis operator 60 conjugate symmetry 181 24,115,116 change of coordinates operator 60 connected 262 476 Index continuous dual space 332 distance 185,304 forward difference operator 445 continuum 15 divides 5,25 Fourier expansion 192,320,327 contraction 370 division algorithm 5 free 100 contravariant tensors 367 domain 6 Frobenius norm 436 contravariant type 366 dot product 182 functional 82 converge 321 double (algebraic) dual space 84 functional calculus 228 converges 185,287,312 dual basis 84 convex 314,398 Gaussian coefficients 53,472 convex combination 398 eigenspace 156 generate 42 convex hull 399 eigenvalue 156 generated 96 coordinate map 47 eigenvector 156 generating function 448,449 coordinate matrix 47,351 elementary divisors 135,147 geometric multiplicity 157 correspondence theorem 77,102 elementary matrix 3 Gersgorin region 179 coset 20,75 embedding 55,101 Gersgorin row disk 178 coset representative 20,75,410 empty word 380 Gersgorin row region 178 cosets 76,101 endomorphism 55,101 graded algebra 372 countable 12 epimorphism 55,101 graded ideal 373 countably infinite 12 equivalence class 7 Gram-Schmidt orthogonalization covariant tensors 367 equivalence relation 6 189 covariant type 366 equivalent 8,64 greatest common divisor 5 cycle 371 Euclidean metric 284 greatest lower bound 10 cyclic 146 evaluation at 82,85 group 17 cyclic decomposition 130, 148 evaluation functional 440,442 cyclic submodule 97 even permutation 372 Hamel basis 189 even weight subspace 36 Hamel dimension 189 decomposable 345 expansion theorems 448 Hamming distance function 303 defined by 142 exponential 448 Hermite polynomials 197,455 degenerate 246 exponential polynomials 468 Hermitian 207 deleted absolute row sum 178 extension by 87 Hilbert basis 189,317 delta functional 441 extension map 361 Hilbert basis theorem 118 delta operator 444 exterior algebra 384,385,386 Hilbert dimension 189,329 delta series 438 exterior product 383 Hilbert space 309 dense 290 exterior product space 384 Hilbert space adjoint 203 derivation 465 external direct sum 38,39 Hilbert spaces 297 descendants 13 homeomorphism 69 determinant 273,389 factored through 338,339 homogeneous component 372 diagonal 4 Farkas's lemma 406 homogeneous of degree 372,373 diagonalizable 165 field 28 homomorphism 55,100 diagonally dominant 179 field of quotients 23 Householder transformation diameter 303 finite 1,12,17 213,425 dimension 409,410 finite support 39 hyperbolic basis 253 direct product 38,393 finitely generated 97 hyperbolic pair 253 direct sum 39,68,102 first isomorphism theorem 79,102 hyperbolic plane 253 direct summand 39,103 flat 409 hyperbolic space 253 discrete metric 284 flat representative 410 hyperplane 410 discriminant 244 forward difference functional 443 Index 477 ideal 19 least upper bound 10 modular law 52 idempotent 91,107 left inverse 104 module 147 Identity 17 left singular vectors 430 monic 5 image 6,57 Legendre polynomials 191 monomorphism 55,101 imaginary part 49 length 183 multilinear 363 index of nilpotence 175 lifted 339 multilinear form 363 induced 287 limit 288 multiplication 17,28,30 inertia 269 limit point 288 multiplicity 1 injection 6,101 line 410 multiset 1 injective 6 linear code 36 inner product 181,240 linear combination 34 natural map 85 inner product space 181,240 linear combinations 96 natural projection 76 integral domain 22 linear functional 82 natural topology 69,71 internal 39 linear hyperplane 400 negative 17 invariant 8,68,72 linear operator 55 net definition 321 invariant factor decomposition 137 linear transformation 55 nilpotent 174,175 invariant factor decomposition linearity 322 noetherian 115 theorem 137 linearly dependent 42,98 noetherian ring 116 invariant factors 137,147 linearly independent 42,98 nondegenerate 246 invariant ideals 137 linearly ordered set 11 nonderogatory 174 invariant under 68 lower bound 10 nonisotropic 246 Inverses 17 lower factorial numbers 437 nonnegative 198,395 invertible functional 441 lower factorial polynomials 454 nonnegative orthant 198,395 involution 174 lower triangular 4 nonsingular 246 irreducible 5,25,72 nonsingular completion 254 isometric 252,298 main diagonal 2 nonsingular completions 254 isometric isomorphism 186,308 matrix 60 nonsingular extension theorem 254 isometrically isomorphic 186,308 matrix of 62,242 nontrivial 34 isometry 186,252,297,308 matrix of the form 242 norm 183,184,331 isomorphic 57 maximal element 10 normal 205 isomorphism 55,57,101 maximal ideal 21 normed linear space 184 isotropic 246 maximal linearly independent set normed vector space 197 42 null 246 join 414 maximal orthonormal set 189 nullity 57 Jordan block 159 measured by 339 Jordan canonical form 159 measuring functions 339 odd permutation 372 Jordan canonical form of 159 measuring set 339 onto 6 metric 185,283 open 286 kernel 57 metric space 185,283 open ball 286 Kronecker delta function 83 metric vector space 240 open neighborhood 286 Kronecker product 393 minimal element 10 open rectangles 68 Laguerre polynomials 456 minimal polynomial 144 open sets 287 minimal spanning set 42 operator adjoint 88 lattice 37 Minkowski space 240 order 17,86,121,437 leading coefficient 5 Minkowski's inequality 35,285 order ideals 99 leading entry 3 mixed tensors 367 ordered basis 47 478 Index orthogonal 167,187,208,245 128 right inverse 104 orthogonal complement 188,245 prime 25 right singular vectors 430 orthogonal direct sum 194,250 principal ideal 23 ring 17 orthogonal geometry 240 principal ideal domain 23 ring with identity 18 orthogonal group 252 product 14 rotation 273 orthogonal projection 223 projection 166 row equivalent 4 orthogonal resolution of the projection modulo 76 row rank 48 identity 226 projection onto 79 row space 48 orthogonal set 188 projection operators 73 scalar multiplication 30,33 orthogonal similarity classes 212 projection theorem 193,316 scalars 33,93 orthogonal spectral resolution projective dimension 421 separable 290 227,228 projective geometry 421 sesquilinear 182 orthogonal transformation 252 projective line 421 Sheffer for 447 orthogonality conditions 446 projective plane 421 Sheffer identity 452 orthogonally diagonalizable 205 projective point 421 Sheffer operator 457,464 orthogonally equivalent 212 properly divides 25 Sheffer sequence 447 orthogonally similar 212 pseudobasis 438 Sheffer shift 457 orthonormal basis 189 pure in 139 sign 372 orthonormal set 188 signature 269 QR factorization 425 similar 9,66,142 parallel 409 quadratic form 208,244 similarity classes 66,142 parallelogram law 183,307 quotient field 23 simple 120 parity 372 quotient module 101 simultaneously diagonalizable 177 Parseval's identity 192,328 quotient ring 21 singular 246 partial order 10 quotient space 76 singular value decomposition 430 partial sum 312 quotient space of 75 singular values 430 partially ordered set 10 size 1 partition 7 radical 246 space 359 permutation 371 range 6 span 42,96 Pincherle derivative 469 rank 48,57,111,244,351 spectral resolution 173 plane 410 rank plus nullity theorem 59 spectral theorem for normal point 410 rational canonical form 149,150 operators 227 polar decomposition 233 rational canonical form of 150 spectrum 156 polarization identities 184 recurrence formula 467 sphere 286 posets 10 reduce 148 splits 158 positive definite 230 reduced row echelon form 3,4 square summable functions 329 positive definiteness 181,283 reduces 68 standard basis 43,58,113 positive square root 231 reflection 213,273,425 standard inner product 182 power 14 reflexivity 7,10 standard topology 68 power of the continuum 15 relatively prime 5,26 standard vector 43 power set 12 resolution of the identity 170 Stirling numbers of the second primary 128 restriction 6 kind 468 primary cyclic decomposition retract 104 strictly diagonally dominant 179 theorem 134 Riesz representation theorem strictly positive 198,395 primary decomposition 147 195,248,333 strictly positive orthant 395 primary decomposition theorem Riesz vector 195 strictly separated 401 Index 479 string 380 topology 287 strongly positive 52,198,395 torsion element 99 wedge product 383 strongly positive orthant 198,395 torsion module 99 weight 35 strongly separated 402 total subset 318 with respect to 192 structure theorem for normal totally degenerate 246 with respect to the bases 62 matrices 222 totally isotropic 246 Witt index 278 structure theorem for normal totally ordered set 11 Witt's cancellation theorem operators 221 totally singular 246 260,275 structure theorem for normal trace 176 Witt's extension theorem 260,277 operators: complex case 216 translate 409 word 380 structure theorem for normal translation 418 operators: real case 220 translation operator 445 zero divisor 22 subfield 53 transpose 2 zero element 17 subgroup 17 transposition 372 zero subspace 36 submatrix 2 triangle inequality Zorn's lemma 11 submodule 96 183,185,283,307 submodule spanned 96 subring 18 umbral algebra 440 subspace 35,240,286 umbral composition 463 subspace generated 42 umbral operator 457 subspace spanned 42 umbral shift 457 sum 14,37 uncountable 12 sup metric 284 underlying set 1 support 6,38 unipotent 282 surjection 6 unique factorization domain 26 surjective 6 unit 25 Sylvester's law of inertia 269 unit vector 183 symmetric 2,207,239,371,374 unitarily diagonalizable 205 symmetric group 371 unitarily equivalent 212 symmetric tensor algebra 377,379 unitarily similar 212 symmetric tensor space 377,378 unitarily upper triangularizable symmetrization map 380 165 Symmetry 7,181,185,274,283 unitary 208 symplectic basis 253 unitary metric 284 symplectic geometry 240 unitary similarity classes 212 symplectic group 252 universal 271 symplectic transformation 252 universal for bilinearity 344 symplectic transvection 261 universal pair for 339 upper bound 10,11 tensor algebra 371 upper triangular 4,161,425 tensor product 345,346,364,393 upper triangularizable 161,425 tensors 345 tensors of type 366 Vandermonde convolution formula third isomorphism theorem 81,102 455 topological space 287 vector space 33,147 topological vector space 69 vectors 33 480 Index

projection 166 row equivalent 4 projection modulo 76 row rank 48 projection onto 79 row space 48 projection operators 73 scalar multiplication 30,33 projection theorem 193,316 scalars 33,93 projective dimension 421 separable 290 projective geometry 421 sesquilinear 182 projective line 421 Sheffer for 447 projective plane 421 Sheffer identity 452 projective point 421 Sheffer operator 457,464 properly divides 25 Sheffer sequence 447 pseudobasis 438 Sheller shift 457 pure in 139 sign 372 signature 269 QR factorization 425 similar 9,66,142 quadratic form 208,244 similarity classes 66,142 quotient field 23 simple 120 quotient module 101 simultaneously diagonalizable 177 quotient ring 21 singular 246 quotient space 76 singular value decomposition 430 quotient space of 75 singular values 430 size 1 radical 246 space 359 range 6 span 42,96 rank 48,57,111,244,351 spectral resolution 173 rank plus nullity theorem 59 spectral theorem for normal rational canonical form 149,150 operators 227 rational canonical form of 150 spectrum 156 recurrence formula 467 sphere 286 reduce 148 splits 158 reduced row echelon form 3,4 square summable functions 329 reduces 68 standard basis 43,58,113 reflection 213,273,425 standard inner product 182 reflexivity 7,10 standard topology 68 relatively prime 5,26 standard vector 43 resolution of the identity 170 Stirling numbers of the second restriction 6 kind 468 retract 104 strictly diagonally dominant 179 Riesz representation theorem strictly positive 198,395 195,248,333 strictly positive orthant 395 Riesz vector 195 strictly separated 401 right inverse 104 string 380 right singular vectors 430 strongly positive 52,198,395 ring 17 strongly positive orthant 198,395 ring with identity 18 strongly separated 402 rotation 273 structure theorem for normal Index 481

matrices 222 totally isotropic 246 structure theorem for normal totally ordered set 1 | operators 221 totally singular 246 structure theorem for normal trace 176 operators: complex case 216 translate 409 structure theorem for normal translation 418 operators: real case 220 translation operator 445 subfield 53 transpose 2 subgroup 17 transposition 372 submatrix 2 triangle inequality submodule 96 183,185,283,307 submodule spanned 96 subring 18 umbral algebra 440 subspace 35,240,286 umbral composition 463 subspace generated 42 umbral operator 457 subspace spanned 42 umbral shift 457 sum 14,37 uncountable 12 sup metric 284 underlying set 1 support 6,38 um )otent 282 surjection 6 unique factorization domain 26 surjective 6 umt 25 Sylvester's law of inertia 269 umt vector 183 symmetric 2,207,239,371,374 unitarily diagonalizable 205 symmetric group 371 unitarily equivalent 212 symmetric tensor algebra 377,379 unitarily similar 212 symmetric tensor space 377,378 unitarily upper triangularizable symmetrization map 380 165 Symmetry 7,181,185,274,283 unitary 208 symplectic basis 253 unitary metric 284 symplectic geometry 240 unitary similarity classes 212 symplectic group 252 universal 271 symplectic transformation 252 universal for bilinearity 344 symplectic transvection 261 universal pair for 339 upper bound 1O, 11 tensor algebra 371 upper triangular 4,161,425 tensor product 345,346,364,393 upper triangularizable 161,425 tensors 345 tensors of type 366 Vandermonde convolution formula third isomorphism theorem 81,102 455 topological space 287 vector space 33,147 topological vector space 69 vectors 33 topology 287 torsion element 99 wedge product 383 torsion module 99 weight 35 total subset 318 with respect to 192 totally degenerate 246 with respect to the bases 62 482 Index

Witt index 278 Witt's cancellation theorem 260,275 Witt's extension theorem 260,277 word 380 zero divisor 22 zero element 17 zero subspace 36 Zorn's lemma 11