Preface to the Second Edition

The basic structure of the first edition has been preserved in the second because it remains congruent with the goal of writing “a book that would be a useful modern treatment of a broad range of topics...[that] may be used as an undergraduate or graduate text and as a self-contained reference for a variety of audiences.” The quotation is from the Preface to the First Edition, whose declaration of goals for the work remains unchanged. What is different in the second edition? The core role of canonical forms has been expanded as a unifying element in understanding similarity (complex, real, and simultaneous), unitary equivalence, uni- tary similarity, congruence, *congruence, unitary congruence, triangular equivalence, and other equivalence relations. More attention is paid to cases of equality in the many inequalities considered in the book. Block matrices are a ubiquitous feature of the exposition in the new edition. Learning has never been a spectator sport, so the new edition continues to emphasize the value of exercises and problems for the active reader. Numerous 2-by-2 examples illustrate concepts throughout the book. Problem threads (some span several chapters) develop special topics as the foundation for them evolves in the text. For example, there are threads involving the adjugate , the compound matrix, finite-dimensional quantum systems, the Loewner ellipsoid and the Loewner–John matrix, and normalizable matrices; see the index for page references for these threads. The first edition had about 690 problems; the second edition has more than 1,100. Many problems have hints; they may be found in an appendix that appears just before the index. A comprehensive index is essential for a book that is intended for sustained use as a reference after initial use as a text. The index to the first edition had about 1,200 entries; the current index has more than 3,500 entries. An unfamiliar term encountered in the text should be looked up in the index, where a pointer to a definition (in Chapter 0 or elsewhere) is likely to be found. New discoveries since 1985 have shaped the presentation of many topics and have stimulated inclusion of some new ones. A few examples of the latter are the Jordan

xiix xiix Preface to the second edition canonical form of a rank-one perturbation, motivated by enduring student interest in the Google matrix; a generalization of real normal matrices (normal matrices A such that AA¯ is real); computable criteria for simultaneous unitary similarity or simultaneous unitary congruence; G. Belitskii’s discovery that a matrix commutes with a Weyr canonical form if and only if it is block upper triangular and has a special structure; the discovery by K. C. O’Meara and C. Vinsonhaler that, unlike the corresponding situation for the Jordan canonical form, a commuting family can be simultaneously upper triangularized by similarity in such a way that any one specified matrix in the family is in Weyr canonical form; and canonical forms for congruence and ∗congruence. Queries from many readers have motivated changes in the way that some topics are presented. For example, discussion of Lidskii’s eigenvalue majorization inequalities was moved from a section primarily devoted to singular value inequalities to the section where majorization is discussed. Fortunately, a splendid new proof of Lidskii’s inequalities by C. K. Li and R. Mathias became available and was perfectly aligned with Chapter 4’s new approach to eigenvalue inequalities for Hermitian matrices. A second example is a new proof of Birkhoff’s theorem, which has a very different flavor from the proof in the first edition. Instructors accustomed to the order of topics in the first edition may be interested in a chapter-by-chapter summary of what is different in the new edition:

0. Chapter 0 has been expanded by about 75% to include a more comprehensive summary of useful concepts and facts. It is intended to serve as an as-needed reference. Definitions of terms and notations used throughout the book can be found here, but it has no exercises or problems. Formal courses and reading for self-study typically begin with Chapter 1. 1. Chapter 1 contains new examples related to similarity and the characteristic poly- nomial, as well as an enhanced emphasis on the role of left eigenvectors in matrix analysis. 2. Chapter 2 contains a detailed presentation of real orthogonal similarity, an exposition of McCoy’s theorem on simultaneous triangularization, and a rigor- ous treatment of continuity of eigenvalues that makes essential use of both the unitary and triangular aspects of Schur’s unitary triangularization theorem. Sec- tion 2.4 (Consequences of Schur’s triangularization theorem) is almost twice the length of the corresponding section in the first edition. There are two new sections, one devoted to the singular value decomposition and one devoted to the CS de- composition. Early introduction of the singular value decomposition permits this essential tool of matrix analysis to be used throughout the rest of the book. 3. Chapter 3 approaches the Jordan canonical form via the Weyr characteristic; it contains an exposition of the Weyr canonical form and its unitary variant that were not in the first edition. Section 3.2 (Consequences of the Jordan canonical form) discusses many new applications; it contains 60% more material than the corresponding section in the first edition. 4. Chapter 4 now has a modern presentation of variational principles and eigen- value inequalities for Hermitian matrices via subspace intersections. It contains an expanded treatment of inverse problems associated with interlacing and other Preface to the second edition xiiixi

classical results. Its detailed treatment of unitary congruence includes Youla’s theorem (a normal form for a square complex matrix A under unitary congruence that is associated with the eigenstructure of AA¯), as well as canonical forms for conjugate normal, congruence normal, and squared normal matrices. It also has an exposition of recently discovered canonical forms for congruence and ∗congruence and new algorithms to construct a basis of a coneigenspace. 5. Chapter 5 contains an expanded discussion of norm duality, many new problems, and a treatment of semi-inner products that finds application in a discussion of finite-dimensional quantum systems in Chapter 7. 6. Chapter 6 has a new treatment of the “disjoint discs” aspect of Gersgorin’sˇ theorem and a reorganized discussion of eigenvalue perturbations, including differentiabil- ity of a simple eigenvalue. 7. Chapter 7 has been reorganized now that the singular value decomposition is introduced in Chapter 2. There is a new treatment of the polar decomposition, new factorizations related to the singular value decomposition, and special emphasis on row and column inclusion. The von Neumann trace theorem (proved via Birkhoff’s theorem) is now the foundation on which many applications of the singular value decomposition are built. The Loewner partial order and block matrices are treated in detail with new techniques, as are the classical determinant inequalities for positive definite matrices. 8. Chapter 8 uses facts about left eigenvectors developed in Chapter 1 to streamline its exposition of the Perron–Frobenius theory of positive and nonnegative matrices. D. Appendix D contains new explicit perturbation bounds for the zeroes of a poly- nomial and the eigenvalues of a matrix. F. Appendix F tabulates a modern list of canonical forms for a pair of Hermitian matrices, or a pair of matrices, one of which is symmetric and the other is skew symmetric. These canonical pairs are applications of the canonical forms for congruence and ∗congruence presented in Chapter 4.

Readers who are curious about the technology of book making may be interested to know that this book began as a set of LATEX files created manually by a company in India from hard copy of the first edition. Those files were edited and revised using the R Scientific WorkPlace graphical user interface and typesetting system. The cover art for the second edition was the result of a lucky encounter on a Delta flight from Salt Lake City to Los Angeles in spring 2003. The young man in the middle seat said he was an artist who paints abstract paintings that are sometimes mathematically inspired. In the course of friendly conversation, he revealed that his special area of mathematical enjoyment was , and that he had studied Matrix Analysis. After mutual expressions of surprise at the chance nature of our meeting, and a pleasant discussion, we agreed that appropriate cover art would enhance the visual appeal of the second edition; he said he would send something to consider. In due course a packet arrived from Seattle. It contained a letter and a stunning 4.5- by 5-inch color photograph, identified on the back as an image of a 72- by 66-inch oil on canvas, painted in 2002. The letter said that “the painting is entitled Surprised Again on the Diagonal and is inspired by the recurring prevalence of the diagonal in math whether it be in geometry, analysis, algebra, set theory or logic. I think that it would xivxii Preface to the second edition be an attractive addition to your wonderful book.” Thank you, Lun-Yi Tsai, for your wonderful cover art! A great many students, instructors, and professional colleagues have contributed to the evolution of this new edition since its predecessor appeared in 1985. Special thanks are hereby acknowledged to T. Ando, Wayne Barrett, Ignat Domanov, Jim Fill, Carlos Martins da Fonseca, Tatiana Gerasimova, Geoffrey Goodson, Robert Guralnick, Thomas Hawkins, Eugene Herman, Khakim Ikramov, Ilse Ipsen, Dennis C. Jespersen, Hideki Kosaki, Zhongshan Li, Teck C. Lim, Ross A. Lippert, Roy Mathias, Dennis Merino, Arnold Neumaier, Kevin O’Meara, Peter Rosenthal, Vladimir Sergeichuk, Wasin So, Hugo Woerdeman, and Fuzhen Zhang. R.A.H. Preface to the First Edition

Linear algebra and matrix theory have long been fundamental tools in mathematical disciplines as well as fertile fields for research in their own right. In this book, and in the companion volume, Topics in Matrix Analysis, we present classical and recent results of matrix analysis that have proved to be important to applied mathematics. The book may be used as an undergraduate or graduate text and as a self-contained reference for a variety of audiences. We assume background equivalent to a one-semester elementary linear algebra course and knowledge of rudimentary analytical concepts. We begin with the notions of eigenvalues and eigenvectors; no prior knowledge of these concepts is assumed. Facts about matrices, beyond those found in an elementary linear algebra course, are necessary to understand virtually any area of mathematical science, whether it be differential equations; probability and ; optimization; or applications in theoretical and applied economics, the engineering disciplines, or operations research, to name only a few. But until recently, much of the necessary material has occurred sporadically (or not at all) in the undergraduate and graduate curricula. As interest in applied mathematics has grown and more courses have been devoted to advanced matrix theory, the need for a text offering a broad selection of topics has become more apparent, as has the need for a modern reference on the subject. There are several well-loved classics in matrix theory, but they are not well suited for general classroom use, nor for systematic individual study. A lack of problems, applications, and motivation; an inadequate index; and a dated approach are among the difficulties confronting readers of some traditional references. More recent books tend to be either elementary texts or treatises devoted to special topics. Our goal was to write a book that would be a useful modern treatment of a broad range of topics. One view of “matrix analysis” is that it consists of those topics in linear algebra that have arisen out of the needs of mathematical analysis, such as multivariable calculus, complex variables, differential equations, optimization, and approximation theory. Another view is that matrix analysis is an approach to real and complex linear

xiiixv xvixiv Preface to the first edition algebraic problems that does not hesitate to use notions from analysis – such as limits, continuity, and power series – when these seem more efficient or natural than a purely algebraic approach. Both views of matrix analysis are reflected in the choice and treatment of topics in this book. We prefer the term matrix analysis to linear algebra as an accurate reflection of the broad scope and methodology of the field. For review and convenience in reference, Chapter 0 contains a summary of necessary facts from elementary linear algebra, as well as other useful, though not necessarily elementary, facts. Chapters 1, 2, and 3 contain mainly core material likely to be included in any second course in linear algebra or matrix theory: a basic treatment of eigenvalues, eigenvectors, and similarity; unitary similarity, Schur triangularization and its implications, and normal matrices; and canonical forms and factorizations, including the Jordan form, LU factorization, QR factorization, and companion matrices. Beyond this, each chapter is developed substantially independently and treats in some depth a major topic: 1. Hermitian and complex symmetric matrices (Chapter 4). We give special emphasis to variational methods for studying eigenvalues of Hermitian matrices and include an introduction to the notion of majorization. 2. Norms on vectors and matrices (Chapter 5) are essential for error analyses of numerical linear algebraic algorithms and for the study of matrix power series and iterative processes. We discuss the algebraic, geometric, and analytic properties of norms in some detail and make a careful distinction between those norm results for matrices that depend on the submultiplicativity axiom for matrix norms and those that do not. 3. Eigenvalue location and perturbation results (Chapter 6) for general (not neces- sarily Hermitian) matrices are important for many applications. We give a detailed treatment of the theory of Gersgorinˇ regions, and some of its modern refinements, and of relevant graph theoretic concepts. 4. Positive definite matrices (Chapter 7) and their applications, including inequalities, are considered at some length. A discussion of the polar and singular value decom- positions is included, along with applications to matrix approximation problems. 5. Entry-wise nonnegative and positive matrices (Chapter 8) arise in many applica- tions in which nonnegative quantities necessarily occur (probability, economics, engineering, etc.), and their remarkable theory reflects the applications. Our devel- opment of the theory of nonnegative, positive, primitive, and irreducible matrices proceeds in elementary steps based on the use of norms. In the companion volume, further topics of similar interest are treated: the field of values and generalizations; inertia, stable matrices, M-matrices and related special classes; matrix equations, Kronecker and Hadamard products; and various ways in which functions and matrices may be linked. This book provides the basis for a variety of one- or two-semester courses through selection of chapters and sections appropriate to a particular audience. We recommend that an instructor make a careful preselection of sections and portions of sections of the book for the needs of a particular course. This would probably include Chapter 1, much of Chapters 2 and 3, and facts about Hermitian matrices and norms from Chapters 4 and 5. Preface to the first edition xviixv

Most chapters contain some relatively specialized or nontraditional material. For example, Chapter 2 includes not only Schur’s basic theorem on unitary triangularization of a single matrix but also a discussion of simultaneous triangularization of families of matrices. In the section on unitary equivalence, our presentation of the usual facts is followed by a discussion of trace conditions for two matrices to be unitarily equivalent. A discussion of complex symmetric matrices in Chapter 4 provides a counterpoint to the development of the classical theory of Hermitian matrices. Basic aspects of a topic appear in the initial sections of each chapter, while more elaborate discussions occur at the ends of sections or in later sections. This strategy has the advantage of presenting topics in a sequence that enhances the book’s utility as a reference. It also provides a rich variety of options to the instructor. Many of the results discussed are valid or can be generalized to be valid for matrices over other fields or in some broader algebraic setting. However, we deliberately confine our domain to the real and complex fields where familiar methods of classical analysis as well as formal algebraic techniques may be employed. Though we generally consider matrices to have complex entries, most examples are confined to real matrices, and no deep knowledge of complex analysis is required. Acquaintance with the arithmetic of complex numbers is necessary for an understanding of matrix analysis and is covered to the extent necessary in an appendix. Other brief appendices cover several peripheral, but essential, topics such as Weierstrass’s theorem and convexity. We have included many exercises and problems because we feel these are essential to the development of an understanding of the subject and its implications. The exercises occur throughout as part of the development of each section; they are generally elementary and of immediate use in understanding the concepts. We rec- ommend that the reader work at least a broad selection of these. Problems are listed (in no particular order) at the end of each section; they cover a range of difficulties and types (from theoretical to computational) and they may extend the topic, develop special aspects, or suggest alternate proofs of major ideas. Significant hints are given for the more difficult problems. The results of some problems are referred to in other problems or in the text itself. We cannot overemphasize the importance of the reader’s active involvement in carrying out the exercises and solving problems. While the book itself is not about applications, we have, for motivational purposes, begun each chapter with a section outlining a few applications to introduce the topic of the chapter. Readers who wish to consult alternative treatments of a topic for additional information are referred to the books listed in the References section following the appendices. The list of book references is not exhaustive. As a practical concession to the limits of space in a general multitopic book, we have minimized the number of citations in the text. A small selection of references to papers – such as those we have explicitly used – does occur at the end of most sections accompanied by a brief discussion, but we have made no attempt to collect historical references to classical results. Extensive bibliographies are provided in the more specialized books we have referenced. We appreciate the helpful suggestions of our colleagues and students who have taken the time to convey their reactions to the class notes and preliminary manuscripts xviiixvi Preface to the first edition that were the precursors of the book. They include Wayne Barrett, Leroy Beasley, Bryan Cain, David Carlson, Dipa Choudhury, Risana Chowdhury, Yoo Pyo Hong, Dmitry Krass, Dale Olesky, Stephen Pierce, Leiba Rodman, and Pauline van den Driessche. R.A.H. C.R.J. CHAPTER 0 Review and Miscellanea

0.0 Introduction

In this initial chapter we summarize many useful concepts and facts, some of which provide a foundation for the material in the rest of the book. Some of this material is included in a typical elementary course in linear algebra, but we also include additional useful items, even though they do not arise in our subsequent exposition. The reader may use this chapter as a review before beginning the main part of the book in Chapter 1; subsequently, it can serve as a convenient reference for notation and definitions that are encountered in later chapters. We assume that the reader is familiar with the basic concepts of linear algebra and with mechanical aspects of matrix manipulations, such as and addition.

0.1 Vector spaces

A finite dimensional vector space is the fundamental setting for matrix analysis.

0.1.1 Scalar field. Underlying a vector space is its field, or set of scalars. For our purposes, that underlying field is typically the real numbers R or the complex numbers C (see Appendix A), but it could be the rational numbers, the modulo a specified prime number, or some other field. When the field is unspecified, we denote it by the symbol F. To qualify as a field, a set must be closed under two binary operations: “addition” and “multiplication.” Both operations must be associative and commutative, and each must have an identity element in the set; inverses must exist in the set for all elements under addition and for all elements except the additive identity under multiplication; multiplication must be distributive over addition.

0.1.2 Vector spaces. A vector space V over a field F is a set V of objects (called vectors) that is closed under a binary operation (“addition”) that is associative and commutative and has an identity (the zero vector, denoted by 0) and additive inverses

1 2 Review and miscellanea in the set. The set is also closed under an operation of “scalar multiplication” of the vectors by elements of the scalar field F, with the following properties for all a, b F and all x, y V : a(x y) ax ay,(a b)x ax bx, a(bx) (ab)x, ∈ ∈ + = + + = + = and ex x for the multiplicative identity e F. = ∈ For a given field F and a given positive n, the set Fn of n-tuples with entries from F forms a vector space over F under entrywise addition in Fn. Our convention is that elements of Fn are always presented as column vectors; we often call them n-vectors. The special cases Rn and Cn are the basic vector spaces of this book; Rn is a real vector space (that is, a vector space over the real field), while Cn is both a real vector space and a complex vector space (a vector space over the complex field). The set of polynomials with real or with complex coefficients (of no more than a specified degree or of arbitrary degree) and the set of real-valued or complex-valued functions on subsets of R or C (all with the usual notions of addition of functions and multiplication of a function by a scalar) are also examples of real or complex vector spaces.

0.1.3 Subspaces, span, and linear combinations. A subspace of a vector space V over a field F is a subset of V that is, by itself, a vector space over F using the same operations of vector addition and scalar multiplication as in V . A subset of V is a subspace precisely when it is closed under these two operations. For example, [a, b, 0]T : a, b R is a subspace of R3; see (0.2.5) for the transpose notation. An { ∈ } intersection of subspaces is always a subspace; a union of subspaces need not be a subspace. The subsets 0 and V are always subspaces of V , so they are often called { } trivial subspaces; a subspace of V is said to be nontrivial if it is different from both 0 and V . A subspace of V is said to be a proper subspace if it is not equal to V . We { } call 0 the zero vector space. Since a vector space always contains the zero vector, a { } subspace cannot be empty. If S is a subset of a vector space V over a field F, span S is the intersection of all subspaces of V that contain S. If S is nonempty, then span S a1v1 akvk : ={ +···+ v1,...,vk S, a1,...,ak F, and k 1, 2,... . If S is empty, it is contained in every ∈ ∈ = } subspace of V ; since the intersection of every subspace of V is the subspace 0 , the { } definition ensures that span S 0 . Notice that span S is always a subspace even if S ={ } is not a subspace; S is said to span V if span S V . = A linear combination of vectors in a vector space V over a field F is any expression of the form a1v1 akvk in which k is a positive integer, a1,...,ak F, and +···+ ∈ v1,...,vk V . Thus, the span of a nonempty subset S of V consists of all linear ∈ combinations of finitely many vectors in S. A linear combination a1v1 akvk +···+ is trivial if a1 ak 0; otherwise, it is nontrivial. A linear combination is by =···= = definition a sum of finitely many elements of a vector space. Let S1 and S2 be subspaces of a vector space over a field F. The sum of S1 and S2 is the subspace

S1 S2 span S1 S2 x y : x S1, y S2 + = { ∪ } = { + ∈ ∈ }

If S1 S2 0 , we say that the sum of S1 and S2 is a direct sum and write it as ∩ ={ } S1 S2; every z S1 S2 can be written as z x y with x S1 and y S2 in one ⊕ ∈ ⊕ = + ∈ ∈ and only one way. 0.1 Vector spaces 3

0.1.4 Linear dependence and . We say that a finite list of vectors v1,...,vk in a vector space V over a field F is linearly dependent if and only if there are scalars a1,...,ak F, not all zero, such that a1x1 ak xk 0. ∈ +···+ = Thus, a list of vectors v1,...,vk is linearly dependent if and only if some nontrivial linear combination of v1,...,vk is the zero vector. It is often convenient to say that “v1,...,vk are linearly dependent” instead of the more formal statement “the list of vec- tors v1,...,vk is linearly dependent.” A list of vectors v1,...,vk is said to have length k. A list of two or more vectors is linearly dependent if one of the vectors is a linear com- bination of some of the others; in particular, a list of two or more vectors in which two of the vectors in the list are identical is linearly dependent. Two vectors are linearly dependent if and only if one of the vectors is a scalar multiple of the other. A list consisting only of the zero vector is linearly dependent since a10 0 for a1 1. = = A finite list of vectors v1,...,vk in a vector space V over a field F is linearly independent if it is not linearly dependent. Again, it can be convenient to say that “v1,...,vk are linearly independent” instead of “the list of vectors v1,...,vk is linearly independent.” Sometimes one encounters natural lists of vectors that have infinitely many elements, for example, the monomials 1, t, t2, t3,...in the vector space of all polynomials with real coefficients or the complex exponentials 1, eit, e2it, e3it,...in the vector space of complex-valued continuous functions that are periodic on [0, 2π]. If certain vectors in a list (finite or infinite) are deleted, the resulting list is a sublist of the original list. An infinite list of vectors is said to be linearly dependent if some finite sublist is linearly dependent; it is said to be linearly independent if every finite sublist is linearly independent. Any sublist of a linearly independent list of vectors is linearly independent; any list of vectors that has a linearly dependent sublist is linearly dependent. Since a list consisting only of the zero vector is linearly dependent, any list of vectors that contains the zero vector is linearly dependent. A list of vectors can be linearly dependent, while any proper sublist is linearly independent; see (1.4.P12). An empty list of vectors is not linearly dependent, so it is linearly independent. The cardinality of a finite set is the number of its (necessarily distinct) elements. For a given list of vectors v1,...,vk in a vector space V , the cardinality of the set v1,...,vk is less than k if and only if two or more vectors in the list are identical; { } if v1,...,vk are linearly independent, then the cardinality of the set v1,...,vk is k. { } The span of a list of vectors (finite or not) is the span of the set of elements of the list; a list of vectors spans V if V is the span of the list. A set S of vectors is said to be linearly independent if every finite list of distinct vectors in S is linearly independent; S is said to be linearly dependent if some finite list of distinct vectors in S is linearly dependent.

0.1.5 Basis. A linearly independent list of vectors in a vector space V whose span is V is a basis for V . Each element of V can be represented as a linear combination of vectors in a basis in one and only one way; this is no longer true if any element whatsoever is appended to or deleted from the basis. A linearly independent list of vectors in V is a basis of V if and only if no list of vectors that properly contains it is linearly independent. A list of vectors that spans V is a basis for V if and only if no proper sublist of it spans V . The empty list is a basis for the zero vector space. 4 Review and miscellanea

0.1.6 Extension to a basis. Any linearly independent list of vectors in a vector space V may be extended, perhaps in more than one way, to a basis of V . A vector space can have a basis that is not finite; for example, the infinite list of monomials 1, t, t 2, t3,... is a basis for the real vector space of all polynomials with real coefficients; each polynomial is a unique linear combination of (finitely many) elements in the basis.

0.1.7 Dimension. If there is a positive integer n such that a basis of the vector space V contains exactly n vectors, then every basis of V consists of exactly n vectors; this common cardinality of bases is the dimension of the vector space V and is denoted by dimV. In this event, V is finite-dimensional; otherwise V is infinite-dimensional. In the infinite-dimensional case, there is a one-to-one correspondence between the elements of any two bases. The real vector space Rn has dimension n. The vector space Cn has dimension n over the field C but dimension 2n over the field R. The basis e1,...,en n of F in which each n-vector ei has a 1 as its ith entry and 0s elsewhere is called the standard basis. It is convenient to say “V is an n-dimensional vector space” as a shorthand for “V is a finite-dimensional vector space whose dimension is n.” Any subspace of an n-dimensional vector space is finite-dimensional; its dimension is strictly less than n if it is a proper subspace. Let V be a finite-dimensional vector space and let S1 and S2 be two given subspaces of V . The subspace intersection theorem is

dim (S1 S2) dim (S1 S2) dim S1 dim S2 (0.1.7.1) ∩ + + = + Rewriting this identity as

dim (S1 S2) dim S1 dim S2 dim (S1 S2) ∩ = + − + dim S1 dim S2 dim V (0.1.7.2) ≥ + − reveals the useful fact that if δ dim S1 dim S2 dim V 1, then the subspace = + − ≥ S1 S2 has dimension at least δ, and hence it contains δ linearly independent vectors, ∩ namely, any δ elements of a basis of S1 S2. In particular, S1 S2 contains a nonzero ∩ ∩ vector. An induction argument shows that if S1,...,Sk are subspaces of V , and if δ dim S1 dim Sk (k 1) dim V 1, then = +···+ − − ≥ dim (S1 Sk) δ (0.1.7.3) ∩···∩ ≥ and hence S1 Sk contains δ linearly independent vectors; in particular, it con- ∩···∩ tains a nonzero vector.

0.1.8 Isomorphism. If U and V are vector spaces over the same scalar field F, and if f : U V is an invertible function such that f (ax by) af(x) bf(y) for all → + = + x, y U and all a, b F, then f is said to be an isomorphism and U and V are ∈ ∈ said to be isomorphic (“same structure”). Two finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension; thus, any n-dimensional vector space over F is isomorphic to Fn. Any n-dimensional real vector space is, therefore, isomorphic to Rn, and any n-dimensional complex vector space is isomorphic to Cn. Specifically, if V is an n-dimensional vector space over a field F 0.2 Matrices 5 with specified basis x1,...,xn , then, since any element x V may be written B ={ } ∈ uniquely as x a1x1 an xn in which each ai F, we may identify x with the n- = +···+T ∈ vector [x] [a1 ... an] . For any basis , the mapping x [x] is an isomorphism B = → B between V and Fn. B

0.2 Matrices

The fundamental object of study here may be thought of in two important ways: as a rectangular array of scalars and as a linear transformation between two vector spaces, given specified bases for each space.

0.2.1 Rectangular arrays. A matrix is an m-by-n array of scalars from a field F. If m n, the matrix is said to be square. The set of all m-by-n matrices over F is denoted = n by Mm,n(F), and Mn,n(F) is often denoted by Mn(F). The vector spaces Mn,1(F) and F are identical. If F C, then Mn(C) is further abbreviated to Mn, and Mm,n(C) to Mm,n. = Matrices are typically denoted by capital letters, and their scalar entries are typically denoted by doubly subscripted lowercase letters. For example, if 3 2 2 0 A − [aij] = 1 π 4 = −  then A M2,3(R) has entries a11 2, a12 3/2, a13 0, a21 1, a22 π, ∈ = =− = =− = a23 4. A submatrix of a given matrix is a rectangular array lying in specified subsets = of the rows and columns of a given matrix. For example, [π 4] is a submatrix (lying in row 2 and columns 2 and 3) of A. Suppose that A [aij] Mn,m(F). The main diagonal of A is the list of entries = ∈ a11, a22,...,aqq, in which q min n, m . It is sometimes convenient to express the = { } q q main diagonal of A as a vector diag A [aii]i 1 F . The pth superdiagonal of A is = = ∈ the list a1,p 1, a2,p 2,...,ak,p k, in which k min n, m p , p 0, 1, 2,...,m + + + = { − } = − 1; the pth subdiagonal of A is the list ap 1,1, ap 2,2,...,ap �,�, in which � min n + + + = { − p, m , p 0, 1, 2,...,n 1. } = − 0.2.2 Linear transformations. Let U be an n-dimensional vector space and let V be an m-dimensional vector space, both over the same field F; let U be a basis of U B and let V be a basis of V . We may use the isomorphisms x [x] U and y [y] V → B → B to representB vectors in U and V as n-vectors and m-vectors over F, respectively. A linear transformation is a function T : U V such that T (a1x1 a2x2) a1T (x1) → + = + a2T (x2) for any scalars a1, a2 and vectors x1, x2. A matrix A Mm,n(F) corresponds ∈ to a linear transformation T : U V in the following way: y T (x) if and only if → = [y] V A[x] U . The matrix A is said to represent the linear transformation T (relative B = B to the bases U and V ); the representing matrix A depends on the bases chosen. When we study a matrixB AB, we realize that we are studying a linear transformation relative to a particular choice of bases, but explicit appeal to the bases is usually not necessary.

0.2.3 Vector spaces associated with a matrix or linear transformation. Any n-dimensional vector space over F may be identified with Fn; we may think of 6 Review and miscellanea

n m A Mm,n(F) as a linear transformation x Ax from F to F (and also as an array). ∈ → The domain of this linear transformation is Fn; its range is range A y Fm : y ={ ∈ = Ax for some x Fn; its null space is nullspace A x Fn : Ax 0 . The range of } ∈ ={ ∈ = } A is a subspace of Fm, and the null space of A is a subspace of Fn. The dimension of nullspace A is denoted by nullity A; the dimension of range A is denoted by rank A. These numbers are related by the rank-nullity theorem dim (range A) dim (nullspace A) rank A nullity A n (0.2.3.1) + = + = n for A Mm,n(F). The null space of A is a set of vectors in F whose entries satisfy m ∈ homogeneous linear equations.

0.2.4 Matrix operations. Matrix addition is defined entrywise for arrays of the same dimensions and is denoted by (“A B”). It corresponds to addition of + + linear transformations (relative to the same basis), and it inherits commutativity and associativity from the scalar field. The (all entries are zero) is the additive identity, and Mm,n(F) is a vector space over F. Matrix multiplication is denoted by juxtaposition (“AB”) and corresponds to the composition of linear transformations. Therefore, it is defined only when A Mm,n(F) and B Mn,q (F). It is associative, but ∈ ∈ not always commutative. For example, 12 10 12 12 10 14 68= 02 34�= 34 02= 38 � � � �� � � �� � � � The 1 . I .. Mn(F) = � 1 � ∈

is the multiplicative identity in Mn(F); its main diagonal entries are 1, and all other entries are 0. The identity matrix and any scalar multiple of it (a scalar matrix) commute with every matrix in Mn(F); they are the only matrices that do so. Matrix multiplication is distributive over matrix addition. The symbol 0 is used throughout the book to denote each of the following: the zero scalar of a field, the zero vector of a vector space, the zero n-vector in Fn (all entries equal to the zero scalar in F), and the zero matrix in Mm,n(F) (all entries equal to the zero scalar). The symbol I denotes the identity matrix of any size. If there is potential for confusion, we indicate the dimension of a zero or identity matrix with subscripts, for example, 0p,q ,0k, or Ik.

0.2.5 The transpose, conjugate transpose, and trace. If A [aij] Mm,n(F), T = ∈ the transpose of A, denoted by A , is the matrix in Mn,m(F) whose i, j entry is a ji; that is, rows are exchanged for columns and vice versa. For example,

T 14 123 25 456 = ⎡ ⎤ � � 36 ⎣ ⎦ Of course, (AT )T A. The conjugate transpose (sometimes called the adjoint or = T Hermitian adjoint) of A Mm,n(C), is denoted by A and defined by A A¯ , in ∈ ∗ ∗ = 0.2 Matrices 7 which A¯ is the entrywise conjugate. For example,

1 i 2 i ∗ 1 i 3 + − − − 3 2i = 2 i 2i − −  +  Both the transpose and the conjugate transpose obey the reverse-order law:(AB) ∗ = B A and (AB)T BT AT . For the complex conjugate of a product, there is no ∗ ∗ = reversing: AB A¯ B¯ . If x, y are real or complex vectors of the same size, then = y∗x is a scalar and its conjugate transpose and complex conjugate are the same: (y x) y x x y yT x¯. ∗ ∗ = ∗ = ∗ = Many important classes of matrices are defined by identities involving the transpose T or conjugate transpose. For example, A Mn(F) is said to be symmetric if A A, T ∈ T = skew symmetric if A A, and orthogonal if A A I; A Mn(C) is said to be =− = ∈ Hermitian if A A, skew Hermitian if A A, essentially Hermitian if eiθ A is ∗ = ∗ =− Hermitian for some θ R, unitary if A A I, and normal if A A AA . ∈ ∗ = ∗ = ∗ Each A Mn(F) can be written in exactly one way as A S(A) C(A), in which ∈ = + S(A) is symmetric and C(A) is skew symmetric: S(A) 1 (A AT ) is the symmetric = 2 + part of A; C(A) 1 (A AT ) is the skew-symmetric part of A. = 2 − Each A Mm,n(C) can be written in exactly one way as A B iC, in which ∈ 1 1 = + B, C Mm,n(R): B (A A¯) is the real part of A; C (A A¯) is the imaginary ∈ = 2 + = 2i − part of A. Each A Mn(C) can be written in exactly one way as A H(A) iK(A), in ∈ = + which H(A) and K (A) are Hermitian: H(A) 1 (A A ) is the Hermitian part of = 2 + ∗ A; iK(A) 1 (A A ) is the skew-Hermitian part of A. The representation A = 2 − ∗ = H(A) iK(A) of a complex or real matrix is its Toeplitz decomposition. + The trace of A [aij] Mm,n(F) is the sum of its main diagonal entries: tr A = ∈ = a11 aqq, in which q min m, n . For any A [aij] Mm,n(C), tr AA∗ +···+ 2 = { } = ∈ = tr A A aij , so ∗ = i, j | |  tr AA∗ 0 if and only if A 0 (0.2.5.1) = = A vector x Fn is isotropic if x T x 0. For example, [1 i]T C2 is a nonzero ∈ = ∈ isotropic vector. There are no nonzero isotropic vectors in Rn.

0.2.6 Metamechanics of matrix multiplication. In addition to the conventional definition of matrix-vector and matrix-matrix multiplication, several alternative view- points can be useful.

n m 1. If A Mm,n(F), x F , and y F , then the (column) vector Ax is a linear ∈ ∈ ∈ combination of the columns of A; the coefficients of the linear combination are the entries of x. The row vector yT A is a linear combination of the rows of A; the coefficients of the linear combination are the entries of y. T 2. If b j is the jth column of B and ai is the ith row of A, then the jth column of T AB is Ab j and the ith row of AB is ai B. To paraphrase, in the matrix product AB, left multiplication by A multiplies the columns of B and right multiplication by B multiplies the rows of A. See (0.9.1) for an important special case of this observation when one of the factors is a . 8 Review and miscellanea

Suppose that A Mm,p(F) and B Mn,q (F). Let ak be the kth column of A and let ∈ ∈ bk be the kth column of B. Then

T T T T 3. If m n, then A B ai b j : the i, j entry of A B is the scalar ai b j . = T = n T 4. If p q, then AB k 1 ak bk : each summand is an m-by-n matrix, the outer = = =  product of a and b . k k 

0.2.7 Column space and row space of a matrix. The range of A Mm,n(F) is also ∈ called its column space because Ax is a linear combination of the columns of A for any x Fn (the entries of x are the coefficients in the linear combination); range A is the ∈ span of the columns of A. Analogously, yT A : y Fm is called the row space of A. { ∈ } If the column space of A Mm,n(F) is contained in the column space of B Mm,k(F), ∈ ∈ then there is some X Mk,n(F) such that A BX (and conversely); the entries in ∈ = column j of X tell how to express column j of A as a linear combination of the columns of B. If A Mm,n(F) and B Mm,q (F), then ∈ ∈ range A range B range AB (0.2.7.1) + =

If A Mm,n(F) and B Mp,n(F), then  ∈ ∈ A nullspace A nullspace B nullspace (0.2.7.2) ∩ = B  

n 0.2.8 The all-ones matrix and vector. In F , every entry of the vector e e1 T = + en is 1. Every entry of the matrix Jn ee is 1. ···+ =

0.3 Determinants

Often in mathematics, it is useful to summarize a multivariate phenomenon with a single number, and the determinant function is an example of this. Its domain is Mn(F) (square matrices only), and it may be presented in several different ways. We denote the determinant of A Mn(F) by det A. ∈ 0.3.1 Laplace expansion by minors along a row or column. The determinant may be defined inductively for A [aij] Mn(F) in the following way. Assume that = ∈ the determinant is defined over Mn 1(F) and let Aij Mn 1(F) denote the submatrix − ∈ − of A Mn(F) obtained by deleting row i and column j of A. Then, for any i, j ∈ ∈ 1,...,n , we have { } n n i k k j det A ( 1) + aik det Aik ( 1) + akj det Akj (0.3.1.1) = − = − k 1 k 1 = = The first sum is the Laplace expansion by minors along row i; the second sum is the Laplace expansion by minors along column j. This inductive presentation begins by 0.3 Determinants 9

defining the determinant of a 1-by-1 matrix to be the value of the single entry. Thus,

det [ a11] a11 = a11 a12 det a11a22 a12a21 a21 a22 = − � � a11 a12 a13 det a21 a22 a23 a11a22a33 a12a23a31 a13a21a32 ⎡ ⎤ = + + a31 a32 a33 a11a23a32 a12a21a33 a13a22a31 ⎣ ⎦ − − −

T and so on. Notice that det A det A, det A det A if A Mn(C), and det I 1. = ∗ = ∈ =

0.3.2 Alternating sums and permutations. A permutation of 1,...,n is a one- { } to-one function σ : 1,...,n 1,...,n . The identity permutation satisfies σ(i) { }→{ } = i for each i 1,...,n. There are n! distinct permutations of 1,...,n , and the = { } collection of all such permutations forms a group under composition of functions. Consistent with the low-dimensional examples in (0.3.1), for A [aij] Mn(F) we = ∈ have the alternative presentation

n

det A sgn σ aiσ(i) (0.3.2.1) = σ � i 1 � � �=

in which the sum is over all n! permutations of 1,...,n and sgn σ, the “sign” { } or “signum” of a permutation σ, is 1 or 1 according to whether the minimum + − number of transpositions (pairwise interchanges) necessary to achieve it starting from 1,...,n is even or odd. We say that a permutation σ is even if sgn σ 1; σ is odd { } =+ if sgn σ 1. =− If sgn σ in (0.3.2.1) is replaced by certain other functions of σ, one obtains general- ized matrix functions in place of det A. For example, the permanent of A, denoted by per A, is obtained by replacing sgn σ by the function that is identically 1. +

0.3.3 Elementary row and column operations. Three simple and fundamental operations on rows or columns, called elementary row and column operations, can be used to transform a matrix (square or not) into a simple form that facilitates such tasks as solving linear equations, determining rank, and calculating determinants and inverses of square matrices. We focus on row operations, which are implemented by matrices that act on the left. Column operations are defined and used in a similar fashion; the matrices that implement them act on the right. 10 Review and miscellanea

Type 1: Interchange of two rows. For i j, interchange of rows i and j of A results from left multiplication of A by �= 1 . .. ⎡ 0 1 ⎤ 1 . ···. . ⎢ . .. . ⎥ ⎢ . . ⎥ ⎢ 1 ⎥ ⎢ 1 0 ⎥ ⎢ ··· ··· ··· 1 ⎥ ⎢ .. ⎥ ⎢ . ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎣ ⎦ The two off-diagonal 1s are in the i, j and j, i positions, the two diagonal 0s are in positions i, i and j, j, and all unspecified entries are 0.

Type 2: Multiplication of a row by a nonzero scalar. Multiplication of row i of A by a nonzero scalar c results from left multiplication of A by

1 . .. 1 ⎡ c ⎤ 1 ⎢ . ⎥ ⎢ .. ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎣ ⎦ The i, i entry is c, all other main diagonal entries are 1, and all unspecified entries are 0.

Type 3: Addition of a scalar multiple of one row to another row. For i j, addition of c times row i of A to row j of A results from left multiplication �= of A by

1 . .. 1 ⎡ 1 ⎤ . ⎢ c .. ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎣ ⎦ The j, i entry is c, all main diagonal entries are 1, and all unspecified entries are 0. The displayed matrix illustrates the case in which j > i. The matrices of each of the three elementary row (or column) operations are just the result of applying the respective operation to the identity matrix I (on the left for a row operation; on the right for a column operation). The effect of a type 1 operation on the determinant is to multiply it by 1; the effect of a type 2 operation is to multiply − it by the nonzero scalar c; a type 3 operation does not change the determinant. The determinant of a square matrix with a zero row is zero. The determinant of a square matrix is zero if and only if some subset of the rows of the matrix is linearly dependent. 0.3 Determinants 11

0.3.4 Reduced . To each A [aij] Mm,n(F) there corresponds = ∈ a (unique) canonical form in Mm,n(F), the reduced row echelon form, also known as the Hermite normal form. If a row of A is nonzero, its leading entry is its first nonzero entry. The defining specifications of the RREF are as follows:

(a) Any zero rows occur at the bottom of the matrix. (b) The leading entry of any nonzero row is a 1. (c) All other entries in the column of a leading entry are zero. (d) The leading entries occur in a stairstep pattern, left to right; that is, if row i is nonzero and aik is its leading entry, then either i m, or row i 1 is zero, or the = + leading entry in row i 1 is ai 1,�, in which �>k. + + For example,

01 100 2 − 00 010π ⎡ 00 001 4⎤ ⎢ 00 000 0⎥ ⎢ ⎥ ⎣ ⎦ is in RREF. If R Mm,n(F) is the RREF of A, then R EA, in which the nonsingular matrix ∈ = E Mm(F) is a product of type 1, type 2, and type 3 elementary matrices corresponding ∈ to the sequence of elementary row operations performed to reduce A to RREF. The determinant of A Mn(F) is nonzero if and only if its RREF is In. The value ∈ of det A may be calculated by recording the effects on the determinant of each of the elementary operations that lead to the RREF. m For the system of linear equations Ax b, with A Mm,n(F) and b F given and = ∈ ∈ x Fn unknown, the set of solutions is unchanged if the same sequence of elementary ∈ row operations is performed on both A and b. The solutions of Ax b are revealed by = inspection of the RREF of [Ab]. Since the RREF is unique, for given A1, A2 Mm,n m ∈ and given b1, b2 F , consistent systems of linear equations A1x b1 and A2x b2 ∈ = = have the same set of solutions if and only if [A1 b1] and [A2 b2] have the same RREF.

0.3.5 Multiplicativity. A key property of the determinant function is that it is multi- plicative: For A, B Mn(F) ∈ det AB det A det B = This may be proved using elementary operations that row-reduce both A and B.

0.3.6 Functional characterization of the determinant. If we think of the deter- minant as a function of each row (or column) of a matrix separately with the others fixed, the Laplace expansion (0.3.1.1) reveals that the determinant is a linear function of the entries in any one given row (column). We summarize this property by saying that the function A det A is multilinear in the rows (columns) of A. → 12 Review and miscellanea

The determinant function A det A is the unique function f : Mn(F) F that is → → (a) Multilinear in the rows of its argument (b) Alternating: any type 1 operation on A changes the sign of f (A) (c) Normalized: f (I) 1 = The permanent function is also multilinear (as are other generalized matrix func- tions), and it is normalized, but it is not alternating.

0.4 Rank

0.4.1 Definition. If A Mm,n(F), rank A dim range A is the length of a longest ∈ = linearly independent list of columns of A. There can be more than one linearly in- dependent list of columns whose length equals the rank. It is a remarkable fact that rank AT rank A. Therefore, an equivalent definition of rank is the length of a longest = linearly independent list of rows of A: row rank column rank. =

n 0.4.2 Rank and linear systems. Let A Mm,n(F) and b F be given. The linear ∈ ∈ system Ax b may have no solution, exactly one solution, or infinitely many solutions; = these are the only possibilities. If there is at least one solution, the linear system is consistent; if there is no solution, the linear system is inconsistent. The linear system Ax b is consistent if and only if rank[Ab] rank A. The matrix [Ab] Mm,n 1(F) = = ∈ + is the . To say that the augmented matrix and the coefficient matrix A of a linear system have the same rank is just to say that b is a linear combination of the columns of A. In this case, appending b to the columns of A does not increase the rank. A solution of the linear system Ax b is a vector x whose entries are the = coefficients in a representation of b as a linear combination of the columns of A.

0.4.3 RREF and rank. Elementary operations do not change the rank of a matrix, and thus rank A is the same as the rank of the RREF of A, which is just the number of nonzero rows in the RREF. As a practical matter, however, numerical calculation of the rank by calculation of the RREF is unwise. Round-off errors in intermediate numerical calculations can make zero rows of the RREF appear to be nonzero, thereby affecting perception of the rank.

0.4.4 Characterizations of rank. The following statements about a given matrix A Mm,n(F) are equivalent; each can be useful in a different context. Note that in (b) ∈ and (c) the key issue is linear independence of lists of columns or rows of a matrix:

(a) rank A k. = (b) k, and no more than k, rows of A are linearly independent. (c) k, and no more than k, columns of A are linearly independent. (d) Some k-by-k submatrix of A has nonzero determinant, and every (k 1)-by- + (k 1) submatrix of A has zero determinant. + (e) dim (range A) k. = 0.4 Rank 13

(f) There are k, but no more than k, linearly independent vectors b1,...,bk such that the linear system Ax b j is consistent for each j 1,...,k. = = (g) k n dim(nullspace A) (the rank-nullity theorem). = − T (h) k min p : A XY for some X Mm,p(F), Y Mn,p(F) . = { = T ∈T ∈ } m (i) k min p : A x1 y x p y for some x1,...,x p F , y1,...,yp = { = 1 +···+ p } ∈ ∈ Fn.

0.4.5 Rank inequalities. Some fundamental inequalities involving rank are:

(a) If A Mm,n(F), then rank A min m, n . ∈ ≤ { } (b) If one or more rows and/or columns are deleted from a matrix, the rank of the resulting submatrix is not greater than the rank of the original matrix. (c) Sylvester inequality: If A Mm,k (F) and B Mk,n(F), then ∈ ∈ (rank A rank B) k rank AB min rank A, rank B + − ≤ ≤ { }

(d) The rank-sum inequality: If A, B Mm,n(F), then ∈ rank A rank B rank(A B) rank A rank B (0.4.5.1) | − | ≤ + ≤ + with equality in the second inequality if and only if (range A) (range B) 0 ∩ ={ } and (range AT ) (range BT ) 0 . If rank B 1 then ∩ ={ } = rank(A B) rank A 1 (0.4.5.2) | + − |≤ in particular, changing one entry of a matrix can change its rank by at most 1. (e) Frobenius inequality: If A Mm,k (F), B Mk,p(F), and C Mp,n(F), then ∈ ∈ ∈ rank AB rank BC rank B rank ABC + ≤ + with equality if and only if there are matrices X and Y such that B BCX = + Y AB.

0.4.6 Rank equalities. Some fundamental equalities involving rank are:

T (a) If A Mm,n(C), then rank A rank A rank A¯ rank A. ∈ ∗ = = = (b) If A Mm(F) and C Mn(F) are nonsingular and B Mm,n(F), then rank AB ∈ ∈ ∈ = rank B rank BC rank ABC; that is, left or right multiplication by a non- = = singular matrix leaves rank unchanged. (c) If A, B Mm,n(F), then rank A rank B if and only if there exist a nonsingular ∈ = X Mm(F) and a nonsingular Y Mn(F) such that B XAY. ∈ ∈ = (d) If A Mm,n(C), then rank A∗ A rank A. ∈ = T (e) Full-rank factorization: If A Mm,n(F), then rank A k if and only if A XY ∈ = = for some X Mm,k (F) and Y Mn,k (F) that each have independent columns. The ∈ ∈ T equivalent factorization A XBY for some nonsingular B Mk (F) can also = ∈ be useful. In particular, rank A 1 if and only if A xyT for some nonzero = = vectors x Fm and y Fn. ∈ ∈ (f) If A Mm,n(F), then rank A k if and only if there exist nonsingular matrices ∈ = Ik 0 S Mm(F) and T Mn(F) such that A S T . ∈ ∈ = 00  14 Review and miscellanea

T (g) Let A Mm,n(F). If X Mn,k (F) and Y Mm,k (F), and if W Y AX is non- ∈ ∈ ∈ = singular, then

1 T 1 T rank(A AXW− Y A) rank A rank AXW− Y A (0.4.6.1) − = − When k 1, this is Wedderburn’s rank-one reduction formula: If x Fn and = ∈ y Fm , and if ω yT Ax 0, then ∈ = �= 1 T rank A ω− Axy A rank A 1 (0.4.6.2) − = − Conversely, if σ F, u Fn, v Fm , and rank A σuvT < rank A, then ∈ ∈ ∈ − rank A σuvT rank A 1 and there are x Fn and y Fm such that − = − ∈ ∈ u Ax, v AT y, yT Ax 0, and σ (yT Ax) 1. = =  �= = −

0.5 Nonsingularity

A linear transformation or matrix is said to be nonsingular if it produces the output 0 only for the input 0. Otherwise, it is singular. If A Mm,n(F) and m < n, then A is ∈ 1 necessarily singular. An A Mn(F) is invertible if there is a matrix A− Mn(F) (the 1∈ 1 1∈ inverse of A) such that A− A I . If A Mn and A− A I, then AA− I; that is, 1 = ∈ 1= = A− is a left inverse if and only if it is a right inverse; A− is unique whenever it exists. It is useful to be able to call on a variety of criteria for a square matrix to be nonsingular. The following are equivalent for a given A Mn(F): ∈ (a) A is nonsingular. 1 (b) A− exists. (c) rank A n. = (d) The rows of A are linearly independent. (e) The columns of A are linearly independent. (f) det A 0. �= (g) The dimension of the range of A is n. (h) The dimension of the null space of A is 0. (i) Ax b is consistent for each b Fn. = ∈ (j) If Ax b is consistent, then the solution is unique. = (k) Ax b has a unique solution for each b Fn. = ∈ (l) The only solution to Ax 0 is x 0. = = (m) 0 is not an eigenvalue of A (see Chapter 1). The conditions (g) and (h) are equivalent for a linear transformation T : V V on → a finite dimensional vector space V ; that is, Tx y has a solution x for every y V = ∈ if and only if the only x such that Tx 0 is x 0 if and only if Tx y has a unique = = = solution x for every y V . ∈ The nonsingular matrices in Mn(F) form a group, the general linear group, often denoted by GL(n, F). 1 T T T 1 1 T T If A Mn(F) is nonsingular, then ((A− ) A ) A(A− ) I,so(A− ) A I, ∈ 1 T T 1 = = 1 T =T 1 which means that (A− ) (A )− . It is convenient to write either (A− ) or (A )− T = 1 1 as A . If A Mn(C) is nonsingular, then (A ) (A ) , and we may safely write − ∈ − ∗ = ∗ − either as A−∗. 0.6 The Euclidean inner product and norm 15

0.6 The Euclidean inner product and norm

0.6.1 Definitions. The scalar x, y y x is the Euclidean inner product (standard � �= ∗ inner product, usual inner product, scalar product, dot product) of x, y Cn. The ∈ Euclidean norm (usual norm, Euclidean length) function on Cn is the real-valued function x x, x 1/2 (x x)1/2; two important properties of this function are that � �2 =� � = ∗ x > 0 for all nonzero x Cn and αx α x for all x Cn and all α C. � �2 ∈ � �2 =| | � �2 ∈ ∈ The function , : Cn Cn C is linear in the first argument and conjugate linear �· ·� × → in the second; that is, αx1 βx2, y α x1, y β x2, y and x,αy1 β y2 � + �= � �+n � � � + �= α¯ x, y1 β¯ x, y2 for all α,β C and y1, y2 C . If V is a real or complex vector � �+ � � ∈ ∈ space and f : V V C is a function that is linear in its first argument and conjugate × → linear in its second argument, we say that f is sesquilinear on V ; f is a semi-inner product on V if it is sesquilinear on V and f (x, x) 0 for every x V ; f is an inner ≥ ∈ product on V if it is sesquilinear on V and f (x, x) > 0 for every nonzero x V . An ∈ inner product space is a pair (V, f ) in which V is a real or complex vector space and f is an inner product on V .

0.6.2 Orthogonality and orthonormality. Two vectors x, y Cn are orthogonal if ∈ x, y 0. In R2 and R3, “orthogonal” has the conventional geometric interpretation of � �= n “perpendicular.” A list of vectors x1,...,xm C is said to be orthogonal if xi , x j ∈ � �= 0 for all distinct i, j 1,...,m . An orthogonal list of nonzero vectors is linearly ∈{ } independent. A vector whose Euclidean norm is 1 is said to be normalized (a unit vector). For any nonzero x Cn, x/ x is a unit vector. An orthogonal list of vectors is ∈ � �2 an orthonormal list if each of its elements is a unit vector. An orthonormal list of vectors is linearly independent. Each of these concepts has a straightforward generalization to the context of an inner product space.

0.6.3 The Cauchy–Schwarz inequality. The Cauchy–Schwarz inequality states that x, y x y |� �| ≤ � �2 � �2 for all x, y Cn, with equality if and only if one of the vectors is a scalar multiple of ∈ the other. The angle θ between two nonzero real vectors x, y Rn is defined by ∈ x, y cos θ � � , 0 θ π (0.6.3.1) = x y ≤ ≤ � �2 � �2 0.6.4 Gram–Schmidt orthonormalization. Any finite independent list of vectors in an inner product space may be replaced by an orthonormal list with the same span. This replacement may be carried out in many ways, but there is a systematic way to do so that has a useful special property. The Gram–Schmidt process starts with a list of vectors v1,...,vn and (if the given list is linearly independent) produces an orthonormal list of vectors z1,...,zn such that span z1,...,zk span x1,...,xk { }= { } for each k 1,...,n. The vectors zi may be calculated in turn as follows: Let y1 x1 = = and normalize it: z1 y1/ y1 . Let y2 x2 x2, z1 z1 (y2 is orthogonal to z1) and = � �2 = −� � normalize it: z2 y2/ y2 2. Once z1,...,zk 1 have been determined, the vector = � � − yk xk xk, zk 1 zk 1 xk, zk 2 zk 2 xk, z1 z1 = −� − � − −� − � − −···−� � 16 Review and miscellanea is orthogonal to z1,...,zk 1; normalize it: zk yk/ yk 2. Continue until k n. If − = � � = we denote Z [z1 ... zn] and X [x1 ... xn], the Gram–Schmidt process gives a = = factorization X ZR, in which the square matrix R [rij] is nonsingular and upper = = triangular; that is, rij 0 whenever i > j. = If the vectors x1,...,xk are orthonormal and the vectors x1,...,xk, xk 1,...,xn are linearly independent, applying the Gram–Schmidt process to the latter list+ produces the list x1,...,xk, zk 1,...,zn of orthonormal vectors. The Gram–Schmidt+ process may be applied to any finite list of vectors, independent or not. If x1,...,xn are linearly dependent, the Gram–Schmidt process produces a vec- tor yk 0 for the least value of k for which xk is a linear combination of x1,...,xk 1. = − 0.6.5 Orthonormal bases. An orthonormal basis of an inner product space is a basis whose elements constitute an orthonormal list. Since any finite ordered basis may be transformed with the Gram–Schmidt process to an orthonormal basis, any finite- dimensional inner product space has an orthonormal basis, and any orthonormal list may be extended to an orthonormal basis. Such a basis is pleasant to work with, since the cross terms in inner product calculations all vanish.

0.6.6 Orthogonal complements. Given any set S Cn, its orthogonal complement ⊂ is the set S x Cn : x y 0 for all y S if S is nonempty; if S is empty, then ⊥ ={ ∈ ∗ = ∈ } S Cn. In either case, S (span S) . Even if S is not a subspace, S is always ⊥ = ⊥ = ⊥ ⊥ a subspace. We have (S ) span S, and (S ) S if S is a subspace. It is always ⊥ ⊥ = ⊥ ⊥ = the case that dim S dim(S ) n. If S1 and S2 are subspaces, then (S1 S2)⊥ ⊥ + ⊥ ⊥ = + = S⊥ S⊥. 1 ∩ 2 For a given A Mm,n, rangeA is the orthogonal complement of nullspace A . There- ∈ ∗ fore, for a given b Cm, the linear system Ax b has a solution (not necessarily ∈ = unique) if and only if b z 0 for every z Cm such that A z 0. This equivalence ∗ = ∈ ∗ = is sometimes stated as the Fredholm alternative (theorem of the alternative)—ex- actly one of the following two statements is true: Either (1) Ax b has a solution or = (2) A y 0 has a solution such that y b 0. ∗ = ∗ �= If A Mm,n and B Mm,q , if X Mm,r and Y Mm,s, and if range X ∈ ∈ ∈ ∈ = nullspace A and range Y nullspace B , then we have the following companion to ∗ = ∗ (0.2.7.1) and (0.2.7.2): X range A range B nullspace ∗ (0.6.6.1) ∩ = Y ∗ 

0.7 Partitioned sets and matrices

A partition of a set is a collection of subsets of such that each element of is a member of oneS and only one of the subsets. ForS example, a partition of theS set 1, 2,...,n is a collection of subsets α1,...,αt (called index sets) such that { } each integer between 1 and n is in one and only one of the index sets. A sequential partition of 1, 2,...,n is a partition in which the index sets have the special form { } α1 1,...,i1 ,α2 i1 1,...,i2 ,...,αt it 1 1,...,n . ={ } ={ + } ={ − + } 0.7 Partitioned sets and matrices 17

A partition of a matrix is a decomposition of the matrix into submatrices such that each entry of the original matrix is in one and only one of the submatrices. Partitioning of matrices is often a convenient device for perception of useful structure. For example, partitioning B [b1 ... bn] Mn(F) according to its columns reveals the presentation = ∈ AB [Ab1 ... Abn] of the matrix product, partitioned according to the columns of = AB.

0.7.1 Submatrices. Let A Mm,n(F). For index sets α 1,...,m and β ∈ ⊆{ } ⊆ 1,...,n , we denote by A[α,β] the (sub)matrix of entries that lie in the rows of { } A indexed by α and the columns indexed by β. For example, 123 123 456[ 1, 3 , 1, 2, 3 ] ⎡ ⎤ { } { } = 789 789 � � ⎣ ⎦ If α β, the submatrix A[α] A[α,α]isaprincipal submatrix of A. An n-by-n = n = matrix has k distinct principal submatrices of size k. For A Mn(F) and k 1....,n , A[ 1,...,k ]isaleading principal submatrix ∈� � ∈{ } { } and A[ k,...,n ]isatrailing principal submatrix. { } It is often convenient to indicate a submatrix or principal submatrix via deletion, rather than inclusion, of rows or columns. This may be accomplished by complementing the index sets. Let αc 1,...,m α and βc 1,...,n β denote the index sets ={ }\ ={ }\ complementary to α and β, respectively. Then A[αc,βc] is the submatrix obtained by deleting the rows indexed by α and the columns indexed by β. For example, the submatrix A[α,∅c] contains the rows of A indexed by α; A[∅c,β] contains the columns of A indexed by β. The determinant of an r-by-r submatrix of A is called a ; if we wish to indicate the size of the submatrix, we call its determinant a minor of size r. If the r-by-r submatrix is a principal submatrix, then its determinant is a principal minor (of size r); if the submatrix is a leading principal matrix, then its determinant is a leading principal minor; if the submatrix is a trailing principal submatrix, then its determinant is a trailing principal minor. By convention, the empty principal minor is 1; that is, det A[ ] 1. ∅ = A signed minor, such as those appearing in the Laplace expansion (0.3.1.1) i j [( 1) + det Aij] is called a cofactor; if we wish to indicate the size of the subma- − trix, we call its signed determinant a cofactor of size r.

0.7.2 Partitions, block matrices, and multiplication. If α1,...,αt constitute a partition of 1,...,m and β ,...,β constitute a partition of 1,...,n , then the { } 1 s { } matrices A αi ,β form a partition of the matrix A Mm,n(F), 1 i t, 1 j s. j ∈ ≤ ≤ ≤ ≤ If A Mm,n(F) and B Mn,p(F) are partitioned so that the two partitions of 1,...,n ∈ � � ∈ { } coincide, the two matrix partitions are said to be conformal. In this event, s

(AB) αi ,γ A αi ,β B β ,γ (0.7.2.1) j = k k j k 1 � � �= � � � � in which the respective collections of submatrices A[αi ,βk] and B[βk,γ j ] are confor- mal partitions of A and B, respectively. The left-hand side of (0.7.2.1) is a submatrix 18 Review and miscellanea of the product AB (calculated in the usual way), and each summand on the right- hand side is a standard matrix product. Thus, multiplication of conformally partitioned matrices mimics usual matrix multiplication. The sum of two partitioned matrices A, B Mm,n(F) of the same size has a similarly pleasant representation if the parti- ∈ tions of their rows (respectively, of their columns) are the same:

(A B) αi ,β A αi ,β B αi ,β + j = j + j If a matrix is partitioned by sequential partitions  of its rows and columns, the result- ing partitioned matrix is called a block matrix. For example, if the rows and columns of A Mn(F) are partitioned by the same sequential partition α1 1,...,k ,α2 ∈ ={ } = k 1,...,n , the resulting block matrix is { + } A[α ,α ] A[α ,α ] A A A 1 1 1 2 11 12 = A[α2,α1] A[α2,α2] = A21 A22     in which the blocks are Aij A[αi ,αj ]. Computations with block matrices are em- = ployed throughout the book; 2-by-2 block matrices are the most important and useful.

0.7.3 The inverse of a partitioned matrix. It can be useful to know the correspond- ing blocks in the inverse of a partitioned nonsingular matrix A, that is, to present the inverse of a partitioned matrix in conformally partitioned form. This may be done in a variety of apparently different, but equivalent, ways — assuming that certain subma- 1 trices of A Mn(F) and A are also nonsingular. For simplicity, let A be partitioned ∈ − as a 2-by-2 block matrix A A A 11 12 = A21 A22   with Aii Mni (F), i 1, 2, and n1 n2 n. A useful expression for the correspond- ∈ = +1 = ingly partitioned presentation of A− is 1 1 A A A 1 A − A 1 A A A 1 A A − 11 12 22− 21 11− 12 21 11− 12 22 (0.7.3.1) 1 − 1 1 1 − 1  A− A21 A12 A− A21 A11 − A22 A21 A− A12 −   22 22  − − 11  1 assuming that all the relevant inverses exist. This expression for A− may be verified by doing a partitioned multiplication by A and then simplifying. In general index set notation, we may write 1 1 c c 1 c − A− [α] A [α] A α,α A α − A α ,α = − and    

1 c 1 c c 1 c c 1 A− α,α A [α]− A α,α A α ,α A [α]− A α,α A α − = − 1 1 1  A α,αc A αc− A αc,α A [α] − A α,αc A αc − = − again assuming that the relevant  inverses  exist. There is an intimate relationship between 1 these representations and the Schur complement; see (0.8.5). Notice that A− [α]isa 1 1 submatrix of A− , while A[α]− is the inverse of a submatrix of A; these two objects are not, in general, the same. 0.7 Partitioned sets and matrices 19

0.7.4 The Sherman–Morrison–Woodbury formula. Suppose that a nonsingular 1 matrix A Mn(F) has a known inverse A and consider B A X RY, in which ∈ − = + X is n-by-r, Y is r-by-n, and R is r-by-r and nonsingular. If B and R 1 YA 1 X are − + − nonsingular, then 1 1 1 1 1 1 1 B− A− A− X(R− YA− X)− YA− (0.7.4.1) = − + If r is much smaller than n, then R and R 1 YA 1 X may be much easier to invert − + − than B. For example, if x, y Fn are nonzero vectors, X x, Y yT , yT A 1x 1, ∈ = = − �=− and R [1], then (0.7.4.1) becomes a formula for the inverse of a rank-1 adjustment = to A: T 1 1 T 1 1 1 T 1 A xy − A− 1 y A− x − A− xy A− (0.7.4.2) + = − + T n T 1 In particular, if B I xy for x, y F and y x 1, then B− I (1 T 1 T = + ∈ �=− = − + y x)− xy .

0.7.5 Complementary nullities. Suppose that A Mn(F) is nonsingular, let α and ∈ β be nonempty subsets of 1,...,n , and write α r and β s for the cardinalities { } | |= | |= of α and β. The law of complementary nullities is 1 c c nullity (A [α,β]) nullity A− β ,α (0.7.5.1) = which is equivalent to the rank identity   1 c c rank (A [α,β]) rank A− β ,α r s n (0.7.5.2) = + + − Since we can permute rows and columns to place first the r rows indexed by α and the s columns indexed by β, it suffices to consider the presentations

A11 A12 1 B11 B12 A and A− = A21 A22 = B21 B22     T T in which A11 and B are r-by-s and A22 and B are (n r)-by-(n s). Then (0.7.5.1) 11 22 − − says that nullity A11 nullity B22. = The underlying principle here is very simple. Suppose that the nullity of A11 is k. If k 1, let the columns of X Ms,k(F) be a basis for the null space of A11. Since A is ≥ ∈ nonsingular, X A X 0 A 11 0 = A21 X = A21 X       has full rank, so A21 X has k independent columns. But

B12 (A21 X) 1 0 1 X X A− A− A B22 (A21 X) = A21 X = 0 = 0         so B22(A21 X) 0 and hence nullity B22 k nullity A11, a statement that is trivi- = ≥ = ally correct if k 0. A similar argument starting with B22 shows that nullity A11 = ≥ nullity B22. For a different approach, see (3.5.P13). Of course, (0.7.5.1) also tells us that nullity A12 nullity B12, nullity A21 = = nullity B21, and nullity A22 nullity B11. If r s n, then rank A11 rank B22 and = + = = rank A22 rank B11, while if n 2r 2s, then we also have rank A12 rank B12 and = = = = 20 Review and miscellanea

rank A21 rank B21. Finally, (0.7.5.2) tells us that the rank of an r-by-s submatrix of = an n-by-n nonsingular matrix is at least r s n. + − 0.7.6 Rank in a partitioned matrix and rank-principal matrices. Partition A ∈ Mn(F) as

A11 A12 A , A11 Mr (F), A22 Mn r (F) = A21 A22 ∈ ∈ − � �

A11 If A11 is nonsingular, then of course rank [ A11 A12 ] r and rank r. Remark- = A21 = ably, the converse is true: � �

A11 if rank A rank[A11 A12] rank , then A11 is nonsingular (0.7.6.1) = = A21 � � This follows from (0.4.6(c)): If A11 is singular, then rank A11 k < r, and there are = nonsingular S, T Mr (F) such that ∈ Ik 0 SA11T = 00r k � − � Therefore,

Ik 0 S 0 T 0 SA12 Aˆ A 00r k = 0 In r 0 In r = ⎡ � − � ⎤ � − � � − � A21TA22 ⎣ ⎦ has rank r, as do its first block row and column. Because the rth row of the first block column of Aˆ is zero, there must be some column in SA12 whose rth entry is not zero, which means that Aˆ has at least r 1 independent columns. This contradicts + rank Aˆ rank A r, so A11 must be nonsingular. = = T Let A Mm,n(F) and suppose that rank A r > 0. Let A XY be a full- ∈ = = rank factorization with X, Y Mm,r (F); see (0.4.6c). Let α,β 1,...,m and ∈ ⊆{ } γ,δ 1,...,n be index sets of cardinality r. Then A[α,γ] X[α,∅c]Y [γ,∅c]T ⊆{ } c = c ∈ Mr (F), which is nonsingular whenever rank X[α,∅ ] rank Y [γ,∅ ] r. The mul- = = tiplicativity property (0.3.5) ensures that det A[α,γ] det A[β,δ] det A[α,δ] det A[β,γ] (0.7.6.2) = Suppose that A Mn(F) and rank A r. We say that A is rank principal if it has a ∈ = nonsingular r-by-r principal submatrix. It follows from (0.7.6.1) that if there is some index set α 1,...,n such that ⊂{ } c c rank A rank A α,∅ rank A ∅ ,α (0.7.6.3) = = (that is, if there are r linearly independent� rows� of A �such that� the corresponding r columns are linearly independent), then A is rank principal; moreover, A[α] is nonsingular. If A Mn(F) is symmetric or skew symmetric, or if A Mn(C) is Hermitian or ∈ ∈ skew Hermitian, then rank A [α,∅c] rank A [∅c,α] for every index set α, so A = satisfies (0.7.6.3) and is therefore rank principal. 0.8 Determinants again 21

0.7.7 Commutativity, anticommutativity, and block diagonal matrices. Two matrices A, B Mn(F) are said to commute if AB BA. Commutativity is not typical, ∈ = s but one important instance is encountered frequently. Suppose that � [�ij]i, j 1 = = ∈ Mn(F) is a block matrix in which �ij 0 if i j; �ii λi Ini for some λi F for = �= =s ∈ each i 1,...,s; and λi λ j if i j. Partition B [Bij]i, j 1 Mn(F) conformally = �= �= = = ∈ with �. Then �B B� if and only if λi Bij Bijλ j for each i, j 1,...,s, that is, = = = (λi λ j )Bij 0 for each i, j 1,...,s. These identities are satisfied if and only if − = = Bij 0 whenever i j. Thus, � commutes with B if and only if B is block diagonal = �= conformal with �; see (0.9.2). Two matrices A, B Mn(F) are said to anticommute if AB BA. For example, 10 ∈ 01 =− the matrices 0 1 and 10 anticommute. − � � � � 0.7.8 The vec mapping. Partition a matrix A Mm,n(F) according to its columns: ∈ mn A [a1 ... an]. The mapping vec : Mm,n(F) F is = → vec A [aT ... aT ]T = 1 n that is, vec A is the vector obtained by stacking the columns of A, left to right. The vec operator can be a convenient tool in problems involving matrix equations.

0.8 Determinants again

Some additional facts about and identities for the determinant are useful for reference.

0.8.1 Compound matrices. Let A Mm,n(F). Let α 1,...,m and β ∈ ⊆{ m } n ⊆ 1,...,n be index sets of cardinality r min m, n elements. The -by- matrix { } ≤ { } r r whose α,β, entry is det A[α,β] is called the rth compound matrix of A and is denoted � � � � by Cr (A). In forming the rows and column of Cr (A), we arrange index sets lexico- graphically, that is, 1, 2, 4 before 1, 2, 5 before 1, 3, 4 , and so on. For example, { } { } { } if 12 3 A 45 6 (0.8.1.0) = ⎡ 7810⎤ ⎣ ⎦ then C2(A) = det 12 det 13 det 23 45 46 56 3 6 3 12 13 23 − − − ⎡ det � � det � � det � � ⎤ 6 11 4 78 7 10 8 10 = ⎡ − − − ⎤ ⎢ ⎥ 3 22 ⎢ � 45� � 46� � 56� ⎥ − − ⎢ det 78 det 7 10 det 8 10 ⎥ ⎣ ⎦ ⎢ ⎥ ⎣ � � � � � � ⎦ If A Mm,k(F), B Mk,n(F), and r min m, k, n , it follows from the Cauchy–Binet ∈ ∈ ≤ { } formula (0.8.7) that

Cr (AB) Cr (A)Cr (B) (0.8.1.1) = which is the multiplicativity property of the rth compound matrix. 22 Review and miscellanea

We define C0(A) 1. We have C1(A) A; if A Mn(F), then Cn(A) det A. = = ∈ = r If A Mm,k(F) and t F, then Cr (tA) t Cr (A) ∈ ∈ = If 1 r n, then Cr (In) I n M n ( r ) ( r ) ≤ ≤ = ∈ 1 1 If A Mn is nonsingular and 1 r n, then Cr (A)− Cr (A− ) ∈ ≤ ≤ n 1= r −1 If A Mn and 1 r n, then det Cr (A) (det A) − ∈ ≤ ≤ =  If A Mm,n(F) and r rank A, then rank Cr (A) 1 ∈ = =T T If A Mm,n(F) and 1 r min m, n , then Cr (A ) Cr (A) ∈ ≤ ≤ { } = If A Mm,n(C) and 1 r min m, n , then Cr (A ) Cr (A) ∈ ≤ ≤ { } ∗ = ∗ If � [dij] Mn(F) is upper (respectively, lower) triangular (see (0.9.3)), then = ∈ n Cr (�) is upper (respectively, lower) triangular; its main diagonal entries are the r possible products of r entries chosen from the list d11,...,dnn, that is, they are the n   scalars di i di i such that 1 i1 < < ir n, arranged lexicographically. r 1 1 ··· r r ≤ ··· ≤ Consequently, if D diag(d1,...,dn) Mn(F) is diagonal, then so is Cr (D); its   = n ∈ main diagonal entries are the r possible products of r entries chosen from the list n d1,...,dn, that is, they are the scalars di di such that 1 i1 < < ir n,  r 1 ··· r ≤ ··· ≤ arranged lexicographically. See chapter 6 of (Fiedler, 1986) for a detailed discussion   of compound matrices.

0.8.2 The adjugate and the inverse. If A Mn(F) and n 2, the transposed matrix ∈ ≥ of cofactors of A

i j c c adj A ( 1) + det A j , i (0.8.2.0) = − { } { } is the adjugate of A; it is also called the classical adjoint of A. For example, ab d b adj cd ca− . = − A calculation  using the Laplace expansion for the determinant reveals the basic property of the adjugate: (adj A) A A (adj A) (det A) I (0.8.2.1) = = n 1 Thus, adj A is nonsingular if A is nonsingular, and det (adj A) (det A) − . = If A is nonsingular, then

1 1 1 adj A (det A) A− , that is, A− (det A)− adj A (0.8.2.2) = = 1 ab− 1 d b 1 For example, cd (ad bc)− ca− if ad bc. In particular, adj(A− ) = − − �= = A/ det A (adj A)1.   = − If A is singular and rank A n 2, then every minor of A of size n 1 is zero, so ≤ − − adj A 0. = If A is singular and rank A n 1, then some minor of A of size n 1 is nonzero, = − − so adj A 0 and rank adj A 1. Moreover, some list of n 1 columns of A is lin- �= ≥ − early independent, so the identity (adj A) A (det A) I 0 ensures that the null space = = of adj A has dimension at least n 1 and hence rank adj A 1. We conclude that − ≤ rank adj A 1. The full-rank factorization (0.4.6(e)) ensures that adj A αxyT for = = some nonzero α F and nonzero x, y Fn that are determined as follows: Compute ∈ ∈ (Ax)yT A(adj A) 0 (adj A)A x(yT A) = = = = 0.8 Determinants again 23 and conclude that Ax 0 and yT A 0, that is, x (respectively, y) is determined up = = to a nonzero scalar factor as a nonzero element of the one-dimensional null space of A (respectively, AT ). The function A adj A is continuous on Mn (each entry of adj A is a multinomial → in the entries of A) and every matrix in Mn is a limit of nonsingular matrices, so proper- ties of the adjugate can be deduced from continuity and properties of the inverse func- 1 tion. For example, if A, B Mn are nonsingular, then adj(AB) (det AB)(AB) ∈ = − = (det A)(det B)B 1 A 1 (det B)B 1(det A)A 1 (adj B)(adj A). Continuity then en- − − = − − = sures that

adj (AB) (adj B)(adj A) for all A, B Mn (0.8.2.3) = ∈ n 1 For any c F and any A Mn(F), adj(cA) c adj A. In particular, adj(cI) ∈ ∈ = − = cn 1 I and adj 0 0. − = If A is nonsingular, then 1 n 1 1 adj(adj A) adj((det A)A− ) (det A) − adj A− = = n 1 n 2 (det A) − (A/ det A) (det A) − A = = so continuity ensures that n 2 adj(adj A) (det A) − A for all A Mn (0.8.2.4) = ∈ If A B is nonsingular, then A(A B) 1 B B(A B) 1 A, so continuity ensures + + − = + − that

A adj (A B) B B adj (A B) A for all A, B Mn (0.8.2.5) + = + ∈ Let A, B Mn and suppose that A commutes with B. If A is nonsingular, ∈ then BA 1 A 1 ABA 1 A 1 BAA 1 A 1 B, so A 1 commutes with B. But − = − − = − − = − − BA 1 (det A) 1 B adj A and A 1 B (det A) 1(adj A)B, so adj A commutes with − = − − = − B. Continuity ensures that adj A commutes with B whenever A commutes with B, even if A is singular. If A [aij] is upper triangular, then adj A [bij] is upper triangular and each = = bii j i a jj; if A is diagonal, then so is adj A. = �= The adjugate is the transpose of the gradient of det A: ∂ T (adj A) det A (0.8.2.6) = ∂a  ij  If A is nonsingular, it follows from (0.8.2.6) that T ∂ 1 det A (det A) A− (0.8.2.7) ∂a =  ij  T T T T If A Mn is nonsingular, then adj A (det A )A (det A)A ∈ = − = − = ((det A)A 1)T (adj A)T . Continuity ensures that − = T T adj A (adj A) for all A Mn(F) (0.8.2.8) = ∈ A similar argument shows that

adj A∗ (adj A)∗ for all A Mn (0.8.2.9) = ∈ 24 Review and miscellanea

n Let A [a1 ... an] Mn(F) be partitioned according to its columns and let b F . = ∈ ∈ Define

A i b [a1 ... ai 1 bai 1 ... an] ← = − + that is, (A b) denotes the matrix whose ith column is b and whose remaining ←i columns coincide with those of A. Examination of the Laplace expansion (0.3.1.1) of det A b by minors along column i reveals that it is the ith entry of the vector ←i (adj A) b, that is,  n det A i b i 1 (adj A) b (0.8.2.10) ← = = Applying this vector identity to each column of C [c1 ... cn] Mn(F) gives the = ∈ matrix identity n det A i c j i, j 1 (adj A) C (0.8.2.11) ← = =   0.8.3 Cramer’s rule. Cramer’s rule is a useful way to present analytically a particular entry of the solution to Ax b when A Mn(F) is nonsingular. The identity = ∈ n A det A i b i 1 A (adj A) b (det A) b ← = = = follows from (0.8.2.9). If det A 0, we obtain Cramer’s rule �=

det(A i b) xi ← = det A for the ith entry xi of the solution vector x. Cramer’s rule also follows directly from multiplicativity of the determinant. The system Ax b may be rewritten as = A(I x) A b ←i = ←i and taking determinants of both sides (using multiplicativity) gives (det A) det(I x) det(A b) ←i = ←i But det(I x) xi , and the formula follows. ←i = 0.8.4 Minors of the inverse. Jacobi’s identity generalizes the adjugate formula for 1 the inverse of a nonsingular matrix and relates the minors of A to those of A Mn(F): − ∈ 1 c c p(α,β) det A [β,α] det A− α ,β ( 1) (0.8.4.1) = − det A in which p(α,β) i  j. Our universal convention is that det A[ ] 1. i α j β ∅ = ∈ + ∈ = For principal submatrices, Jacobi’s identity assumes the simple form   1 c det A [α] det A− [α ] (0.8.4.2) = det A

0.8.5 Schur complements and determinantal formulae. Let A [aij] Mn(F) = ∈ be given and suppose that α 1,...,n is an index set such that A[α] is nonsingular. ⊆{ } An important formula for det A, based on the 2-partition of A using α and αc, is

c c 1 c det A det A [α] det A α A α ,α A [α]− A α,α (0.8.5.1) = −       0.8 Determinants again 25 which generalizes the familiar formula for the determinant of a 2-by-2 matrix. The special matrix c c 1 c A/A [α] A α A α ,α A [α]− A α,α (0.8.5.2) = − which also appears in the partitioned  form for the inverse in (0.7.3.1),  is called the Schur complement of A [α] in A. When convenient, we take α 1,...,k and write A as a ={ c } c 2-by-2 block matrix A [Aij] with A11 A[α], A22 A[α ], A12 A[α,α ], and c = = = = A21 A[α ,α]. The formula (0.8.5.1) may be verified by computing the determinant = of both sides of the identity 1 I 0 A11 A12 I A11− A12 1 − (0.8.5.3) A21 A− I A21 A22 0 I  − 11    A11 0 1 = 0 A22 A21 A− A12  − 11  which contains a wealth of information about the Schur complement S [sij] 1 = = A/A11 A22 A21 A− A12: = − 11 (a) The Schur complement S arises (uniquely) in the lower right corner if linear combinations of the first k rows (respectively, columns) of A are added to the last n k rows (respectively, columns) in such a way as to produce a zero block in the − lower left (respectively, upper right) corner; this is block Gaussian elimination, and it is (uniquely) possible because A11 is nonsingular. Any submatrix of A that includes A11 as a principal submatrix has the same determinant before and after the block eliminations that produce the block diagonal form in (0.8.5.3). Thus, for any index set β i1,...,im 1,...,n k , if we construct the ={ }⊆{ − } shifted index set β˜ i1 k,...,im k , then det A[α β,α˜ γ˜ ] (before) ={ + + } ∪ ∪ det(A11 S[β,γ]) (after), so = ⊕ det S [β,γ] det A α β,α˜ γ˜ / det A [α] (0.8.5.4) = ∪ ∪ For example, if β i and γ j , then withα 1,...,k , we have ={} ={ } ={ } det S [β,γ] sij (0.8.5.5) = det A [ 1,...k, k i , 1,...,k, k j ] / det A11 = { + } { + } so all the entries of S are ratios of minors of A. (b) rank A rank A11 rank S rank A11, and rank A rank A11 if and only if = 1 + ≥ = A22 A21 A− A12. = 11 (c) A is nonsingular if and only if S is nonsingular, since det A det A11 det S. If A = is nonsingular, then det S det A/ det A11. = Suppose that A is nonsingular. Then inverting both sides of (0.8.5.3) gives a presen- tation of the inverse different from that in (0.7.3.1): 1 1 1 1 1 1 A11 A11− A12 S− A21 A11− A11− A12 S− A− + 1 1 − 1 (0.8.5.6) = S− A21 A− S−  − 11  Among other things, this tells us that A 1[ k 1,...,n ] S 1, so − { + } = − 1 det A− [ k 1,...,n ] det A11/ det A (0.8.5.7) { + } = 26 Review and miscellanea

This is a form of Jacobi’s identity (0.8.4.1). Another form results from using the adjugate to write the inverse, which gives n k 1 det ((adj A) [ k 1,...,n ]) (det A) − − det A11 (0.8.5.8) { + } = When αc consists of a single element, the Schur complement of A[α] in A is a scalar and (0.8.5.1) reduces to the identity det A A αc det A [α] A αc,α (adj A [α]) A α,αc (0.8.5.9) = − which is valid even� if �A[α] is singular.� For� example, if α� 1�,...,n 1 , then ={ − } αc n and A is presented as a bordered matrix ={ } Ax˜ A = yT a � � n 1 with a F, x, y F − , and A˜ Mn 1(F); (0.8.5.9) is the Cauchy expansion of the ∈ ∈ ∈ − determinant of a bordered matrix Ax˜ det a det A˜ yT adj A˜ x (0.8.5.10) yT a = − � � The Cauchy expansion (0.8.5.10) involves signed minors� of A� of size n 2 (the entries − of adj A˜) and a bilinear form in the entries of a row and column; the Laplace expansion (0.3.1.1) involves signed minors of A of size n 1 and a linear form in the entries of − a row or column. If a 0, we can use the Schur complement of [a] in A to express �= ˜ Ax 1 T det a det(A˜ a− xy ) yT a = − � � Equating the right-hand side of this identity to that of (0.8.5.10) and setting a 1 =− gives Cauchy’s formula for the determinant of a rank-one perturbation det A˜ xyT det A˜ yT adj A˜ x (0.8.5.11) + = + The uniqueness property� of the Schur� complement� discussed� in (a) can be used to derive an identity involving a Schur complement within a Schur complement. Suppose that the nonsingular k-by-k block A11 is partitioned as a 2-by-2 block matrix A11 = [ ij] in which the upper left �-by-� block 11 is nonsingular. Write A21 [ 1 2], A T A T T = A A in which 1 is (n k)-by-�, and write A [ ], in which 1 is �-by-(n k); − 12 = 1 2 − this givesA the refined partition B B B

11 12 1 A A B A 21 22 2 = ⎡ A A B ⎤ 1 2 A22 A A Now add linear combinations of the⎣ first � rows of A to⎦ the next k � rows to reduce − 21 to a zero block. The result is A 11 12 1 A A B A� 0 A11/ 11 � = ⎡ A B2 ⎤ 1 2 A22 A A ⎣ ⎦ in which we have identified the resulting 2,2 block of A� as the (necessarily nonsingular) Schur complement of 11 in A11. Now add linear combinations of the first k rows of A 0.8 Determinants again 27

A� to the last n k rows to reduce [ 1 2] to a zero block. The result is − A A 11 12 1 A A B A�� 0 A11/ 11 � = ⎡ A B2 ⎤ 00 A/A11 ⎣ ⎦ in which we have identified the resulting 3,3 block of A�� as the Schur complement of A11 in A. The lower right 2-by-2 block of A�� must be A/ 11, the Schur complement A of 11 in A. Moreover, the lower right block of A/ 11 must be the Schur complement A A of A11/ 11 in A/ 11. This observation is the quotient property of Schur complements: A A A/A11 (A/ 11) / (A11/ 11) (0.8.5.12) = A A If the four blocks Aij in (0.8.5.3) are square and the same size, and if A11 commutes with A21, then

det A det A11 det S det(A11 S) = = 1 det(A11 A22 A11 A21 A− A12) det(A11 A22 A21 A12) = − 11 = − If A11 commutes with A12, the same conclusion follows from computing det A = (det S)(det A) det(SA11). By continuity, the identity = det A det(A11 A22 A21 A12) (0.8.5.13) = − is valid whenever A11 commutes with either A21 or A12, even if it is singular. If A22 commutes with either A12 or A21, a similar argument using the Schur complement of A22 shows that

det A det(A11 A22 A12 A21) (0.8.5.14) = − if A22 commutes with either A21 or A12.

0.8.6 Determinantal identities of Sylvester and Kronecker. We consider two consequences of (0.8.5.4). If we set n k B bij [det A [ 1,...k, k i , 1,...,k, k j ]]i,−j 1 = = { + } { + } = then each entry of�B is� the determinant of a bordered matrix of the form (0.8.5.10): A˜ T is A11, x is the jth column of A12, y is the ith row of A21, and a is the i, j entry of A22. The identity (0.8.5.5) tells us that B (det A11)S, so = n k det B (det A11) − det S = n k n k 1 (det A11) − (det A/ det A11) (det A11) − − det A = = This observation about B is Sylvester’s identity for bordered determinants:

n k 1 det B (det A [α]) − − det A (0.8.6.1) = in which B [det A[α i ,α j ]] and i, j are indices not contained in α. = ∪ { } ∪ { } If A22 0, then each entry of B is the determinant of a bordered matrix of the form = 1 (0.8.5.10) with a 0. In this case, the Schur complement A/A11 A21 A− A12 has = =− 11 rank at most k, so the determinant of every (k 1)-by-(k 1) submatrix of B is zero; + + this observation about B is Kronecker’s theorem for bordered determinants. 28 Review and miscellanea

0.8.7 The Cauchy–Binet formula. This useful formula can be remembered because of its similarity in appearance to the formula for matrix multiplication. This is no accident, since it is equivalent to multiplicativity of the compound matrix (0.8.1.1). Let A Mm,k(F), B Mk,n(F), and C AB. Furthermore, let 1 r min m, k, n , ∈ ∈ = ≤ ≤ { } and let α 1,...,m and β 1,...,n be index sets, each of cardinality r. An ⊆{ } ⊆{ } expression for the α,β minor of C is

det C [α,β] det A [α,γ] det B [γ,β] = γ in which the sum is taken over all index sets γ 1,...,k of cardinality r. ⊆{ }

0.8.8 Relations among minors. Let A Mm,n(F) be given and let a fixed index set ∈ α 1,...,m of cardinality k be given. The minors det A [α,ω], as ω 1,...,n ⊆{ } ⊆{ } runs over ordered index sets of cardinality k, are not algebraically independent since there are more minors than there are distinct entries among the submatrices. Quadratic relations are known among these minors. Let i1, i2,...,ik 1,...,n be k distinct ∈{ } indices, not necessarily in natural order, and let A[α; i1,...,ik] denote the matrix whose rows are indicated by α and whose jth column is column ij of A[α, 1,...,n ]. { } The difference between this and our previous notation is that columns might not occur in natural order as in A( 1, 3 ;4, 2), whose first column has the 1, 4 and 3, 4 entries of { } A. We then have the relations

det A [α; i1,...,ik] det A [α; j1,..., jk] k

det A [α; i1,...,is 1, jt , is 1,...,ik] det A [α; j1,..., jt 1, is, jt 1,..., jk] = − + − + t 1 = for each s 1,...,k and all sequences of distinct indices i1,...,ik 1,...,n and = ∈{ } j1,..., jk 1,...,n . ∈{ }

0.8.9 The Laplace expansion theorem. The Laplace expansion (0.3.1.1) by minors along a given row or column is included in a natural family of expressions for the determinant. Let A Mn(F), let k 1,...,n be given, and let β 1,...,n be ∈ ∈{ } ⊆{ } any given index set of cardinality k. Then

det A ( 1)p(α,β) det A [α,β] det A αc,βc = α −   ( 1)p(α,β) det A [β,α] det A βc,αc = α −   in which the sums are over all index sets α 1,...,n of cardinality k, and p(α,β) ⊆{ } = i α i j β j. Choosing k 1 and β i or j gives the expansions in (0.3.1.1). ∈ + ∈ = ={} { }   0.8.10 Derivative of the determinant. Let A(t) [a1(t) ... an(t)] [aij(t)] be = = an n-by-n complex matrix whose entries are differentiable functions of t and define A (t) [a� (t)]. It follows from multilinearity of the determinant (0.3.6(a)) and the � = ij 0.8 Special types of matrices 29

definition of the derivative that

n n n d T det A(t) det A(t) a� (t) ((adj A(t)) )ija� (t) dt = ←j j = ij j 1 j 1 i 1 �= � � �= �= tr((adj A(t))A�(t)) (0.8.10.1) =

For example, if A Mn and A(t) tI A, then A (t) I and ∈ = − � = d det (tI A) tr((adj A(t))I) tr adj (tI A) (0.8.10.2) dt − = = −

c c 0.8.11 Dodgson’s identity. Let A Mn(F). Define a det A[ n ], b A[ n , ∈ = { } = { } 1 c], c A[ 1 c, n c], d det A[ 1 c], and e det A[[ 1, n c]. If e 0, then { } = { } { } = { } = { } �= det A (ad bc)/e. = −

0.8.12 Adjugates and compounds. Let A, B Mn(F). Let α 1,...,n and β ∈ ⊆{ } ⊆ 1,...,n be index sets of cardinality r n. The α,β entry of the rth { } ≤ adjr (A) M n (F) is ∈ (r ) ( 1)p(α,β) det A[βc,αc] (0.8.12.1) −

in which p(α,β) i α i j β j. The rows and columns of adjr (A) are formed = ∈ + ∈ by arranging the index sets lexicographically, just as for the rth compound matrix. For example, using the matrix� A in� (0.8.1.0), we have

10 63 − adj (A) 85 2 2 − − = ⎡ 7 41⎤ − ⎣ ⎦ The multiplicativity property of the rth adjugate matrix is

adj (AB) adj (B) adj (A) (0.8.12.2) r = r r We define adj (A) 1. We have adj (A) det A and adj (A) adj A. The rth n = 0 = 1 = adjugate and rth compound matrices are related by the identity

adjr (A)Cr (A) Cr (A) adjr (A) (det A)I n = = (r ) 1 of which the identities in (0.8.9) are special cases. In particular, Cr (A)− 1 = (det A)− adjr (A) if A is nonsingular. The determinant of a sum of matrices can be expressed using the rth adjugate and rth compound matrices:

n n k k det(sA tB) s − t tr(adj (A)Ck(B)) (0.8.12.3) + = k k 0 �= n n In particular, det(A I ) k 0 tr adjk(A) k 0 tr Ck(A). + = = = = � � 30 Review and miscellanea

0.9 Special types of matrices

Certain matrices of special form arise frequently and have important properties. Some of these are cataloged here for reference and terminology.

0.9.1 Diagonal matrices. A matrix D [dij] Mn,m(F) is diagonal if dij 0 = ∈ = whenever j i. If all the diagonal entries of a diagonal matrix are positive (non- �= negative) real numbers, we refer to it as a positive (nonnegative) diagonal matrix. The term positive diagonal matrix means that the matrix is diagonal and has positive diag- onal entries; it does not refer to a general matrix with positive diagonal entries. The identity matrix I Mn is a positive diagonal matrix. A square diagonal matrix D is a ∈ scalar matrix if its diagonal entries are all equal, that is, D αI for some α F. Left = ∈ or right multiplication of a matrix by a scalar matrix has the same effect as multiplying it by the corresponding scalar. T q If A [aij] Mn,m(F) and q min m, n , then diag A [a11,...,aqq] F de- = ∈ = { } = ∈ notes the vector of diagonal entries of A (0.2.1). Conversely, if x Fq and if m and ∈ n are positive integers such that min m, n q, then diag x Mn,m(F) denotes the { }= ∈ n-by-m diagonal matrix A such that diag A x; for diag x to be well-defined, both = m and n must be specified. For any a1,...,an F, diag(a1,...,an) always denotes ∈ the matrix A [aij] Mn(F) such that aii ai for each i 1,...,n and aij 0 if = ∈ = = = i j. �= Suppose that D [dij], E [eij] Mn(F) are diagonal and let A [aij] Mn(F) = =n ∈ = ∈ be given. Then (a) det D i 1 dii; (b) D is nonsingular if and only if all dii 0; (c) = = �= left multiplication of A by D multiplies the rows of A by the diagonal entries of D (the � ith row of DAis dii times the ith row of A); (d) right multiplication of A by D multiplies the columns of A by the diagonal entries of D, that is, the jth column of AD is d jj times the jth column of A; (e) DA AD if and only if aij 0 whenever dii d jj; = = �= (f) if all the diagonal entries of D are distinct and DA AD, then A is diagonal; (g) = for any positive integer k, Dk diag(dk ,...,dk ); and (h) any two diagonal matrices = 11 nn D and E of the same size commute: DE diag(d11e11,...,dnnenn) ED. = =

0.9.2 Block diagonal matrices and direct sums. A matrix A Mn(F) of the form ∈ A11 0 . A .. = ⎡ ⎤ 0 A ⎢ kk ⎥ ⎣ k ⎦ in which Aii Mni (F), i 1,...,k, i 1 ni n, and all blocks above and below the ∈ = = = block diagonal are zero blocks, is called block diagonal. It is convenient to write such a matrix as � k

A A11 A22 Akk Aii = ⊕ ⊕···⊕ = i 1 �= This is the direct sum of the matrices A11,...,Akk. Many properties of block di- k agonal matrices generalize those of diagonal matrices. For example, det( i 1 Aii) k ⊕ = = i 1 det Aii, so that A Aii is nonsingular if and only if each Aii is nonsingular, = =⊕ � 0.9 Special types of matrices 31

k k i 1,...,k. Furthermore, two direct sums A i 1 Aii and B i 1 Bii, in which = =⊕= =⊕= each Aii is the same size as Bii, commute if and only if each pair Aii and Bii commutes, k k i 1,...,k. Also, rank( i 1 Aii) i 1 rank Aii. = ⊕ = = = 1 1 1 If A Mn and B Mm are nonsingular, then (A B)− A− B− ∈ ∈ 1 � 1 1 ⊕ = ⊕ 1 and (det(A B))(A B)− (det A)(det B)(A− B− ) ((det B)(det A)A− ⊕ 1 ⊕ = ⊕ = ⊕ (det A)(det B)B− ), so a continuity argument ensures that adj(A B) (det B) adj A (det A) adj B (0.9.2.1) ⊕ = ⊕

0.9.3 Triangular matrices. A matrix T [tij] Mn,m(F) is upper triangular if = ∈ tij 0 whenever i > j. If tij 0 whenever i j, then T is said to be strictly upper = = ≥ triangular. Analogously, T is lower triangular (or strictly lower triangular) if its transpose is upper triangular (or strictly upper triangular). A is either lower or upper triangular; a strictly triangular matrix is either strictly upper triangular or strictly lower triangular. A unit triangular matrix is a triangular matrix (upper or lower) that has ones on its main diagonal. Sometimes the terms right (in place of upper) and left (in place of lower) are used to describe triangular matrices. Let T Mn,m(F) be given. If T is upper triangular, then T [RT2] if n m, ∈ R = ≤ whereas T if n m; R Mmin n,m (F) is upper triangular and T2 is arbitrary = 0 ≥ ∈ { } (empty if n �m).� If T is lower triangular, then T [L 0] if n m, whereas T L = = ≤ = T2 if n m; L Mmin n,m (F) is lower triangular and T2 is arbitrary (empty if n m�). � ≥ ∈ { } = A square triangular matrix shares with a square diagonal matrix the property that its determinant is the product of its diagonal entries. Square triangular matrices need not commute with other square triangular matrices of the same size. However, if T Mn ∈ is triangular, has distinct diagonal entries, and commutes with B Mn, then B must ∈ be triangular of the same type as T (2.4.5.1). For each i 1,...,n, left multiplication of A Mn(F) by a lower triangular matrix = ∈ L (A LA) replaces the ith row of A by a linear combination of the first through → ith rows of A. The result of performing a finite number of type 3 row operations on A (0.3.3) is a matrix LA, in which L is a unit lower triangular matrix. Corresponding statements may be made about column operations and right multiplication by an upper triangular matrix. The rank of a triangular matrix is at least, and can be greater than, the number of nonzero entries on the main diagonal. If a square triangular matrix is nonsingular, its inverse is a triangular matrix of the same type. A product of square triangular matrices of the same size and type is a triangular matrix of the same type; each i, i diagonal entry of such a matrix product is the product of the i, i entries of the factors.

0.9.4 Block triangular matrices. A matrix A Mn(F) of the form ∈ A11 . A .. (0.9.4.1) = ⎡ ⎤ 0 A ⎢ kk ⎥ ⎣ k ⎦ in which Aii Mn (F), i 1,...,k, � ni n, and all blocks below the block ∈ i = i 1 = diagonal are zero, is block upper triangular= ; it is strictly block upper triangular if, in 32 Review and miscellanea addition, all the diagonal blocks are zero blocks. A matrix is block lower triangular if its transpose is block upper triangular; it is strictly block lower triangular if its transpose is strictly block upper triangular. We say that a matrix is block triangular if it is either block lower triangular or block upper triangular; a matrix is both block lower triangular and block upper triangular if and only if it is block diagonal. A block upper triangular matrix in which all the diagonal blocks are 1-by-1 or 2-by-2 is said to be upper quasitriangular. A matrix is lower quasitriangular if its transpose is upper quasitriangular; it is quasitriangular if it is either upper quasitriangular or lower quasitriangular. A matrix that is both upper quasitriangular and lower quasitriangular is said to be quasidiagonal. Consider the square block triangular matrix A in (0.9.4.1). We have det A = det A11 det Akk and rank A rank A11 rank Akk. If A is nonsingular (that ··· ≥ +···+ 1 is, if Aii is nonsingular for all i 1,...,k), then A− is a block triangular matrix = 1 1 partitioned conformally to A whose diagonal blocks are A11− ,...,Akk− . t If A Mn(F) is upper triangular, then [A[αi ,αj ]]i, j 1 is block upper triangular for ∈ = any sequential partition α1,...,αt of 1,...,n (0.7.2). { }

0.9.5 Permutation matrices. A square matrix P is a if exactly one entry in each row and column is equal to 1 and all other entries are 0. Multiplication by such matrices effects a permutation of the rows or columns of the matrix multiplied. For example,

010 1 2 100 2 1 ⎡ 001⎤ ⎡ 3 ⎤ = ⎡ 3 ⎤ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ illustrates how a permutation matrix produces a permutation of the rows (entries) of a vector: it sends the first entry to the second position, sends the second entry to the first position, and leaves the third entry in the third position. Left multiplication of a matrix A Mm,n by an m-by-m permutation matrix P permutes the rows of A, while ∈ right multiplication of A by an n-by-n permutation matrix P permutes the columns of A. The matrix that carries out a type 1 elementary operation (0.3.3) is an example of a special type of permutation matrix called a transposition. Any permutation matrix is a product of transpositions. The determinant of a permutation matrix is 1, so permutation matrices are nonsin- ± gular. Although permutation matrices need not commute, the product of two permuta- tion matrices is again a permutation matrix. Since the identity is a permutation matrix and P T P 1 for every permutation matrix P, the set of n-by-n permutation matrices = − is a subgroup of GL(n, C) with cardinality n!. Since right multiplication by P T P 1 permutes columns in the same way that left = − multiplication by P permutes rows, the transformation A PAPT permutes the rows → and columns (and hence also the main diagonal entries) of A Mn in the same way. In ∈ the context of linear equations with coefficient matrix A, this transformation amounts to renumbering the variables and the equations in the same way. A matrix A Mn such ∈ that PAPT is triangular for some permutation matrix P is called essentially triangular; these matrices have much in common with triangular matrices. 0.9 Special types of matrices 33

T If � Mn is diagonal and P Mn is a permutation matrix, then P�P is a diagonal ∈ ∈ matrix. The n-by-n reversal matrix is the permutation matrix 1 .. Kn . κij Mn (0.9.5.1) = ⎡ 1 ⎤ = ∈ � � ⎣ ⎦ in which κi,n i 1 1 for i 1,...,n and all other entries are zero. The rows of − + = = Kn A are the rows of A presented in reverse order; the columns of AKn are the columns of A presented in reverse order. The reversal matrix is sometimes called the sip matrix (standard involutory permutation), the backward identity, or the . For any n-by-n matrix A [aij], the entries ai,n i 1 for i 1,...,n comprise its = − + = counterdiagonal (sometimes called the secondary diagonal, backward diagonal, cross diagonal, dexter-diagonal, or antidiagonal). A generalized permutation matrix is a matrix of the form G PD, in which = P, D Mn, P is a permutation matrix, and D is a nonsingular diagonal matrix. The ∈ set of n-by-n generalized permutation matrices is a subgroup of GL(n, C).

0.9.6 Circulant matrices. A matrix A Mn(F) of the form ∈ a1 a2 an ··· an a1 a2 an 1 ⎡ ··· − ⎤ A an 1 an a1 an 2 (0.9.6.1) − ··· − = ⎢ . . . . . ⎥ ⎢ ...... ⎥ ⎢ ⎥ ⎢ a2 a3 an a1 ⎥ ⎢ ··· ⎥ ⎣ ⎦ is a . Each row is the previous row cycled forward one step; the entries in each row are a cyclic permutation of those in the first. The n-by-n permutation matrix 010 0 ··· . ⎡ . 01 . ⎤ 0 In 1 Cn .. .. − (0.9.6.2) = ⎢ .. . 0 ⎥ = 101,n 1 ⎢ ⎥ � − � ⎢ 01⎥ ⎢ ⎥ ⎢ 10 0 ⎥ ⎢ ··· ⎥ ⎣ ⎦ is the basic circulant permutation matrix. A matrix A Mn(F) can be written in the ∈ form n 1 − k A ak 1Cn (0.9.6.3) = + k 0 �= 0 n (a polynomial in the matrix Cn) if and only if it is a circulant. We have C I C , and n = = n the coefficients a1,...,an are the entries of the first row of A. This representation reveals that the circulant matrices of size n are a commutative algebra: linear combinations and products of circulants are circulants; the inverse of a nonsingular circulant is a circulant; any two circulants of the same size commute. 34 Review and miscellanea

0.9.7 Toeplitz matrices. A matrix A aij Mn 1(F) of the form = ∈ + � � a0 a1 a2 an ··· ··· a 1 a0 a1 a2 an 1 ⎡ − ··· − ⎤ a 2 a 1 a0 a1 an 2 − − ··· − A ⎢ ...... ⎥ = ⎢ ...... ⎥ ⎢ ...... ⎥ ⎢ . . . . ⎥ ⎢ ...... a ⎥ ⎢ .. .. 1 ⎥ ⎢ a n a n 1 a 1 a0 ⎥ ⎢ − − + ··· ··· − ⎥ ⎣ ⎦

is a . The entry aij is equal to a j i for some given sequence − a n, a n 1,...,a 1, a0, a1, a2,...,an 1, an C. The entries of A are constant down − − + − − ∈ the diagonals parallel to the main diagonal. The Toeplitz matrices

01 0 0 0 .. ⎡ 0 . ⎤ 10 B and F ⎡ . . ⎤ = . = . . ⎢ .. 1 ⎥ . . ⎢ ⎥ ⎢ ⎥ ⎢ 0 0 ⎥ ⎢ 0 10⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ are called the backward shift and forward shift because of their effect on the elements of T T the standard basis e1,...,en 1 . Moreover, F B and B F . A matrix A Mn 1 + } = = ∈ + can be written in the{ form

n n k k A a k F ak B (0.9.7.1) = − + k 1 k 0 �= �= if and only if it is a Toeplitz matrix. Toeplitz matrices arise naturally in problems involving trigonometric moments. Using a reversal matrix K of appropriate size (0.9.5.1), notice that the forward and backward shift matrices are related: F KBK BT and B KFK F T . The = = = = representation (0.9.7.1) ensures that KA AT K for any Toeplitz matrix A, that is, = AT K AK K AK 1. = = − An upper triangular Toeplitz matrix A Mn 1(F) can be represented as a polynomial ∈ + in B:

n A a0 I a1 B an B = + +···+

This representation (and the fact that Bn 1 0) makes it clear why the upper trian- + = gular Toeplitz matrices of size n are a commutative algebra: Linear combinations and products of upper triangular Toeplitz matrices are upper triangular Toeplitz matrices; A 1 n is nonsingular if and only if a0 0, in which case A− b0 I b1 B bn B is �= 1 = + 1+···+k 1 also an upper triangular Toeplitz matrix with b0 a0− and bk a0− ( m− 0 ak mbm) = = = − for k 1,...,n. Any two upper triangular Toeplitz matrices of the same size commute. = � 0.9 Special types of matrices 35

0.9.8 Hankel matrices. A matrix A Mn 1(F) of the form ∈ + a0 a1 a2 an ··· a1 a2 an 1 ⎡ ··· · · + ⎤ · . A a2 . = ⎢ · · ⎥ ⎢ . · ⎥ ⎢ . an a2n 1 ⎥ ⎢ − ⎥ ⎢ an an 1 a2n 1 a2n ⎥ ⎢ + ··· − ⎥ ⎣ ⎦ is a . Each entry aij is equal to ai j 2 for some given sequence a0, a1, a2, + − ...,a2n 1, a2n. The entries of A are constant along the diagonals perpendicular to the main diagonal.− Hankel matrices arise naturally in problems involving power moments. Using a reversal matrix K of appropriate size (0.9.5.1), notice that KAand AK are Hankel matrices for any Toeplitz matrix A; KH and HK are Toeplitz matrices for any Hankel matrix H.Since K K T K 1 and Hankel matrices are symmetric, this = = − means that any Toeplitz matrix is a product of two symmetric matrices with special structure: a reversal matrix and a Hankel matrix.

0.9.9 Hessenberg matrices. A matrix A [aij] Mn(F) is said to be in upper = ∈ Hessenberg form or to be an upper if aij 0 for all i > j 1: = + a11 a21 a22 ⎡ ⎤ .. A a32 . = ⎢ ⎥ ⎢ . . ⎥ ⎢ .. .. ⎥ ⎢ ⎥ ⎢ 0 an,n 1 ann ⎥ ⎢ − ⎥ An upper Hessenberg matrix ⎣A is said to be unreduced if all its⎦ subdiagonal entries are nonzero, that is, if ai 1,i 0 for all i 1,...,n 1; the rank of such a matrix is at + �= = − least n 1 since its first n 1 columns are independent. − − Let A Mn(F) be unreduced upper Hessenberg. Then A λI is unreduced upper ∈ − Hessenberg for all λ F, so rank(A λI ) n 1 for all λ F. ∈ − ≥ −T ∈ A matrix A Mn(F) is lower Hessenberg if A is upper Hessenberg. ∈ 0.9.10 Tridiagonal, bidiagonal, and other structured matrices. A matrix A = [aij] Mn(F) that is both upper and lower Hessenberg is called tridiagonal, that is, A ∈ is tridiagonal if aij 0 whenever i j > 1: = | − | a1 b1 0 . c a .. A ⎡ 1 2 ⎤ (0.9.10.1) = .. .. ⎢ . . bn 1 ⎥ ⎢ − ⎥ ⎢ 0 cn 1 an ⎥ ⎢ − ⎥ ⎣ ⎦ The determinant of A can be calculated inductively starting with det A1 a1, det A2 = = a1a2 b1c1, and then computing a sequence of 2-by-2 matrix products − det Ak 1 0 ak 1 bkck det Ak 0 + + − , k 2,...,n 1 det Ak 0 = 10det Ak 1 0 = − � � � �� − � 36 Review and miscellanea 0.9 Special types of matrices 37

A Jacobi matrix is a real symmetric with positive subdiagonal entries. (respectively, skew centrohermitian) matrix is centrohermitian (respectively, skew cen- An upper A Mn(F) is a tridiagonal matrix (0.9.10.1) in which trohermitian). A product of centrohermitian matrices is centrohermitian. ∈ T c1 cn 1 0. A matrix A Mn(F) is lower bidiagonal if A is upper bidiag- =···= − = ∈ onal. 0.9.11 Vandermonde matrices and Lagrange interpolation. A Vandermonde A block tridiagonal or block bidiagonal matrix has a block structure like the pattern matrix A Mn(F) has the form in (0.9.10.1); the diagonal blocks are square and the sizes of the superdiagonal and ∈ 2 n 1 subdiagonal blocks are determined by the sizes of their nearest diagonal blocks. 1 x1 x1 x1 − 2 ··· n 1 A matrix A [aij] Mn(F) is persymmetric if aij an 1 j,n 1 i for all i, j 1 x2 x2 x2 − = ∈ = + − + − = ⎡ ··· ⎤ 1,...,n; that is, a is symmetric with respect to the counterdiag- A . . . . . (0.9.11.1) = ...... onal. An alternative, and very useful, characterization is that A is persymmetric if ⎢ 2 n 1 ⎥ T ⎢ 1 xn xn xn − ⎥ Kn A A Kn, in which Kn is the reversal matrix (0.9.5.1). If A is persymmetric and ⎢ ··· ⎥ = ⎣ ⎦ invertible, then A 1 is also persymmetric since K A 1 (AK ) 1 (K AT ) 1 j 1 − n − n − n − in which x ,...,x F; that is, A [a ] with a x − . It is a fact that T = = = 1 n ij ij i A− Kn. Toeplitz matrices are persymmetric. We say that A Mn(F) is skew persym- ∈ = = T ∈ n metric if Kn A A Kn; the inverse of a nonsingular skew-persymmetric matrix is =− skew persymmetric. det A (xi x j ) (0.9.11.2) = i, j 1 − A complex matrix A Mn such that Kn A A Kn is perhermitian; A is skew per- i>=j ∈ = ∗ � hermitian if Kn A A Kn. The inverse of a nonsingular perhermitian (respectively, =− ∗ skew perhermitian) matrix is perhermitian (respectively, skew perhermitian). so a is nonsingular if and only if the parameters x1,...,xn are distinct. A matrix A [aij] Mn(F) is centrosymmetric if aij an 1 i,n 1 j for all 1 = ∈ = + − + − If x1,...,xn are distinct, the entries of the inverse A− [αij] of the Vandermonde i, j 1,...,n. Equivalently, A is centrosymmetric if Kn A AKn; A is skew cen- = = = matrix (0.9.11.1) are trosymmetric if Kn A AKn. A centrosymmetric matrix is symmetric about its =− geometric center, as illustrated by the example i 1 Sn i (x1,...,xˆ j ,...,xn) αij ( 1) − − , i, j 1,...,n = − (xk x j ) = 12345 − k j 06789 ��= A ⎡ 1 2 3 2 1 ⎤ = − − − − − in which S0 1, and if m > 0, then Sm(x1,...,xˆ j ,...,xn) is the mth elementary ⎢ 98760⎥ = ⎢ ⎥ symmetric function of the n 1 variables xk, k 1,...,n, k j; see (1.2.14). ⎢ 54321⎥ − = �= ⎢ ⎥ The Vandermonde matrix arises in the interpolation problem of finding a poly- ⎣ ⎦ n 1 n 2 nomial p(x) an 1x − an 2x − a1x a0 of degree at most n 1 with If A is nonsingular and centrosymmetric (respectively, skew centrosymmetric), then = − + − +···+ + − 1 1 coefficients from F such that A− is also centrosymmetric (respectively, skew centrosymmetric) since KA− 1 1 1 = (AKn)− (Kn A)− A− Kn. If A and B are centrosymmetric, then AB is cen- 2 n 1 = = p(x1) a0 a1x1 a2x1 an 1x1 − y1 trosymmetric since Kn AB AKn B ABKn. If A and B are skew centrosymmetric, = + + +···+ − = = = 2 n 1 then AB is centrosymmetric. p(x2) a0 a1x2 a2x2 an 1x2 − y2 = + + +···+ − = A centrosymmetric matrix A Mn(F) has a special block structure. If n 2m, then . . . . . ∈ = . . . . . (0.9.11.3) 2 n 1 p(xn) a0 a1xn a2xn an 1xn − yn BKmCKm = + + +···+ − = A , B, C Mm(F) (0.9.10.2) = CKm BKm ∈ in which x ,...,x and y ,...,y are given elements of F. The interpolation conditions � � 1 n 1 n (0.9.11.3) are a system of n equations for the n unknown coefficients a0,...,an 1, and If n 2m 1, then T n T− n = + they have the form Aa y, in which a [a0 ... an 1] F , y [y1 ... yn] F , = = − ∈ = ∈ and A Mn(F) is the Vandermonde matrix (0.9.11.1). This interpolation problem BKm yKmCKm ∈ T T m always has a solution if the points x , x ,...,x are distinct, since A is nonsingular in A x α x Km , B, C Mm(F), x, y F ,α F (0.9.10.3) 1 2 n = ⎡ ⎤ ∈ ∈ ∈ this event. C yKm BKm ⎣ ⎦ If the points x1,...,xn are distinct, the coefficients of the interpolating polynomial A complex matrix A Mn such that Kn A AK¯ n is centrohermitian; it is skew could in principle be obtained by solving the system (0.9.11.3), but it is usually more ∈ = centrohermitian if Kn A AK¯ n. The inverse of a nonsingular centrohermitian useful to represent the interpolating polynomial p(x) as a linear combination of the =− 0.9 Special types of matrices 37

(respectively, skew centrohermitian) matrix is centrohermitian (respectively, skew cen- trohermitian). A product of centrohermitian matrices is centrohermitian.

0.9.11 Vandermonde matrices and Lagrange interpolation. A Vandermonde matrix A Mn(F) has the form ∈ 2 n 1 1 x1 x1 x1 − 2 ··· n 1 1 x2 x2 x2 − A ⎡ . . . ···. . ⎤ (0.9.11.1) = ...... ⎢ 2 n 1 ⎥ ⎢ 1 xn x x ⎥ ⎢ n ··· n − ⎥ ⎣ ⎦ j 1 in which x1,...,xn F; that is, A [aij] with aij x − . It is a fact that ∈ = = i n

det A (xi x j ) (0.9.11.2) = i, j 1 − �i>=j so a Vandermonde matrix is nonsingular if and only if the parameters x1,...,xn are distinct. 1 If x1,...,xn are distinct, the entries of the inverse A [αij] of the Vandermonde − = matrix (0.9.11.1) are

i 1 Sn i (x1,...,xˆ j ,...,xn) αij ( 1) − − , i, j 1,...,n = − (xk x j ) = − k j ��= in which S0 1, and if m > 0, then Sm(x1,...,xˆ j ,...,xn) is the mth elementary = symmetric function of the n 1 variables xk, k 1,...,n, k j; see (1.2.14). − = �= The Vandermonde matrix arises in the interpolation problem of finding a poly- n 1 n 2 nomial p(x) an 1x − an 2x − a1x a0 of degree at most n 1 with = − + − +···+ + − coefficients from F such that

2 n 1 p(x1) a0 a1x1 a2x1 an 1x1 − y1 = + + +···+ − = 2 n 1 p(x2) a0 a1x2 a2x2 an 1x2 − y2 = + + +···+ − = ...... (0.9.11.3) 2 n 1 p(xn) a0 a1xn a2xn an 1xn − yn = + + +···+ − = in which x1,...,xn and y1,...,yn are given elements of F. The interpolation conditions (0.9.11.3) are a system of n equations for the n unknown coefficients a0,...,an 1, and T n T− n they have the form Aa y, in which a [a0 ... an 1] F , y [y1 ... yn] F , = = − ∈ = ∈ and A Mn(F) is the Vandermonde matrix (0.9.11.1). This interpolation problem ∈ always has a solution if the points x1, x2,...,xn are distinct, since A is nonsingular in this event. If the points x1,...,xn are distinct, the coefficients of the interpolating polynomial could in principle be obtained by solving the system (0.9.11.3), but it is usually more useful to represent the interpolating polynomial p(x) as a linear combination of the 38 Review and miscellanea

Lagrange interpolating polynomials

(x x j ) j i − Li (x) �= , i 1,...,n = (xi x j ) = j i − �= Each polynomial Li (x) has degree n 1 and has the property that Li (xk) 0 if k i, − = �= but Li (xi ) 1. Lagrange’s interpolation formula =

p(x) y1 L1(x) yn Ln(x) (0.9.11.4) = +···+ provides a polynomial of degree at most n 1 that satisfies the equations (0.9.11.3). −

0.9.12 Cauchy matrices. A A Mn(F) is a matrix of the form A 1 n ∈ = [(ai b j ) ] , in which a1,...,an, b1,...,bn are scalars such that ai b j 0 for + − i, j 1 + �= all i, j 1,...,= n. It is a fact that = (a j ai )(b j bi ) 1 i< j n − − det A ≤ ≤ (0.9.12.1) = (ai b j ) 1 i j n + ≤ ≤ ≤ so A is nonsingular if and only if ai a j and bi b j for all i j.AHilbert matrix 1 n �= �= �= Hn [(i j 1) ] is a Cauchy matrix that is also a Hankel matrix. It is a fact = + − − i, j 1 that = (1!2! (n 1)!)4 det Hn ··· − (0.9.12.2) = 1!2! (2n 1)! ··· − 1 n so a is always nonsingular. The entries of its inverse H [hij] n− = i, j 1 are =

i j ( 1) + (n i 1)!(n j 1)! hij − + − + − (0.9.12.3) = ((i 1)!( j 1)!)2(n i)!(n j)!(i j 1) − − − − + −

0.9.13 Involution, nilpotent, projection, coninvolution. A matrix A Mn(F) is ∈ an involution if A2 I, that is, if A A 1 (the term involutory is also used) = = − r nilpotent if Ak 0 for some positive integer k; the least such k is the index of = r nilpotence of A a projection if A2 A (the term idempotent is also used) = r Now suppose that F C. A matrix A Mn is = ∈ a Hermitian projection if A A and A2 A (the term orthogonal projection is ∗ = = r also used; see (4.1.P19)) a coninvolution if AA¯ I, that is, if A¯ A 1 (the term coninvolutory is also = = − r used) 0.10 Change of basis 39

0.10 Change of basis

Let V be an n-dimensional vector space over the field F, and let the list 1 v1,v2, B = ...,vn be a basis for V . Any vector x V can be represented as x α1v1 α2v2 ∈ = + + αnvn because 1 spans V. If there were some other representation of x β v1 ···+ B = 1 + β v2 β vn in the same basis, then 2 +···+ n 0 x x (α1 β )v1 (α2 β )v2 (αn β )vn = − = − 1 + − 2 +···+ − n from which it follows that all αi β 0 because the list 1 is independent. Given − i = B the basis 1, the linear mapping B α1 . x [x] 1 . , in which x α1v1 α2v2 αnvn → B = ⎡ ⎤ = + +···+ α ⎢ n ⎥ n ⎣ ⎦ from V to F is well-defined, one-to-one, and onto. The scalars αi are the coordinates of x with respect to the basis 1, and the column vector [x] 1 is the unique 1-coordinate representation of x. B B B Let T : V V be a given linear transformation. The action of T on any x V is → ∈ determined once one knows the n vectors Tv1, Tv2,...,Tvn, because any x V has ∈ a unique representation x α1v1 αnvn and Tx T (α1v1 αnvn) = +···+ = +···+ = T (α1v1) T (αnvn) α1T v1 αn T vn by linearity. Thus, the value of Tx +···+ = +···+ is determined once [x] 1 is known. B Let 2 w1,w2,...,wn also be a basis for V (either different from or the same B ={ } as 1) and suppose that the 2-coordinate representation of T v j is B B t1 j . T v j . , j 1, 2,...,n 2 ⎡ . ⎤ B = = t � � ⎢ nj ⎥ Then, for any x V , we have ⎣ ⎦ ∈ n n

[Tx] 2 α j T v j α j T v j B = ⎡ ⎤ = 2 j 1 j 1 B �= 2 �= � � ⎣ ⎦B n t1 j t11 t1n α1 . . ···. . . α j ...... = ⎡ . ⎤ = ⎡ . . ⎤ ⎡ . ⎤ j 1 tnj tn1 tnn αn �= ⎢ ⎥ ⎢ ··· ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ The n-by-n array [tij] depends on T and on the choice of the bases 1 and 2, but it B B does not depend on x. We define the 1- 2 basis representation of T to be B B t11 t1n ··· . .. . 2 [T ] 1 . . . [T v1] 2 ...[T vn] 2 B B = ⎡ ⎤ = B B tn1 tnn ⎢ ··· ⎥ � � ⎣ ⎦ We have just shown that [Tx] 2 2 [T ] 1 [x] 1 for any x V . In the important special B = B B B ∈ case 2 1, we have 1 [T ] 1 , which is called the 1 basis representation of T. B = B B B B 40 Review and miscellanea

Consider the identity linear transformation I : V V defined by Ix x for all x. → = Then

[x] 2 [Ix] 2 2 [I ] 1 [x] 1 2 [I ] 1 [Ix] 1 2 [I] 1 1 [I] 2 [x] 2 B = B = B B B = B B B = B B B B B for all x V . By successively choosing x w1,w2,...,wn, this identity permits us ∈ = to identify each column of 2 [I ] 1 1 [I ] 2 and shows that B B B B

2 [I ] 1 1 [I ] 2 In B B B B =

If we do the same calculation starting with [x] 1 [Ix] 1 , we find that B = B =···

1 [I ] 2 2 [I ] 1 In B B B B =

Thus, every matrix of the form 2 [I ] 1 is invertible and 1 [I] 2 is its inverse. B B B B Conversely, every S [s1 s2 ... sn] Mn(F) has the form 1 [I] = ∈ B B for some basis . We may take to be the vectors s˜1, s˜2,...,s˜n defined by B B { } [s˜i ] 1 si , i 1, 2,...,n. The list is independent because S is invertible. B = = Notice that B

2 [I ] 1 [I v1] 2 ...[I vn] 2 ] [v1] 2 ...[vn] 2 B B = B B = B B so 2 [I] 1 describes how the elements of the basis 1 are formed from elements of the B B B basis 2. Now let x V and compute B ∈

1 [T ] 2 [x] 2 [Tx] 2 [I (Tx)] 2 2 [I] 1 [Tx] 1 B B B = B = B = B B B

2 [I ] 1 1 [T ] 1 [x] 1 2 [I] 1 1 [T ] 1 [Ix] 1 = B B B B B = B B B B B

2 [I ] 1 1 [T ] 1 1 [I] 2 [x] 2 = B B B B B B B By choosing x w1,w2,...,wn successively, we conclude that =

2 [T ] 2 2 [I ] 1 1 [T ] 1 1 [I] 2 (0.10.1.1) B B = B B B B B B This identity shows how the 1 basis representation of T changes if the basis is changed B to 2. For this reason, the matrix 2 [I ] 1 is called the 1 2 change of basis matrix. B B B B − B Any matrix A Mn(F) is a basis representation of some linear transformation ∈ T : V V , for if is any basis of V , we can determine Tx by [Tx] A[x] . For → B = B this T, a computationB reveals that [T ] A. B B =

0.11 Equivalence relations

Let S be a given set and let � be a given subset of S S (a, b):a S and b S . × ={ ∈ ∈ } Then � defines a relation on S in the following way: We say that a is related to b, written a ∼ b,if(a, b) �. A relation on S is said to be an equivalence relation if it ∈ is (a) reflexive (a ∼ a for every a S), (b) symmetric (a ∼ b whenever b ∼ a), and ∈ (c) transitive (a ∼ c whenever a ∼ b and b ∼ c). An equivalence relation on S gives a disjoint partition of S in a natural way: If we define the equivalence class of any a S ∈ by Sa b S : b ∼ a , then S a S Sa, and for each a, b S, either Sa Sb (if ={ ∈ } =∪ ∈ ∈ = a ∼ b) or Sa Sb ∅ (if a ∼ b). Conversely, any disjoint partition of S can be used ∩ = � to define an equivalence relation on S. 0.11 Equivalence relations 41

The following table lists several equivalence relations that arise in matrix analysis. The factors D1, D2, S, T , L, and R are square and nonsingular; U and V are unitary; L is lower triangular; R is upper triangular; D1 and D2 are diagonal; and A and B need not be square for equivalence, unitary equivalence, triangular equivalence, or diagonal equivalence.

Equivalence Relation A B ∼ ∼ congruence A SBST = unitary congruence A U BUT = *congruence A SBS∗ = 1 consimilarity A SBS¯ − = equivalence A SBT = unitary equivalence A UBV = diagonal equivalence A D1 BD2 = 1 similarity A SBS− = unitary similarity A U BU∗ = triangular equivalence A LBR =

Whenever an interesting equivalence relation arises in matrix analysis, it can be useful to identify a set of distinguished representatives of the equivalence classes (a canonical form or normal form for the equivalence relation). Alternatively, we often want to have effective criteria (invariants) that can be used to decide if two given matrices belong to the same equivalence class. Abstractly, a canonical form for an equivalence relation ∼ on a set S is a subset C of S such that S a C Sa and Sa Sb ∅ whenever a, b C and a b; the =∪ ∈ ∩ = ∈ �= canonical form of an element a S is the unique element c C such that a Sc. ∈ ∈ ∈ For a given equivalence relation in matrix analysis, it is important to make an artful and simple choice of canonical form, and one sometimes does this in more than one way to tailor the canonical form to a specific purpose. For example, the Jordan and Weyr canonical forms are different canonical forms for similarity; the Jordan canonical form works well in problems involving powers of matrices, while the Weyr canonical form works well in problems involving commutativity. An invariant for an equivalence relation ∼ on S is a function f on S such that f (a) f (b) whenever a ∼ b. A family of invariants for an equivalence relation ∼ = F on S is said to be complete if f (a) f (b) for all f if and only if a ∼ b; a complete = ∈ family of invariants is often called a complete systemF of invariants. For example, the singular values of a matrix are a complete system of invariants for unitary equivalence.